Skip to content

Commit

Permalink
Merge branch 'ddpg' of github.com:salesforce/warp-drive into ddpg
Browse files Browse the repository at this point in the history
  • Loading branch information
Emerald01 committed Feb 19, 2024
2 parents fc65445 + 162cbc1 commit 6b603d0
Showing 1 changed file with 20 additions and 11 deletions.
31 changes: 20 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,27 @@ WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement
framework that implements end-to-end multi-agent RL on a single or multiple GPUs (Graphics Processing Unit).

Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude
faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU,
and runs simulations across multiple agents and multiple environment replicas in parallel.

We have some main updates since its initial open source,
- version 1.3: provides the auto scaling tools to achieve the optimal throughput per device.
- version 1.4: supports the distributed asynchronous training among multiple GPU devices.
- version 1.6: supports the aggregation of multiple GPU blocks for one environment replica.
- version 2.0: supports the dual backends of both CUDA C and JIT compiled Numba. [(Our Blog article)](https://blog.salesforceairesearch.com/warpdrive-v2-numba-nvidia-gpu-simulations/)
- version 2.6: supports single agent environments, including Cartpole, MountainCar, Acrobot

faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel.
Together, these allow the user to run thousands or even millions of concurrent simulations and train
on extremely large batches of experience, achieving at least 100x throughput over CPU-based counterparts.
on extremely large batches of experience, achieving at least 100x throughput over CPU-based counterparts.

The table below provides a visual overview of Warpdrive's key features and scalability over various dimensions.

| | Support | Concurrency | Version
:--- | :---: | :---: | :---:
| Environments | Single ✅ Multi ✅ | 1 to 1000 per GPU | 1.0
| Agents | Single ✅ Multi ✅ | 1 to 1024 per environment | 1.0
| Agents | Multi across blocks ✅| 1024 per block | 1.6
| Discrete Actions | Single ✅ Multi ✅| - | 1.0
| Continuous Action | Single ✅ Multi ✅| - | 2.7
| On-Policy Policy Gradient | A2C ✅ PPO ✅ | - | 1.0
| Off-Policy Policy Gradient| DDPG ✅ | - | 2.7
| Auto-Scaling | ✅ | - | 1.3
| Distributed Simulation | 1 GPU ✅ 2-16 GPU node ✅ | - | 1.4
| Environment Backend | CUDA C ✅ | - | 1.0
| Environment Backend | CUDA C ✅ Numba ✅ | - | 2.0
| Training Backend | Pytorch ✅ | - | 1.0


## Environments
1. We include several default multi-agent environments
Expand Down

0 comments on commit 6b603d0

Please sign in to comment.