From 2df2bd2fcebae0f4f4baac3b5108cc98154efc1f Mon Sep 17 00:00:00 2001 From: Tian Lan <31748898+Emerald01@users.noreply.github.com> Date: Sun, 18 Feb 2024 11:00:52 -0800 Subject: [PATCH 1/4] Update README.md --- README.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 48583fb..1d3353b 100644 --- a/README.md +++ b/README.md @@ -7,12 +7,20 @@ Using the extreme parallelization capability of GPUs, WarpDrive enables orders-o faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel. -We have some main updates since its initial open source, -- version 1.3: provides the auto scaling tools to achieve the optimal throughput per device. -- version 1.4: supports the distributed asynchronous training among multiple GPU devices. -- version 1.6: supports the aggregation of multiple GPU blocks for one environment replica. -- version 2.0: supports the dual backends of both CUDA C and JIT compiled Numba. [(Our Blog article)](https://blog.salesforceairesearch.com/warpdrive-v2-numba-nvidia-gpu-simulations/) -- version 2.6: supports single agent environments, including Cartpole, MountainCar, Acrobot +| | Support | Concurrent Number | Version +:--- | :---: | :---: | :---: +| Environments | Single ✅ Multi ✅ | >= 1000 per GPU | 1.0 +| Agents | Single ✅ Multi ✅ | 1024 | 1.0 +| Agents | Multi across blocks ✅| 1024 per block | 1.6 +| Discrete Actions | Single ✅ Multi ✅| - | 1.0 +| Continuous Action | Single ✅ Multi ✅| - | 2.7 +| On-Policy Policy Gradient | A2C ✅, PPO ✅ | - | 1.0 +| Off-Policy Policy Gradient| DDPG ✅ | - | 2.7 +| Auto-Scaling | ✅ | - | 1.3 +| Distributed Simulation | ✅ | 2 to 16 GPUs node| 1.4 +| Environment Backend | CUDA C ✅ | - | 1.0 +| Environment Backend | CUDA C ✅ Numba ✅ | - | 2.0 +| Training Backend | Pytorch ✅ | - | 1.0 Together, these allow the user to run thousands or even millions of concurrent simulations and train on extremely large batches of experience, achieving at least 100x throughput over CPU-based counterparts. From 3485b6ce5b2a4bc51ebb61e144e91b235409255b Mon Sep 17 00:00:00 2001 From: Tian Lan <31748898+Emerald01@users.noreply.github.com> Date: Sun, 18 Feb 2024 11:01:35 -0800 Subject: [PATCH 2/4] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1d3353b..dafc1b3 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ Using the extreme parallelization capability of GPUs, WarpDrive enables orders-o faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel. -| | Support | Concurrent Number | Version +| | Support | Concurrency | Version :--- | :---: | :---: | :---: | Environments | Single ✅ Multi ✅ | >= 1000 per GPU | 1.0 | Agents | Single ✅ Multi ✅ | 1024 | 1.0 From 7c2bb3c9fae7faf777b5ad89a5a871996b8a6cf7 Mon Sep 17 00:00:00 2001 From: Tian Lan <31748898+Emerald01@users.noreply.github.com> Date: Sun, 18 Feb 2024 11:03:44 -0800 Subject: [PATCH 3/4] Update README.md --- README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index dafc1b3..411a1d5 100644 --- a/README.md +++ b/README.md @@ -7,14 +7,17 @@ Using the extreme parallelization capability of GPUs, WarpDrive enables orders-o faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel. +Together, these allow the user to run thousands or even millions of concurrent simulations and train +on extremely large batches of experience, achieving at least 100x throughput over CPU-based counterparts. + | | Support | Concurrency | Version :--- | :---: | :---: | :---: -| Environments | Single ✅ Multi ✅ | >= 1000 per GPU | 1.0 -| Agents | Single ✅ Multi ✅ | 1024 | 1.0 +| Environments | Single ✅ Multi ✅ | 1 to 1000 per GPU | 1.0 +| Agents | Single ✅ Multi ✅ | 1 to 1024 per environment | 1.0 | Agents | Multi across blocks ✅| 1024 per block | 1.6 | Discrete Actions | Single ✅ Multi ✅| - | 1.0 | Continuous Action | Single ✅ Multi ✅| - | 2.7 -| On-Policy Policy Gradient | A2C ✅, PPO ✅ | - | 1.0 +| On-Policy Policy Gradient | A2C ✅ PPO ✅ | - | 1.0 | Off-Policy Policy Gradient| DDPG ✅ | - | 2.7 | Auto-Scaling | ✅ | - | 1.3 | Distributed Simulation | ✅ | 2 to 16 GPUs node| 1.4 @@ -22,8 +25,6 @@ and runs simulations across multiple agents and multiple environment replicas in | Environment Backend | CUDA C ✅ Numba ✅ | - | 2.0 | Training Backend | Pytorch ✅ | - | 1.0 -Together, these allow the user to run thousands or even millions of concurrent simulations and train -on extremely large batches of experience, achieving at least 100x throughput over CPU-based counterparts. ## Environments 1. We include several default multi-agent environments From 162cbc18845dd92919dfade3d9bebed2854bef38 Mon Sep 17 00:00:00 2001 From: Tian Lan <31748898+Emerald01@users.noreply.github.com> Date: Sun, 18 Feb 2024 13:15:20 -0800 Subject: [PATCH 4/4] Update README.md --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 411a1d5..5817d43 100644 --- a/README.md +++ b/README.md @@ -4,11 +4,11 @@ WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement framework that implements end-to-end multi-agent RL on a single or multiple GPUs (Graphics Processing Unit). Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude -faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, -and runs simulations across multiple agents and multiple environment replicas in parallel. - +faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel. Together, these allow the user to run thousands or even millions of concurrent simulations and train -on extremely large batches of experience, achieving at least 100x throughput over CPU-based counterparts. +on extremely large batches of experience, achieving at least 100x throughput over CPU-based counterparts. + +The table below provides a visual overview of Warpdrive's key features and scalability over various dimensions. | | Support | Concurrency | Version :--- | :---: | :---: | :---: @@ -20,7 +20,7 @@ on extremely large batches of experience, achieving at least 100x throughput ove | On-Policy Policy Gradient | A2C ✅ PPO ✅ | - | 1.0 | Off-Policy Policy Gradient| DDPG ✅ | - | 2.7 | Auto-Scaling | ✅ | - | 1.3 -| Distributed Simulation | ✅ | 2 to 16 GPUs node| 1.4 +| Distributed Simulation | 1 GPU ✅ 2-16 GPU node ✅ | - | 1.4 | Environment Backend | CUDA C ✅ | - | 1.0 | Environment Backend | CUDA C ✅ Numba ✅ | - | 2.0 | Training Backend | Pytorch ✅ | - | 1.0