Copyright (c) 2021, salesforce.com, inc. \
All rights reserved. \
SPDX-License-Identifier: BSD-3-Clause \
For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause

**Try this notebook on [Colab](http://colab.research.google.com/github/salesforce/warp-drive/blob/master/tutorials/tutorial-2.a-warp_drive_sampler.ipynb)!**

# ⚠️ PLEASE NOTE:
This notebook runs on a GPU runtime.\
If running on Colab, choose Runtime > Change runtime type from the menu, then select `GPU` in the 'Hardware accelerator' dropdown menu.

In [None]:
import torch

assert torch.cuda.device_count() > 0, "This notebook needs a GPU to run!"

# Welcome to WarpDrive!

This is the second tutorial on WarpDrive, a PyCUDA-based framework for extremely parallelized multi-agent reinforcement learning (RL) on a single graphics processing unit (GPU). At this stage, we assume you have read our [first tutorial](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1-warp_drive_basics.ipynb) on WarpDrive basics.

In this tutorial, we describe **CUDASampler**, a lightweight and fast action sampler based on the policy distribution across several RL agents and environment replicas. `CUDASampler` utilizes the GPU to parallelize operations to efficiently sample a large number of actions in parallel. 

Notably:

1. It reads the distribution on the GPU through Pytorch and samples actions exclusively at the GPU. There is no data transfer. 
2. It maximizes parallelism down to the individual thread level, i.e., each agent at each environment has its own random seed and independent random sampling process. 
3. It runs much faster than most GPU samplers. For example, it is significantly faster than Pytorch.

# Dependencies

You can install the warp_drive package using

- the pip package manager, OR
- by cloning the warp_drive package and installing the requirements.

We will install the latest version of WarpDrive using the pip package manager.

In [None]:
pip install -U rl_warp_drive

In [None]:
import numpy as np
from warp_drive.managers.pycuda_managers.pycuda_function_manager import PyCUDAFunctionManager, PyCUDASampler
from warp_drive.managers.pycuda_managers.pycuda_data_manager import PyCUDADataManager
from warp_drive.utils.constants import Constants
from warp_drive.utils.data_feed import DataFeed
from warp_drive.utils.common import get_project_root

_MAIN_FILEPATH = f"{get_project_root()}/warp_drive/cuda_includes"
_CUBIN_FILEPATH = f"{get_project_root()}/warp_drive/cuda_bin"
_ACTIONS = Constants.ACTIONS

In [None]:
# Set logger level e.g., DEBUG, INFO, WARNING, ERROR
import logging

logging.getLogger().setLevel(logging.INFO)

# Initialize PyCUDASampler

We first initialize the **PyCUDADataManager** and **PyCUDAFunctionManager**. To illustrate the sampler, we first load a pre-compiled binary file called "test_build.cubin". Note that these low-level managers and modules will be hidden and called automatically by WarpDrive in any end-to-end training and simulation. In this and the next tutorials, we want to show how a few fundamental modules work and their performance, that is why some low-level APIs are called.

In [None]:
cuda_data_manager = PyCUDADataManager(num_agents=5, episode_length=10, num_envs=2)
cuda_function_manager = PyCUDAFunctionManager(
    num_agents=cuda_data_manager.meta_info("n_agents"),
    num_envs=cuda_data_manager.meta_info("n_envs"),
)

main_example_file = f"{_MAIN_FILEPATH}/test_build.cu"
bin_example_file = f"{_CUBIN_FILEPATH}/test_build.fatbin"

This binary is compiled with inclusion of auxiliary files in `warp_drive/cuda_includes/core` which includes several CUDA core services provided by WarpDrive. These include the backend source code for `CUDASampleController`. 

To make "test_build.fatbin" available, we compiled this test cubin by calling `_compile()` from `CUDAFunctionManager`.
For this notebook demonstration, in the bin folder, we have already provided a pre-compiled binary but we suggest that you still execute the cell below to re-compile it to avoid possilble binary incompatible issues across different platforms. (`_compile()` is a low-level API, user will not need call those internal APIs directly for any WarpDrive end-to-end simulation and training.)

In [None]:
cuda_function_manager._compile(main_file=main_example_file, 
                               cubin_file=bin_example_file)

Finally, we initialize **PyCUDASampler** and assign the random seed. `PyCUDASampler` keeps independent randomness across all threads and blocks. Notice that `PyCUDASampler` requires `PyCUDAFunctionManager` because `PyCUDAFunctionManager` manages all the CUDA function pointers including to the sampler. Also notice this test binary uses 2 environment replicas and 5 agents. 

In [None]:
cuda_function_manager.load_cuda_from_binary_file(
    bin_example_file, default_functions_included=True
)
cuda_sampler = PyCUDASampler(function_manager=cuda_function_manager)
cuda_sampler.init_random(seed=None)

# Sampling

## Actions Placeholder

Now, we feed the **actions_a** placeholder into the GPU. It has the shape `(n_envs=2, n_agents=5)` as expected. Also we make it accessible by Pytorch, because during RL training, actions will be fed into the Pytorch trainer directly.

In [None]:
data_feed = DataFeed()
data_feed.add_data(name=f"{_ACTIONS}_a", data=[[[0], [0], [0], [0], [0]], [[0], [0], [0], [0], [0]]])
cuda_data_manager.push_data_to_device(data_feed, torch_accessible=True)
assert cuda_data_manager.is_data_on_device_via_torch(f"{_ACTIONS}_a")

## Action Sampled Distribution

We define an action **distribution** here. During training, this distribution would be provided by the policy model implemented in Pytorch. The distribution has the shape `(n_envs, n_agents, **n_actions**)`. The last dimension `n_actions` defines the size of the action space for a particular *discrete* action. For example, if we have up, down, left, right and no-ops, `n_actions=5`.

**n_actions** needs to be registered by the sampler so the sampler is able to pre-allocate a global memory space in GPU to speed up action sampling. This can be done by calling `sampler.register_actions()`.

In this tutorial, we check if our sampled action distribution follows the given distribution. For example, the distribution [0.333, 0.333, 0.333] below suggests the 1st agent has 3 possible actions and each of them have equal probability.

In [None]:
cuda_sampler.register_actions(
    cuda_data_manager, action_name=f"{_ACTIONS}_a", num_actions=3
)

distribution = np.array(
    [
        [
            [0.333, 0.333, 0.333],
            [0.2, 0.5, 0.3],
            [0.95, 0.02, 0.03],
            [0.02, 0.95, 0.03],
            [0.02, 0.03, 0.95],
        ],
        [
            [0.1, 0.7, 0.2],
            [0.7, 0.2, 0.1],
            [0.5, 0.5, 0.0],
            [0.0, 0.5, 0.5],
            [0.5, 0.0, 0.5],
        ],
    ]
)
distribution = torch.from_numpy(distribution).float().cuda()

In [None]:
# Run 10000 times to collect statistics
actions_batch = torch.from_numpy(np.empty((10000, 2, 5), dtype=np.int32)).cuda()

for i in range(10000):
    cuda_sampler.sample(cuda_data_manager, distribution, action_name=f"{_ACTIONS}_a")
    actions_batch[i] = cuda_data_manager.data_on_device_via_torch(f"{_ACTIONS}_a")[:, :, 0]
actions_batch_host = actions_batch.cpu().numpy()

In [None]:
actions_env_0 = actions_batch_host[:, 0]
actions_env_1 = actions_batch_host[:, 1]

In [None]:
print(
    "Sampled actions distribution versus the given distribution (in bracket) for env 0: \n"
)
for agent_id in range(5):
    print(
        f"Sampled action distribution for agent_id: {agent_id}:\n"
        f"{(actions_env_0[:, agent_id] == 0).sum() / 10000.0}({distribution[0, agent_id, 0]}), \n"
        f"{(actions_env_0[:, agent_id] == 1).sum() / 10000.0}({distribution[0, agent_id, 1]}), \n"
        f"{(actions_env_0[:, agent_id] == 2).sum() / 10000.0}({distribution[0, agent_id, 2]})  \n"
    )

In [None]:
print(
    "Sampled actions distribution versus the given distribution (in bracket) for env 1: "
)

for agent_id in range(5):
    print(
        f"Sampled action distribution for agent_id: {agent_id}:\n"
        f"{(actions_env_1[:, agent_id] == 0).sum() / 10000.0}({distribution[1, agent_id, 0]}), \n"
        f"{(actions_env_1[:, agent_id] == 1).sum() / 10000.0}({distribution[1, agent_id, 1]}), \n"
        f"{(actions_env_1[:, agent_id] == 2).sum() / 10000.0}({distribution[1, agent_id, 2]})  \n"
    )

## Action Randomness Across Threads

Another important validation is whether the sampler provides independent randomness across different agents and environment replicas. Given the same policy model for all the agents and environment replicas, we can check if the sampled actions are independently distributed. 

Here, we assign all agents across all envs the same distribution [0.25, 0.25, 0.25, 0.25]. It is equivalent to an uniform action distribution among all actions [0,1,2,3], across 5 agents and 2 envs. Then we check the standard deviation across the agents.

In [None]:
data_feed = DataFeed()
data_feed.add_data(name=f"{_ACTIONS}_b", data=[[[0], [0], [0], [0], [0]], [[0], [0], [0], [0], [0]]])
cuda_data_manager.push_data_to_device(data_feed, torch_accessible=True)
assert cuda_data_manager.is_data_on_device_via_torch(f"{_ACTIONS}_b")

In [None]:
cuda_sampler.register_actions(
    cuda_data_manager, action_name=f"{_ACTIONS}_b", num_actions=4
)

In [None]:
distribution = np.array(
    [
        [
            [0.25, 0.25, 0.25, 0.25],
            [0.25, 0.25, 0.25, 0.25],
            [0.25, 0.25, 0.25, 0.25],
            [0.25, 0.25, 0.25, 0.25],
            [0.25, 0.25, 0.25, 0.25],
        ],
        [
            [0.25, 0.25, 0.25, 0.25],
            [0.25, 0.25, 0.25, 0.25],
            [0.25, 0.25, 0.25, 0.25],
            [0.25, 0.25, 0.25, 0.25],
            [0.25, 0.25, 0.25, 0.25],
        ],
    ]
)
distribution = torch.from_numpy(distribution).float().cuda()

In [None]:
# Run 10000 times to collect statistics.
actions_batch = torch.from_numpy(np.empty((10000, 2, 5), dtype=np.int32)).cuda()

for i in range(10000):
    cuda_sampler.sample(cuda_data_manager, distribution, action_name=f"{_ACTIONS}_b")
    actions_batch[i] = cuda_data_manager.data_on_device_via_torch(f"{_ACTIONS}_b")[:, :, 0]
actions_batch_host = actions_batch.cpu().numpy()

In [None]:
actions_batch_host

In [None]:
actions_batch_host.std(axis=2).mean(axis=0)

To check the independence of randomness among all threads, we can compare it with a Numpy implementation. Here we use `numpy.choice(4, 5)` to repeat the same process for an uniform action distribution among all actions [0,1,2,3], 5 agents and 2 envs. We should see that the variation of Numpy output is very close to our sampler.

In [None]:
actions_batch_numpy = np.empty((10000, 2, 5), dtype=np.int32)
for i in range(10000):
    actions_batch_numpy[i, 0, :] = np.random.choice(4, 5)
    actions_batch_numpy[i, 1, :] = np.random.choice(4, 5)
actions_batch_numpy.std(axis=2).mean(axis=0)

## Running Speed

The total time for sampling includes receiving a new distribution and using this to sample.
Comparing our sampler with [torch.Categorical sampler](https://pytorch.org/docs/stable/distributions.html), 
we reach **7-8X** speed up for the distribution above. 

*Note: our sampler runs in parallel across threads, so this speed-up is almost constant when scaling up the number of agents or environment replicas, i.e., increasing the number of used threads.*

In [None]:
from torch.distributions import Categorical

In [None]:
distribution = np.array(
    [
        [
            [0.333, 0.333, 0.333],
            [0.2, 0.5, 0.3],
            [0.95, 0.02, 0.03],
            [0.02, 0.95, 0.03],
            [0.02, 0.03, 0.95],
        ],
        [
            [0.1, 0.7, 0.2],
            [0.7, 0.2, 0.1],
            [0.5, 0.5, 0.0],
            [0.0, 0.5, 0.5],
            [0.5, 0.0, 0.5],
        ],
    ]
)
distribution = torch.from_numpy(distribution).float().cuda()

In [None]:
start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

start_event.record()
for _ in range(1000):
    cuda_sampler.sample(cuda_data_manager, distribution, action_name=f"{_ACTIONS}_a")
end_event.record()
torch.cuda.synchronize()
print(f"time elapsed: {start_event.elapsed_time(end_event)} ms")

In [None]:
start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

start_event.record()
for _ in range(1000):
    Categorical(distribution).sample()
end_event.record()
torch.cuda.synchronize()
print(f"time elapsed: {start_event.elapsed_time(end_event)} ms")

# Learn More and Explore our Tutorials!

Next, we suggest you check out our advanced [tutorial](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-3-warp_drive_reset_and_log.ipynb) on WarpDrive's reset and log controller!

For your reference, all our tutorials are here:
1. [WarpDrive basics(intro and pycuda)](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1.a-warp_drive_basics.ipynb)
2. [WarpDrive basics(numba)](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1.b-warp_drive_basics.ipynb)
3. [WarpDrive sampler(pycuda)](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2.a-warp_drive_sampler.ipynb)
4. [WarpDrive sampler(numba)](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2.b-warp_drive_sampler.ipynb)
5. [WarpDrive resetter and logger](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-3-warp_drive_reset_and_log.ipynb)
6. [Create custom environments (pycuda)](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-4.a-create_custom_environments_pycuda.md)
7. [Create custom environments (numba)](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-4.b-create_custom_environments_numba.md)
8. [Training with WarpDrive](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-5-training_with_warp_drive.ipynb)
9. [Scaling Up training with WarpDrive](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-6-scaling_up_training_with_warp_drive.md)
10. [Training with WarpDrive + Pytorch Lightning](https://github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-7-training_with_warp_drive_and_pytorch_lightning.ipynb)