Copyright (c) 2021, salesforce.com, inc. \
All rights reserved. \
SPDX-License-Identifier: BSD-3-Clause \
For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause

**Try this notebook on [Colab](http://colab.research.google.com/github/salesforce/warp-drive/blob/master/tutorials/tutorial-3-warp_drive_reset_and_log.ipynb)!**

# ⚠️ PLEASE NOTE:
This notebook runs on a GPU runtime.\
If running on Colab, choose Runtime > Change runtime type from the menu, then select `GPU` in the 'Hardware accelerator' dropdown menu.

In [None]:
import torch

assert torch.cuda.device_count() > 0, "This notebook needs a GPU to run!"

# Welcome to WarpDrive!

This is our third (and an advanced) tutorial about WarpDrive, a framework for extremely parallelized multi-agent reinforcement learning (RL) on a single GPU. If you haven't yet, please also checkout our previous tutorials

- [WarpDrive basics](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1-warp_drive_basics.ipynb)
- [WarpDrive sampler](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2-warp_drive_sampler.ipynb)

In this tutorial, we describe **CUDAEnvironmentReset** and **CUDALogController**. 

- CUDAEnvironmentReset works exclusively on the GPU to reset the environment in-place. 
- CUDALogController works exclusively in the GPU device to log the episode history. 

They both play important roles in the WarpDrive framework.

# Dependencies

You can install the warp_drive package using

- the pip package manager, OR
- by cloning the warp_drive package and installing the requirements.

On Colab, we will do the latter.

In [None]:
import sys

IN_COLAB = "google.colab" in sys.modules

if IN_COLAB:
    ! git clone https://github.com/salesforce/warp-drive.git
    % cd warp-drive
    ! pip install -e .
else:
    ! pip install -U rl_warp_drive

In [None]:
import numpy as np
from warp_drive.managers.data_manager import CUDADataManager
from warp_drive.managers.function_manager import (
    CUDAFunctionManager,
    CUDALogController,
    CUDAEnvironmentReset,
)
from warp_drive.utils.constants import Constants
from warp_drive.utils.data_feed import DataFeed
from warp_drive.utils.common import get_project_root

_MAIN_FILEPATH = f"{get_project_root()}/warp_drive/cuda_includes"
_CUBIN_FILEPATH = f"{get_project_root()}/warp_drive/cuda_bin"
_ACTIONS = Constants.ACTIONS

In [None]:
# Set logger level e.g., DEBUG, INFO, WARNING, ERROR
import logging

logging.getLogger().setLevel(logging.INFO)

# CUDAEnvironmentReset and CUDALogController

Assuming you have developed a CUDA environment `step` function, here we show how WarpDrive can help to facilitate the environment rollout by resetting and logging the environment on the GPU. If you do not have "test_build.cubin" built, you can refer to the previous tutorial [WarpDrive sampler](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2-warp_drive_sampler.ipynb) about how to automatically build it. 

In [None]:
cuda_data_manager = CUDADataManager(num_agents=5, num_envs=2, episode_length=2)
cuda_function_manager = CUDAFunctionManager(
    num_agents=cuda_data_manager.meta_info("n_agents"),
    num_envs=cuda_data_manager.meta_info("n_envs"),
)

In [None]:
main_example_file = f"{_MAIN_FILEPATH}/test_build.cu"
bin_example_file = f"{_CUBIN_FILEPATH}/test_build.fatbin"

cuda_function_manager._compile(main_file=main_example_file, 
                               cubin_file=bin_example_file)

In [None]:
cuda_function_manager.load_cuda_from_binary_file(bin_example_file)
cuda_env_resetter = CUDAEnvironmentReset(function_manager=cuda_function_manager)
cuda_env_logger = CUDALogController(function_manager=cuda_function_manager)

## Step Function

We have an example step function already checked in and compiled inside `test_build.cubin`. 

The source code of this dummy step function can be found [here](https://www.github.com/salesforce/warp-drive/blob/master/example_envs/dummy_env/test_step.cu). For each step, array `x` will be divided by `multiplier` while array `y` will be multiplied by the same `multiplier`:

```
x[index] = x[index] / multiplier;
y[index] = y[index] * multiplier;
```

Now we just need to initialize it with CUDAFunctionManager and wrap up it with a Python/CUDA step callable. In `dummy_env` this function is called `cuda_dummy_step()`. 

Notice that we provide the **EnvWrapper** to wrap up most of processes below automatically. However, the unique Python/CUDA step callable you developed needs to be defined inside your environment so **EnvWrapper** can find and wrap it up. 

For concrete examples on how to define more complex `step` functions, you can refer to [example1](https://www.github.com/salesforce/warp-drive/blob/master/example_envs/tag_gridworld/tag_gridworld_step.cu) and [example2](https://www.github.com/salesforce/warp-drive/blob/master/example_envs/tag_continous/tag_continuous_step.cu).

In [None]:
cuda_function_manager.initialize_functions(["testkernel"])


def cuda_dummy_step(
    function_manager: CUDAFunctionManager,
    data_manager: CUDADataManager,
    env_resetter: CUDAEnvironmentReset,
    target: int,
    step: int,
):

    env_resetter.reset_when_done(data_manager)

    step = np.int32(step)
    target = np.int32(target)
    test_step = function_manager.get_function("testkernel")
    test_step(
        data_manager.device_data("X"),
        data_manager.device_data("Y"),
        data_manager.device_data("_done_"),
        data_manager.device_data(f"{_ACTIONS}"),
        data_manager.device_data("multiplier"),
        target,
        step,
        data_manager.meta_info("episode_length"),
        block=function_manager.block,
        grid=function_manager.grid,
    )

## Reset and Log Function

In the `step` function above, besides the step function managed by CUDAFunctionManager, you can see the function called `CUDAEnvironmentReset.reset_when_done()`. This function will reset the corresponding env to its initial state when the `done` flag becomes true on the GPU. This reset only resets the env that is done. 

To make it work properly, you need to specify which data (usually the feature arrays and observations) can be reset. 

This is where the flag **save_copy_and_apply_at_reset** comes into play. If the data has `save_copy_and_apply_at_reset` set to True, a dedicated copy will be maintained in the device for resetting. 

On the other hand, **log_data_across_episode** will create a buffer on the GPU for logs. This lets you record a complete episode. 

These two functions can be independently used!

In [None]:
data = DataFeed()
data.add_data(
    name="X",
    data=[[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 1.0]],
    save_copy_and_apply_at_reset=True,
    log_data_across_episode=True,
)

data.add_data(
    name="Y",
    data=np.array([[6, 7, 8, 9, 10], [1, 2, 3, 4, 5]]),
    save_copy_and_apply_at_reset=True,
    log_data_across_episode=True,
)
data.add_data(name="multiplier", data=2.0)

tensor = DataFeed()
tensor.add_data(
    name=f"{_ACTIONS}",
    data=[
        [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]],
        [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]],
    ],
)

cuda_data_manager.push_data_to_device(data)
cuda_data_manager.push_data_to_device(tensor, torch_accessible=True)

assert cuda_data_manager.is_data_on_device("X")
assert cuda_data_manager.is_data_on_device("Y")
assert cuda_data_manager.is_data_on_device_via_torch(f"{_ACTIONS}")

Now, we run an complete set of parallel episodes and inspect the log for the first environment.

## Test Run

In [None]:
# t = 0 is reserved for the initial state.
cuda_env_logger.reset_log(data_manager=cuda_data_manager, env_id=0)

for t in range(1, cuda_data_manager.meta_info("episode_length") + 1):
    cuda_dummy_step(
        function_manager=cuda_function_manager,
        data_manager=cuda_data_manager,
        env_resetter=cuda_env_resetter,
        target=100,
        step=t,
    )
    cuda_env_logger.update_log(data_manager=cuda_data_manager, step=t)

dense_log = cuda_env_logger.fetch_log(data_manager=cuda_data_manager, names=["X", "Y"])

# Test after two steps that the log buffers for X and Y log are updating.
X_update = dense_log["X_for_log"]
Y_update = dense_log["Y_for_log"]

assert abs(X_update[1].mean() - 0.15) < 1e-5
assert abs(X_update[2].mean() - 0.075) < 1e-5
assert Y_update[1].mean() == 16
assert Y_update[2].mean() == 32

# Right now, the reset functions have not been activated.
# The done flags should be all True now.

done = cuda_data_manager.pull_data_from_device("_done_")
print(f"The done array = {done}")

For this demo, we can explicitly reset the environment to see how it works. The `dummy_step` function will do this in the next step by itself as well. After resetting, you can see that all the done flags go back to False and the `X` and `Y` arrays get reset successfully as well.

In [None]:
cuda_env_resetter.reset_when_done(data_manager=cuda_data_manager)

done = cuda_data_manager.pull_data_from_device("_done_")
assert done[0] == 0
assert done[1] == 0

X_after_reset = cuda_data_manager.pull_data_from_device("X")
Y_after_reset = cuda_data_manager.pull_data_from_device("Y")
# the 0th dim is env
assert abs(X_after_reset[0].mean() - 0.3) < 1e-5
assert abs(X_after_reset[1].mean() - 0.8) < 1e-5
assert Y_after_reset[0].mean() == 8
assert Y_after_reset[1].mean() == 3

# Learn More and Explore our Tutorials!

Now that you have familiarized yourself with WarpDrive, we suggest you take a look at our tutorials on [creating custom environments](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-4-create_custom_environments.md) and on how to use WarpDrive to perform end-to-end multi-agent reinforcement learning [training](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-5-training_with_warp_drive.ipynb)!

For your reference, all our tutorials are here:
1. [WarpDrive basics](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1-warp_drive_basics.ipynb)
2. [WarpDrive sampler](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2-warp_drive_sampler.ipynb)
3. [WarpDrive reset and log](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-3-warp_drive_reset_and_log.ipynb)
4. [Creating custom environments](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-4-create_custom_environments.md)
5. [Training with WarpDrive](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-5-training_with_warp_drive.ipynb)
6. [Scaling Up training with WarpDrive](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-6-scaling_up_training_with_warp_drive.md)
7. [Training with WarpDrive + Pytorch Lightning](https://github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-7-training_with_warp_drive_and_pytorch_lightning.ipynb)