Copyright (c) 2022, salesforce.com, inc and MILA.  
All rights reserved.  
SPDX-License-Identifier: BSD-3-Clause  
For full license text, see the LICENSE file in the repo root  
or https://opensource.org/licenses/BSD-3-Clause  

# Install required packages

In [None]:
!pip install rl_warp_drive
!pip install rllib
!pip install matplotlib
!pip install -r requirements.txt

In [None]:
import os
import sys

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

sys.path.append(os.getcwd()+"/scripts")
sys.path = [os.getcwd()+"/scripts"] + sys.path

from desired_outputs import desired_outputs

# Train agents with GPU

To train with GPU, you need to make sure that you have an **Nvdia Graphic Card** and be able to install critical packages such as ``warp-drive`` and ``pytorch``. If you don't have an Nvdia Graphic Card, you may refer to the section **Train Agents with CPU** below.

In a word, to install ``warp-drive``, one can run ``pip install rl_warp_drive``. If errors pop out, please check [here](https://github.com/salesforce/warp-drive) for more details.

To install pytorch with support of CUDA, a quick trial would be ``conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch`` if one runs a conda virtual environment. For more details, please refer to [here](https://pytorch.org/get-started/locally/).

If you encounter this error, please try to reduce your ``train_batch_size`` or ``num_envs``.

```
RuntimeError: CUDA out of memory. Tried to allocate
```

In [None]:
from gpu_trainer import trainer

To train the agents without naive negotiation and ensemble results with 100 random intialized enviornments and 1024 batch size. This training process is done by a single GPU.

```python
negotiation_on = 0 # without naive negotiation
num_envs = 100 # ensemble results with 100 random intialized enviornments
train_batch_size = 1024 # train with 1024 batch_size
num_episodes = 30000 # number of episodes
lr = 0.005 # learning rate
model_params_save_freq = 5000 # save model for every 5000 steps
```

In [None]:
gpu_trainer_off, gpu_nego_off_ts = trainer(negotiation_on=0, num_envs=100, train_batch_size=1024, num_episodes=30000, lr=0.0005, model_params_save_freq=5000, desired_outputs=desired_outputs)

To train the agents with naive action and ensemble results with 100 random intialized enviornments and 1024 batch size. This training process is done by a single GPU.

```python
negotiation_on = 1 # with naive negotiation
num_envs = 100 # ensemble results with 100 random intialized enviornments
train_batch_size = 1024 # train with 1024 batch_size
num_episodes = 30000 # number of episodes
lr = 0.005 # learning rate
model_params_save_freq = 5000 # save model for every 5000 steps
```

In [None]:
gpu_trainer_on, gpu_nego_on_ts = trainer(negotiation_on=1, num_envs=100, train_batch_size=1024, num_episodes=30000, lr=0.0005, model_params_save_freq=5000, desired_outputs=desired_outputs)

To customize the training script, please check ``gpu_trainer.py`` for more details.

# Train agents with CPU

To train agents with CPU, if the process is killed, one probably need to reduce ``num_envs`` and ``train_batch_size``. One should also expected longer period to train agents. Besides, please notice that training with negotiation usually need **3x** computational resource than training without negotiation.

In [None]:
from cpu_trainer import trainer

In [None]:
# This is necessary for rllib to get the correct path!
os.chdir(os.getcwd()+"/scripts")

To train the agents with naive actions and ensemble results with 100 random intialized enviornments and 1024 batch size. This training process is done by a single CPU (``num_workers=1``).

```python
negotiation_on = 0 # with naive negotiation
num_envs = 1 # ensemble results with 100 random intialized enviornments
train_batch_size = 1024 # train with 1024 batch_size
num_episodes = 30000 # number of episodes
lr = 0.005 # learning rate
model_params_save_freq = 5000 # save model for every 5000 steps
```

In [None]:
cpu_trainer_off, cpu_nego_off_ts = trainer(negotiation_on=0, num_envs=1, train_batch_size=1024, num_episodes=300, lr=0.0005, model_params_save_freq=5000, desired_outputs=desired_outputs)

To train the agents with naive actions and ensemble results with 100 random intialized enviornments and 1024 batch size. This training process is done by a single CPU (``num_workers=1``).

```python
negotiation_on = 1 # with naive negotiation
num_envs = 1 # ensemble results with 100 random intialized enviornments
train_batch_size = 1024 # train with 1024 batch_size
num_episodes = 30000 # number of episodes
lr = 0.005 # learning rate
model_params_save_freq = 5000 # save model for every 5000 steps
```

In [None]:
cpu_trainer_on, cpu_nego_on_ts = trainer(negotiation_on=1, num_envs=1, train_batch_size=1024, num_episodes=300, lr=0.0005, model_params_save_freq=5000, desired_outputs=desired_outputs)

# Save or load from previous training results

This section is for saving and loading the results (not the trainer) which is based on ``pickle``

In [None]:
from opt_helper import save, load

To save the output timeseries, one can do:
```python
save({"nego_off":nego_off_ts, "nego_on":nego_on_ts}, "filename.pkl")
```

To load the output timeseries, one can do:
```python
dict_ts = load("filename.pkl")
nego_off_ts, nego_on_ts = dict_ts["nego_off"], dict_ts["nego_on"]
```

In [None]:
# [uncomment the below to save]
# save({"nego_off":nego_off_ts, "nego_on":nego_on_ts}, "filename.pkl")

In [None]:
# [uncomment the below to load]
dict_ts = load("filename.pkl")
nego_off_ts, nego_on_ts = dict_ts["nego_off"], dict_ts["nego_on"]

The available data that we can plot

# Plot results

In [None]:
from desired_outputs import desired_outputs

One may want to check the performance of the agents by plotting graphs. Below, we list all the logged variables. One may change the ``desired_outputs.py`` to add more variables of interest.

```python
desired_outputs = ['global_temperature', 'global_carbon_mass', 'capital_all_regions', 'labor_all_regions', 'production_factor_all_regions', 'intensity_all_regions', 'global_exogenous_emissions', 'global_land_emissions', 'timestep', 'activity_timestep', 'capital_depreciation_all_regions', 'savings_all_regions', 'mitigation_rate_all_regions', 'max_export_limit_all_regions', 'mitigation_cost_all_regions', 'damages_all_regions', 'abatement_cost_all_regions', 'utility_all_regions', 'social_welfare_all_regions', 'reward_all_regions', 'consumption_all_regions', 'current_balance_all_regions', 'gross_output_all_regions', 'investment_all_regions', 'production_all_regions', 'tariffs', 'future_tariffs', 'scaled_imports', 'desired_imports', 'tariffed_imports', 'stage', 'minimum_mitigation_rate_all_regions', 'promised_mitigation_rate', 'requested_mitigation_rate', 'proposal_decisions']
```

In [None]:
from opt_helper import plot_result

The plot_result function plots the time series of all the logged variables.

```python
plot_result(variables, nego_off_ts, nego_on_ts, k)
```
``variables`` can be either a list of variable names comes from the above list or a single variable of interest. The ``nego_off_ts`` and ``nego_on_ts`` are the time series loggings for these variables. ``k`` represents the dimension of the interest data, for most of situation, it should be ``0`` by default.

In [None]:
plot_result(desired_outputs, nego_off_ts, nego_on_ts, k=0)

In [None]:
plot_result("global_temperature", nego_off_ts, nego_on_ts, k=0)

# How to quickly evaluate the results

This section to for evaluating the trained agents. One can edit the evaluate function ``eval metrics`` here ``evaluate_submission.py`` if interested in more metrics.

To use the evaluation script, one need to input the trainer, logged_variables and the framework of the trainer.
The first 2 are given by the ``trainer`` function as above. If one train the agents with GPU, then the framework should be ``warpdrive``. If one train the framework using CPU, it should be ``rllib``.

We give one example below.

In [None]:
from evaluate_submission import val_metrics

In [None]:
val_metrics(trainer=gpu_trainer_off, logged_ts=gpu_nego_off_ts, framework="warpdrive")

# How to modify the simulation

## Introduction of environment code

``rice.py``, ``rice_cuda.py``, ``rice_step.cu`` and ``rice_helpers.py`` are responsible for the GPU code. 

Among them, ``rice_helpers.py`` includes all the social-economics-climate dynamics and this files should not be changed.

``rice.py`` includes the patterns of agents interact with the environment, which should be the main script to be modified.

[GPU needed] ``rice_cuda.py`` connects the data between the python script and CUDA code.

[GPU needed] ``rice_step.cu`` includes the CUDA version codes of both the social-economics-climate dynamics and the patterns of agents interact with the environment. To train the agent with GPU, the CUDA code must share the same logic with the python codes. The CUDA code mostly follows the grammar of C++. Please refer to [here](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) for more details.

## How to add extra observations

To add extra observations, one need to add the initiation of the observations in the `rest()` and `generate_observation()` function in `rice.py`

## How to change the logic of taking actions

The baseline logic of taking actions are a naive bargain process including a ``proposal_step()`` for each agent to propose the next step and a ``evaluation_step()`` for each agent to evaluation others proposal and determine  the tariff and international trade volumes. They are fulfilled in the ``step()`` function in the ``rice.py``.

We expect competitors are able to propose a mechanism to form a [dynamic climate club](https://williamnordhaus.com/publications/climate-clubs-overcoming-free-riding-international-climate-policy) so that countries in the club may enjoy more trades and less tariff while those who contribute less on the climate mitigation might suffer more tariff and less trades.

## What is masking?

TBC

## How to modify your masking?

TBC