# Algorithm usage tutorial

During the development of a reinforcement learning application, we often train the agent using different reinforcement learning algorithm and see its performance difference.
In this notebook, you will learn how to train an agent in the same environment using different algorithms.  
3 steps.

(0. Preparation of this notebook)
1. Setting up the training environment 
2. Setup the DDPG algorithm and train
3. Change the training algorithm to SAC and train

## Preparation

Let's start by first installing nnabla-rl and importing required packages for training.

In [None]:
!pip install nnabla-rl

In [None]:
import gym
import nnabla as nn
from nnabla import functions as NF
from nnabla import parametric_functions as NPF
from nnabla import solvers as NS

import nnabla_rl
import nnabla_rl.algorithms as A
import nnabla_rl.writers as W
import nnabla_rl.functions as RF
from nnabla_rl.builders import SolverBuilder
from nnabla_rl.environments.environment_info import EnvironmentInfo
from nnabla_rl.models.q_function import QFunction
from nnabla_rl.environments.wrappers import NumpyFloat32Env, ScreenRenderEnv
from nnabla_rl.utils.evaluator import EpisodicEvaluator
from nnabla_rl.utils.reproductions import set_global_seed

In [None]:
!git clone https://github.com/sony/nnabla-rl.git
!bash nnabla-rl/interactive-demos/package_install.sh
%run nnabla-rl/interactive-demos/colab_utils.py

In [None]:
nn.clear_parameters()

## Setting up the training environment

Set up the "Pendulum" environment provided by the OpenAI Gym.

In [None]:
def build_env(env_name):
    env = gym.make(env_name)
    env = NumpyFloat32Env(env)
    env = ScreenRenderEnv(env)  # for rendering environment
    env.seed(0) # optional
    return env

In [None]:
env_name = "Pendulum-v0"
env = build_env(env_name)
set_global_seed(0) # optional

## Preparation of Hook (optional)

This hook may slow down the training.

In [None]:
render_hook = RenderHook(env=env)

## Setup the DDPG algorithm and train

We are almost ready to start the training. Let's first try the DDPG algorithm to train the agent.

In [None]:
config = A.DDPGConfig(gpu_id=0, start_timesteps=200)

In [None]:
ddpg = A.DDPG(
    env_or_env_info=env,
    config=config
)
ddpg.set_hooks([render_hook])

In [None]:
ddpg.train(env, total_iterations=50000)

Wait for a while and see that the pendulum swang up.

## Change the training algorithm to SAC and train

Next, let's try training the agent with another reinforcement learning algorithm SAC.  
You will find that changing the algorithm is very easy.

In [None]:
env.reset()
nn.clear_parameters()
render_hook.reset()

In [None]:
config = A.SACConfig(gpu_id=gpu_id, start_timesteps=500)

In [None]:
sac = A.SAC(
    env_or_env_info=env,
    config=config
)
sac.set_hooks([render_hook])

In [None]:
sac.train(env, total_iterations=50000)

Changing the training algorithm is easy, right?

## Note

To train an agent using different algorithms, you'll need to check the action type required by the environment.  
Required action type must be supported by the algorithm that you want to use.  
In this example, we used the "Pendulum" environment which works with continuous action outputs and both DDPG and SAC supports continuous action environment.

See the [Algorithm catalog](https://github.com/sony/nnabla-rl/blob/master/nnabla_rl/algorithms/README.md) for the action type supported by the algorithm.