# Solver tutorial

In this notebook, you will learn how to replace the default network solver with the solver of your choice following below 4 steps.

(0. Preparation of this notebook)
1. Setting up the training environment 
2. Create a SolverBuilder
3. Setup the DDPG algorithm
4. Run the training

## Preparation

Let's start by first installing nnabla-rl and importing required packages for training.

In [None]:
!pip install nnabla-rl

In [None]:
import gym
import nnabla as nn
from nnabla import functions as NF
from nnabla import parametric_functions as NPF
from nnabla import solvers as NS

import nnabla_rl
import nnabla_rl.algorithms as A
import nnabla_rl.writers as W
import nnabla_rl.functions as RF
from nnabla_rl.builders import SolverBuilder
from nnabla_rl.environments.environment_info import EnvironmentInfo
from nnabla_rl.models.q_function import QFunction
from nnabla_rl.environments.wrappers import NumpyFloat32Env, ScreenRenderEnv
from nnabla_rl.utils.evaluator import EpisodicEvaluator
from nnabla_rl.utils.reproductions import set_global_seed

In [None]:
!bash package_install.sh

In [None]:
%run ./colab_utils.py

In [None]:
nn.clear_parameters()
nnabla_rl.run_on_gpu(0)

## Setting up the training environment

Set up the "Pendulum" environment provided by the OpenAI Gym.

In [None]:
def build_env(env_name):
    env = gym.make(env_name)
    env = NumpyFloat32Env(env)
    env = ScreenRenderEnv(env)  # for rendering screen
    env.seed(0)
    return env

In [None]:
env_name = "Pendulum-v0"
env = build_env(env_name)
set_global_seed(0)

## Create a SolverBuilder

To replace the default solver, you'll need to create a SolverBuilder.  
We will replace the default Adam solver (the default solver of DDPG algorithm) with RMSprop in this example.

In [None]:
class MySolverBuilder(SolverBuilder):
    def build_solver(self,  # type: ignore[override]
                     env_info: EnvironmentInfo,
                     algorithm_config: A.DDPGConfig,
                     **kwargs) -> nn.solver.Solver:
        return NS.RMSprop(lr=algorithm_config.learning_rate)  # configuration depends on the algorithm to use

## Setup the DDPG algorithm

We are almost ready to start the training. Finally, let's set up the DDPG algorithm.  
Here, we provide the SolverBuilder that we just implemented to replace the default solver. 

In [None]:
config = A.DDPGConfig(start_timesteps=200)

In [None]:
ddpg = A.DDPG(
    env_or_env_info=env,
    config=config,
    critic_solver_builder=MySolverBuilder(),
    actor_solver_builder=MySolverBuilder()
)

## Preparation of Hook (optional)

We append RenderHook to visually check the training status. This step is optional.

In [None]:
render_hook = RenderHook(env=env)

In [None]:
ddpg.set_hooks([render_hook])

## Run the training

The training takes time (10-20 min).  
After 10-20 min, you will see the agent swinging up the pendulum.

In [None]:
ddpg.train(env, total_iterations=50000)