## Swinging up a pendulum using DDPG
Let's learn how to implement the DDPG for the swinging up pendulum task using stable
baselines. First, let's import the necessary libraries

In [None]:
import gym
import numpy as np

from stable_baselines.ddpg.policies import MlpPolicy
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise, AdaptiveParamNoiseSpec
from stable_baselines import DDPG

Create the pendlum environment using gym:

In [None]:
env = gym.make('Pendulum-v0')

Get the number of actions:

In [3]:
n_actions = env.action_space.shape[-1]

We know that in DDPG, instead of selecting the action directly, we add some noise using the Ornstein-Uhlenbeck process to ensure exploration. So, we create the action noise as:

In [4]:
action_noise = OrnsteinUhlenbeckActionNoise(mean=np.zeros(n_actions), sigma=float(0.5) * np.ones(n_actions))

Instantiate the agent:

In [None]:
agent = DDPG(MlpPolicy, env, verbose=1, param_noise=None, action_noise=action_noise)

Now, we can train the agent as usual:

In [6]:
agent.learn(total_timesteps=25000)

<stable_baselines.ddpg.ddpg.DDPG at 0x7f4f1801f7f0>

After training, we can evaluate our agent by looking at the mean rewards:

In [7]:
mean_reward, n_steps = evaluate_policy(agent, agent.get_env(),
n_eval_episodes=10)

We can also have a look at how our trained agent performs in the environment:


In [None]:
state = env.reset()
while True:
    action, _states = agent.predict(state)
    next_state, reward, done, info = env.step(action)
    state = next_state
    env.render()

After training the agent, we can also look at how our trained agent swings up the
pendulum by rendering the environment. Can we also look at the computational graph of
DDPG? Yes! In the next section, we will learn how to do that.

# Viewing the computational graph in TensorBoard

With stables baselines, it is easier to view the computational graph of our model in
TensorBoard. In order to that, we just need to pass the directory where we need to store our
log files while instantiating the agent as shown below:

In [None]:
agent = DDPG(MlpPolicy, env, verbose=1, param_noise=None,action_noise=action_noise, tensorboard_log="logs")

Then, we can train the agent:

In [9]:
agent.learn(total_timesteps=25000)




<stable_baselines.ddpg.ddpg.DDPG at 0x7f4f10154898>

After training, open the terminal and type the following command to run the TensorBoard:


`tensorboard --logdir logs`