# Using the saved agent

<img src="images/restore/restore.png" width="500"></img>

## Step 1: Restoring the agent from the checkpoint

1. **Algorithm trainer class**: Find it in the algorithm implementation (linked in the [`rllib` algorithms page](https://docs.ray.io/en/master/rllib-algorithms.html))
2. Import the trainer class.
3. Create an empty agent by initializing the trainer class. **Use the same configuration as the experiment**.
4. Restore the agent from the checkpoint.

In [1]:
import ray

ray.init()

{'node_ip_address': '192.168.0.90',
 'raylet_ip_address': '192.168.0.90',
 'redis_address': '192.168.0.90:6379',
 'object_store_address': '/tmp/ray/session_2022-01-04_13-12-43_027975_3647/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2022-01-04_13-12-43_027975_3647/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2022-01-04_13-12-43_027975_3647',
 'metrics_export_port': 59211,
 'node_id': '2f158cedc42e83d5aeb34f494af1533cf982fde785a7349fc0a2b2d0'}

In [2]:
from ray.rllib.agents.ppo.ppo import PPOTrainer

agent = PPOTrainer(config={"env": "CartPole-v1",
                           "evaluation_interval": 2,
                           "evaluation_num_episodes": 20
                           }
                   )
agent.restore("../18_saving_the_trained_agent/cartpole_v1/PPO/PPO_CartPole-v1_788b8_00000_0_2022-01-03_16-26-08/checkpoint_000016/checkpoint-16")

2022-01-04 13:12:48,373	INFO trainer.py:722 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also want to then set `eager_tracing=True` in order to reach similar execution speed as with static-graph mode.
2022-01-04 13:12:48,374	INFO ppo.py:166 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
2022-01-04 13:12:48,375	INFO trainer.py:743 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
2022-01-04 13:12:52,231	INFO trainable.py:467 -- Restored on 192.168.0.90 from checkpoint: ../18_saving_the_trained_agent/cartpole_v1/PPO/PPO_CartPole-v1_788b8_00000_0_2022-01-03_16-26-08/checkpoint_000016/checkpoint-16
2022-01-04 13:12:52,233	INFO trainable.py:475 -- Current state after restoring: {'_iteration': 16, '_timesteps_total': 0, 

## Step 2: Use the agent

- Compute the action (according to the **trained policy**) using the `agent.compute_action()` method

In [3]:
import gym

env = gym.make("CartPole-v1")
obs = env.reset()
while True:
    action = agent.compute_action(obs)
    obs, reward, done, _ = env.step(action)
    env.render()
    if done:
        break
env.close()

Failed to establish dbus connection

## Making videos of the agent in action

- Wrap the `env` in the `gym.wrappers.RecordVideo` class.
    - Supply the directory to write the video

In [4]:
from gym.wrappers import RecordVideo

env = RecordVideo(gym.make("CartPole-v1"), "ppo_video")
obs = env.reset()
while True:
    action = agent.compute_action(obs)
    obs, reward, done, _ = env.step(action)
    if done:
        break
env.close()