**To use the Jupyter notebook, ensure that you have installed the following packages (in the pre-defined order):**
1. Python3
2. Matplotlib
3. Numpy
4. Scipy
5. tqdm
6. Mujoco (Make sure Mujoco is installed before installing our Realworld RL suite)
7. The Realworld RL Suite

**It is recommended to use the realworldrl_venv virtual environment that you used when installing the realworldrl_suite package. To do so, you may need to run the following commands:**  

```
pip3 install --user ipykernel
python3 -m ipykernel install --user --name=realworldrl_venv
```

Then in this notebook, click 'Kernel' in the menu, then click 'Change Kernel' and select `realworldrl_venv`

**Note**: You may need to restart the Jupyter kernel to see the updated virtual environment in the Kernel list.


**Import the necessary libraries**

In [0]:
#@title 
# Copyright 2020 Google LLC.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

# https://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import matplotlib.pyplot as plt
import numpy as np
import tqdm

import collections

import realworldrl_suite.environments as rwrl
from realworldrl_suite.utils import evaluators

**Define the environment and the policy**

In [0]:
total_episodes = 1000  # The analysis tools require at least 100 episodes.
domain_name = 'cartpole'
task_name = 'realworld_swingup'

# Define the challenge dictionaries
safety_spec_dict = dict(enable=True, safety_coeff=0.5)
delay_spec_dict = dict(enable=True, actions=20)

log_safety_violations = True

def random_policy(action_spec):

  def _act(timestep):
    del timestep
    return np.random.uniform(low=action_spec.minimum,
                             high=action_spec.maximum,
                             size=action_spec.shape)
  return _act


env = rwrl.load(
    domain_name=domain_name,
    task_name=task_name,
    safety_spec=safety_spec_dict,
    delay_spec=delay_spec_dict,
    log_output=os.path.join('/tmp/', 'log.npz'),
    environment_kwargs=dict(log_safety_vars=log_safety_violations, 
                            flat_observation=True,
                            log_every=10))

policy = random_policy(action_spec=env.action_spec())

**Run the main loop**

In [0]:
rewards = []
episode_counter = 0
for _ in tqdm.tqdm(range(total_episodes)):
    timestep = env.reset()
    total_reward = 0.
    while not timestep.last():
        action = policy(timestep)
        timestep = env.step(action)
        total_reward += timestep.reward
    rewards.append(total_reward)
    episode_counter+=1
print('Random policy total reward per episode: {:.2f} +- {:.2f}'.format(
np.mean(rewards), np.std(rewards)))

In [0]:
f = open(env.logs_path, "rb")   
stats = np.load(f, allow_pickle=True)
evaluator = evaluators.Evaluators(stats)

**Load the average return plot as a function of the number of episodes**

In [0]:
fig = evaluator.get_return_plot()
plt.show()

**Compute regret**

In [0]:
fig = evaluator.get_convergence_plot()
plt.show()

**Compute instability**

In [0]:
fig = evaluator.get_stability_plot()
plt.show()

**Safety violations plot (left figure) and the mean evolution of safety constraint violations during an episode (right figure)**

In [0]:
fig = evaluator.get_safety_plot()
plt.show()

**Multiple training seeds can be aggregated.**

In [0]:
# We emulate multiple runs by copying the same logs with added noise.

all_evaluators = []
for _ in range(10):
  another_evaluator = evaluators.Evaluators(stats)
  v = another_evaluator.stats['return_stats']['episode_totals']
  v += np.random.randn(*v.shape) * 100.
  all_evaluators.append(another_evaluator)

**Normalized regret across all runs.**

In [0]:
evaluators.get_regret_plot(all_evaluators)
plt.show()

**Return across all runs.**

In [0]:
evaluators.get_return_plot(all_evaluators, stride=500)
plt.show()

**Additional useful functions**

Multi-objective runs can be analyzed using:

```
evaluator.get_multiobjective_plot()  # For a single run.
evaluators.get_multiobjective_plot(all_evaluators)  # For multiple runs.
```