# Hands-on RL with Ray’s RLlib
## A beginner’s tutorial for working with multi-agent environments, models, and algorithms

### Overview
“Hands-on RL with Ray’s RLlib” is a beginners tutorial for working with reinforcement learning (RL) environments, models, and algorithms using Ray’s RLlib library. RLlib offers high scalability, a large list of algos to choose from (offline, model-based, model-free, etc..), support for TensorFlow and PyTorch, and a unified API for a variety of applications. This tutorial includes a brief introduction to provide an overview of concepts (e.g. why RL) before proceeding to RLlib (multi- and single-agent) environments, neural network models, hyperparameter tuning, debugging, student exercises, Q/A, and more. All code will be provided as .py files in a GitHub repo.

### Intended Audience
* Python programmers who want to get started with reinforcement learning and RLlib.

### Prerequisites
* Some Python programming experience.
* Some familiarity with machine learning.
* *Helpful, but not required:* Experience in reinforcement learning and Ray.
* *Helpful, but not required:* Experience with TensorFlow or PyTorch.

### Requirements/Dependencies

Install conda (https://www.anaconda.com/products/individual)

Then ...

#### Quick `conda` setup instructions (Mac and Linux):
```
$ conda create -n rllib python=3.8
$ conda activate rllib
$ pip install ray[rllib]
$ pip install [tensorflow|torch]  # <- either one works!
$ pip install jupyter-labs
```

#### Quick `conda` setup instructions (Win10):
```
$ conda create -n rllib python=3.8
$ conda activate rllib
$ pip install ray[rllib]
$ pip install [tensorflow|torch]  # <- either one works!
$ pip install jupyter-labs
$ conda install pywin32
```

Also, for Win10 Atari support, we have to install atari_py from a different source (gym does not support Atari envs on Windows).

```
$ pip install git+https://github.com/Kojoley/atari-py.git
```

### Opening these tutorial files:
```
$ git clone https://github.com/sven1977/rllib_tutorials
$ cd rllib_tutorials
$ jupyter-lab
```

### Key Takeaways
* What is reinforcement learning and why RLlib?
* Core concepts of RLlib: Environments, Trainers, Policies, and Models.
* How to configure, hyperparameter-tune, and parallelize RLlib.
* RLlib debugging best practices.

### Tutorial Outline
1. RL and RLlib in a nutshell.
1. Defining an RL-solvable problem: Our first environment.
1. Exercise No.1 (env loop)
1. Picking an algorithm and training our first RLlib Trainer.
1. Configurations and hyperparameters - Easy tuning with Ray Tune.
1. Fixing our experiment's config - Going multi-agent.
1. The "infinite laptop": Quick intro into how to use RLlib with Anyscale's product.
1. Exercise No.2 (run your own Ray RLlib+Tune experiment)
1. Neural network models - Provide your custom models using tf.keras or torch.nn.
1. Deeper dive into RLlib's parallelization architecture.
1. Specifying different compute resources and parallelization options through our config.
1. "Hacking in": Using callbacks to customize the RL loop and generate our own metrics.
1. Exercise No.3 (write your own custom callback)
1. "Hacking in (part II)" - Debugging with RLlib and PyCharm.
1. Checking on the "infinite laptop" - Did RLlib learn to solve the problem?

### Other Recommended Readings
* [Attention Nets and More with RLlib's Trajectory View API](https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65)
* [Intro to RLlib: Example Environments](https://medium.com/distributed-computing-with-ray/intro-to-rllib-example-environments-3a113f532c70)
* [Reinforcement Learning with RLlib in the Unity Game Engine](https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d)


In [None]:
import numpy as np

import ray

# Start a new instance of Ray or connect to an already running one.
ray.init()
# In case you encounter this error during our tutorial:
# RuntimeError: Maybe you called ray.init twice by accident?
# Try: ray.shutdown() or ray.init(ignore_reinit_error=True)

<img src="images/rl-cycle.png" width=1200>

### Coding/defining our "problem" via an RL environment.

We will use the following (adversarial) multi-agent environment
throughout this tutorial to demonstrate a large fraction of RLlib's
APIs, features, and customization options.

<img src="images/environment.png" width=800>

### A word or two on Spaces:

Spaces are used in ML to describe what possible/valid values inputs and outputs of a neural network can have.

RL environments also use them to describe what their valid observations and actions are.

Spaces are usually defined by their shape (e.g. 84x84x3 RGB images) and datatype (e.g. uint8 for RGB values between 0 and 255).
However, spaces could also be composed of other spaces (see Tuple or Dict spaces) or could be simply discrete with n fixed possible values
(represented by integers). For example, in our game, where each agent can only go up/down/left/right, the action space would be "Discrete(4)"
(no datatype, no shape needs to be defined here).

<img src="images/spaces.png" width=800>

In [None]:
import gym
from gym.spaces import Discrete, MultiDiscrete
import random

from ray.rllib.env.multi_agent_env import MultiAgentEnv

class MultiAgentArena(MultiAgentEnv):
    def __init__(self, config=None):
        # !LIVE CODING!
        config = config or {}
        self.width = config.get("width", 10)
        self.height = config.get("height", 10)

        # 0=up, 1=right, 2=down, 3=left.
        self.action_space = Discrete(4)
        self.observation_space = MultiDiscrete([self.width * self.height,
                                                self.width * self.height])
        # End an episode after this many timesteps.
        self.timestep_limit = config.get("ts", 100)
        # Reset env.
        self.reset()

    def reset(self):
        # !LIVE CODING!
        # Row-major coords.
        self.agent1_pos = [0, 0]
        self.agent2_pos = [self.height - 1, self.width - 1]
        # Reset agent1's visited states.
        self.agent1_visited_states = set()
        # How many timesteps have we done in this episode.
        self.timesteps = 0

        return self.get_obs()

    def step(self, action: dict):
        # !LIVE CODING!
        self.timesteps += 1
        # Determine, who is allowed to move first.
        agent1_first = random.random() > 0.5
        # Move first agent (could be agent 1 or 2).
        if agent1_first:
            r1, r2 = self.move(self.agent1_pos, action["agent1"], is_agent1=True)
            add = self.move(self.agent2_pos, action["agent2"], is_agent1=False)
        else:
            r1, r2 = self.move(self.agent2_pos, action["agent2"], is_agent1=False)
            add = self.move(self.agent1_pos, action["agent1"], is_agent1=True)
        r1 += add[0]
        r2 += add[1]

        obs = self.get_obs()

        reward = {"agent1": r1, "agent2": r2}

        done = self.timesteps >= self.timestep_limit
        done = {"agent1": done, "agent2": done, "__all__": done}

        return obs, reward, done, {}

    def get_obs(self):
        ag1_discrete_pos = self.agent1_pos[0] * self.width + \
            (self.agent1_pos[1] % self.width)
        ag2_discrete_pos = self.agent2_pos[0] * self.width + \
            (self.agent2_pos[1] % self.width)
        return {
            "agent1": np.array([ag1_discrete_pos, ag2_discrete_pos]),
            "agent2": np.array([ag2_discrete_pos, ag1_discrete_pos]),
        }

    def move(self, coords, action, is_agent1):
        orig_coords = coords[:]
        # Change the row: 0=up (-1), 2=down (+1)
        coords[0] += -1 if action == 0 else 1 if action == 2 else 0
        # Change the column: 1=right (+1), 3=left (-1)
        coords[1] += 1 if action == 1 else -1 if action == 3 else 0

        # Solve collisions.
        # Make sure, we don't end up on the other agent's position.
        # If yes, don't move (we are blocked).
        if (is_agent1 and coords == self.agent2_pos) or (not is_agent1 and coords == self.agent1_pos):
            coords[0], coords[1] = orig_coords
            # Agent2 blocked agent1 (agent1 tried to run into agent2)
            # OR Agent2 bumped into agent1 (agent2 tried to run into agent1)
            # -> +1 for agent2; -1 for agent1
            return -1.0, 1.0

        # No agent blocking -> check walls.
        if coords[0] < 0:
            coords[0] = 0
        elif coords[0] >= self.height:
            coords[0] = self.height - 1
        if coords[1] < 0:
            coords[1] = 0
        elif coords[1] >= self.width:
            coords[1] = self.width - 1

        # If agent1 -> +1.0 if new tile covered.
        if is_agent1 and not tuple(coords) in self.agent1_visited_states:
            self.agent1_visited_states.add(tuple(coords))
            return 1.0, -0.1
        # No new tile for agent1 -> Negative reward.
        return -0.5, -0.1

    # Optionally: Add `render` method returning some img.
    def render(self, mode=None):
        return np.random.randint(0, 256, (20, 20, 3), dtype=np.uint8)

## Exercise No 1

<hr />

Write an "environment loop" using our `MultiAgentArena` class.

1. Create an env object.
1. `reset` your environment to get the first (initial) observation.
1. `step` through the environment using a provided
   "DummyTrainer.compute_action([obs])" method to compute action dicts (see cell below, in which you can create a DummyTrainer object and query it for random actions).
1. When an episode is done, remember to `reset()` your environment before the next call to `step()`.
1. If you feel, this is way too easy for you ;) , try to extract each agent's reward, sum it up over one episode and - at the end of an episode (when done=True) - print out each agent's accumulated reward (also called "return").

**Good luck! :)**


In [None]:
class DummyTrainer:
    """Dummy Trainer class used in Exercise #1.

    Use its `compute_action` method to get a new action, given some environment
    observation.
    """

    def compute_action(self, obs):
        # Returns a random action.
        return {
            "agent1": np.random.randint(4),
            "agent2": np.random.randint(4)
        }

dummy_trainer = DummyTrainer()
# Check, whether it's working.
for _ in range(3):
    print(dummy_trainer.compute_action({"agent1": np.array([0, 10]), "agent2": np.array([10, 0])}))

In [None]:
# Solution to Exercise #1:
#from gym.envs.classic_control.rendering import SimpleImageViewer
#simple_image_viewer = SimpleImageViewer()

# Solution:
env = MultiAgentArena(config={"width": 10, "height": 10})
obs = env.reset()
# Play through a single episode.
done = {"__all__": False}
return_ag1 = return_ag2 = 0.0
num_episodes = 0
while num_episodes < 10:
    action = dummy_trainer.compute_action(obs)
    obs, rewards, done, _ = env.step(action)
    return_ag1 += rewards["agent1"]
    return_ag2 += rewards["agent2"]    
    if done["__all__"]:
        print(f"Episode done. R1={return_ag1} R2={return_ag2}")
        num_episodes += 1
        return_ag1 = return_ag2 = 0.0
        obs = env.reset()
    # Optional:
    #img = env.render()
    #simple_image_viewer.imshow(img)


In [None]:
# 4) Plugging in RLlib.

# Import a Trainable (one of RLlib's built-in algorithms):
# We use the PPO algorithm here b/c its very flexible wrt its supported
# action spaces and model types and b/c it learns well almost any problem.
from ray.rllib.agents.ppo import PPOTrainer

# Specify a very simple config, defining our environment and some environment
# options (see environment.py).
config = {
    "env": MultiAgentArena,
    "env_config": {
        "config": {
            "width": 10,
            "height": 10,
        },
    },
    # "framework": "torch",
    "create_env_on_driver": True,
}
# Instantiate the Trainer object using above config.
rllib_trainer = PPOTrainer(config=config)


In [None]:
# That's it, we are ready to train.
# Calling `train` once runs a single "training iteration". One iteration
# for most algos contains a) sampling from the environment(s) + b) using the
# sampled data (observations, actions taken, rewards) to update the policy
# model (neural network), such that it would pick better actions in the future,
# leading to higher rewards.
print(rllib_trainer.train())

In [None]:
# Run `train()` n times. Try to repeatedly call this to see rewards increase.
# Move on once you see episode rewards > -55.0.
for i in range(10):
    results = rllib_trainer.train()
    print(f"iteration {i}: R={results['episode_reward_mean']}")

In [None]:
# !LIVE CODING!
# Let's actually "look inside" our Trainer to see what's in there.
pol = rllib_trainer.get_policy()
print(f"Policy: {pol}; Observation-space: {pol.observation_space}; Action-space: {pol.action_space}")

print(f"Model: {pol.model}")

# Create a fake numpy B=1 (single) observation consisting of both agents positions ("one-hot'd" and "concat'd").
from ray.rllib.utils.numpy import one_hot
single_obs = np.concatenate([one_hot(0, depth=100), one_hot(99, depth=100)])
single_obs = np.array([single_obs])
#single_obs.shape

# Generate the Model's output.
out, state_out = pol.model({"obs": single_obs})

# tf1.x (static graph) -> Need to run this through a tf session.
numpy_out = pol._sess.run(out)

# RLlib then passes the model's output to the policy's "action distribution" to sample an action.
action_dist = pol.dist_class(out)
action = action_dist.sample()

# Show us the actual action.
pol._sess.run(action)

In [None]:
# Save our trainer.
checkpoint_path = rllib_trainer.save()
print(f"Trainer was saved in '{checkpoint_path}'!")

import os
os.listdir(os.path.dirname(checkpoint_path))

In [None]:
# Pretend, we wanted to pick up training from a previous run:
new_trainer = PPOTrainer(config=config)
# Evaluate the new trainer (this should yield random results).
results = new_trainer._evaluate()
print(f"Evaluating new trainer: R={results['evaluation']['episode_reward_mean']}")

# Restoring the trained state into the `new_trainer` object.
new_trainer.restore(checkpoint_path)

# Evaluate again (this should yield results we saw after having trained our saved agent).
results = new_trainer._evaluate()
print(f"Evaluating restored trainer: R={results['evaluation']['episode_reward_mean']}")

In [None]:
# 5) Configuration dicts and Ray Tune.
# Where are the default configuration dicts stored?
import pprint
from ray.rllib.agents.ppo import DEFAULT_CONFIG as PPO_DEFAULT_CONFIG
print(f"PPO's default config is:")
pprint.pprint(PPO_DEFAULT_CONFIG)

#from ray.rllib.agents.dqn import DEFAULT_CONFIG as DQN_DEFAULT_CONFIG
#print(f"DQN's default config is:")
#pprint.pprint(DQN_DEFAULT_CONFIG)

#from ray.rllib.agents.trainer import COMMON_CONFIG
#print(f"RLlib Trainer's default config is:")
#pprint.pprint(COMMON_CONFIG)

In [None]:
# Plugging in Ray Tune.
# Note that this is the recommended way to run any experiments with RLlib.
# Reasons:
# - Tune allows you to do hyperparameter tuning in a user-friendly way
#   and at large scale!
# - Tune automatically allocates needed resources for the different
#   hyperparam trials and experiment runs.

from ray import tune

# Now that we will run things "automatically" through tune, we have to
# define one or more stopping criteria.
stop = {
    # explain that keys here can be anything present in the above print(trainer.train())
    "training_iteration": 5,
    "episode_reward_mean": 9999.9,
}

# "PPO" is a registered name that points to RLlib's PPOTrainer.
# See `ray/rllib/agents/registry.py`
# Run our simple experiment until one of the stop criteria is met.
tune.run("PPO", config=config, stop=stop)


In [None]:
# Updating an algo's default config dict and adding hyperparameter tuning
# options to it.
# Note: Hyperparameter tuning options (e.g. grid_search) will only work,
# if we run these configs via `tune.run`.
config.update(
    {
        # Try 2 different learning rates.
        "lr": tune.grid_search([0.0001, 0.5]),
        # NN model config to tweak the default model
        # that'll be created by RLlib for the policy.
        "model": {
            # e.g. change the dense layer stack.
            "fcnet_hiddens": [256, 256, 256],
            # Alternatively, you can specify a custom model here
            # (we'll cover that later).
            # "custom_model": ...
            # Pass kwargs to your custom model.
            # "custom_model_config": {}
        },
    }
)
# Repeat our experiment using tune's grid-search feature.
results = tune.run(
    "PPO",
    config=config,
    stop=stop,
    checkpoint_at_end=True,  # create a checkpoint when done.
    checkpoint_freq=1,  # create a checkpoint on every iteration.
)
print(results)


In [None]:
# 6) Going multi-policy: Our experiment is ill-configured b/c both
# agents, which should behave differently due to their different
# tasks and reward functions, learn the same policy (the "default_policy",
# which RLlib always provides if you don't configure anything else; Remember
# that RLlib does not know at Trainer setup time, how many and which agents
# the environment will "produce").
# Let's fix this and introduce the "multiagent" API.

# 6.1.) Define an agent->policy mapping function.
# Which agents (defined by the environment) use which policies
# (defined by us)? Mapping is M (agents) -> N (policies), where M >= N.
def policy_mapping_fn(agent: str):
    assert agent in ["agent1", "agent2"], f"ERROR: invalid agent {agent}!"
    return "pol1" if agent == "agent1" else "pol2"
    
# 6.2.) Define details for our two policies.
#TODO: coding Sven: Make it possible to not need obs/action spaces
#  if they are the default anyways.
observation_space = rllib_trainer.workers.local_worker().env.observation_space
action_space = rllib_trainer.workers.local_worker().env.action_space
# Btw, the above is equivalent to saying:
# >>> rllib_trainer.get_policy("default_policy").obs/action_space
policies = {
    "pol1": (None, observation_space, action_space, {"lr": 0.0003}),
    "pol2": (None, observation_space, action_space, {"lr": 0.0004}),
}

#policies_to_train = ["pol1", "pol2"]

# 6.3) Adding the above to our config.
config.update({
    "multiagent": {
        "policies": policies,
        "policy_mapping_fn": policy_mapping_fn,
        #"policies_to_train": policies_to_train,
    },
})


## Exercise No 2

<hr />

Try learning our environment using Ray tune.run and a simple hyperparameter grid_search over:
- 2 different learning rates (pick your own values).
- AND 2 different `train_batch_size` settings (use 2000 and 3000).

Also, make RLlib use a [128,1282] dense layer stack as the NN model.

Also, use the config setting of `num_envs_per_worker=10` to increase the sampling throughput.

In case your local machine has less than 12 CPUs, try setting `num_workers=1` to make all tune trials run at the same time.
Background: PPO by default uses 2 workers, which makes 1 trial use 3 CPUs (2 workers + "driver" ("local-worker")),
which makes the entire experiment use 12 CPUs. Tune will run trials in sequence in case it cannot allocate enough CPUs at once
(which is also fine, but then takes longer).

Try to reach a total reward (sum of agent1 and agent2) of -25.0.

**Good luck! :)**


In [None]:
# Solution to Exercise #2:

# Update our config and set it up for 2x tune grid-searches (leading to 4 parallel trials in total).
config.update({
    "lr": tune.grid_search([0.0001, 0.0005]),
    "train_batch_size": tune.grid_search([2000, 3000]),
    "num_envs_per_worker": 10,
    # Change our model to be simpler.
    "model": {
        "fcnet_hiddens": [128, 128],
    },
})

# Run the experiment.
tune.run("PPO", config=config, stop={"episode_reward_mean": -25.0, "training_iteration": 100})

In [None]:
# 8) Infinite laptop:

# NOTE: The following cell will only work if you are already on-boarded to our Anyscale Inc. "Infinite Laptop".
# To get more information, see https://www.anyscale.com/product

# Let's quickly divert from our MultiAgentArena and move to something much heavier in terms of environment/simulator complexity.
# We will now demonstrate, how you can use Anyscale's infinite laptop to launch an RLlib experiment on a cloud 4 GPU + 32 CPU machine
# all from within this Jupyter cell here.
# Start an experiment in the cloud using Anyscale's product, RLlib, and a more complex multi-agent env.

# NOTE 
import anyscale



In [None]:
# 9) Custom Neural Network Models.

import tensorflow as tf


class MyModel(tf.keras.Model):
    def __init__(self,
                input_space,
                action_space,
                num_outputs,
                name="",
                *,
                layers = (256, 256)):
        super().__init__(name=name)

        self.dense_layers = []
        for i, layer_size in enumerate(layers):
            self.dense_layers.append(tf.keras.layers.Dense(
                layer_size, activation=tf.nn.relu, name=f"dense_{i}"))

        self.logits = tf.keras.layers.Dense(
            num_outputs,
            activation=tf.keras.activations.linear,
            name="logits")
        self.values = tf.keras.layers.Dense(
            1, activation=None, name="values")

    def call(self, inputs, training=None, mask=None):
        # Standardized input args:
        # - input_dict (RLlib `SampleBatch` object, which is basically a dict with numpy arrays
        # in it)
        out = inputs["obs"]
        for l in self.dense_layers:
            out = l(out)
        logits = self.logits(out)
        values = self.values(out)

        # Standardized output:
        # - "normal" model output tensor (e.g. action logits).
        # - list of internal state outputs (only needed for RNN-/memory enhanced models).
        # - "extra outs", such as model's side branches, e.g. value function outputs.
        return logits, [], {"vf_preds": tf.reshape(values, [-1])}

In [None]:
# Do a quick test on the custom model class.
from gym.spaces import Box
test_model = MyModel(
    input_space=Box(-1.0, 1.0, (2, )),
    action_space=None,
    num_outputs=2,
)
test_model({"obs": np.array([[0.5, 0.5]])})

In [None]:
# Set up our custom model and re-run the experiment.

config.update({
    "model": {
        "custom_model": MyModel,
        "custom_model_config": {
            "layers": [128, 128],
        },
    },
    # Revert these to single trials (and use those hyperparams that performed well in our Exercise #2).
    "lr": 0.0005,
    "train_batch_size": 2000,
})

tune.run("PPO", config=config, stop=stop)

In [None]:
# "Hacking in": How do we customize our RL loop?
# RLlib offers a callbacks API that allows you to add custom behavior at
# all major events during the environment sampling and learning process.

# Our problem: So far, we can only see the total reward (sum for both agents).
# This does not give us enough insights into the question of which agent
# learns what (maybe agent2 doesn't learn anything and the reward we are observing
# is mostly due to agent1's progress in covering the map!).
# The following custom callbacks class allows us to add each agents single reward to
# the returned metrics, which will then be displayed in tensorboard.

# We will override RLlib's DefaultCallbacks class and implement the
# `on_episode_step` and `on_episode_end` methods therein.

from ray.rllib.agents.callbacks import DefaultCallbacks


class MyCallbacks(DefaultCallbacks):
    def on_episode_start(self, *, worker, base_env,
                         policies, episode,
                         env_index, **kwargs):
        episode.user_data["agent1_rewards"] = []
        episode.user_data["agent2_rewards"] = []

    def on_episode_step(self, *, worker, base_env,
                        episode, env_index, **kwargs):
        # Make sure this episode is ongoing.
        #assert episode.length > 0, \
        #    "ERROR: `on_episode_step()` callback should not be called right " \
        #    "after env reset!"
        ag1_r = episode.prev_reward_for("agent1")
        ag2_r = episode.prev_reward_for("agent2")
        #print("ag1_r={} ag2_r={}".format(ag1_r, ag2_r))
        episode.user_data["agent1_rewards"].append(ag1_r)
        episode.user_data["agent2_rewards"].append(ag2_r)

    def on_episode_end(self, *, worker, base_env,
                       policies, episode,
                       env_index, **kwargs):
        episode.custom_metrics["ag1_R"] = sum(episode.user_data["agent1_rewards"])
        episode.custom_metrics["ag2_R"] = sum(episode.user_data["agent2_rewards"])
        episode.hist_data["agent1_rewards"] = episode.user_data["agent1_rewards"]
        episode.hist_data["agent2_rewards"] = episode.user_data["agent2_rewards"]



In [None]:
# Setting up our config to point to our new custom callbacks class:
config.update({
    "env": MultiAgentArena,  # force "reload"
    "callbacks": MyCallbacks,  # by default, this would point to `rllib.agents.callbacks.DefaultCallbacks`, which does nothing.
    #TODO: remove this once native keras models are supported!
    "model": {
        "custom_model": None,
    },
})

results = tune.run("PPO", config=config, stop={"training_iteration": 10})

### Let's check tensorboard for the new custom metrics!

1. Head over to ~/ray_results/PPO/PPO_MultiAgentArena_[some key]_00000_0_[date]_[time]/
1. In that directory, you should see a `event.out....` file.
1. Run `tensorboard --logdir .` and head to https://localhost:6006

<img src="images/tensorboard.png" width=800>


## Exercise No 3

<hr />

Assume we would like to know exactly how much (new) ground agent1 
covers on average in an episode.
Write your own custom callback class (sub-class
ray.rllib.agents.callback::DefaultCallbacks) and override one or more methods
therein to collect the following data:
The number of (unique) fields agent1 has covered in an episode.

Run a simple experiment using tune.run (and your custom callbacks class)
and confirm the new metric shows up in tensorboard.

**Good luck! :)**


In [39]:
TODO: Last missing piece: Closer look at RLlib and its parallelization options and compute type options.

SyntaxError: invalid syntax (<ipython-input-39-2eaf55e4435e>, line 1)