<a href="https://colab.research.google.com/github/lcipolina/Ray/blob/main/3_Ray_RLLIb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple Demo of Ray's RLLib

We show how to train a reinforcement learning environment that has been built on top of OpenAI Gym using Ray and RLlib.

Ray Resources

Video Tutorials
https://www.youtube.com/watch?v=Zy_AuL7xLpk&list=PLzTswPQNepXk44l2ElLcxzNTRrlrR95ud

https://medium.com/distributed-computing-with-ray/intro-to-rllib-example-environments-3a113f532c70

https://opendatascience.com/reinforcement-learning-with-ray-rllib/

https://towardsdatascience.com/ray-and-rllib-for-fast-and-parallel-reinforcement-learning-6d31ee21c96c

https://github.com/anyscale/academy/tree/64b5b7d149d1dfd3883948bbde0a247b57fbef0c


https://arxiv.org/pdf/1712.09381.pdf


## Ray and RLLib Overview

RLLIb is basically a library with RL algos and other RL features that are specifically designed to work well with RAY.

It comes with some RL common algos already implemented for you.

Or you can build your own.. similar to GYM’s environment

It has of course, integration with GYM, so one can call GYM’s envs and all its features.

**List of Algos supported**

https://docs.ray.io/en/releases-0.8.6/rllib-algorithms.html

**How to Install**

https://docs.ray.io/en/releases-0.8.6/installation.html 


In [33]:
import cv2
from google.colab.patches import cv2_imshow
import os

In [34]:
!curl -sS https://i.imgur.com/etN9r4z.png> kp.jpg 
img= cv2.imread("kp.jpg")
img= cv2.resize(img, (900,400), interpolation = cv2.INTER_AREA)
cv2_imshow(img)

AttributeError: ignored

In [None]:
!curl -sS https://i.imgur.com/jTSW7va.png > kp.jpg 
img= cv2.imread("kp.jpg")
img= cv2.resize(img, (900,300), interpolation = cv2.INTER_AREA)
cv2_imshow(img)

In [None]:
!curl -sS https://i.imgur.com/bNMNJFd.png > kp.jpg 
img= cv2.imread("kp.jpg")
img= cv2.resize(img, (900,300), interpolation = cv2.INTER_AREA)
cv2_imshow(img)

## Mountain Car with RLLIB - PPO
https://medium.com/distributed-computing-with-ray/intro-to-rllib-example-environments-3a113f532c70

In [None]:
!pip install ray[rllib]
!pip install 'ray[default]'
!pip install gym
!pip install tensorflow

Initializing Ray

It prints the dashboard running on local port

In [None]:
import ray
import ray.rllib.agents.ppo as ppo

ray.shutdown()
ray.init(ignore_reinit_error=True)


In [None]:
!curl -sS https://i.imgur.com/VyEW7RX.png> kp.jpg 
img= cv2.imread("kp.jpg")
img= cv2.resize(img, (900,400), interpolation = cv2.INTER_AREA)
cv2_imshow(img)

Define directory for checkpoints

In [None]:
import shutil

CHECKPOINT_ROOT = "tmp/ppo/cart"
shutil.rmtree(CHECKPOINT_ROOT, ignore_errors=True, onerror=None)

ray_results = os.getenv("HOME") + "/ray_results/"
shutil.rmtree(ray_results, ignore_errors=True, onerror=None)

### Configuration
Train a policy with PPO

In [None]:
# Configuration
SELECT_ENV = "CartPole-v1"

config = ppo.DEFAULT_CONFIG.copy()
config["log_level"] = "WARN"

agent = ppo.PPOTrainer(config, env=SELECT_ENV)

### Training
Results are saved at: root/ray_results

In [None]:
# Training
N_ITER = 5  #only 5 iterations to show the idea
s = "{:3d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:6.2f} saved {}"

for n in range(N_ITER):
  result = agent.train()
  file_name = agent.save(CHECKPOINT_ROOT)

  print(s.format(
    n + 1,
    result["episode_reward_min"],
    result["episode_reward_mean"],
    result["episode_reward_max"],
    result["episode_len_mean"],
    file_name
   ))

### Tensorboard results
Note: one can also use WandB

In [None]:
#From command line
#tensorboard - logdir=$HOME/ray_results/

In [None]:
!curl -sS https://i.imgur.com/kySTNnm.png> kp.jpg 
img= cv2.imread("kp.jpg")
img= cv2.resize(img, (900,400), interpolation = cv2.INTER_AREA)
cv2_imshow(img)

## Ray Paralellization
RAY basically triggers several ROLLOUTS (i.e. episodes) at the same time (by distributing it to many “WORKERS”) so the whole thing runs faster.