# Test a RL Algorithm using Ray programming ecosystem
---

The main aim of the current analysis is to provide a general guidelines to prepare and test the following workflow that breaks down the several needed steps to let an Agent, in the context of RL learning modeling problems, be trained onto a given provided rational Environment we purpose for the objectives we have thought and we continue to bear in mind so that we can understand whether the proposed solution actually and indeed works.

## Install further necessary third-party python modules
---


Here in this section we will perform some notebook required management task such as providing ways to install third-party python modules we will adopt to accomplish the goal of our analysis.

In [1]:
!pip install ray

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
!pip install lz4

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Imports
---

The following section details and lists the modules we are going to exploit.

In particular we connect to our google drive remote data storage that will provide both input data fetched for training step as weel as output directory location where we will be able to save our intermediate as well as ultimate results, such as checkpoints files that are in charge or representing snapshots of RL model to be trained at a given point in time during the whole processing step corresponding to training phase of our workflow analysis.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


The, we as eary mentioned proceed with the import of our main packages that provides standard utilities and ways of dealing with our data to be analyzed.

In [4]:
import os
import gym
import random
import numpy as np
import pandas as pd
import gym
import ray
import tqdm

In [5]:
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import MinMaxScaler

In [6]:

from ray.rllib.env.env_context import EnvContext
from ray.rllib.algorithms import appo
from ray.rllib.algorithms.appo import APPOConfig

## Globals

Global variables are available as input FORM:

In [7]:
EMPLOYESS_FILE_PATH = '/content/drive/MyDrive/ANALYSES_AND_TESTS/RL_TESTS/data/employees.csv' #@param {type:"string"}

In [8]:
SEATS_FILE_PATH = '/content/drive/MyDrive/ANALYSES_AND_TESTS/RL_TESTS/data/seats_dataset.csv' #@param {type:"string"}

In [9]:
CHECKPOINTS_DIR = '/content/drive/MyDrive/ANALYSES_AND_TESTS/RL_TESTS/out/checkpoints' #@param {type:"string"}

In [10]:
FREQ_SHOW_RESULTS = 10 #@param {type:"integer"}
FREQ_SAVE_MODEL_CHECKPOINT = 100 #@param {type:"integer"}

## Analysis

### Load and Prepare Input Data

We start our analysis and proceed step by step showing all the actions necessary to correctly provided the right shaped data to our routines and algorithms for training and evaluating the RL learning approach we decided to adopt along with the devised RL Environmentm that will suggest and interact with the Agent to address the policy approximation strategy we desire to obtain.

So, we firstly load the data from a specific location into data frame objects used later for feeding the learning system we prepared:

In [11]:
df_emp = pd.read_csv(EMPLOYESS_FILE_PATH, sep =';')

In [12]:
df_emp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         80 non-null     int64 
 1   role       80 non-null     object
 2   client     80 non-null     object
 3   practice   80 non-null     object
 4   job        80 non-null     object
 5   name       80 non-null     object
 6   birthdate  80 non-null     object
 7   username   80 non-null     object
 8   mail       80 non-null     object
 9   address    80 non-null     object
dtypes: int64(1), object(9)
memory usage: 6.4+ KB


In [13]:
df_emp.head()

Unnamed: 0,id,role,client,practice,job,name,birthdate,username,mail,address
0,0,EM,Brown-Lindsey,C&CA,"Production assistant, television",Jennifer Chavez,1944-11-01,wendy61,jamesmccall@hotmail.com,114 Bryan Throughway Suite 189\nSouth Andreamo...
1,1,EM,Davidson Inc,DCX,Brewing technologist,Kenneth Goodwin,1973-12-31,adam32,amysantiago@hotmail.com,"12111 Harris Shoals\nChadburgh, NE 62238"
2,2,EM,"Williams, Lopez and Brown",I&S,Chartered certified accountant,Andrew Patterson,1958-09-11,mward,weaverlaura@gmail.com,"73286 Becker Courts Suite 329\nNew Rachelside,..."
3,3,EM,Jenkins-Hall,Data Engineer,Film/video editor,Patrick Jenkins DDS,1912-09-30,eric66,jonathan41@hotmail.com,"331 Silva Ways\nLake Meghanshire, MD 15360"
4,4,EM,"Bender, Hamilton and Hendricks",Robotics,Health promotion specialist,Brandi Crawford,1919-06-22,sarahbarnes,wendy20@yahoo.com,"31349 Smith Light\nLindaberg, FL 97316"


In [14]:
df_seats = pd.read_csv(SEATS_FILE_PATH, sep =';')

In [15]:
df_seats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 114 entries, 0 to 113
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   id        114 non-null    int64  
 1   building  114 non-null    object 
 2   floor     114 non-null    int64  
 3   room      0 non-null      float64
 4   island    114 non-null    int64  
 5   x_coord   114 non-null    int64  
 6   y_coord   114 non-null    int64  
dtypes: float64(1), int64(5), object(1)
memory usage: 6.4+ KB


In [16]:
df_seats.head()

Unnamed: 0,id,building,floor,room,island,x_coord,y_coord
0,0,Artificial Test Building,1,,9,1161,463
1,1,Artificial Test Building,1,,14,393,463
2,2,Artificial Test Building,1,,7,1089,462
3,3,Artificial Test Building,1,,8,1017,462
4,4,Artificial Test Building,1,,13,602,462


We also subset the data frames earlier fetched, retrieving just the pieces of informations, that are our main features representing part of the data that the algorithm will exploit for its own computations:

In [17]:
df_emp = df_emp[['id','practice','client']]

In [18]:
df_emp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        80 non-null     int64 
 1   practice  80 non-null     object
 2   client    80 non-null     object
dtypes: int64(1), object(2)
memory usage: 2.0+ KB


In [19]:
df_seats = df_seats[['island','floor','x_coord','y_coord']]
df_seats['id_seat'] = df_seats.index

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_seats['id_seat'] = df_seats.index


In [20]:
df_emp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        80 non-null     int64 
 1   practice  80 non-null     object
 2   client    80 non-null     object
dtypes: int64(1), object(2)
memory usage: 2.0+ KB


In [21]:
scaler = MinMaxScaler()

df_seats_scaled = df_seats
df_seats_scaled[['x_coord','y_coord']] = scaler.fit_transform(df_seats_scaled[['x_coord','y_coord']])

df_seats_scaled[['island_scaled']] = df_seats_scaled[['island']]*2

floor_scale_factor = max(df_seats_scaled['island_scaled'])
df_seats_scaled[['floor_scaled']] = df_seats_scaled[['floor']]*2*floor_scale_factor

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[col] = igetitem(value, i)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


In [22]:
df_seats_scaled

Unnamed: 0,island,floor,x_coord,y_coord,id_seat,island_scaled,floor_scaled
0,9,1,1.000000,0.708411,0,18,56
1,14,1,0.307484,0.708411,1,28,56
2,7,1,0.935077,0.706542,2,14,56
3,8,1,0.870153,0.706542,3,16,56
4,13,1,0.495942,0.706542,4,26,56
...,...,...,...,...,...,...,...
109,1,2,0.614968,0.082243,109,2,112
110,1,2,0.584310,0.050467,110,2,112
111,1,2,0.648332,0.048598,111,2,112
112,1,2,0.603246,0.035514,112,2,112


Test take floor from seat id:

In [23]:
(df_seats
    .set_index("id_seat")
    .loc[2, ["floor"]].values[0]
)

1.0

### Define Environment

In [24]:
class MyEnv_France(gym.Env):

    def __init__(self, config: EnvContext):
        self.df_seats = config["df_seats"]
        self.df_emp = config["df_employees"]
        self.max_seats = len(self.df_seats)
        self.max_employees = len(self.df_emp)
        
        self.fake_seats = self.max_seats + 1
        self.fake_emp = self.max_employees + 1
        
        self.action_space = gym.spaces.Discrete(self.max_seats) #Discrete(2) -> {0,1}
        
        #Provo a considerare solo gli indici, altrimenti modificare un po' quello sotto
        self.observation_space = gym.spaces.Dict(
            {"id_emp": gym.spaces.Box(low=0, high=self.max_employees + 1, shape=(self.max_employees + 1,), dtype=np.uint8), #vedere se per l'allenamento è meglio il Box
             "id_seat": gym.spaces.Box(low=0, high=self.max_seats + 1, shape=(self.max_seats + 1,), dtype=np.uint8)
            }
        )
        
    def reset(self):
        self.state = self.observation_space.sample()
        self.bound_random = np.random.randint(70, self.max_employees)
        
        self.state['id_seat'][:self.bound_random] = np.random.choice(range(self.max_seats), self.bound_random, replace=False)
        self.state['id_emp'][:self.bound_random] = np.random.choice(range(self.max_employees), self.bound_random, replace=False)
        self.state['id_seat'][self.bound_random:] = self.fake_seats
        self.state['id_emp'][self.bound_random:] = self.fake_emp
        
        info = {}

        observation = self.state
        
        self.index_count = 0   
        self.done = False
        self.actions = []
        
        return observation
    
    def step(self, action): 
         
        if self.index_count == self.bound_random:
            self.done = True
        else:
            if action not in self.actions:
                floor_number = (
                    self.df_seats
                    .set_index("id_seat")
                    .loc[action, ["floor"]].values[0]
                )
                #inserire check per verificare se l'azione è già stata fatta
                self.state['id_seat'][self.index_count] = action
                
                #calcoo numero di step per vedere se andare avanti o no
                self.reward = self.score(self.state['id_seat'], self.state['id_emp'], self.index_count)
                if floor_number == 1:
                    self.reward = self.reward * 2
                else:
                    self.reward = self.reward * 1.5
                
                self.index_count += 1
            else:
                self.reward = -1.5
                             
        if self.reward > 1.9:
            self.done = True
                
        info = {}
        
        return self.state, self.reward, self.done, info
    
    def render(self):
        print("NUOVO STEP: \n")
    
    def score(self, seat_array, emp_array, index):
        #join
        self.seat_indices = seat_array[0:index+1]
        self.emp_indices = emp_array[0:index+1]
        seats = self.df_seats.iloc[self.seat_indices]
        emps = self.df_emp.iloc[self.emp_indices]
        emps_join = emps.copy()
        emps_join['id_seat'] = self.seat_indices
        merged_df = pd.merge(emps_join, seats, on='id_seat', how='inner')

        #score
        try:
            label_practice = merged_df['practice']
            label_client = merged_df['client']
            Z = merged_df[['x_coord', 'y_coord','island_scaled','floor_scaled']]
            total_score = silhouette_score(Z, label_practice) + silhouette_score(Z, label_client)
        except:
            total_score = 0
        
        return total_score # seats, silhouette_score(Z, label_practice), silhouette_score(Z, label_client)

### Init Ray

In [25]:
if not ray.is_initialized():
    ray.init()
    assert ray.is_initialized()

2023-01-26 08:14:34,587	INFO worker.py:1538 -- Started a local Ray instance.


0,1
Python version:,3.8.10
Ray version:,2.2.0


### Prepare config and Create RL Algorithm Instance to be Trained

Model Configs:

In [None]:
HORIZON = 10000 #@param {type: 'integer'}

Create Model Configs:

In [None]:
config = (
    APPOConfig()
    .rollouts(horizon=10000)
    .environment(
        MyEnv_France,
        env_config={
            "df_seats":df_seats_scaled,
            "df_employees": df_emp
        }
    )
)

Create Model:

In [26]:
algo = appo.APPO(env=MyEnv_France, config=config)

2023-01-26 08:14:36,760	INFO algorithm_config.py:2503 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
2023-01-26 08:14:36,855	INFO tensorboardx.py:42 -- pip install "ray[tune]" to see TensorBoard files.
2023-01-26 08:14:36,888	INFO algorithm.py:501 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
[2m[36m(pid=13539)[0m   import imp
[2m[36m(pid=13538)[0m   import imp
2023-01-26 08:14:57,829	INFO trainable.py:172 -- Trainable.setup took 20.955 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.


### Train Defined Environment

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
mean_ppo = []

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    with tqdm.tqdm(1000) as pbar:
        for ii in range(1000):
            result = algo.train()
            if ii % FREQ_SHOW_RESULTS == 0:
                pbar.write(
                    "Average Episode reward: %.4f" % (result['episode_reward_mean'],)
                )
            if ii % FREQ_SAVE_MODEL_CHECKPOINT == 0:
                # checkpoint_dir = algo.save("/tmp/rllib_checkpoint")
                _ = algo.save(os.path.join(CHECKPOINTS_DIR, f"rllib_checkpoint_{ii}"))
            mean_ppo.append(result['episode_reward_mean'])
            pbar.update(1)

  return float(np.nanmean(self.items[: self.count]))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
1it [00:17, 17.17s/it]

Average Episode reward: -55.5013


  return float(np.nanmean(self.items[: self.count]))
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
11it [02:21, 12.84s/it]

Average Episode reward: -54.3261


21it [04:21, 11.87s/it]

Average Episode reward: -54.3328


31it [06:38, 14.86s/it]

Average Episode reward: -53.9113


41it [08:48, 12.90s/it]

Average Episode reward: -53.9264


43it [09:13, 12.59s/it]

### Shutdown Ray

In [None]:
if ray.is_initialized():
    ray.shutdown()
    assert not ray.is_initialized()

### Inference

In [None]:
#Consuming
env = MyEnv_France(config = {"df_seats":df_seats_scaled, "df_employees": df_emp})
episode_reward = 0
done = False
obs = env.reset()

obs_fist = obs.copy()

while not done:
    action = algo.compute_single_action(obs)
    obs, reward, done, info = env.step(action)
    print("questo è lo score", env.score(obs['id_seat'], obs['id_emp']))
    print("Reward a questo step:",reward)
    episode_reward += reward

## Summary

Here compile summary notes if any to be reported in order to complete the actual analysis.

## References
---

### Colab Reference Manuals
- [How to produce Forms](https://colab.research.google.com/notebooks/forms.ipynb#scrollTo=3jKM6GfzlgpS)
- [How to provide Widgets](https://colab.research.google.com/notebooks/widgets.ipynb)

### Ray RLlib:
- [serve/tutorials](https://docs.ray.io/en/latest/serve/tutorials/rllib.html):
    - the article shows how to save RLlib models as intermediate checkpoints