## Bibliotecas de aprendizaje automático que utilizan Ray

Ahora discutiremos varias de las librerías hechas posibles por Ray, que también fueron impulsoras para la creación de Ray. Entre ellas se incluyen Ray RLlib para el aprendizaje por refuerzo, Ray Tune para el ajuste de hiperparámetros, Ray S_GD_ para el entrenamiento distribuido de modelos TensorFlow y PyTorch, y Ray Serve para el servicio de modelos.

### ¿Qué es el aprendizaje por refuerzo?
El aprendizaje por refuerzo (RL) es un tema amplio y diverso. No podemos hacerle justicia aquí, pero podemos explorar los aspectos más destacados y ver cómo Ray RLlib permite a los profesionales del RL trabajar de manera eficiente. RLlib también es modular y flexible para ayudar a los investigadores a explorar nuevos algoritmos y técnicas de RL.

<br>

![Image](https://i.ibb.co/mNjCs6n/wray-0301.png)

<br>

Un agente realiza acciones en un entorno, intentando maximizar una recompensa acumulativa. En cada paso, el agente observa el estado actual del entorno y la recompensa recibida por la acción anterior. A continuación, el agente decide la siguiente acción a realizar.

Aprender la mejor política que maximice la recompensa acumulada es la esencia de la RL. A menudo esto se hace por ensayo y error, probando episodios repetidos para determinar qué acciones son las mejores. El agente puede tener o no conocimientos previos sobre el entorno, es decir, un modelo. Por ejemplo, un entorno que representa un sistema físico podría modelarse con una simulación de la física implicada.

Una consideración clave es el equilibrio entre explotación y exploración. Si el agente descubre que una determinada acción siempre le reporta una buena recompensa, es posible que desee explotarla. Sin embargo, pueden existir acciones incluso mejores que el agente aún no haya descubierto, por lo que es necesario explorar, aunque la mayoría de las acciones alternativas puedan resultar inferiores. Equilibrar este equilibrio de forma eficaz es importante.

Otro reto clave es el problema de la asignación de créditos. Si estamos maximizando la recompensa acumulada, puede ser difícil saber en qué medida cada acción concreta durante un episodio largo ha contribuido a esa recompensa o la ha restado.

Un entorno de ejemplo popular es CartPole, parte de OpenAI Gym, que simula un carrito moviéndose a izquierda o derecha mientras intenta mantener en equilibrio un poste vertical. CartPole puede determinarse completamente por simple física, pero utilizaremos RL para aprender a equilibrar el poste por ensayo y error, del mismo modo que un humano aprendería esta tarea.

### Introducción a Ray Train

Soporte de frameworks: Train se abstrae de la complejidad de escalar el entrenamiento para los marcos de aprendizaje automático más comunes, como XGBoost, Pytorch y Tensorflow. Hay tres grandes categorías de entrenadores que Train ofrece:

In [6]:
#pip install --upgrade 'ray[rllib]'

In [7]:
import os

import ray
from ray.train.xgboost import XGBoostTrainer
from ray.air.config import ScalingConfig
import warnings

# Suppress noisy requests warnings.
warnings.filterwarnings("ignore")
os.environ["PYTHONWARNINGS"] = "ignore"

ray.init()

2022-12-22 15:27:04,260	INFO worker.py:1529 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


0,1
Python version:,3.8.10
Ray version:,2.2.0
Dashboard:,http://127.0.0.1:8265


In [8]:
# Load data.
dataset = ray.data.read_csv("breast_cancer.csv")



In [9]:
# Split data into train and validation.
train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)

Read progress: 100%|██████████| 1/1 [00:00<00:00, 777.15it/s]
Read progress: 100%|██████████| 1/1 [00:00<00:00, 855.81it/s]


## XGBoost

In [None]:
#!pip install xgboost
#!pip install xgboost_ray

In [10]:
trainer = XGBoostTrainer(
    scaling_config=ScalingConfig(
        # Number of workers to use for data parallelism.
        num_workers=2,
        # Whether to use GPU acceleration.
        use_gpu=False,
    ),
    label_column="target",
    num_boost_round=20,
    params={
        # XGBoost specific params
        "objective": "binary:logistic",
        # "tree_method": "gpu_hist",  # uncomment this to use GPU for training
        "eval_metric": ["logloss", "error"],
    },
    datasets={"train": train_dataset, "valid": valid_dataset},
)
result = trainer.fit()
print(result.metrics)

0,1
Current time:,2022-12-22 15:27:49
Running for:,00:00:06.73
Memory:,6.3/102.3 GiB

Trial name,status,loc,iter,total time (s),train-logloss,train-error,valid-logloss
XGBoostTrainer_2d02b_00000,TERMINATED,10.3.17.115:63032,21,5.19852,0.0184957,0,0.0893879


[2m[36m(_RemoteRayXGBoostActor pid=63447)[0m [15:27:47] task [xgboost.ray]:140157672893456 got new rank 0
[2m[36m(_RemoteRayXGBoostActor pid=63448)[0m [15:27:47] task [xgboost.ray]:139978890763376 got new rank 1


Trial name,date,done,episodes_total,experiment_id,experiment_tag,hostname,iterations_since_restore,node_ip,pid,should_checkpoint,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,train-error,train-logloss,training_iteration,trial_id,valid-error,valid-logloss,warmup_time
XGBoostTrainer_2d02b_00000,2022-12-22_15-27-49,True,,c18e0a94714643cea2b5da56bd5cd8ef,0,lr17-1-poplar-8,21,10.3.17.115,63032,True,5.19852,0.119546,5.19852,1671722869,0,,0,0.0184957,21,2d02b_00000,0.0409357,0.0893879,0.00729012


2022-12-22 15:27:49,251	INFO tune.py:762 -- Total run time: 7.28 seconds (6.73 seconds for the tuning loop).


{'train-logloss': 0.01849572784173766, 'train-error': 0.0, 'valid-logloss': 0.08938791321002339, 'valid-error': 0.04093567251461988, 'time_this_iter_s': 0.11954593658447266, 'should_checkpoint': True, 'done': True, 'timesteps_total': None, 'episodes_total': None, 'training_iteration': 21, 'trial_id': '2d02b_00000', 'experiment_id': 'c18e0a94714643cea2b5da56bd5cd8ef', 'date': '2022-12-22_15-27-49', 'timestamp': 1671722869, 'time_total_s': 5.198515176773071, 'pid': 63032, 'hostname': 'lr17-1-poplar-8', 'node_ip': '10.3.17.115', 'config': {}, 'time_since_restore': 5.198515176773071, 'timesteps_since_restore': 0, 'iterations_since_restore': 21, 'warmup_time': 0.0072901248931884766, 'experiment_tag': '0'}


## Pytorch

In [13]:
#!pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu # CPU installation
#!pip3 install torch torchvision torchaudio # GPU installation

In [14]:
import torch
import torch.nn as nn
from ray import train
from ray.air import session, Checkpoint
from ray.train.torch import TorchTrainer
from ray.air.config import ScalingConfig

In [15]:
input_size = 1
layer_size = 15
output_size = 1
num_epochs = 3

In [16]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.layer1 = nn.Linear(input_size, layer_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(layer_size, output_size)

    def forward(self, input):
        return self.layer2(self.relu(self.layer1(input)))

In [17]:
def train_loop_per_worker():
    dataset_shard = session.get_dataset_shard("train")
    model = NeuralNetwork()
    loss_fn = nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

    model = train.torch.prepare_model(model)

    for epoch in range(num_epochs):
        for batches in dataset_shard.iter_torch_batches(
            batch_size=32, dtypes=torch.float
        ):
            inputs, labels = torch.unsqueeze(batches["x"], 1), batches["y"]
            output = model(inputs)
            loss = loss_fn(output, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            print(f"epoch: {epoch}, loss: {loss.item()}")

        session.report(
            {},
            checkpoint=Checkpoint.from_dict(
                dict(epoch=epoch, model=model.state_dict())
            ),
        )

In [18]:
train_dataset = ray.data.from_items([{"x": x, "y": 2 * x + 1} for x in range(200)])
scaling_config = ScalingConfig(num_workers=3)
# If using GPUs, use the below scaling config instead.
# scaling_config = ScalingConfig(num_workers=3, use_gpu=True)
trainer = TorchTrainer(
    train_loop_per_worker=train_loop_per_worker,
    scaling_config=scaling_config,
    datasets={"train": train_dataset},
)
result = trainer.fit()

0,1
Current time:,2022-12-22 15:33:06
Running for:,00:00:06.98
Memory:,6.6/102.3 GiB

Trial name,status,loc,iter,total time (s),_timestamp,_time_this_iter_s,_training_iteration
TorchTrainer_ea2a7_00000,TERMINATED,10.3.17.115:65474,3,2.51715,1671723183,0.122461,3


[2m[36m(RayTrainWorker pid=65641)[0m 2022-12-22 15:33:03,485	INFO config.py:86 -- Setting up process group for: env:// [rank=0, world_size=3]
[2m[36m(RayTrainWorker pid=65641)[0m 2022-12-22 15:33:03,579	INFO train_loop_utils.py:270 -- Moving model to device: cpu
[2m[36m(RayTrainWorker pid=65641)[0m 2022-12-22 15:33:03,579	INFO train_loop_utils.py:330 -- Wrapping provided model in DistributedDataParallel.


[2m[36m(RayTrainWorker pid=65642)[0m epoch: 0, loss: 47379.8671875
[2m[36m(RayTrainWorker pid=65642)[0m epoch: 0, loss: 4.158123624995226e+16
[2m[36m(RayTrainWorker pid=65642)[0m epoch: 0, loss: 1116853497757696.0
[2m[36m(RayTrainWorker pid=65641)[0m epoch: 0, loss: 42690.72265625
[2m[36m(RayTrainWorker pid=65641)[0m epoch: 0, loss: 3.0162896042328064e+16
[2m[36m(RayTrainWorker pid=65641)[0m epoch: 0, loss: 1116846921089024.0
[2m[36m(RayTrainWorker pid=65643)[0m epoch: 0, loss: 48171.13671875
[2m[36m(RayTrainWorker pid=65643)[0m epoch: 0, loss: 3.940379520506266e+16
[2m[36m(RayTrainWorker pid=65643)[0m epoch: 0, loss: 1116843565645824.0


Trial name,_time_this_iter_s,_timestamp,_training_iteration,date,done,episodes_total,experiment_id,experiment_tag,hostname,iterations_since_restore,node_ip,pid,should_checkpoint,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
TorchTrainer_ea2a7_00000,0.122461,1671723183,3,2022-12-22_15-33-04,True,,69281c23640a4a7f965f62b2b6d17cd0,0,lr17-1-poplar-8,3,10.3.17.115,65474,True,2.51715,0.122184,2.51715,1671723184,0,,3,ea2a7_00000,0.0266662


[2m[36m(RayTrainWorker pid=65642)[0m epoch: 1, loss: 714780201451520.0
[2m[36m(RayTrainWorker pid=65642)[0m epoch: 1, loss: 457460590575616.0
[2m[36m(RayTrainWorker pid=65642)[0m epoch: 1, loss: 292778324000768.0
[2m[36m(RayTrainWorker pid=65641)[0m epoch: 1, loss: 714779261927424.0
[2m[36m(RayTrainWorker pid=65641)[0m epoch: 1, loss: 457459315507200.0
[2m[36m(RayTrainWorker pid=65641)[0m epoch: 1, loss: 292774968557568.0
[2m[36m(RayTrainWorker pid=65643)[0m epoch: 1, loss: 714780335669248.0
[2m[36m(RayTrainWorker pid=65643)[0m epoch: 1, loss: 457460187922432.0
[2m[36m(RayTrainWorker pid=65643)[0m epoch: 1, loss: 292773190172672.0
[2m[36m(RayTrainWorker pid=65642)[0m epoch: 2, loss: 187374977941504.0
[2m[36m(RayTrainWorker pid=65641)[0m epoch: 2, loss: 187374491402240.0
[2m[36m(RayTrainWorker pid=65643)[0m epoch: 2, loss: 187375045050368.0
[2m[36m(RayTrainWorker pid=65642)[0m epoch: 2, loss: 119920654286848.0
[2m[36m(RayTrainWorker pid=65642)[0

2022-12-22 15:33:06,407	INFO tune.py:762 -- Total run time: 7.09 seconds (6.98 seconds for the tuning loop).


## Tensorflow

In [None]:
#!pip install tensorflow

In [24]:
from ray.air import session, Checkpoint, ScalingConfig
from ray.train.tensorflow import TensorflowTrainer
import numpy as np

In [25]:
def train_func(config):
    import tensorflow as tf
    n = 100
    # create a toy dataset
    # data   : X - dim = (n, 4)
    # target : Y - dim = (n, 1)
    X = np.random.normal(0, 1, size=(n, 4))
    Y = np.random.uniform(0, 1, size=(n, 1))

    strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
    with strategy.scope():
        # toy neural network : 1-layer
        model = tf.keras.Sequential([tf.keras.layers.Dense(1, activation="linear", input_shape=(4,))])
        model.compile(optimizer="Adam", loss="mean_squared_error", metrics=["mse"])

    for epoch in range(config["num_epochs"]):
        model.fit(X, Y, batch_size=20)
        checkpoint = Checkpoint.from_dict(
            dict(epoch=epoch, model_weights=model.get_weights())
        )
        session.report({}, checkpoint=checkpoint)

trainer = TensorflowTrainer(
    train_func,
    train_loop_config={"num_epochs": 5},
    scaling_config=ScalingConfig(num_workers=2),
)
result = trainer.fit()

0,1
Current time:,2022-12-22 15:37:36
Running for:,00:00:11.68
Memory:,6.9/102.3 GiB

Trial name,status,loc,iter,total time (s),_timestamp,_time_this_iter_s,_training_iteration
TensorflowTrainer_88870_00000,TERMINATED,10.3.17.115:68163,5,6.76956,1671723454,0.264796,5


[2m[36m(pid=68163)[0m 2022-12-22 15:37:26.311475: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
[2m[36m(pid=68163)[0m To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[2m[36m(pid=68163)[0m 2022-12-22 15:37:26.428655: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/poplar/lib:/opt/popart/lib
[2m[36m(pid=68163)[0m 2022-12-22 15:37:26.428696: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[2m[36m(pid=68163)[0m 2022-12-22 15:37:27.043662: W tensorflow/compiler/xla/stream_executor



Trial name,_time_this_iter_s,_timestamp,_training_iteration,date,done,episodes_total,experiment_id,experiment_tag,hostname,iterations_since_restore,node_ip,pid,should_checkpoint,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
TensorflowTrainer_88870_00000,0.264796,1671723454,5,2022-12-22_15-37-34,True,,6510be2d03164196af4a2e2f4ac51f92,0,lr17-1-poplar-8,5,10.3.17.115,68163,True,6.76956,0.263286,6.76956,1671723454,0,,5,88870_00000,0.00570345


1/5 [=====>........................] - ETA: 0s - loss: 2.5980 - mse: 2.5980
1/5 [=====>........................] - ETA: 0s - loss: 2.5980 - mse: 2.5980


2022-12-22 15:37:36,792	INFO tune.py:762 -- Total run time: 11.79 seconds (11.67 seconds for the tuning loop).
