**Javier Rojas Herrera**

jrojash1995@gmail.com

**Hardware, Deployment y MLOps**



### Letra pequeña del Deep Learning

* **Entrenamiento e Inferencia requieren de hardware avanzado en computo numérico**
* Se requieren grandes volumenes de datos etiquetados



###  <center>[<img src="images/gpuvscpuvstpu.webp" width="80%"/> ](attachment:image.png)</center>

## GPU (graphics processing unit)
###  <center>[<img src="images/A100.jpg" width="60%"/> ](attachment:image.png)</center>

## GPU vs CPU
###  <center>[<img src="images/gpuvscpu.png" width="60%"/> ](attachment:image.png)</center>

## GPU vs CPU: Inferencia
###  <center>[<img src="images/gpuvscpu2.png" width="70%"/> ](attachment:image.png)</center>

## Tabla resumen GPUS

|GPU |  Cuda cores | Tensor cores | VRAM  | Power | Precio |
|----------|----------|----------| ----------|  ----------| ----------|
| T4    | 2500   | 320  | 15 GB | 70 W | 1100 usd |
| L4   | 7680   | 240  | 24 GB | 72 W | 2600 usd |
| L40   | 18176    | 568  | 48 GB | 300 W | 8400 usd |
| A100    | 6920   |  422   | 80 GB | 400 W | 12000 usd |
| H100    | 14592    |  456    | 80 GB | 350 W | 30000 usd |


In [1]:
!nvidia-smi

Mon Jun 10 18:57:33 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:07:00.0 Off |                  N/A |
|  0%   34C    P8              19W / 320W |    223MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

In [2]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
from torchvision.models import vit_b_16 , ViT_B_16_Weights
from torchvision.transforms import v2
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader,Subset
import torch
import matplotlib.pyplot as plt
import time
from trans import UnNormalize
from io import StringIO
import numpy as np

In [3]:
##load model VIT B 16
weights = ViT_B_16_Weights.DEFAULT
preprocess = weights.transforms()
model = vit_b_16(weights=weights)
model.heads.head = torch.nn.Linear(768,10)

##set optimizer and loss
optim = torch.optim.Adam(model.parameters())
cross_entropy = torch.nn.CrossEntropyLoss()

##load data into dataloader
data = CIFAR10("./", download=True, train=True, transform=weights.transforms())
data = CIFAR10("./", download=True, train=True, transform=weights.transforms())
subset_indices = torch.randperm(len(data))[:1000]
subset_cifar10 = Subset(data, subset_indices)
dataloader = DataLoader(subset_cifar10, batch_size=32, shuffle=False)

class_names = ["Airplane","Auto","Bird","Cat","Deer","Dog","Frog","Horse","Ship","Truck"]

Using downloaded and verified file: ./cifar-10-python.tar.gz
Extracting ./cifar-10-python.tar.gz to ./
Files already downloaded and verified


In [4]:
std = weights.transforms().std
mean = weights.transforms().mean
invTrans=UnNormalize(mean,std)
@interact
def show_articles_more_than(x=1000):
    plt.figure(figsize=(5,3))
    print("Label: ",class_names[data[x][1]])
    plt.axis('off')
    plt.imshow(invTrans(data[x][0]).permute(1,2,0))

interactive(children=(IntSlider(value=1000, description='x', max=3000, min=-1000), Output()), _dom_classes=('w…

# ¿Qué elementos utilizan VRAM en un entrenamiento?

---

- Almacenamiento de tensores de entrada


- Almacenamiento de los parametros del modelo (weight and biases)

- Almacenamiento de gradientes (backpropagation)

- Almacenamiento de tensores de salida

---

In [5]:
## Tomando un batch del dataloader y transfiriendolo a VRAM
for image,label in dataloader:
    image_=image.cuda()
    break
!nvidia-smi

Mon Jun 10 19:00:58 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:07:00.0 Off |                  N/A |
|  0%   35C    P2              25W / 320W |    462MiB / 10240MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

In [6]:
## transfiriendo el modelo a VRAM
model = model.cuda()
!nvidia-smi

Mon Jun 10 19:01:52 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:07:00.0 Off |                  N/A |
|  0%   37C    P2              24W / 320W |    844MiB / 10240MiB |     10%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

## Locura de los parámetros


# <center>[ <img src="images/madness.png" width="60%"/>](attachment:image.png)</center>

## ¿Cómo aprovechar al maximo el hardware disponible?

In [7]:
linear_test= torch.nn.Linear(1,1)    #Se define una capa lineal con 1 neurona
print(linear_test.weight[0].detach().numpy())  #Imprime el valor del peso 

[0.3353274]


### ¿Cómo los computadores almacenan los numeros reales?


* Estandar IEEE754 establece la forma en la que los numeros reales son almacenados en memoria

* Existen los puntos flotante de 16,32,64,128 bits. Siendo el más utilizado el punto flotante de 32 bits o de precisión simple

# FP32 vs FP16

 # <center>[ <img src="images/fp16.ppm" width="80%"/>](attachment:image.png)</center>


### FP32 expresión notación científica
${(−1)^S × 2^{(E-127)} × 1.F}$   

Donde:

$S = signo$

$E = exponente$

$F = mantisa$

### FP16 expresión notación científica
${(−1)^S × 2^{(E-15)} × 1.F}$   

Donde:

$S = signo$

$E = exponente$

$F = mantisa$

In [8]:
number= 0.123456789123456789123
sio = StringIO()
np.savetxt(sio, np.array([number], dtype=np.float64))
np.savetxt(sio, np.array([number], dtype=np.float32))
np.savetxt(sio, np.array([number], dtype=np.float16))
s = sio.getvalue()
print(s)


1.234567891234567838e-01
1.234567910432815552e-01
1.234741210937500000e-01



# FP32 vs FP16

 # <center>[ <img src="images/fp32vsfp16_tabla.png" width="80%"/>](attachment:image.png)</center>

In [None]:
A= np.random.rand(10000,10000).astype(np.float32)
B= np.random.rand(10000,10000).astype(np.float32)

In [None]:
%%time
print(np.matmul(A,B))

In [None]:
A= np.random.rand(10000,10000).astype(np.float64)
B= np.random.rand(10000,10000).astype(np.float64)

In [None]:
%%time
print(np.matmul(A,B))

## ¿Es posible usar representacion de FP16 para entrenar o inferir?

* Algunas operaciones como las convoluciones o lineales, pueden realizarse completamente en FP16
* Sin embargo, otras operaciones como la reducción, a menudo pueden necesitar la representacion en FP32

## Precision mixta automática (AMP)

 # <center>[ <img src="images/amp.png"/>](attachment:image.png)</center>

## ¿Realmente tiene beneficios usar FP16?

  # Tensor cores
 
 # <center>[ <img src="images/tensorop.png" width=100%/>](attachment:image.png)</center>

 # Tensor cores
 
 # <center>[ <img src="images/tensor_cores.gif"/>](attachment:image.png)</center>

### Entrenamiento tradicional

In [14]:
for epoch in range(1):
    time_i=time.time()
    epoch_loss = 0.0
    for image,label in dataloader:
        optim.zero_grad() 
        image=image.cuda()  
        label=label.cuda()
        output = model(image)   
        loss= cross_entropy(output,label)        
        loss.backward()     
        epoch_loss+=loss.item()
        optim.step()
    print(f'Tiempo por epoca: {time.time()-time_i} segs | Epoch loss: {epoch_loss}')        

Tiempo por epoca: 12.544077634811401 segs | Epoch loss: 79.52772259712219


## Entrenamiento utilizando precision mixta + tensor cores

In [15]:
for epoch in range(1):
    time_i=time.time()
    epoch_loss = 0.0
    for image,label in dataloader:
        optim.zero_grad()
        image=image.cuda()
        label=label.cuda()
        with torch.autocast(device_type="cuda"):
            output = model(image)
            loss= cross_entropy(output,label)        
            loss.backward()
            optim.step()
        epoch_loss+=loss.item()
    print(f'Tiempo por epoca: {time.time()-time_i} segs | Epoch loss: {epoch_loss}')

Tiempo por epoca: 4.64338493347168 segs | Epoch loss: 72.82086181640625


## ¿Qué problemas pueden ocurrir al trabajar con una precisión de 16 bits?

* Cálculo de gradientes acumulativos podrian no poder representarse en FP16 (Desvanecimiento de gradiente)

### Entrenamiento con cálculo de gradiente escalado

In [16]:
scaler = torch.cuda.amp.GradScaler() 
a=torch.tensor([0.00045],requires_grad=True).cuda()
scaler.scale(a)

tensor([29.4912], device='cuda:0', grad_fn=<MulBackward0>)

In [17]:
scaler = torch.cuda.amp.GradScaler()  
for epoch in range(3):
    time_i=time.time()
    epoch_loss = 0.0
    for image,label in dataloader:
        optim.zero_grad()
        image=image.cuda()
        label=label.cuda()
        with torch.autocast(device_type="cuda"):
            output = model(image)
            loss= cross_entropy(output,label)        
        scaler.scale(loss).backward()
        scaler.step(optim)
        scaler.update()
        epoch_loss+=loss.item()
    print(f'Tiempo por epoca: {time.time()-time_i} segs | Epoch loss: {epoch_loss}')

Tiempo por epoca: 4.631499767303467 segs | Epoch loss: 68.31180572509766
Tiempo por epoca: 4.612504005432129 segs | Epoch loss: 66.4228515625
Tiempo por epoca: 4.608715772628784 segs | Epoch loss: 65.20722198486328


# ¿Qué sucede si no dispongo de hardware o si requiero de pocas horas de computo?

### Principales servicios cloud para creación de máquinas virtuales

# <center>[<img src="images/azurevs.jpg" width="80%"/>](attachment:image.png)</center>

### Pros de utilizar máquinas virtuales

* Fácil de crear y configurar según las necesidades

* Costo bajo al corto plazo

* Integración directa con otros servicios cloud del mismo prestador

### Contras de utilizar máquinas virtuales

* Los recursos solicitados pueden no estar disponibles

* Alto costo a largo plazo

## ¿Usar MV es lo más eficiente para realizar tareas de machine learning en la nube?

* Los modelos en MV no escalan (Inferencia)

* Entrenamiento y despliegue de modelos complejo de automatizar

# <center>[<img src="images/mlstudiovsvertex.png" width="60%"/>](attachment:image.png)</center>

### Ventajas al utilizar servicios especializados para ML en la nube

* Deployment escalable y automatizado de modelos

* Entrenamiento automatizado (pipelines)

* Disponibilidad de una familia de modelos pre entrenados a través de API

* Creación de notebooks jupyter

# <center>[<img src="images/mlsteps.jpg" width="80%"/>](attachment:image.png)</center>

## Deployment de modelos de ML

* Disponibilizar modelos para el uso real de usuarios

# <center>[<img src="images/depl.png" width="50%"/>](attachment:image.png)</center>

## Modelo como API
# <center>[<img src="images/apimodel.png" width="70%"/>](attachment:image.png)</center>

## Frameworks para deployment de modelos

# <center>[<img src="images/deploy.png" width="70%"/>](attachment:image.png)</center>

### Ejemplo:  Bento ML

In [18]:
get_ipython().system_raw('BENTOML_PORT=11000 bentoml serve server:svc &')

2024-06-10T20:01:59-0400 [INFO] [cli] Environ for worker 0: set CUDA_VISIBLE_DEVICES to 0
2024-06-10T20:01:59-0400 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "server:svc" can be accessed at http://localhost:11000/metrics.
2024-06-10T20:02:00-0400 [INFO] [cli] Starting production HTTP BentoServer from "server:svc" listening on http://0.0.0.0:11000 (Press CTRL+C to quit)
2024-06-10T20:03:04-0400 [INFO] [api_server:12] 190.217.221.19:58372 (scheme=http,method=GET,path=/docs,type=,length=) (status=404,type=text/plain; charset=utf-8,length=9) 3.529ms (trace=678284b775f8c40f454c993618b0dccd,span=eda6d4ce21265fc7,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:04-0400 [INFO] [api_server:12] 190.217.221.19:58372 (scheme=http,method=GET,path=/favicon.ico,type=,length=) (status=404,type=text/plain; charset=utf-8,length=9) 0.602ms (trace=550ec6a242f12c401ba9bd37dae97b5c,span=ea9152f6c56cafe2,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:07-0400 [INFO] [api_server:12] 190.15

2024-06-10T20:03:17-0400 [INFO] [api_server:3] 190.215.92.28:53237 (scheme=http,method=GET,path=/docs,type=,length=) (status=404,type=text/plain; charset=utf-8,length=9) 0.706ms (trace=19421eff0ed7131450cfd5e57cd6ca6a,span=dd318531ccd376f1,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:17-0400 [INFO] [api_server:2] 190.196.33.80:37876 (scheme=http,method=GET,path=/,type=,length=) (status=200,type=text/html; charset=utf-8,length=2945) 0.378ms (trace=2665afefbceb958388438e78551e1da5,span=96b7c3b8e4983c23,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:17-0400 [INFO] [api_server:2] 190.196.33.80:42029 (scheme=http,method=GET,path=/static_content/index.css,type=,length=) (status=200,type=text/css; charset=utf-8,length=1127) 5.153ms (trace=083ea936f8d23181f5c00ad4e64d3172,span=6f199dd48f4e9ebd,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:17-0400 [INFO] [api_server:2] 190.196.33.80:37876 (scheme=http,method=GET,path=/static_content/swagger-ui.css,type=,length=) (status=200,type=

2024-06-10T20:03:28-0400 [INFO] [api_server:7] 44.196.175.104:30597 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=4572) 2.252ms (trace=fb6ff48ee715228439abac8a7a610828,span=6a36d9594f206eb5,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:29-0400 [INFO] [api_server:3] 201.189.202.30:56511 (scheme=http,method=GET,path=/,type=,length=) (status=200,type=text/html; charset=utf-8,length=2945) 0.364ms (trace=18ada799ccc63013507048a6593edb26,span=1c6eb5bcd14460ed,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:29-0400 [INFO] [api_server:3] 201.189.202.30:56511 (scheme=http,method=GET,path=/static_content/swagger-ui.css,type=,length=) (status=200,type=text/css; charset=utf-8,length=152059) 1.757ms (trace=f8d769e6d12ffd7453288aa752e5f8f5,span=94760468b02b9735,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:29-0400 [INFO] [api_server:12] 201.189.202.30:56541 (scheme=http,method=GET,path=/static_content/index.css,type=,length=) (status=200

2024-06-10T20:03:31-0400 [INFO] [api_server:9] 179.60.66.160:17778 (scheme=http,method=GET,path=/static_content/swagger-ui-standalone-preset.js,type=,length=) (status=200,type=text/javascript; charset=utf-8,length=230777) 1.651ms (trace=c254146d5e029ed68abaf2d76e6c2931,span=f0ea1be171c95b40,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:31-0400 [INFO] [api_server:5] 44.196.175.104:60936 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=4572) 0.574ms (trace=3dcdf883d53f0415c5f406bce6169c3d,span=3d4a0bdac4669ce4,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:31-0400 [INFO] [api_server:7] 44.196.175.104:6329 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=4572) 0.855ms (trace=74ec0d1771769fb362f5217d86742c97,span=853bf0a24fda66e6,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:31-0400 [INFO] [api_server:9] 179.60.66.160:17778 (scheme=http,method=GET,path=/docs.json,type=,length=) (stat

2024-06-10T20:03:44-0400 [INFO] [api_server:7] 179.56.131.197:55577 (scheme=http,method=GET,path=/,type=,length=) (status=200,type=text/html; charset=utf-8,length=2945) 0.225ms (trace=9bef1d176417fea274271c73644d6363,span=432c18d70dfb2773,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:44-0400 [INFO] [api_server:7] 179.56.131.197:55577 (scheme=http,method=GET,path=/static_content/swagger-ui.css,type=,length=) (status=200,type=text/css; charset=utf-8,length=152059) 1.356ms (trace=32adf5ecbf7c68d03bc8a48242c20a67,span=25b2ad1a9735b307,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:44-0400 [INFO] [api_server:7] 179.56.131.197:55578 (scheme=http,method=GET,path=/static_content/index.css,type=,length=) (status=200,type=text/css; charset=utf-8,length=1127) 0.851ms (trace=dcf43a01dbb3602852b19670e5fde1b7,span=1d046412c9ee056c,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:44-0400 [INFO] [api_server:7] 179.56.131.197:55584 (scheme=http,method=GET,path=/static_content/swagger-initial

2024-06-10T20:03:51-0400 [INFO] [api_server:7] 190.215.92.28:53273 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=4572) 0.907ms (trace=ffcbc5fcad56258b836fdb4787da2681,span=99ef8232cced84dd,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:51-0400 [INFO] [api_server:7] 190.215.92.28:53273 (scheme=http,method=GET,path=/static_content/favicon-dark-32x32.png,type=,length=) (status=200,type=image/png,length=654) 1.535ms (trace=266f435c43a65347b4936a9333f01701,span=361b2d99368748c7,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:52-0400 [INFO] [api_server:5] 44.196.175.104:27316 (scheme=http,method=GET,path=/docs.json,type=,length=) (status=200,type=application/json,length=4572) 0.860ms (trace=8167b78bec86e65cccbdd535c5040bb1,span=70e6e3a1b20cca61,sampled=0,service.name=gpt2_demo)
2024-06-10T20:03:53-0400 [INFO] [api_server:11] 201.105.213.238:62692 (scheme=http,method=GET,path=/,type=,length=) (status=200,type=text/html; charset=utf-8,le

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


2024-06-10T20:04:15-0400 [INFO] [runner:gpt2:1] _ (scheme=http,method=POST,path=/predict,type=application/octet-stream,length=23) (status=200,type=application/vnd.bentoml.DefaultContainer,length=88) 715.074ms (trace=5c3595a6acff42a4addc9521f334f2a4,span=19ee5086a4b4950d,sampled=0,service.name=gpt2)
{'text': 'hi, i am'}
hi, i am a woman of the same gender, i am bisexual, i am into science and
2024-06-10T20:04:15-0400 [INFO] [api_server:9] 190.217.221.19:58389 (scheme=http,method=POST,path=/invocation,type=application/json,length=21) (status=200,type=application/json,length=88) 808.538ms (trace=5c3595a6acff42a4addc9521f334f2a4,span=775476b190881ae2,sampled=0,service.name=gpt2_demo)
2024-06-10T20:04:19-0400 [INFO] [api_server:1] 190.12.168.30:32347 (scheme=http,method=GET,path=/,type=,length=) (status=200,type=text/html; charset=utf-8,length=2945) 0.305ms (trace=0b8641886d87560c8ea3f6fde746fa38,span=3e19237a3d1842ba,sampled=0,service.name=gpt2_demo)
2024-06-10T20:04:19-0400 [INFO] [api_se

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


2024-06-10T20:04:24-0400 [INFO] [runner:gpt2:1] _ (scheme=http,method=POST,path=/predict,type=application/octet-stream,length=23) (status=200,type=application/vnd.bentoml.DefaultContainer,length=88) 106.491ms (trace=509524f7948f5963ad98463158b442e4,span=f8cbd788330a9a8e,sampled=0,service.name=gpt2)
{'text': 'Hi, I am'}
Hi, I am not sure, but it makes sense. I've been playing Halo since I was
2024-06-10T20:04:24-0400 [INFO] [api_server:11] 179.57.111.3:58923 (scheme=http,method=POST,path=/invocation,type=application/json,length=20) (status=200,type=application/json,length=88) 139.266ms (trace=509524f7948f5963ad98463158b442e4,span=b68687f88f2ebd34,sampled=0,service.name=gpt2_demo)
2024-06-10T20:04:41-0400 [INFO] [api_server:3] 179.56.131.197:55620 (scheme=http,method=GET,path=/,type=,length=) (status=200,type=text/html; charset=utf-8,length=2945) 0.356ms (trace=f758009df29020420410b82e12578d4b,span=ba795052dc88bb41,sampled=0,service.name=gpt2_demo)
2024-06-10T20:04:41-0400 [INFO] [api_se

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


2024-06-10T20:04:55-0400 [INFO] [runner:gpt2:1] _ (scheme=http,method=POST,path=/predict,type=application/octet-stream,length=20) (status=200,type=application/vnd.bentoml.DefaultContainer,length=91) 134.747ms (trace=39abbd6aa1e4f440a21232b10a571d89,span=aa7349d148e38fc3,sampled=0,service.name=gpt2)
{'text': 'hello'}
hello the news release from E. Coli

In a major new move, the city announced
2024-06-10T20:04:55-0400 [INFO] [api_server:6] 201.189.202.30:56744 (scheme=http,method=POST,path=/invocation,type=application/json,length=19) (status=200,type=application/json,length=93) 175.271ms (trace=39abbd6aa1e4f440a21232b10a571d89,span=10241e59cca1dbd5,sampled=0,service.name=gpt2_demo)


## Modelo como API
# <center>[<img src="images/depl2.png" width="70%"/>](attachment:image.png)</center>

## Problemas de levantar modelos API en MV

# <center>[<img src="images/apin2.png" width="70%"/>](attachment:image.png)</center>

## Solución: Escalar modelos API en cloud

# <center>[<img src="images/depl4.png" width="50%"/>](attachment:image.png)</center>

### Mostrar ejemplo de escalamiento en VERTEX AI

## ¿Qué es MLOps?

* Paradigma repetible que tiene como objetivo implementar y mantener modelos de aprendizaje automático en producción de manera confiable y eficiente.


# <center>[<img src="images/mlops.png" width="80%"/>](attachment:image.png)</center>

## Pipelines en MLOps

* Una Pipeline es un flujo de trabajo conformado por uno o varios componentes y sus interacciones a través de entradas y salidas.

# <center>[<img src="images/compo.png" width="60%"/>](attachment:image.png)</center>

# <center>[<img src="images/pipeline.png" width="60%"/>](attachment:image.png)</center>

### Frameworks para MLOps

# <center>[<img src="images/mlops_frame2.png" width="70%"/>](attachment:image.png)</center>

### Servicios cloud para MLOps
# <center>[<img src="images/mlstudiovsvertex.png" width="60%"/>](attachment:image.png)</center>

### Ejemplo de pipeline de juguete definida en Kubeflow

In [20]:
import kfp.dsl as dsl
from kfp.v2 import compiler

@dsl.component
def load_Data(a: float, b: float) -> float:
    return a + b

@dsl.component
def train(a: float, b: float) -> float:
    return a * b

@dsl.pipeline
def add_pipeline(a: float, b: float):
    add_task = add(a=a, b=b)
    mul_task = mul(a=a, b=add_task.output)
    
compiler.Compiler().compile(pipeline_func=add_pipeline, package_path='add_pipeline.json')


  from kfp.v2 import compiler
  return component_factory.create_component_from_func(


2024-06-10T21:14:35-0400 [INFO] [api_server:7] 190.12.168.30:32656 (scheme=http,method=GET,path=/,type=,length=) (status=200,type=text/html; charset=utf-8,length=2945) 0.364ms (trace=80e5012edec596be57c6380555f7de24,span=4c5b778b07b87cde,sampled=0,service.name=gpt2_demo)
2024-06-10T21:14:35-0400 [INFO] [api_server:7] 190.12.168.30:32656 (scheme=http,method=GET,path=/static_content/swagger-ui.css,type=,length=) (status=200,type=text/css; charset=utf-8,length=152059) 1.306ms (trace=3133d0051c7f2a7c9fea87b88e1f3e61,span=eb3521dfc41aa0a4,sampled=0,service.name=gpt2_demo)
2024-06-10T21:14:35-0400 [INFO] [api_server:4] 190.12.168.30:32427 (scheme=http,method=GET,path=/static_content/index.css,type=,length=) (status=200,type=text/css; charset=utf-8,length=1127) 0.878ms (trace=d9a985a73743d8bdd133a2c74751b141,span=ce110c1de467d791,sampled=0,service.name=gpt2_demo)
2024-06-10T21:14:35-0400 [INFO] [api_server:7] 190.12.168.30:31876 (scheme=http,method=GET,path=/static_content/swagger-initializer

### Ejemplo de Pipeline real en vertex AI

# <center>[<img src="images/vertex.png" width="56%"/>](attachment:image.png)</center>