In [1]:
!pip install boto3
!pip install psycopg2-binary



### Checkpoint 6: Final mile(s)
+ Pipeline de predicción: Su requieres tendría que buscar el pkl en S3 (training) y haber pasado la validación de FE
  + Las predicciones se guardan en S3 y en RDS
  + Metadata de predicción en RDS -> gobernanza de modelos (guardar el uuid del archivo con las predicciones)
  + Al menos 2 validaciones sobre las predicciones
+ Cálculo de bias y fairness con el mejor modelo seleccionado durante el training (a través de python no Web!)
  + Persistencia de bias/fairnes, métricas.
  + Metadata de bias
+ API para exponer tus predicciones, al menos 1 endpoint
+ Dashboard de monitoreo de modelo
+ README completo, cualquiera que se meta a su github puede reproducir su producto de datos siguiendo sus
instrucciones
  + Agrega el requirements.txt de tu pyenv
  + Agrega una foto de tu pipeline completo todo en verde!

Proceso:
+ Corremos bias y fairness del mejor modelo
  + Visualización de Pipeline
  + Persistencia de datos
  + Verificación de persistencia de metadatos
+ Corremos predicciones
  + Visualización de tu pipeline
+ Verificamos metadata de predicciones
+ Validaciones de predicciones marbles
  + Primera vez no pasa (si guardas metadatos, se guardan cuando falla)
  + Segunda vez pasa, guarda metadatos
+ Verificamos el endpoint de tu API
  + Regresamos predicciones
+ Verificamos dashboard de monitoreo

## 0.1 Reinicia

1. Borra modelos: (asegurarse que las credenciales también esten en [default]
```
aws s3 rm s3://models-dpa --recursive
aws s3 rm s3://preds-dpa --recursive
```

2. Borra info de tablas (en psql)

```
delete from metadatos.bias;
delete from metadatos.models;
delete from predictions.train;
delete from predictions.test;
delete from metadatos.predictions;
```

3. Corre modelo
```
PYTHONPATH='.' AWS_PROFILE=dpa luigi --module modelling  RunModelSimple --local-scheduler
```

# 0.2 Configuración Inicial

1. Iniciar EC2 (ssh)

```
ssh -i /home/paola/.ssh/dpa_prueba.pem ubuntu@ec2-54-226-157-22.compute-1.amazonaws.com

```


2. Correr Docker

```
sudo docker run --rm -it \
-v /home/ubuntu/dpa_rita:/home  \
-v $HOME/.aws:/root/.aws:ro  \
-v $HOME/.rita:/root/.rita \
--entrypoint "/bin/bash" \
--net=host \
paolamedo/aws_rita:6.0.1
```

3. Correr luigi
```
$luigid
```

4. Crear puente
```
ssh -i ~/.ssh/dpa_prueba.pem -L localhost:8082:localhost:8082 ubuntu@ec2-54-226-157-22.compute-1.amazonaws.com -v
```

5. Ver luigi
```
localhost:8082
```

6. Otra ventana, dentro de docker.
```
cd .aws
nano credential -> actualizar credenciales
```

Iniciar orquestadores
```
cd /home
python3 setup.py install
cd src/orquestadores
```


In [2]:
import sys
sys.path.append('./../')
%load_ext autoreload
%autoreload 2

import pandas as pd

from src.utils.s3_utils import create_bucket, get_s3_objects, describe_s3
from src.utils.db_utils import execute_query, show_select,get_select, get_dataframe

In [3]:
bucket_name = "models-dpa"
get_s3_objects(bucket_name)

s3.ObjectSummary(bucket_name='models-dpa', key='20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&.model.zip') 6667
s3.ObjectSummary(bucket_name='models-dpa', key='20200601_0-1.5_LR_=#iter#-%205$%#pca#-%9&.model.zip') 6685


In [4]:
query = "select * from metadatos.models order by fecha desc; "
get_dataframe(query)

Unnamed: 0,fecha,objetivo,model_name,s3_name,hyperparams,auroc,aupr,precision,recall,f1,train_time,test_split,train_nrows
0,20200601,0-1.5,LR,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,"{""iter"": 202, ""pca"": 7}",0.987856,0.986579,0.9721254355400696,0.9721254355400696,0.9721254355400696,37.55584955215454,0.2,1154
1,20200601,0-1.5,LR,20200601_0-1.5_LR_=#iter#-%205$%#pca#-%9&,"{""iter"": 205, ""pca"": 9}",1.0,1.0,1.0,1.0,1.0,55.85569787025452,0.2,1195


# 1. Corremos bias y fairness del mejor modelo

## 1.1 Persistencia de datos
Verificar tabla antes

In [12]:
#query = "delete from metadatos.bias;"
#execute_query(query)

query = "select * from metadatos.bias order by fecha desc; "
get_dataframe(query)

#Si quieremos que vuelva a correr el mismo dia, tenemos que cambiar el ID en orquestadores/bias.py

Error while fetching data from PostgreSQL Length mismatch: Expected axis has 0 elements, new values have 10 elements


## 1.2  Visualización de Pipeline

```
PYTHONPATH='.' AWS_PROFILE=dpa luigi --module bias EvaluateBias --local-scheduler
```

Ambos son lo mismo, (este está más padre): 
```
PYTHONPATH='.' AWS_PROFILE=dpa luigi --module luigi_main  Pipeline --local-scheduler  --type train
```

Para correr con Luigi Task Visualiser

```
PYTHONPATH='.' AWS_PROFILE=dpa luigi --module luigi_main  Pipeline  --type train
```


## 1.3  Verificación de persistencia de metadatos

In [15]:
query = "select * from metadatos.bias order by fecha desc; "
get_dataframe(query)

Unnamed: 0,fecha,s3_name,attribute_value_q1,attribute_value_q2,attribute_value_q3,attribute_value_q4,fpr_disparity_q1,fpr_disparity_q2,fpr_disparity_q3,fpr_disparity_q4
0,20200601,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,335.00-461.00,461.00-733.00,733.00-2133.00,89.00-335.00,0.989373,0.933755,1.125718,1.0


## 1.4 Verificación del CopyToTable
```
PYTHONPATH='.' AWS_PROFILE=dpa luigi --module luigi_main  Pipeline --local-scheduler  --type train
```

In [16]:
query = "select * from metadatos.bias order by fecha desc; "
get_dataframe(query)

Unnamed: 0,fecha,s3_name,attribute_value_q1,attribute_value_q2,attribute_value_q3,attribute_value_q4,fpr_disparity_q1,fpr_disparity_q2,fpr_disparity_q3,fpr_disparity_q4
0,20200601,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,335.00-461.00,461.00-733.00,733.00-2133.00,89.00-335.00,0.989373,0.933755,1.125718,1.0


# 2. Corremos predicciones

+ Validaciones de predicciones marbles
  + Primera vez no pasa (si guardas metadatos, se guardan cuando falla)
  + Segunda vez pasa, guarda metadatos
  
+ Visualización de tu pipeline
+ Verificamos metadata de predicciones

## 2.1 Validaciones de predicciones marbles

In [19]:
query = "delete from metadatos.testing_predict_cols;"
execute_query(query)

query = "select * from metadatos.testing_predict_cols order by fecha desc; "
get_dataframe(query)

PostgreSQL connection is closed
Error while fetching data from PostgreSQL Length mismatch: Expected axis has 0 elements, new values have 4 elements


In [20]:
query = "delete from metadatos.testing_predict_types;"
execute_query(query)

query = "select * from metadatos.testing_predict_types order by fecha desc; "
get_dataframe(query)

PostgreSQL connection is closed
Error while fetching data from PostgreSQL Length mismatch: Expected axis has 0 elements, new values have 4 elements


### 2.1.1 Primera vez no pasa (si guardas metadatos, se guardan cuando falla)

1. Modificar src/unit_tests/predict_columns.py (donde dice #PARA QUE FALLE)
    * En un mismo dia, cambiar el update_id en orquestadores/predictions.py
    
```
cd /home/src/unit_tests/
nano predict_columns.py
```
   
2. Correr task de predict
```
cd /home
python3 setup.py install
cd src/orquestadores
PYTHONPATH='.' AWS_PROFILE=dpa luigi --module luigi_main  Pipeline --type predict
```


In [21]:
query = "select * from metadatos.testing_predict_cols order by fecha desc; "
get_dataframe(query)

Unnamed: 0,fecha,nombre_task,task_status,msg_error
0,1062020,check_columns,failure,number of columns do not match


### 2.1.2 Segunda vez pasa, guarda metadatos

3. Volver a modificar
    
```
cd /home/src/unit_tests/
nano predict_columns.py
```

4. Correr task de predict
```
cd /home
python3 setup.py install
cd src/orquestadores
PYTHONPATH='.' AWS_PROFILE=dpa luigi --module luigi_main  Pipeline  --type predict
```

In [24]:
query = "select * from metadatos.testing_predict_cols order by fecha desc; "
get_dataframe(query)

Unnamed: 0,fecha,nombre_task,task_status,msg_error
0,1062020,check_columns,failure,number of columns do not match
1,1062020,check_columns,success,none


In [25]:
query = "select * from metadatos.testing_predict_types order by fecha desc; "
get_dataframe(query)

Unnamed: 0,fecha,nombre_task,task_status,msg_error
0,1062020,check_columns_types,success,none


## 2.2 Visualización de tu pipeline
En localhost orquestador

## 2.3 Verificamos metadata de predicciones

In [27]:
query = "select * from metadatos.predictions order by fecha; "
get_dataframe(query)

Unnamed: 0,fecha,s3_name_model,s3_name_pred,number_pred,binary_stats
0,20200531,31052020_0-1.5_LR_=#iter#-%400$%#pca#-%8&,31052020_0-1.5_LR_=#iter#-%400$%#pca#-%8&.preds,1385,1.0
1,20200531,31052020_0-1.5_LR_=#iter#-%200$%#pca#-%6&,31052020_0-1.5_LR_=#iter#-%200$%#pca#-%6&.preds,1385,1.0
2,20200531,31052020_0-1.5_LR_=#iter#-%200$%#pca#-%6&,31052020_0-1.5_LR_=#iter#-%200$%#pca#-%6&.preds,1385,1.0
3,20200531,31052020_0-1.5_LR_=#iter#-%400$%#pca#-%8&,31052020_0-1.5_LR_=#iter#-%400$%#pca#-%8&.preds,1385,1.0
4,20200601,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&.preds,1441,1.0
5,20200601,31052020_0-1.5_LR_=#iter#-%200$%#pca#-%6&,31052020_0-1.5_LR_=#iter#-%200$%#pca#-%6&.preds,1385,1.0
6,27052020,18052020_0-1.5_LR_=#iter#-%1$%#pca#-%1&,18052020_0-1.5_LR_=#iter#-%1$%#pca#-%1&.preds,1000,1.0
7,29052020,18052020_0-1.5_LR_=#iter#-%1$%#pca#-%1&,18052020_0-1.5_LR_=#iter#-%1$%#pca#-%1&.preds,1000,1.0
8,30052020,30052020_0-1.5_LR_=#iter#-%200$%#pca#-%8&,30052020_0-1.5_LR_=#iter#-%200$%#pca#-%8&.preds,1000,1.0


## 2.4 Verificamos que hayamos guardado las predicciones

In [28]:
bucket_name = "preds-dpa"
get_s3_objects(bucket_name)

s3.ObjectSummary(bucket_name='preds-dpa', key='20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&.preds') 106723


In [29]:
query = " select * from predictions.test  order by s3_name desc limit 10; "
get_dataframe(query)

Unnamed: 0,flight_number,distance,prediction,s3_name,fecha
0,1609.0,650.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200201.0
1,1613.0,1045.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200216.0
2,3324.0,402.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200206.0
3,3331.0,868.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200207.0
4,3322.0,335.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200219.0
5,3306.0,155.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200220.0
6,3320.0,733.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200227.0
7,1595.0,190.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200203.0
8,1606.0,993.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200207.0
9,1607.0,1021.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20200201.0


In [30]:
query = " select * from predictions.train  order by s3_name desc limit 10; "
get_dataframe(query)

Unnamed: 0,flight_number,originwac,distance,label_value,score,s3_name,fecha
0,3320.0,22.0,733.0,0.0,0.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191207.0
1,3334.0,33.0,1013.0,0.0,0.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191207.0
2,3309.0,23.0,340.0,1.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191228.0
3,3302.0,21.0,719.0,1.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191229.0
4,3331.0,54.0,868.0,1.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191229.0
5,3320.0,22.0,733.0,0.0,0.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191207.0
6,1588.0,81.0,2133.0,0.0,0.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191202.0
7,3330.0,74.0,868.0,1.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191229.0
8,3325.0,33.0,402.0,1.0,1.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191229.0
9,3319.0,41.0,733.0,0.0,0.0,20200601_0-1.5_LR_=#iter#-%202$%#pca#-%7&,20191207.0


# 3. Verificamos el endpoint de tu API

1. Instala flask
```
 pip install flask_restx
```

2. Crear puente

```
ssh -i ~/.ssh/dpa_prueba.pem -L localhost:5000:localhost:5000 ubuntu@ec2-54-226-157-22.compute-1.amazonaws.com -v
```


3. Corre app
```
cd /home
python3 setup.py install
cd src/deploy
python3 app.py
```

4. Ver predicciones 
```
http://127.0.0.1:5000/predicts/1609
```

5. Ver swagger
```
http://127.0.0.1:5000/swagger.json
```



# 4. Verificamos dashboard de monitoreo

- Magia en Shinny <3

# Extra. Correr todo el Pipeline

1. Para CopyToTable
```
delete from table_updates;
```

2. Borrar targets

```
cd /home/src/orquestadores/target
rm *.txt
```


3. Cambiar id
```
cd /home/src/orquestadores/
nano predictions.py
nano bias.py
nano modelling.py <- cambiar parametros de modelo
```

4. Correr completo 
```
cd /home
python3 setup.py install
cd src/orquestadores
PYTHONPATH='.' AWS_PROFILE=dpa luigi --module luigi_main  Pipeline  --type predict
```