# Curso Deep Learning

<img src="https://yaelmanuel.com/wp-content/uploads/2021/12/platzi-banner-logo-matematicas.png" width="500px">

---

## Creando nuestro propio Churn Analysis 🤓📊

En este lab aprenderás:

* [Tensorflow](https://www.tensorflow.org/)
* [Keras](https://keras.io/)
* Descargar un dataset, prepararlo, entrenarlo, realizar finetuning y guardarlo.


### 1) Descarga del dataset 🤓

Utilizaremos un conjunto de datos de un proveedor de Telecomunicaciones para su Programa de Retención.
<br>Para más detalle acá se puede ver el dataset de Kaggle: [Telco Customer Churn](https://www.kaggle.com/datasets/blastchar/telco-customer-churn/data).


In [None]:
!pip install --upgrade --force-reinstall --no-deps kaggle

In [None]:
from google.colab import files
files.upload()

In [None]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json

In [None]:
!kaggle datasets list -s telco-customer-churn

In [None]:
!kaggle datasets download -d blastchar/telco-customer-churn

In [None]:
!unzip '/content/telco-customer-churn.zip'

### 2) Preparación de la data 👌

#### 2.1) Instalamos las dependencias 🙌

In [None]:
!pip install ydata-profiling

In [None]:
import joblib
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import load_model
from tensorflow.keras.callbacks import EarlyStopping

from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc

#### 2.2) Explorar Dataset 🔍

In [None]:
!ls

In [None]:
df = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv", sep=",")

**Tip:** Para visualizar todas las columnas del dataframe

In [29]:
df.head(3)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes


In [30]:
pd.set_option('display.max_columns', None)

In [31]:
df.head(3)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes


#### 2.3) EDA (Análisis Exploratorio de Datos)

Visualizamos qué tenemos en el dataframe

In [32]:
# Crear un resumen utilizando funciones nativas de pandas
def summarize_dataframe_with_pandas(df):
    summary = df.describe(include='all').T  # Descripción general
    summary['Type'] = df.dtypes  # Tipos de datos
    summary['Unique Values'] = df.nunique()  # Cantidad de valores únicos
    summary['Examples'] = df.apply(lambda col: col.dropna().unique()[:3])  # Ejemplos de valores

    # Reorganizar columnas para mejor visualización
    summary = summary[['Type', 'Unique Values', 'Examples']]
    return summary

In [33]:
summarize_dataframe_with_pandas(df)

Unnamed: 0,Type,Unique Values,Examples
customerID,object,7043,"[7590-VHVEG, 5575-GNVDE, 3668-QPYBK]"
gender,object,2,"[Female, Male]"
SeniorCitizen,int64,2,"[0, 1]"
Partner,object,2,"[Yes, No]"
Dependents,object,2,"[No, Yes]"
tenure,int64,73,"[1, 34, 2]"
PhoneService,object,2,"[No, Yes]"
MultipleLines,object,3,"[No phone service, No, Yes]"
InternetService,object,3,"[DSL, Fiber optic, No]"
OnlineSecurity,object,3,"[No, Yes, No internet service]"


Una visualización más detallada e interactiva

In [34]:
from ydata_profiling import ProfileReport

ProfileReport(df, minimal=True)

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



#### 2.4) Valores únicos

Eliminar columna con valores únicos

In [35]:
df = df.drop('customerID', axis=1)

Eliminar columna con que puede generar Bias o Sesgo

In [36]:
df = df.drop('gender', axis=1)

#### 2.5) Valores faltantes

In [39]:
# Evaluar cantidad de valores faltantes
df.isnull().sum()

Unnamed: 0,0
SeniorCitizen,0
Partner,0
Dependents,0
tenure,0
PhoneService,0
MultipleLines,0
InternetService,0
OnlineSecurity,0
OnlineBackup,0
DeviceProtection,0


#### 2.6) Columnas Categóricas

Reeplazo de valores binarios en columnas categóricas

In [40]:
# Evitar warning por uso de Replace
pd.set_option('future.no_silent_downcasting', True)

In [41]:
categorical_columns = list(df.select_dtypes(include='O').keys())

for i in categorical_columns:
    df[i] = df[i].replace('Yes', 1)
    df[i] = df[i].replace('No', 0)

Label Encoder

In [42]:
# Inicializar y aplicar LabelEncoder único
label_encoders = {}
for col in categorical_columns:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col].astype(str))  # Codificar las columnas categóricas
    label_encoders[col] = le  # Guardar el codificador único para cada columna

In [43]:
# Guardar los label encoders
joblib.dump(label_encoders, 'label_encoders.pkl')

['label_encoders.pkl']

In [44]:
df.head(3)

Unnamed: 0,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,0,1,0,1,0,2,1,0,1,0,0,0,0,0,1,2,29.85,2505,0
1,0,0,0,34,1,0,1,1,0,1,0,0,0,1,0,3,56.95,1466,0
2,0,0,0,2,1,0,1,1,1,0,0,0,0,0,1,3,53.85,157,1


#### 2.7) Columnas Numéricas

Escalar la data

In [53]:
scale_cols = ['tenure','MonthlyCharges','TotalCharges']

scale = MinMaxScaler()
df[scale_cols] = scale.fit_transform(df[scale_cols])

In [54]:
# Guardar el escalado de datos
joblib.dump(scale, 'scaler.pkl')

['scaler.pkl']

In [55]:
df.head(3)

Unnamed: 0,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,0,1,0,0.013889,0,2,1,0,1,0,0,0,0,0,1,2,0.115423,0.383614,0
1,0,0,0,0.472222,1,0,1,1,0,1,0,0,0,1,0,3,0.385075,0.224502,0
2,0,0,0,0.027778,1,0,1,1,1,0,0,0,0,0,1,3,0.354229,0.024043,1


### 3) Entrenamiento 💪

### 4) Red Neuronal 😨

### 5) Métricas 📊

### 6) Guardar el modelo 💾

### 7) Hacer Predicciones en Producción 🤙

### 8) Conclusiones

- Aprender sobre los distintos objetos y métodos que nos ofrece Tensorflow + Keras.

- Realizar el proceso completo de entrenamiento de un modelo con Tensorflow + Keras.

- Aprender tips sobre implementación con el uso de la GPU.

<br>
<br>
<br>

---

<br>
<br>


<img src="https://static.platzi.com/media/avatars/platziteam_8cfe6fc7-1246-4c9a-9f5d-d10d467443ee.png" width="100px">

[Platzi](https://platzi.com/) 🚀

