<a href="https://colab.research.google.com/github/pablocelva/challenge-telecom-x-parte-2/blob/main/TelecomX_LATAM_parte_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Challenge Telecom X Latam (parte 2)

🧠 Objetivos del Desafío

- Preparar los datos para el modelado (tratamiento, codificación, normalización).

- Realizar análisis de correlación y selección de variables.

-  Entrenar dos o más modelos de clasificación.

- Evaluar el rendimiento de los modelos con métricas.

- Interpretar los resultados, incluyendo la importancia de las variables.

-  Crear una conclusión estratégica señalando los principales factores que influyen en la cancelación.

#🛠️ Preparación de los Datos

##1. Extracción de datos tratados

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
url = 'https://raw.githubusercontent.com/pablocelva/challenge-telecom-x-parte-2/refs/heads/main/datos_tratados.csv'
datos = pd.read_csv(url)
datos.sample(5)

Unnamed: 0,Churn,gender,SeniorCitizen,Partner,Dependents,tenure,Contract,PaperlessBilling,PaymentMethod,Cuenta_Mensual,...,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Cuentas_Diarias,Total_Internet_Services
6916,0,Female,1,1,0,27,Month-to-month,1,Bank transfer (automatic),104.3,...,1,Fiber optic,1,0,0,1,1,1,3.476667,4
7215,0,Male,0,0,0,72,Two year,1,Electronic check,118.2,...,1,Fiber optic,1,1,1,1,1,1,3.94,6
4261,0,Female,0,0,1,2,One year,0,Bank transfer (automatic),20.5,...,0,No,0,0,0,0,0,0,0.683333,0
6017,0,Male,0,0,0,5,Month-to-month,1,Mailed check,55.75,...,0,DSL,1,0,0,1,0,0,1.858333,2
3247,0,Female,0,1,1,51,One year,1,Mailed check,95.15,...,1,Fiber optic,1,0,0,1,1,0,3.171667,3


## 2. Eliminar columnas no relevantes

In [3]:
df = datos.drop(['Cuentas_Diarias', 'Total_Internet_Services'], axis=1)
df.sample(5)

Unnamed: 0,Churn,gender,SeniorCitizen,Partner,Dependents,tenure,Contract,PaperlessBilling,PaymentMethod,Cuenta_Mensual,Cuenta_Total,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies
3307,0,Female,1,0,0,20,Month-to-month,1,Electronic check,81.45,1671.6,1,0,Fiber optic,0,0,0,0,0,1
2711,1,Male,1,0,0,2,Month-to-month,1,Electronic check,44.15,92.65,1,0,DSL,0,0,0,0,0,0
383,0,Male,1,0,0,51,One year,1,Bank transfer (automatic),79.6,3974.7,1,1,Fiber optic,0,1,0,0,0,0
5467,1,Male,0,0,0,1,Month-to-month,1,Electronic check,79.15,79.15,1,1,Fiber optic,0,1,0,0,0,0
5364,0,Male,0,1,1,13,Two year,0,Mailed check,40.55,590.35,0,0,DSL,1,1,0,1,0,0


##3. Encoding

In [6]:
#categorical_cols = df.select_dtypes(include=['object', 'category']).columns
#print(f"Categorical columns to encode: {list(categorical_cols)}")

#df_encoded = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

#df_encoded.sample(5)

In [4]:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder

In [14]:
X = df.drop('Churn', axis=1)
y = df['Churn']

In [15]:
X.sample(5)

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,Contract,PaperlessBilling,PaymentMethod,Cuenta_Mensual,Cuenta_Total,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies
6556,Female,0,0,0,55,Two year,1,Mailed check,64.75,3617.1,1,0,DSL,0,1,1,0,0,1
4289,Female,1,1,0,67,One year,1,Electronic check,105.6,7112.15,1,1,Fiber optic,1,1,0,0,1,1
5810,Female,0,1,1,33,Two year,1,Mailed check,24.5,740.3,1,1,No,0,0,0,0,0,0
4955,Male,1,0,0,46,Month-to-month,0,Electronic check,85.0,3969.4,1,0,Fiber optic,1,0,0,0,0,1
3777,Female,0,1,1,56,Two year,1,Credit card (automatic),61.3,3346.8,1,0,DSL,1,1,1,0,0,0


In [16]:
y.sample(5)

Unnamed: 0,Churn
3824,0
3462,1
5969,0
1862,0
6181,0


In [11]:
columnas = X.columns

In [9]:
one_hot = make_column_transformer(
    (OneHotEncoder(drop='if_binary'), ['Contract','PaymentMethod', 'InternetService']),
    remainder='passthrough',
    sparse_threshold=0,
    force_int_remainder_cols=False
)

In [17]:
X = one_hot.fit_transform(X)

In [19]:
one_hot.get_feature_names_out()

array(['onehotencoder__Contract_Month-to-month',
       'onehotencoder__Contract_One year',
       'onehotencoder__Contract_Two year',
       'onehotencoder__PaymentMethod_Bank transfer (automatic)',
       'onehotencoder__PaymentMethod_Credit card (automatic)',
       'onehotencoder__PaymentMethod_Electronic check',
       'onehotencoder__PaymentMethod_Mailed check',
       'onehotencoder__InternetService_DSL',
       'onehotencoder__InternetService_Fiber optic',
       'onehotencoder__InternetService_No', 'remainder__gender',
       'remainder__SeniorCitizen', 'remainder__Partner',
       'remainder__Dependents', 'remainder__tenure',
       'remainder__PaperlessBilling', 'remainder__Cuenta_Mensual',
       'remainder__Cuenta_Total', 'remainder__PhoneService',
       'remainder__MultipleLines', 'remainder__OnlineSecurity',
       'remainder__OnlineBackup', 'remainder__DeviceProtection',
       'remainder__TechSupport', 'remainder__StreamingTV',
       'remainder__StreamingMovies'], dt

In [21]:
X

array([[0.0, 1.0, 0.0, ..., 1, 1, 0],
       [1.0, 0.0, 0.0, ..., 0, 0, 1],
       [1.0, 0.0, 0.0, ..., 0, 0, 0],
       ...,
       [1.0, 0.0, 0.0, ..., 0, 0, 0],
       [0.0, 0.0, 1.0, ..., 1, 0, 1],
       [0.0, 0.0, 1.0, ..., 0, 1, 1]], dtype=object)

In [23]:
pd.DataFrame(X, columns=one_hot.get_feature_names_out())

Unnamed: 0,onehotencoder__Contract_Month-to-month,onehotencoder__Contract_One year,onehotencoder__Contract_Two year,onehotencoder__PaymentMethod_Bank transfer (automatic),onehotencoder__PaymentMethod_Credit card (automatic),onehotencoder__PaymentMethod_Electronic check,onehotencoder__PaymentMethod_Mailed check,onehotencoder__InternetService_DSL,onehotencoder__InternetService_Fiber optic,onehotencoder__InternetService_No,...,remainder__Cuenta_Mensual,remainder__Cuenta_Total,remainder__PhoneService,remainder__MultipleLines,remainder__OnlineSecurity,remainder__OnlineBackup,remainder__DeviceProtection,remainder__TechSupport,remainder__StreamingTV,remainder__StreamingMovies
0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,...,65.6,593.3,1,0,0,1,0,1,1,0
1,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,...,59.9,542.4,1,1,0,0,0,0,0,1
2,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,73.9,280.85,1,0,0,0,1,0,0,0
3,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,98.0,1237.85,1,0,0,1,1,0,1,1
4,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,...,83.9,267.4,1,0,0,0,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7262,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,...,55.15,742.9,1,0,1,0,0,1,0,0
7263,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,85.1,1873.7,1,1,0,0,0,0,0,1
7264,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,...,50.3,92.75,1,0,0,1,0,0,0,0
7265,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,...,67.85,4627.65,1,0,1,0,1,1,0,1


In [26]:
df_encoded = pd.DataFrame(X, columns=one_hot.get_feature_names_out())
df_encoded.sample(5)

Unnamed: 0,onehotencoder__Contract_Month-to-month,onehotencoder__Contract_One year,onehotencoder__Contract_Two year,onehotencoder__PaymentMethod_Bank transfer (automatic),onehotencoder__PaymentMethod_Credit card (automatic),onehotencoder__PaymentMethod_Electronic check,onehotencoder__PaymentMethod_Mailed check,onehotencoder__InternetService_DSL,onehotencoder__InternetService_Fiber optic,onehotencoder__InternetService_No,...,remainder__Cuenta_Mensual,remainder__Cuenta_Total,remainder__PhoneService,remainder__MultipleLines,remainder__OnlineSecurity,remainder__OnlineBackup,remainder__DeviceProtection,remainder__TechSupport,remainder__StreamingTV,remainder__StreamingMovies
5096,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,101.9,1667.25,1,0,1,0,0,1,1,1
49,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,...,44.3,44.3,1,0,0,0,0,0,0,0
5044,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,...,68.05,4158.25,1,0,1,1,1,1,0,0
2556,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,84.35,1938.05,1,1,1,0,0,1,0,0
6205,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,...,85.6,1345.55,1,1,0,0,0,0,0,1


##4. Verificación de la Proporción de Cancelación (Churn)

In [28]:
cuenta_churn = df['Churn'].value_counts()
print("Distribución de la variable 'Churn':", cuenta_churn)

Distribución de la variable 'Churn': Churn
0    5398
1    1869
Name: count, dtype: int64


In [29]:
proporcion_churn = df['Churn'].value_counts(normalize=True)
print("\nProporción de la variable 'Churn':", proporcion_churn)


Proporción de la variable 'Churn': Churn
0    0.74281
1    0.25719
Name: proportion, dtype: float64


In [30]:
proporcion_clase_min = proporcion_churn.min()

if proporcion_clase_min < 0.25:
    print("\nExiste un posible desbalance de clases en la variable 'Churn'.")
else:
    print("\nNo parece haber un desbalance significativo de clases en la variable 'Churn'.")


No parece haber un desbalance significativo de clases en la variable 'Churn'.


##5. Balanceo de Clases

##6. Normalización o Estandarización

#🎯 Correlación y Selección de Variables

##1. Análisis de Correlación

##2. Análisis Dirigido

#🤖 Modelado Predictivo

##1. Separación de Datos

##2. Creación de Modelos

##3. Evaluación de los Modelos

#📋 Interpretación y Conclusiones

##1. Análisis de la Importancia de las Variables

##2. Conclusión