# üìä Telecom X - Predi√ß√£o de Evas√£o de Clientes (Parte 2)

Este notebook cont√©m a **Parte 2** do desafio da Telecom X: constru√ß√£o de modelos preditivos para prever a evas√£o de clientes.

---

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
from imblearn.over_sampling import SMOTE

import warnings
warnings.filterwarnings('ignore')

## 1. Carregar os dados tratados

In [None]:
df = pd.read_csv('TelecomX_limpo.csv')
df.head()

## 2. Pr√©-processamento dos dados

In [None]:
# Remover colunas irrelevantes
if 'customerID' in df.columns:
    df = df.drop(columns=['customerID'])

# One-hot encoding para vari√°veis categ√≥ricas
df = pd.get_dummies(df, drop_first=True)

df.head()

## 3. Balanceamento das classes

In [None]:
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

print('Propor√ß√£o de classes:')
print(y.value_counts(normalize=True))

# Aplicar SMOTE
sm = SMOTE(random_state=42)
X_res, y_res = sm.fit_resample(X, y)

print('\nAp√≥s SMOTE:')
print(y_res.value_counts(normalize=True))

## 4. Divis√£o treino/teste

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.3, random_state=42, stratify=y_res)
X_train.shape, X_test.shape

## 5. Modelagem

Treinaremos dois modelos:
- Regress√£o Log√≠stica (com normaliza√ß√£o)
- Random Forest (sem normaliza√ß√£o)

In [None]:
# Normaliza√ß√£o para modelos baseados em dist√¢ncia
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Regress√£o Log√≠stica
logreg = LogisticRegression(max_iter=1000)
logreg.fit(X_train_scaled, y_train)
y_pred_log = logreg.predict(X_test_scaled)

# Random Forest
rf = RandomForestClassifier(random_state=42, n_estimators=200)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)

## 6. Avalia√ß√£o dos modelos

In [None]:
print('--- Regress√£o Log√≠stica ---')
print(classification_report(y_test, y_pred_log))
ConfusionMatrixDisplay(confusion_matrix(y_test, y_pred_log)).plot()
plt.show()

print('--- Random Forest ---')
print(classification_report(y_test, y_pred_rf))
ConfusionMatrixDisplay(confusion_matrix(y_test, y_pred_rf)).plot()
plt.show()

## 7. Import√¢ncia das vari√°veis

In [None]:
importances = pd.Series(rf.feature_importances_, index=X_train.columns)
importances.sort_values(ascending=False).head(10).plot(kind='bar', figsize=(10,5))
plt.title('Top 10 Vari√°veis mais importantes - Random Forest')
plt.show()