# Predição de _churn_ - _Telecom_

* Modelo de predição de cancelamento de contrato de clientes;
* Desenvolvido a partir do projeto de análise de dados [Taxa _Churn_ - _Telecom_](https://github.com/mannalab/Data-Science/blob/main/An%C3%A1lise%20de%20dados/Taxa_Churn_Telecom.ipynb), do Manna;

---

[Open In Colab](https://colab.research.google.com/drive/1XECcYqpeGXbIhw9eueI-UEoGokYalXCQ?usp=sharing)

[Open in Kaggle](https://www.kaggle.com/leonichel/predict-churn-telecom)

## Modelo de aprendizagem

### Bibliotecas

In [None]:
!pip install gradio

In [None]:
import pandas as pd
import numpy as np
import gradio as gr
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler, LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.metrics import confusion_matrix, classification_report, precision_recall_curve, roc_curve, roc_auc_score
from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split, cross_val_predict
from sklearn.ensemble import RandomForestClassifier
from imblearn.under_sampling import RandomUnderSampler

### Leitura do banco de dados

In [None]:
!wget 'https://storage.googleapis.com/kagglesdsdata/datasets/13996/18858/WA_Fn-UseC_-Telco-Customer-Churn.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210518%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210518T232452Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=4db3347fced42a1b3c52c68f5da6fa30b801c5353009372e736cf85f80994cacef926286750aa2b3136f27822112ed1fbf229b524ed608379f5761fb696d4cbd18fca1777ab5cbeab06ed36f2620a70516cae5a51ebd96249df9327fed22ec5f8a522eae5b1b2bb60ad52bf6c9909dc65854aaca88b309ef8f51e669de548c2da038b710bddf7a29a2f27380dc2b550562804cf654ec5491496037432c042e8ac4fb4376dc55b54d8883347a4cd2ea40c8c8a334df89ef4d19fe615e34057d2781da8e02cf306f95208bd19b3dc47cfd1aa4523cc9d0f18c81d78fb8855dd8c5aa63bbc69f746f8c5c50b5eaf601e74c7fb8a88c2d8936be4c553777fd638765' -O 'churn.csv'

In [None]:
df = pd.read_csv('churn.csv')

### Pré-processamento

#### Remover _'customerID'_

In [None]:
df.drop('customerID', axis=1, inplace=True)

#### Alterar valores binários de _'SeniorCitizen'_ para 'Yes' e 'No'

In [None]:
filter = lambda x: 'Yes' if x == 1 else 'No'
df['SeniorCitizen'] = df['SeniorCitizen'].apply(filter)

#### Removendo linhas com valores nulos de _'TotalCharges'_

In [None]:
df.drop(df[(df.TotalCharges == " ")].index, axis=0, inplace=True)

In [None]:
df.TotalCharges = pd.to_numeric(df.TotalCharges)

### Separação do banco de dados

In [None]:
train, test = train_test_split(df, test_size=0.2, random_state=0)
test.info()

In [None]:
y = train['Churn']
train.drop(['Churn'] , axis=1, inplace=True)
X = train.copy()

y

### Criação de _Pipelines_

In [None]:
numerical_features = train.select_dtypes(exclude=['object']).columns.tolist()
categorical_features = train.select_dtypes(include=['object']).columns.tolist()
categorical_features

In [None]:
# Numérico
numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant')),
    ('scaler', StandardScaler())])

# Categórico
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder())])

# Juntando
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)])

pipe_RF = Pipeline(
    steps = [('preprocessor', preprocessor),
            ('RF', RandomForestClassifier(class_weight='balanced', random_state=0))])

pipe_RF.fit(X, y)

### Predição e validação com banco de trainamento

In [None]:
y_pred = cross_val_predict(pipe_RF, X, y, cv=5)
print(classification_report(y, y_pred))

Predição e validação com banco de teste

In [None]:
y_test = test['Churn']
test.drop(['Churn'] , axis=1, inplace=True)
X_test = test.copy()

y_test

In [None]:
y_pred_test = cross_val_predict(pipe_RF, X_test, y_test, cv=5)
print(classification_report(y_test, y_pred_test))

### Resultado

* O modelo tem baixo _recall_ para a resposta _'Yes'_, ocasionado, possivelmente, pelo banco de dados ser debalanceado (70% dos valores de saída são _'No'_); Para arrumar, recomenda-se balancear o banco de dados, com técnicas de _undersampling_ ou _oversampling_;
* O modelo obteve cerca de 77% de precisão geral;
* O modelo não sofreu de _underfitting_ ou _overfitting_;

## Desenvolvendo interface

### Função preditiva

In [None]:
def predict(gender, SeniorCitizen, Partner, Dependents, PhoneService, 
            MultipleLines, InternetService, OnlineSecurity, OnlineBackup,
            DeviceProtection, TechSupport, StreamingTV, StreamingMovies, 
            Contract, PaperlessBilling, PaymentMethod, tenure, MonthlyCharges, 
            TotalCharges):

    a = [gender, SeniorCitizen, Partner, Dependents, tenure, PhoneService, 
        MultipleLines, InternetService, OnlineSecurity, OnlineBackup, 
        DeviceProtection, TechSupport, StreamingTV, StreamingMovies, Contract, 
        PaperlessBilling, PaymentMethod, MonthlyCharges, TotalCharges]

    x = pd.DataFrame([a], columns=X.columns)
    y = pipe_RF.predict(x)

    return y[0]

### Opções da interface

In [None]:
options = []
carry = []
for col in categorical_features:
    for i in df[col].value_counts().index:
        carry.append(i)
    options.append(carry)
    carry = []
options

In [None]:
inputs = []
for opt in options:
    inputs.append(gr.inputs.Radio(opt))

for i in numerical_features:
    inputs.append("number")

### Interface

In [None]:
gr.Interface(fn=predict, inputs=inputs, outputs='text').launch(share=True);