# Predição de _churn_ - _Telecom_

* Modelo de predição de cancelamento de contrato de clientes;
* Desenvolvido a partir do projeto de análise de dados [Taxa _Churn_ - _Telecom_](https://github.com/mannalab/Data-Science/blob/main/An%C3%A1lise%20de%20dados/Taxa_Churn_Telecom.ipynb), do Manna;

---

[Open In Colab](https://colab.research.google.com/drive/1XECcYqpeGXbIhw9eueI-UEoGokYalXCQ?usp=sharing)

[Open in Kaggle](https://www.kaggle.com/leonichel/predict-churn-telecom)

## Modelo de aprendizagem

### Bibliotecas

In [1]:
!pip install gradio

Collecting gradio
[?25l  Downloading https://files.pythonhosted.org/packages/a4/45/0f2d4b4b55c1bccadd1983cfac35788ddadc0e0f70722b90b557088c634b/gradio-1.7.7-py3-none-any.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 4.0MB/s 
[?25hCollecting ffmpy
  Downloading https://files.pythonhosted.org/packages/bf/e2/947df4b3d666bfdd2b0c6355d215c45d2d40f929451cb29a8a2995b29788/ffmpy-0.3.0.tar.gz
Collecting pycryptodome
[?25l  Downloading https://files.pythonhosted.org/packages/ad/16/9627ab0493894a11c68e46000dbcc82f578c8ff06bc2980dcd016aea9bd3/pycryptodome-3.10.1-cp35-abi3-manylinux2010_x86_64.whl (1.9MB)
[K     |████████████████████████████████| 1.9MB 33.2MB/s 
[?25hCollecting paramiko
[?25l  Downloading https://files.pythonhosted.org/packages/95/19/124e9287b43e6ff3ebb9cdea3e5e8e88475a873c05ccdf8b7e20d2c4201e/paramiko-2.7.2-py2.py3-none-any.whl (206kB)
[K     |████████████████████████████████| 215kB 44.7MB/s 
[?25hCollecting analytics-python
  Downloading https://files.pytho

In [2]:
import pandas as pd
import numpy as np
import gradio as gr
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler, LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.metrics import confusion_matrix, classification_report, precision_recall_curve, roc_curve, roc_auc_score
from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split, cross_val_predict
from sklearn.ensemble import RandomForestClassifier
from imblearn.under_sampling import RandomUnderSampler



### Leitura do banco de dados

In [3]:
!wget 'https://storage.googleapis.com/kagglesdsdata/datasets/13996/18858/WA_Fn-UseC_-Telco-Customer-Churn.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210518%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210518T232452Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=4db3347fced42a1b3c52c68f5da6fa30b801c5353009372e736cf85f80994cacef926286750aa2b3136f27822112ed1fbf229b524ed608379f5761fb696d4cbd18fca1777ab5cbeab06ed36f2620a70516cae5a51ebd96249df9327fed22ec5f8a522eae5b1b2bb60ad52bf6c9909dc65854aaca88b309ef8f51e669de548c2da038b710bddf7a29a2f27380dc2b550562804cf654ec5491496037432c042e8ac4fb4376dc55b54d8883347a4cd2ea40c8c8a334df89ef4d19fe615e34057d2781da8e02cf306f95208bd19b3dc47cfd1aa4523cc9d0f18c81d78fb8855dd8c5aa63bbc69f746f8c5c50b5eaf601e74c7fb8a88c2d8936be4c553777fd638765' -O 'churn.csv'

--2021-05-21 04:02:30--  https://storage.googleapis.com/kagglesdsdata/datasets/13996/18858/WA_Fn-UseC_-Telco-Customer-Churn.csv?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20210518%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20210518T232452Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=4db3347fced42a1b3c52c68f5da6fa30b801c5353009372e736cf85f80994cacef926286750aa2b3136f27822112ed1fbf229b524ed608379f5761fb696d4cbd18fca1777ab5cbeab06ed36f2620a70516cae5a51ebd96249df9327fed22ec5f8a522eae5b1b2bb60ad52bf6c9909dc65854aaca88b309ef8f51e669de548c2da038b710bddf7a29a2f27380dc2b550562804cf654ec5491496037432c042e8ac4fb4376dc55b54d8883347a4cd2ea40c8c8a334df89ef4d19fe615e34057d2781da8e02cf306f95208bd19b3dc47cfd1aa4523cc9d0f18c81d78fb8855dd8c5aa63bbc69f746f8c5c50b5eaf601e74c7fb8a88c2d8936be4c553777fd638765
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.98.128, 142.250.97.128, 108.177.11.128, ...
C

In [4]:
df = pd.read_csv('churn.csv')

### Pré-processamento

#### Remover _'customerID'_

In [5]:
df.drop('customerID', axis=1, inplace=True)

#### Alterar valores binários de _'SeniorCitizen'_ para 'Yes' e 'No'

In [6]:
filter = lambda x: 'Yes' if x == 1 else 'No'
df['SeniorCitizen'] = df['SeniorCitizen'].apply(filter)

#### Removendo linhas com valores nulos de _'TotalCharges'_

In [7]:
df.drop(df[(df.TotalCharges == " ")].index, axis=0, inplace=True)

In [8]:
df.TotalCharges = pd.to_numeric(df.TotalCharges)

### Separação do banco de dados

In [9]:
train, test = train_test_split(df, test_size=0.2, random_state=0)
test.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1407 entries, 5561 to 2918
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   gender            1407 non-null   object 
 1   SeniorCitizen     1407 non-null   object 
 2   Partner           1407 non-null   object 
 3   Dependents        1407 non-null   object 
 4   tenure            1407 non-null   int64  
 5   PhoneService      1407 non-null   object 
 6   MultipleLines     1407 non-null   object 
 7   InternetService   1407 non-null   object 
 8   OnlineSecurity    1407 non-null   object 
 9   OnlineBackup      1407 non-null   object 
 10  DeviceProtection  1407 non-null   object 
 11  TechSupport       1407 non-null   object 
 12  StreamingTV       1407 non-null   object 
 13  StreamingMovies   1407 non-null   object 
 14  Contract          1407 non-null   object 
 15  PaperlessBilling  1407 non-null   object 
 16  PaymentMethod     1407 non-null   objec

In [10]:
y = train['Churn']
train.drop(['Churn'] , axis=1, inplace=True)
X = train.copy()

y

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


2964     No
5113     No
5363     No
5074     No
156      No
       ... 
4939     No
3269     No
1658    Yes
2612     No
2737     No
Name: Churn, Length: 5625, dtype: object

### Criação de _Pipelines_

In [11]:
numerical_features = train.select_dtypes(exclude=['object']).columns.tolist()
categorical_features = train.select_dtypes(include=['object']).columns.tolist()
categorical_features

['gender',
 'SeniorCitizen',
 'Partner',
 'Dependents',
 'PhoneService',
 'MultipleLines',
 'InternetService',
 'OnlineSecurity',
 'OnlineBackup',
 'DeviceProtection',
 'TechSupport',
 'StreamingTV',
 'StreamingMovies',
 'Contract',
 'PaperlessBilling',
 'PaymentMethod']

In [12]:
# Numérico
numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant')),
    ('scaler', StandardScaler())])

# Categórico
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder())])

# Juntando
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)])

pipe_RF = Pipeline(
    steps = [('preprocessor', preprocessor),
            ('RF', RandomForestClassifier(class_weight='balanced', random_state=0))])

pipe_RF.fit(X, y)

Pipeline(memory=None,
         steps=[('preprocessor',
                 ColumnTransformer(n_jobs=None, remainder='drop',
                                   sparse_threshold=0.3,
                                   transformer_weights=None,
                                   transformers=[('num',
                                                  Pipeline(memory=None,
                                                           steps=[('imputer',
                                                                   SimpleImputer(add_indicator=False,
                                                                                 copy=True,
                                                                                 fill_value=None,
                                                                                 missing_values=nan,
                                                                                 strategy='constant',
                                                           

### Predição e validação com banco de trainamento

In [13]:
y_pred = cross_val_predict(pipe_RF, X, y, cv=5)
print(classification_report(y, y_pred))

              precision    recall  f1-score   support

          No       0.82      0.90      0.86      4125
         Yes       0.63      0.47      0.54      1500

    accuracy                           0.78      5625
   macro avg       0.73      0.68      0.70      5625
weighted avg       0.77      0.78      0.77      5625



Predição e validação com banco de teste

In [14]:
y_test = test['Churn']
test.drop(['Churn'] , axis=1, inplace=True)
X_test = test.copy()

y_test

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


5561     No
5814     No
2645     No
3983    Yes
6438    Yes
       ... 
2757     No
5702    Yes
1662    Yes
2766     No
2918     No
Name: Churn, Length: 1407, dtype: object

In [15]:
y_pred_test = cross_val_predict(pipe_RF, X_test, y_test, cv=5)
print(classification_report(y_test, y_pred_test))

              precision    recall  f1-score   support

          No       0.83      0.90      0.86      1038
         Yes       0.62      0.47      0.54       369

    accuracy                           0.79      1407
   macro avg       0.73      0.69      0.70      1407
weighted avg       0.77      0.79      0.78      1407



### Resultado

* O modelo tem baixo _recall_ para a resposta _'Yes'_, ocasionado, possivelmente, pelo banco de dados ser debalanceado (70% dos valores de saída são _'No'_); Para arrumar, recomenda-se balancear o banco de dados, com técnicas de _undersampling_ ou _oversampling_;
* O modelo obteve cerca de 77% de precisão geral;
* O modelo não sofreu de _underfitting_ ou _overfitting_;

## Desenvolvendo interface

### Função preditiva

In [16]:
def predict(gender, SeniorCitizen, Partner, Dependents, PhoneService, 
            MultipleLines, InternetService, OnlineSecurity, OnlineBackup,
            DeviceProtection, TechSupport, StreamingTV, StreamingMovies, 
            Contract, PaperlessBilling, PaymentMethod, tenure, MonthlyCharges, 
            TotalCharges):

    a = [gender, SeniorCitizen, Partner, Dependents, tenure, PhoneService, 
        MultipleLines, InternetService, OnlineSecurity, OnlineBackup, 
        DeviceProtection, TechSupport, StreamingTV, StreamingMovies, Contract, 
        PaperlessBilling, PaymentMethod, MonthlyCharges, TotalCharges]

    x = pd.DataFrame([a], columns=X.columns)
    y = pipe_RF.predict(x)

    return y[0]

### Opções da interface

In [18]:
options = []
carry = []
for col in categorical_features:
    for i in df[col].value_counts().index:
        carry.append(i)
    options.append(carry)
    carry = []
options

[['Male', 'Female'],
 ['No', 'Yes'],
 ['No', 'Yes'],
 ['No', 'Yes'],
 ['Yes', 'No'],
 ['No', 'Yes', 'No phone service'],
 ['Fiber optic', 'DSL', 'No'],
 ['No', 'Yes', 'No internet service'],
 ['No', 'Yes', 'No internet service'],
 ['No', 'Yes', 'No internet service'],
 ['No', 'Yes', 'No internet service'],
 ['No', 'Yes', 'No internet service'],
 ['No', 'Yes', 'No internet service'],
 ['Month-to-month', 'Two year', 'One year'],
 ['Yes', 'No'],
 ['Electronic check',
  'Mailed check',
  'Bank transfer (automatic)',
  'Credit card (automatic)']]

In [19]:
inputs = []
for opt in options:
    inputs.append(gr.inputs.Radio(opt))

for i in numerical_features:
    inputs.append("number")

### Interface

In [20]:
gr.Interface(fn=predict, inputs=inputs, outputs='text').launch();

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
This share link will expire in 24 hours. If you need a permanent link, visit: https://gradio.app/introducing-hosted (NEW!)
Running on External URL: https://25254.gradio.app
Interface loading below...
