# Using AutoViML for AutoML

- It takes care of categorical encoding.
- Takes care of feature selection.
- It provides graphical outputs that can explain the model.
- Gives multiple graphs like AUC curve etc.
- Does a bit of data cleaning.
- It tunes the hyperparamters as well.

# Installing AutoViML

In [0]:
! pip install autoviml

In [0]:
! pip install shap
# For model explainability

# Time to use it

## Loading Data

In [0]:
import sys,tempfile, urllib, os
import pandas as pd

In [0]:
BASE_DIR = '/tmp'
OUTPUT_FILE = os.path.join(BASE_DIR, 'churn_data.csv')

In [0]:
churn_data=urllib.request.urlretrieve('https://raw.githubusercontent.com/srivatsan88/YouTubeLI/master/dataset/WA_Fn-UseC_-Telco-Customer-Churn.csv', OUTPUT_FILE)

In [0]:
churn_df = pd.read_csv(OUTPUT_FILE)

In [11]:
churn_df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [0]:
# Splitting it into train and test 
size = int(0.7 * churn_df.shape[0])
train_df = churn_df[:size]
test_df = churn_df[size:]

## Time for AutoViML

In [0]:
from autoviml.Auto_ViML import Auto_ViML

In [0]:
target = 'Churn'

hyper_param: Tuning options are GridSearch ('GS') and RandomizedSearch ('RS'). Default is 'GS'.

feature_reduction: Default = 'True' but it can be set to False if you don't want automatic    

Boosting Flag: you have 4 possible choices (default is False):                               
  None = This will build a Linear Model                                                  
  False = This will build a Random Forest or Extra Trees model (also known as Bagging)        
  True = This will build an XGBoost model                                                     
  CatBoost = THis will build a CatBoost model (provided you have CatBoost installed)          


In [0]:
model, features, trainm, testm = Auto_ViML(train_df, target, test_df,
                            sample_submission='',
                            scoring_parameter='', KMeans_Featurizer=False,
                            hyper_param='GS',feature_reduction=True,
                             Boosting_Flag=None,Binning_Flag=False,
                            Add_Poly=0, Stacking_Flag=False,Imbalanced_Flag=False,
                            verbose=1)
            

In [28]:
features

In [0]:
trainm

In [0]:
testm

In [0]:
from sklearn.metrics import classification_report, confusion_matrix

In [32]:
print(confusion_matrix(test_df[target].values, testm['Churn_Bagging_predictions'].values))

[[1376  157]
 [ 303  277]]


In [33]:
print(confusion_matrix(test_df[target].values, testm['Churn_Ensembled_predictions'].values))

[[1389  144]
 [ 281  299]]


In [34]:
print(confusion_matrix(test_df[target].values, testm['Churn_Boosting_predictions'].values))

[[1491   42]
 [ 437  143]]


In [35]:
print(classification_report(test_df[target].values, testm['Churn_Boosting_predictions'].values))

              precision    recall  f1-score   support

          No       0.77      0.97      0.86      1533
         Yes       0.77      0.25      0.37       580

    accuracy                           0.77      2113
   macro avg       0.77      0.61      0.62      2113
weighted avg       0.77      0.77      0.73      2113

