### Comparing Models

Now that you have seen a variety of models for regression and classification problems, it is good to step back and weigh the pros and cons of these options.  In the case of classification models, there are at least three things to consider:

1. Is the model good at handling imbalanced classes?
2. Does the model train quickly?
3. Does the model yield interpretable results?

Depending on your dataset and goals, the importance of these considerations will vary from project to project.  Your goal is to review our models to this point and discuss the pros and cons of each.  Two example datasets are offered as a way to offer two very different tasks where interpretability of the model may be of differing importance.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits

### Data and Task

Your goal is to discuss the pros and cons of Logistic Regression, Decision Trees, KNN, and SVM for the tasks below.  Consider at least the three questions above and list any additional considerations you believe are important to determining the "best" model for the task.  Share your response with your peers on the class discussion board.  

**TASK 1**: Predicting Customer Churn

Suppose you are tasked with producing a model to predict customer churn.  Which of your classification models would you use and what are the pros and cons of this model for this task?  Be sure to consider interpretability, imbalnced classes, and the speed of training.



The data is loaded below.  Note that the handwritten digit data is already split into features and target (`digits`, `labels`). 

In [2]:
churn = pd.read_csv('data/telecom_churn.csv')
digits, labels = load_digits(return_X_y=True)

In [17]:
X = churn.drop('Churn', axis=1).copy()
y = churn['Churn'].copy()

X = X.drop(X.select_dtypes(object).columns, axis=1).drop('Area code', axis=1)
y = y.map({True: 1, False: 0})

X = sk.preprocessing.StandardScaler().set_output(transform="pandas").fit_transform(X)
X.head()

Unnamed: 0,Account length,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls
0,0.676489,1.234883,1.566767,0.476643,1.567036,-0.07061,-0.05594,-0.070427,0.866743,-0.465494,0.866029,-0.085008,-0.601195,-0.08569,-0.427932
1,0.149065,1.307948,-0.333738,1.124503,-0.334013,-0.10808,0.144867,-0.107549,1.058571,0.147825,1.05939,1.240482,-0.601195,1.241169,-0.427932
2,0.902529,-0.59176,1.168304,0.675985,1.168464,-1.573383,0.496279,-1.5739,-0.756869,0.198935,-0.755571,0.703121,0.211534,0.697156,-1.188218
3,-0.42859,-0.59176,2.196596,-1.466936,2.196759,-2.742865,-0.608159,-2.743268,-0.078551,-0.567714,-0.078806,-1.303026,1.024263,-1.306401,0.332354
4,-0.654629,-0.59176,-0.24009,0.626149,-0.240041,-1.038932,1.098699,-1.037939,-0.276311,1.067803,-0.276562,-0.049184,-0.601195,-0.045885,1.092641


In [21]:
import sklearn as sk
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

models_churn = {
    'LogisticRegression': dict(
        params = {
            'C': [0.1, 1, 10, 100],       
            # 'penalty': ['l2', None],  
        },
        model = LogisticRegression(max_iter=2000),  
    ),
    'KNN': dict(
        params = {
            'n_neighbors': [3, 5, 7, 9],
            'weights': ['uniform', 'distance'],
        },
        model = KNeighborsClassifier(),  
    ),
    'SVC': dict(
        params = {
            'C': [0.1, 1, 10, 100],       
            'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],  
            'gamma': ['scale', 'auto']
        },
        model = SVC(),  
    ),
    'DecisionTree': dict(
        params = {
            'criterion': ['gini', 'entropy'],
            'max_depth': [None, 10, 20, 30],
            'min_samples_split': [2, 5, 10],
            'min_samples_leaf': [1, 2, 4]
        },
        model = DecisionTreeClassifier(),  
    ),
}


In [23]:
from sklearn.model_selection import GridSearchCV

gridcv_churn = {}
for model_name, model_info in models_churn.items():
    models_churn[model_name]["gridcv"] = GridSearchCV(
        model_info["model"],
        param_grid=model_info["params"],
        verbose=3,
        n_jobs=-1,
        return_train_score=True,
    ).fit(X, y)

Fitting 5 folds for each of 4 candidates, totalling 20 fits
[CV 1/5] END .........C=0.1;, score=(train=0.860, test=0.858) total time=   0.0s
[CV 5/5] END .........C=0.1;, score=(train=0.859, test=0.863) total time=   0.0s
[CV 4/5] END .........C=0.1;, score=(train=0.858, test=0.853) total time=   0.0s
[CV 1/5] END ...........C=1;, score=(train=0.860, test=0.859) total time=   0.0s
[CV 2/5] END ...........C=1;, score=(train=0.862, test=0.859) total time=   0.0s
[CV 3/5] END ...........C=1;, score=(train=0.856, test=0.855) total time=   0.0s
[CV 4/5] END ...........C=1;, score=(train=0.858, test=0.853) total time=   0.0s
[CV 5/5] END ...........C=1;, score=(train=0.856, test=0.865) total time=   0.0s
[CV 2/5] END .........C=0.1;, score=(train=0.862, test=0.856) total time=   0.0s
[CV 3/5] END .........C=0.1;, score=(train=0.857, test=0.856) total time=   0.0s
[CV 1/5] END ..........C=10;, score=(train=0.860, test=0.859) total time=   0.0s
[CV 2/5] END ..........C=10;, score=(train=0.862,

In [29]:
models_churn['LogisticRegression']['gridcv'].cv_results_

{'mean_fit_time': array([0.00723977, 0.00627837, 0.01200576, 0.00642438]),
 'std_fit_time': array([0.00064714, 0.00095094, 0.0056562 , 0.00114594]),
 'mean_score_time': array([0.00237455, 0.00206132, 0.00315189, 0.00196157]),
 'std_score_time': array([0.00039973, 0.00022789, 0.00126713, 0.00040139]),
 'param_C': masked_array(data=[0.1, 1.0, 10.0, 100.0],
              mask=[False, False, False, False],
        fill_value=1e+20),
 'params': [{'C': 0.1}, {'C': 1}, {'C': 10}, {'C': 100}],
 'split0_test_score': array([0.85757121, 0.85907046, 0.85907046, 0.85907046]),
 'split1_test_score': array([0.85607196, 0.85907046, 0.85907046, 0.85907046]),
 'split2_test_score': array([0.85607196, 0.85457271, 0.85457271, 0.85457271]),
 'split3_test_score': array([0.85285285, 0.85285285, 0.85285285, 0.85285285]),
 'split4_test_score': array([0.86336336, 0.86486486, 0.86486486, 0.86486486]),
 'mean_test_score': array([0.85718627, 0.85808627, 0.85808627, 0.85808627]),
 'std_test_score': array([0.00345157,

In [30]:
models_churn['KNN']['gridcv'].cv_results_

{'mean_fit_time': array([0.01427584, 0.0113709 , 0.00912328, 0.01305919, 0.01017351,
        0.01004782, 0.0093771 , 0.01214862]),
 'std_fit_time': array([0.00612737, 0.00401023, 0.00249208, 0.004234  , 0.00511053,
        0.00332488, 0.00312541, 0.00582174]),
 'mean_score_time': array([0.18969965, 0.06176577, 0.11491513, 0.08358765, 0.11605024,
        0.07979841, 0.12335153, 0.06689601]),
 'std_score_time': array([0.05293507, 0.00773914, 0.01570222, 0.0121597 , 0.00721101,
        0.02098021, 0.01197626, 0.00591616]),
 'param_n_neighbors': masked_array(data=[3, 3, 5, 5, 7, 7, 9, 9],
              mask=[False, False, False, False, False, False, False, False],
        fill_value=999999),
 'param_weights': masked_array(data=['uniform', 'distance', 'uniform', 'distance',
                    'uniform', 'distance', 'uniform', 'distance'],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'n_neighbor

In [31]:
models_churn['SVC']['gridcv'].cv_results_

{'mean_fit_time': array([8.43851089e-02, 1.36428308e-01, 1.57465839e-01, 2.38653517e-01,
        9.86271381e-02, 1.43289757e-01, 1.33455086e-01, 2.04449701e-01,
        2.80190277e-01, 1.90803099e-01, 1.53210640e-01, 1.88111305e-01,
        3.39827633e-01, 2.23242426e-01, 1.60679865e-01, 2.38838339e-01,
        4.14069877e+00, 7.72818184e-01, 2.91994190e-01, 2.30380106e-01,
        3.46434064e+00, 6.80375433e-01, 2.16952896e-01, 2.34752798e-01,
        9.19940865e+02, 6.79256425e+00, 3.91476202e-01, 1.71282625e-01,
        8.82707377e+02, 8.04591293e+00, 4.03738737e-01, 2.03243780e-01]),
 'std_fit_time': array([9.66755162e-03, 9.43435511e-03, 1.17570095e-02, 2.35322065e-02,
        2.92224611e-02, 1.83183867e-02, 1.69501620e-02, 3.05269932e-02,
        2.38195975e-01, 9.41495398e-03, 4.84692792e-03, 7.76622382e-03,
        3.16996199e-01, 2.91281143e-02, 2.48093272e-02, 5.20275403e-02,
        1.27718940e+00, 1.63379398e-01, 5.41814728e-02, 3.62092701e-02,
        9.81118503e-01, 1.282

In [32]:
models_churn['DecisionTree']['gridcv'].cv_results_

{'mean_fit_time': array([0.17216406, 0.15032964, 0.17669353, 0.16776023, 0.15029273,
        0.11030211, 0.11773977, 0.1326468 , 0.13799253, 0.1105907 ,
        0.08681207, 0.09233489, 0.09054728, 0.0854629 , 0.09972754,
        0.09071102, 0.09586382, 0.0741148 , 0.11400766, 0.13803854,
        0.12399726, 0.12225971, 0.12033672, 0.16609764, 0.13974438,
        0.15498304, 0.14617953, 0.1755147 , 0.17993188, 0.23097482,
        0.18060293, 0.1651825 , 0.09137039, 0.11565475, 0.11060786,
        0.11797781, 0.12745304, 0.13661628, 0.125982  , 0.12479639,
        0.11036081, 0.14457831, 0.09386144, 0.10780501, 0.09926085,
        0.07919631, 0.11423845, 0.12365651, 0.1503933 , 0.15931873,
        0.15283065, 0.14328184, 0.14889746, 0.11098218, 0.13464565,
        0.11425452, 0.14949112, 0.13531222, 0.1432271 , 0.14801722,
        0.14296689, 0.10942464, 0.11801558, 0.13249316, 0.12366209,
        0.11414704, 0.10832858, 0.10091834, 0.09995508, 0.11664147,
        0.11642838, 0.13831043]

**TASK 2**: Recognizing Handwritten Digits

Suppose you are tasked with training a model to recognize handwritten digits.  Which of your classifier would you use here and why?  Again, be sure to consider the balance of classes, speed of training, and importance of interpretability.



In [None]:
#example image
plt.imshow(digits[0].reshape(8, 8))
plt.title('This is a handwritten 0.');