# 14.2 Case Study: Classification with k-Nearest Neighbors and the Digits Dataset, Part 1

**This file contains Sections 14.2 and 14.3 and all of their subsections and Self Check exercises**

### Classification Problems
### Our Approach

## 14.2.1 k-Nearest Neighbors Algorithm
### Hyperparameters and Hyperparameter Tuning

## 14.2.2 Loading the Dataset

**We added `%matplotlib inline` to enable Matplotlib in this notebook.**

In [None]:
%matplotlib inline
from sklearn.datasets import load_digits

In [None]:
digits = load_digits()

### Displaying the Description

In [None]:
print(digits.DESCR)

### Checking the Sample and Target Sizes

In [None]:
digits.target[::100]

In [None]:
digits.data.shape

In [None]:
digits.target.shape

### A Sample Digit Image

In [None]:
digits.images[13]

### Preparing the Data for Use with Scikit-Learn

In [None]:
digits.data[13]

## 14.2.3 Visualizing the Data
### Creating the Diagram 

In [None]:
import matplotlib.pyplot as plt

In [None]:
figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(6, 4))

### Displaying Each Image and Removing the Axes Labels 

for item in zip(axes.ravel(), digits.images, digits.target):
    axes, image, target = item
    axes.imshow(image, cmap=plt.cm.gray_r)
    axes.set_xticks([])  # remove x-axis tick marks
    axes.set_yticks([])  # remove y-axis tick marks
    axes.set_title(target)
plt.tight_layout()     

In [None]:
# This placeholder cell was added because we had to combine 
# the sections snippets 12-13 for the visualization to work in Jupyter
# and want the subsequent snippet numbers to match the book

## 14.2.4 Splitting the Data for Training and Testing 

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
     digits.data, digits.target, random_state=11)

### Training and Testing Set Sizes

In [None]:
X_train.shape

In [None]:
X_test.shape

## 14.2.5 Creating the Model 

In [None]:
from sklearn.neighbors import KNeighborsClassifier

In [None]:
knn = KNeighborsClassifier()

## 14.2.6 Training the Model 

In [None]:
knn.fit(X=X_train, y=y_train)

## 14.2.7 Predicting Digit Classes 

In [None]:
predicted = knn.predict(X=X_test)

In [None]:
expected = y_test

In [None]:
predicted[:20]

In [None]:
expected[:20]

In [None]:
wrong = [(p, e) for (p, e) in zip(predicted, expected) if p != e]

In [None]:
wrong

# 14.3 Case Study: Classification with k-Nearest Neighbors and the Digits Dataset, Part 2
## 14.3.1 Metrics for Model Accuracy 
### Estimator Method `score`

In [None]:
print(f'{knn.score(X_test, y_test):.2%}')

### Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
confusion = confusion_matrix(y_true=expected, y_pred=predicted)

In [None]:
confusion

### Classification Report

In [None]:
from sklearn.metrics import classification_report

In [None]:
names = [str(digit) for digit in digits.target_names]

In [None]:
print(classification_report(expected, predicted, 
       target_names=names))

### Visualizing the Confusion Matrix

In [None]:
import pandas as pd

In [None]:
confusion_df = pd.DataFrame(confusion, index=range(10),
     columns=range(10))

In [None]:
import seaborn as sns

In [None]:
axes = sns.heatmap(confusion_df, annot=True, 
                    cmap='nipy_spectral_r')

## 14.3.2 K-Fold Cross-Validation
### KFold Class

In [None]:
from sklearn.model_selection import KFold

In [None]:
kfold = KFold(n_splits=10, random_state=11, shuffle=True)

### Using the `KFold` Object with Function `cross_val_score` 

In [None]:
from sklearn.model_selection import cross_val_score

In [None]:
scores = cross_val_score(estimator=knn, X=digits.data, 
     y=digits.target, cv=kfold)

In [None]:
scores

In [None]:
print(f'Mean accuracy: {scores.mean():.2%}')

In [None]:
print(f'Accuracy standard deviation: {scores.std():.2%}')

## 14.3.3 Running Multiple Models to Find the Best One 

In [None]:
from sklearn.svm import SVC

In [None]:
from sklearn.naive_bayes import GaussianNB

In [None]:
estimators = {
     'KNeighborsClassifier': knn, 
     'SVC': SVC(gamma='scale'),
     'GaussianNB': GaussianNB()}

In [None]:
for estimator_name, estimator_object in estimators.items():
     kfold = KFold(n_splits=10, random_state=11, shuffle=True)
     scores = cross_val_score(estimator=estimator_object, 
         X=digits.data, y=digits.target, cv=kfold)
     print(f'{estimator_name:>20}: ' + 
           f'mean accuracy={scores.mean():.2%}; ' +
           f'standard deviation={scores.std():.2%}')

### Scikit-Learn Estimator Diagram

## 14.3.4 Hyperparameter Tuning 

In [None]:
for k in range(1, 20, 2):
     kfold = KFold(n_splits=10, random_state=11, shuffle=True)
     knn = KNeighborsClassifier(n_neighbors=k)
     scores = cross_val_score(estimator=knn, 
         X=digits.data, y=digits.target, cv=kfold)
     print(f'k={k:<2}; mean accuracy={scores.mean():.2%}; ' +
           f'standard deviation={scores.std():.2%}')

# More Info 
* See **video** Lesson 14 in [**Python Fundamentals LiveLessons** on Safari Online Learning](https://learning.oreilly.com/videos/python-fundamentals/9780135917411)
* See **book** Chapter 14 in [**Python for Programmers** on Safari Online Learning](https://learning.oreilly.com/library/view/python-for-programmers/9780135231364/), or see **book** Chapter 15 in **Intro to Python for Computer Science and Data Science**
* Interested in a print book? Check out:

| Python for Programmers | Intro to Python for Computer<br>Science and Data Science
| :------ | :------
| <a href="https://amzn.to/2VvdnxE"><img alt="Python for Programmers cover" src="../images/PyFPCover.png" width="150" border="1"/></a> | <a href="https://amzn.to/2LiDCmt"><img alt="Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud" src="../images/IntroToPythonCover.png" width="159" border="1"></a>

>Please **do not** purchase both books&mdash;our professional book **_Python for Programmers_** is a subset of our college textbook **_Intro to Python for Computer Science and Data Science_**

In [None]:
##########################################################################
# (C) Copyright 2019 by Deitel & Associates, Inc. and                    #
# Pearson Education, Inc. All Rights Reserved.                           #
#                                                                        #
# DISCLAIMER: The authors and publisher of this book have used their     #
# best efforts in preparing the book. These efforts include the          #
# development, research, and testing of the theories and programs        #
# to determine their effectiveness. The authors and publisher make       #
# no warranty of any kind, expressed or implied, with regard to these    #
# programs or to the documentation contained in these books. The authors #
# and publisher shall not be liable in any event for incidental or       #
# consequential damages in connection with, or arising out of, the       #
# furnishing, performance, or use of these programs.                     #
##########################################################################
