<hr style="border: solid 3px blue;">

# Introduction


![](https://c.tenor.com/HG1t-Q7GmW0AAAAC/oh-this-new-thing-oh-this.gif)

In general, the best performance is to solve MNIST problems using CNNs. There are many notebooks using CNN, so if you are interested in CNN, you can look at other notebooks.

In this notebook, we try to analyze the MNIST dataset from a new perspective, model in a way other than CNN modeling, and understand their behavior.

This notebook will proceed in the following order.
* Check outliers for anomaly (outlier) detection.
* Try dimensional reduction to draw the MNIST dataset in a lower dimension and understand the dataset.
* Model using a tree-based model and understand the operation of the model..

---------------------------------------------------------------------
# Setting Up

In [None]:
try:
    import pycaret
except:
    !pip install pycaret

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.datasets import mnist
import plotly
import plotly.express as px
import plotly.graph_objects as go

from sklearn.model_selection import train_test_split
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

In [None]:
train_df = pd.read_csv('../input/digit-recognizer/train.csv')
test_df = pd.read_csv('../input/digit-recognizer/test.csv')
submission_df = pd.read_csv('../input/digit-recognizer/sample_submission.csv')

In [None]:
#from keras.datasets import mnist
train_x = train_df.drop('label',axis=1)
train_y = train_df['label']

--------------------------
# Checking Target Imbalance

In [None]:
sns.set(style="ticks", context="talk",font_scale = 1)
plt.style.use("dark_background")
plt.figure(figsize = (20,10))
ax = train_y.value_counts().sort_values(ascending=False).plot(kind='bar',
                                                                        grid = False,
                                                                        fontsize=20,
                                                                        color='grey')
plt.xticks(rotation=0)
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+ p.get_width() / 2., height + 30, height, ha = 'center', size = 30)
sns.despine()

<span style="color:Blue"> **Observation**

* All numbers appear to be balanced.
* 1 is the most and 5 is the least.

<hr style="border: solid 3px blue;">

# Checking Anomaly

![](https://miro.medium.com/max/1400/1*Bl1pi1ZHwncJqiuugKYndQ.png)

Picture Credit: https://miro.medium.com

> In data analysis, anomaly detection (also referred to as outlier detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the data.

Ref: https://en.wikipedia.org/wiki/Anomaly_detection

Here, anomaly detection is performed and which records are determined as outliers. Then, draw these values as a picture, and visually check and understand outliers.

In [None]:
from pycaret.anomaly import *

In [None]:
pycaret.anomaly.setup(
    data=train_df,
    silent=True)

In [None]:
pca = pycaret.anomaly.create_model('pca')

In [None]:
plot_model(pca, plot = 'umap')

<span style="color:Blue"> **Observation**

* The yellow dots are outliers. There don't seem to be many outliers.

In [None]:
pca_df = pycaret.anomaly.assign_model(pca)

In [None]:
abnormal_data = pca_df[pca_df.Anomaly == 1].sort_values(by='Anomaly_Score', ascending=False)
print("the size of anomaly = ",len(abnormal_data))
abnormal_data.head(10).style.set_properties(**{'background-color': 'black',
                           'color': 'white',
                           'border-color': 'white'})

<span style="color:Blue"> **Observation**

* There are 2100 anomalies.
* In the last column, there is an anomaly score, so the degree of anomaly can be judged.   

**Let's check the top 10 out of the above outlier data.**

In [None]:
top10 = abnormal_data.drop('label',axis=1)[:10]
top10 = top10.loc[:,:'pixel783']

In [None]:
fig = plt.figure(figsize=(25, 4))
img = np.array(top10).reshape(-1, 28, 28)
fig = px.imshow(img,color_continuous_scale='Blues_r',facet_col=0, binary_string=True, facet_col_wrap=5,labels={'facet_col':'img'})
fig.show()

<span style="color:Blue"> **Observation**

* Among the outliers, there are pictures that are confusing even with our eyes when looking at the pictures drawn by selecting the top 10 by sorting based on the value of the largest abnormal score.

In [None]:
sns.set(style="ticks", context="talk",font_scale = 1)
plt.style.use("dark_background")
plt.figure(figsize = (20,10))
ax = abnormal_data.label.value_counts().sort_values(ascending=False).plot(kind='bar',
                                                                        grid = False,
                                                                        fontsize=20,
                                                                        color='grey')
plt.xticks(rotation=0)
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+ p.get_width() / 2., height + 10, height, ha = 'center', size = 30)
sns.despine()

<span style="color:Blue"> **Observation**

* A lot of outliers are seen in the numbers 7, 2, and 6.
* Numbers of 1 seem to have the fewest outliers.

------------------------------------------------------------------------------------------
# Checking train dataset

Let's draw pictures by randomly picking 8 pieces of data.

In [None]:
fig = plt.figure(figsize=(25, 4))
img = np.array(train_x.sample(8)).reshape(-1, 28, 28)
fig = px.imshow(img,facet_col=0, binary_string=True, facet_col_wrap=4,labels={'facet_col':'img'})
fig.show()

In [None]:
img = np.array(train_x.sample(1))[0].reshape( 28, 28)
sns.set(style="ticks", context="talk",font_scale = 1)
plt.style.use("dark_background")
fig = plt.figure(figsize = (30,30)) 
ax = fig.add_subplot(111)
ax.imshow(img)
width, height = img.shape
thresh = img.max()/2.5
for x in range(width):
    for y in range(height):
        val = round(img[x][y],2) if img[x][y] !=0 else 0
        ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center',
                    color='white' if img[x][y]<thresh else 'black')

<span style="color:Blue"> **Observation**
    
* Each number image in MNIST is only a list of 28 * 28 dimensional numbers.

In [None]:
print(train_x.shape)

sample_size = 5000

train_x = pd.DataFrame(train_x[:sample_size])
train_y = train_y[:sample_size]

<hr style="border: solid 3px blue;">

# Plotting after Dimensional Reduction

![](https://miro.medium.com/max/698/1*WVFe7w1rzZWsmghdvaoXag.png)

Picture Credit: https://miro.medium.com

We want to understand the characteristics of the dataset by projecting the dataset to a low dimension through PCA and UMAP.

-------------------------------------------------
## PCA

PCA is the most representative method of dimensionality reduction. This is a method of re-axis of multidimensional data in the direction of large variance. The greater the dependence between variables, the smaller the principal component can represent the original data. However, since it is assumed that each feature follows a normal distribution, it is not appropriate to apply a variable with a distorted distribution to PCA.

In [None]:
from sklearn.decomposition import PCA

pca = PCA()
x_pca = pca.fit_transform(train_x)
markers=['o','v','^','<','>','8','s','P','*','X']
# plot in 2D by class
sns.set(style="white", context="talk",font_scale = 1)
plt.style.use("dark_background")
sns.set_palette("bright")
plt.figure(figsize=(10,10))
for i,marker in enumerate(markers):
    mask = train_y == i
    plt.scatter(x_pca[mask, 0], x_pca[mask, 1], label=i, s=10, alpha=1,marker=marker)
plt.legend(bbox_to_anchor=(1.00, 1), loc='upper left',fontsize=15)

-----------------------------------
## UMAP plot

> Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data
> 
> 1. The data is uniformly distributed on Riemannian manifold;
> 2. The Riemannian metric is locally constant (or can be approximated as such);
> 3. The manifold is locally connected.
> From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.

Ref: https://umap-learn.readthedocs.io/en/latest/

In [None]:
import umap.plot
mapper = umap.UMAP().fit(train_x)
umap.plot.points(mapper, labels=train_y, theme='fire')

It looks like an orderly universe. A few galaxies seem to be visible as well.

------------------------------------------------------------
# UMAP 3D plot

In [None]:
from umap import UMAP

umap_3d = UMAP(n_components=3, init='random', random_state=0)
x_umap = umap_3d.fit_transform(train_x)
umap_df = pd.DataFrame(x_umap)
train_y_sr = pd.Series(train_y,name='label')
print(type(x_umap))
new_df = pd.concat([umap_df,train_y_sr],axis=1)
fig = px.scatter_3d(
    new_df, x=0, y=1, z=2,
    color='label', labels={'color': 'number'}
)
fig.update_traces(marker_size=1)
fig.show()

<span style="color:Blue"> **Observation**
    
* When looking at the figures of dimensional reduction in the lower dimension, boundaries are visible even with our eyes. In other words, the reason why MNIST can get better results than other datasets is probably well-separated as in the above figures of the dataset.
    

<hr style="border: solid 3px blue;">

# Simple is better!

![](https://image.freepik.com/free-photo/simple-is-better_360032-968.jpg)

We create a model using class ML and try to understand the operation of this model. And, using pycaret, we tune the model design and hyperparameters with short code. After that, we will try to understand the behavior of these models with various visualizations.

A simple way to do the above operations is to use a well-made library. Using Pycaret is also a good option.

------------------------------------------
## Setting Up

> This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function.

Ref: https://pycaret.readthedocs.io/en/latest/api/classification.html

In [None]:
from pycaret.classification import *
setup(data = train_df, 
             target = 'label',
             preprocess = False,
             silent=True)

--------------------------------------------
## Creating Model

> This function trains and evaluates the performance of a given estimator using cross validation. The output of this function is a score grid with CV scores by fold.

https://pycaret.readthedocs.io/en/latest/api/classification.html

In [None]:
et = create_model('et')

In [None]:
dt = create_model('dt')

-------------------------------------
## Tuning Hyperparameters

> This function tunes the hyperparameters of a given estimator. The output of this function is a score grid with CV scores by fold of the best selected model based on optimize parameter. 

https://pycaret.readthedocs.io/en/latest/api/classification.html

In [None]:
tuned_et = tune_model(et, optimize = 'Accuracy',early_stopping = True)

In [None]:
params = { "max_depth":[2]}
tuned_dt = tune_model(dt, optimize = 'Accuracy',early_stopping = True,custom_grid = params)

-------------------------------
## Interpreting Modeling

> This function analyzes the predictions generated from a trained model. Most plots in this function are implemented based on the SHAP (SHapley Additive exPlanations)

Ref: https://pycaret.readthedocs.io/en/latest/api/classification.html

In [None]:
with plt.rc_context({'figure.facecolor':'white'}):
    interpret_model(tuned_et)

<span style="color:Blue"> **Observation**

* Pixel 378 was determined as important features from the model point of view.
* Each class has different feature importance.  

In [None]:
with plt.rc_context({'figure.facecolor':'black','text.color':'blue'}):
    plot_model(tuned_dt, plot='tree')

Decision tree is one of the most basic ML methods. The above figure is a simplified drawing to check how the decision tree model makes a decision on the MNIST dataset.

If you look at the picture above, you can check how the class is separated according to the value of each pixel.

--------------------------------
## Calibrating Model

> This function calibrates the probability of a given estimator using isotonic or logistic regression.

Ref: https://pycaret.readthedocs.io/en/latest/api/classification.

In [None]:
cali_model = calibrate_model(tuned_et)

------------------------------------
## Finalizing Mode

> This function trains a given estimator on the entire dataset including the holdout set.

Ref: https://pycaret.readthedocs.io/en/latest/api/classification.html

In [None]:
final_model = finalize_model(cali_model)

---------------------------------------------------
## Plotting using the final model

In [None]:
sns.set_style("white")
sns.set_palette("bright")
plt.figure(figsize=(10, 10))
with plt.rc_context({'figure.facecolor':'grey'}):
    plot_model(final_model, plot='boundary')

<span style="color:Blue"> **Observation**

* If you look at the picture above, you can see that the boundaries are determined for each class by reducing it to two dimensions.
* It seems that the boundaries for most classes are well determined.
* In the areas where data overlapped in the middle of the left, the boundaries were complicatedly determined.
* Even though the 784 dimension is reduced to 2D, the boundary is well determined, so a more detailed boundary can be drawn in the 784 dimension. However, we can neither draw nor understand 784-dimensional pictures.
    


In [None]:
sns.set(style="ticks", context="talk",font_scale = 1)
plt.style.use("dark_background")
plot_model(final_model, plot='confusion_matrix')

<span style="color:Blue"> **Observation**

* There are cases in which data related to Class 9 is incorrectly judged as 4 or 7. Looking at the shape of 9, there seems to be a possibility that it may be wrongly judged.
* There are cases where data related to Class 1 are incorrectly judged as 2 or 8. 

--------------------------------------------
## Predicting using the final model

In [None]:
pred_unseen = predict_model(final_model, data = test_df)

In [None]:
submission_df['Label'] = pred_unseen['Label']
submission_df.to_csv('submission.csv',header =  ['ImageId', 'Label' ], index = None)