<a href="https://colab.research.google.com/github/neharika12/Women-Cloth-Reviews-/blob/main/Hand_Written_Digit_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Title of Project** : Hand Written Digit Prediction - Classification
---

### **OBJECTIVE** :
>The digits dataset consist of 8x8 pixel images of digits. The images attribute of the dataset stores 8x8 arrays of grayscale values for each image. We will use these arrays to visualize the first 4 images. The target attribute of the dataset stores the digit each image represents.

## **Data Source**:
>YBI Foundation DatataSet

# Import Library

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Import Data

In [None]:
from sklearn.datasets import load_digits
df = load_digits()


# Describe Data

In [None]:
print(df.images)

In [None]:
# Print description
print(df.DESCR)

In [None]:
# Print the number of images and labels
print("Images shape: ", df.images.shape)
print("Target shape: ", df.target.shape)

# Data Visualization

In [None]:
# Visualize some images
plt.figure(figsize=(10, 10))

for i in range(25):
    plt.subplot(5, 5, i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(df.images[i], cmap=plt.cm.binary)
    plt.xlabel(df.target[i])
plt.show()

#Data Preprocessing

In [None]:
df.images.shape

In [None]:
df.images[0]

In [None]:
df.images[0].shape

In [None]:
len(df.images)

In [None]:
n_samples = len(df.images)
data = df.images.reshape((n_samples, -1))

In [None]:
data[0]

In [None]:
data[0].shape

In [None]:
data.shape

#Scaling Image Data

In [None]:
data.min()

In [None]:
data.max()

In [None]:
data = data/16

In [None]:
data.min()

In [None]:
data.max()

In [None]:
data[0]

#Train Test Split Data

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(data, df.target, test_size=0.3)

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

# Random Forest Model

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
rf = RandomForestClassifier()

In [None]:
rf.fit(X_train, y_train)

# Predict Test Data

In [None]:
y_pred = rf.predict(X_test)

In [None]:
y_pred

# Model Evaluation

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

In [None]:
confusion_matrix(y_test, y_pred)

In [None]:
print (classification_report(y_test, y_pred))

# Explaination
---
1. **Confusion Matrix**: The confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known.
    
    In this case, the confusion matrix looks like this:
    
    ```
    [[47,  0,  0,  0,  1,  0,  0,  0,  0,  0],
     [ 0, 50, ... ],
     ...
     [ ..., ..., ..., ..., ..., ..., ..., ..., ..., ..., ,46]]
    ```
    
    This is a `10x10` matrix because there are `10` classes. The rows represent the actual class and the columns represent the predicted class.
    
    The diagonal elements represent the number of points for which the predicted label is equal to the true label. So for example `47` at position `(0,0)` indicates that there are `47` instances where the actual output was `0` and the model also correctly predicted them as `0`.
    
2. **Classification Report**: This report displays the precision, recall, F1 score and support for each class.
    
    - **Precision** is defined as: (True Positives)/(True Positives + False Positives). It represents how many of our model's positive predictions were actually positive.
        
    - **Recall** (also known as sensitivity) is defined as: (True Positives)/(True Positives + False Negatives). It represents how many of all actual positives were correctly identified by our model.
        
    - **F1-score** is a weighted harmonic mean of precision and recall such that the best score is `1.0` and worst score is `0`.
        
    - **Support** refers to number of occurrences of each given class in `y_true` ( actual test set labels).
        
    
    The 'macro average' is the average of metric calculated independently for each class then taking the average. The 'weighted average' calculates metrics for each label, and finds their average weighted by support (the number of true instances for each label).
    

In this case, the classification report suggests that the model is performing well across all classes with an overall accuracy of `96%`. The precision, recall and f1-score are also quite high (`>= 90%`) for all classes which indicates a good fit.