# __MNIST Digit Classification Project__

#### In this project, a __Convolutional Neural Network (CNN)__, a specialized type of neural network for image-related tasks, is employed to tackle the problem of handwritten digit classification. The project utilizes key libraries such as __Pandas__, __TensorFlow__, and __scikit-learn__ to manage data and implement machine learning functionalities.

### __Libraries Used:__
1. __pandas__ : It is used to load the data from CSV files, manipulate it (extracting features and labels), and possibly for any data exploration or analysis.

2. __tensorflow.keras.models.Sequential and other keras layers__: __Sequential__ model is used to create a linear stack of layers. __Conv2D__ layers are used for 2D convolutions (common in image processing). __MaxPooling2D__ layers perform downsampling. __Flatten__ layers flatten the multi-dimensional data into a 1D array. __Dense__ layers are fully connected layers.

3. __tensorflow.keras.utils.to_categorical__: It is used to convert numerical labels into one-hot encoded vectors, which is necessary for categorical classification tasks.

4. __sklearn.model_selection.train_test_split__: It is used to split the preprocessed data into training and validation sets, allowing the model to be trained on one subset of the data and tested on another independent subset.

In [16]:
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split



In [None]:
# if dataset is not availavle we can use :
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)


##### For dataset: [https://www.kaggle.com/datasets/oddrationale/mnist-in-csv/download?datasetVersionNumber=2]

In [17]:

data = pd.read_csv('mnist_train.csv')#loadig dataset
X = data.drop('label', axis=1).values# Extracting features (pixels) and labels from the dataset
y = data['label'].values
X = X.reshape(-1, 28, 28, 1).astype('float32') / 255
y = to_categorical(y)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)


In [13]:
#Creating Model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

In [14]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])# Compiling Model
model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_val, y_val))


In [21]:
loss, accuracy = model.evaluate(X_val, y_val)#evaluating model
print(f'Validation accuracy: {accuracy:.5f}')


Validation accuracy: 0.99092


In [22]:
import joblib
joblib.dump(model, 'trained_model_2.pkl')#exporting model

['trained_model_2.pkl']