<a href="https://colab.research.google.com/github/spoorthi182005/hand-written-digit-prediction/blob/main/Untitled1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Hand Written Digit Prediction**

**Objective**

The objective of hand-written digit prediction is to develop a machine learning model that accurately identifies and classifies handwritten digits (0-9) from images. This task is crucial for applications such as optical character recognition (OCR), digitization of documents, and automated form processing.

**Data Source**

Handwritten digit prediction typically involves using machine learning algorithms, such as convolutional neural networks (CNNs), trained on datasets like MNIST or USPS, which consist of labeled images of handwritten digits. These algorithms learn to classify new images of handwritten digits based on patterns and features extracted from the training data.

**Import Library**


In [None]:
# Importing necessary libraries
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten


**Import DataSet**

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize the pixel values to be between 0 and 1
x_train, x_test = x_train / 255.0, x_test / 255.0

# Print the shapes of the datasets
print("Training data shape:", x_train.shape)  # (60000, 28, 28) - 60000 samples of 28x28 images
print("Training labels shape:", y_train.shape)  # (60000,) - 60000 labels (one for each image)

print("Test data shape:", x_test.shape)  # (10000, 28, 28) - 10000 samples of 28x28 images
print("Test labels shape:", y_test.shape)  # (10000,) - 10000 labels (one for each image)


**Describe Data**


In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.utils import to_categorical

# Load and preprocess data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Build the model
model = Sequential([
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=5, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {test_acc}')



**Data Visualization**


In [None]:
import matplotlib.pyplot as plt
from sklearn import datasets

# Load the digits dataset
digits = datasets.load_digits()

# Create a figure to display the digits
fig = plt.figure(figsize=(8, 8))

# Plot several digits
for i in range(10):
    ax = fig.add_subplot(2, 5, i + 1)
    ax.matshow(digits.images[i], cmap='gray_r')
    plt.title(f"Digit: {digits.target[i]}")
    ax.axis('off')

plt.tight_layout()
plt.show()



**Data Preprocessing**


In [None]:
import numpy as np
from tensorflow.keras.datasets import mnist  # If using TensorFlow

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize the pixel values to be between 0 and 1
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# Flatten the images from 28x28 to 784-dimensional vectors
X_train = X_train.reshape((X_train.shape[0], 28*28))
X_test = X_test.reshape((X_test.shape[0], 28*28))

# Print the shape of the data for verification
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)


**Define Target Variable (y) and Feature Variables (X)**

In [None]:
import numpy as np
from sklearn.datasets import fetch_openml

# Load the MNIST dataset (or any similar dataset)
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist.data, mnist.target

# Print the shapes of X and y
print("Shape of X:", X.shape)  # Shape of X: (70000, 784) -> 70000 samples, 784 features (pixels)
print("Shape of y:", y.shape)  # Shape of y: (70000,) -> 70000 labels (digits)

# Optionally, you can normalize the pixel values to be between 0 and 1
X = X / 255.0


** Test SplitTrain**

```
# This is formatted as code
```



In [None]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

# Load the dataset
digits = load_digits()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=42)

# Print the shapes of the training and testing sets
print("Training set shape:", X_train.shape, y_train.shape)
print("Testing set shape:", X_test.shape, y_test.shape)


**Modeling**



In [None]:
# Importing necessary libraries
import tensorflow as tf
from tensorflow.keras import layers, models, datasets
import matplotlib.pyplot as plt

# Load the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()

# Preprocess the data
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture
model = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")

# Plot training history
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
plt.show()

**Model Evaluation**



In [None]:
# Import necessary libraries
from sklearn.metrics import classification_report, accuracy_score
import numpy as np

# Assuming you have your predictions and true labels
y_true = np.array([3, 8, 1, 0, 7])  # Example true labels
y_pred = np.array([3, 8, 1, 1, 7])  # Example predicted labels

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f'Accuracy: {accuracy}')

# Generate classification report
report = classification_report(y_true, y_pred)
print('Classification Report:\n', report)



**Prediction**


In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# Load and preprocess the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build the model
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('\nTest accuracy:', test_acc)

# Make predictions
predictions = model.predict(x_test[:5])
predicted_labels = [tf.argmax(prediction).numpy() for prediction in predictions]

print('Predicted labels:', predicted_labels)



**Explanation**


Handwritten digit prediction showcases the capability of machine learning algorithms to interpret and classify complex data. With advancements in deep learning and neural networks, particularly CNNs, accuracy rates have improved significantly, often surpassing human-level performance on standard datasets like MNIST. This task not only demonstrates the power of modern machine learning but also has practical applications in various industries where automated recognition of handwritten digits is beneficial.