# MNIST Handwritten Digits Dataset

This notebook demonstrates loading and exploring the MNIST dataset of handwritten digits.

## Load the MNIST Dataset

MNIST contains 70,000 grayscale images of handwritten digits (0-9):
- 60,000 training images
- 10,000 test images
- Each image is 28x28 pixels

In [None]:
# Import TensorFlow and load MNIST dataset
import tensorflow as tf
from tensorflow.keras.datasets import mnist

# Load the data (train and test sets)
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print(f"Training images shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test images shape: {X_test.shape}")
print(f"Test labels shape: {y_test.shape}")
print(f"\nPixel value range: {X_train.min()} to {X_train.max()}")
print(f"Unique labels: {sorted(set(y_train))}")

## Visualize Sample Images

Let's look at some sample handwritten digits from the training set.

In [None]:
# Visualize sample images from the dataset
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 5, figsize=(12, 5))
fig.suptitle('Sample MNIST Handwritten Digits', fontsize=16)

for i, ax in enumerate(axes.flat):
    ax.imshow(X_train[i], cmap='gray')
    ax.set_title(f'Label: {y_train[i]}')
    ax.axis('off')

plt.tight_layout()
plt.show()

## Examine a Single Image

Let's look at the first image in detail to understand the data structure.

In [None]:
# Display a single image and its properties
sample_image = X_train[0]
sample_label = y_train[0]

print(f"Image shape: {sample_image.shape}")
print(f"Label: {sample_label}")
print(f"\nPixel values (first 5x5 corner):")
print(sample_image[:5, :5])

# Visualize the image
plt.figure(figsize=(6, 6))
plt.imshow(sample_image, cmap='gray')
plt.title(f'Label: {sample_label}', fontsize=16)
plt.colorbar()
plt.show()

## Data Statistics

Let's analyze the distribution of labels in the dataset.

In [None]:
# Analyze label distribution
import numpy as np

# Count occurrences of each digit
unique, counts = np.unique(y_train, return_counts=True)

print("Training set label distribution:")
for digit, count in zip(unique, counts):
    print(f"Digit {digit}: {count} images ({count/len(y_train)*100:.2f}%)")

# Visualize distribution
plt.figure(figsize=(10, 5))
plt.bar(unique, counts, color='steelblue', alpha=0.7)
plt.xlabel('Digit', fontsize=12)
plt.ylabel('Number of Images', fontsize=12)
plt.title('Distribution of Digits in Training Set', fontsize=14)
plt.xticks(unique)
plt.grid(axis='y', alpha=0.3)
plt.show()