# 🧠 Classifying Handwritten Digits with TensorFlow, PyTorch, and Keras

Welcome to this hands-on session where we'll teach a computer to read handwritten numbers! This is a classic first project in AI called image classification.

We will build the *exact same* neural network using three of the most popular deep learning toolkits: **TensorFlow**, **PyTorch**, and standalone **Keras**. This will give you a great feel for how each one works.

--- 

### 📘 Learning Objectives for Today:

By the end of this 2-hour session, you will be able to:
1.  **Understand** the task of classifying the MNIST handwritten digit dataset.
2.  **Load and prepare** image data for a neural network.
3.  **Implement** the same neural network model using TensorFlow/Keras, PyTorch, and standalone Keras.
4.  **Recognize** the key similarities and differences between these powerful frameworks.
5.  **Train and evaluate** each model to see how well it performs.

## The Task: The MNIST Dataset ✍️

Our goal is to classify images from the famous **MNIST dataset**. 
- It contains 60,000 images for training and 10,000 for testing.
- Each image is a small, 28x28 pixel grayscale picture of a handwritten digit (0 through 9).
- We will train a neural network to look at an image and predict which digit it is.

# Topic 1: TensorFlow (with the integrated Keras API)

**TensorFlow** is a powerful and flexible open-source library for machine learning developed by Google. It's often used with its user-friendly high-level API, **Keras**, which is now fully integrated into TensorFlow. This combination gives us both power and simplicity!

Let's build our digit classifier.

In [None]:
# for tensorflow
# pip install tensorflow -i https://pypi.tuna.tsinghua.edu.cn/simple
# for pytorch
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

In [2]:
#frist download tensorflow using this code in Anaconda Prompt
# pip install tensorflow -i https://pypi.tuna.tsinghua.edu.cn/simple
# Step 1: Import TensorFlow
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense

# Step 2: Load and preprocess the data
# The data is already included in TensorFlow!
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values from 0-255 to 0-1. This helps the network learn better.
x_train, x_test = x_train / 255.0, x_test / 255.0

print("Data loaded and preprocessed.")

Data loaded and preprocessed.


In [3]:
# Step 3: Build the model architecture
model_tf = Sequential([
    # This layer flattens the 28x28 image into a single 784-pixel line.
    Flatten(input_shape=(28, 28)),  
    
    # This is our hidden layer with 128 neurons. 'relu' is a common activation function.
    Dense(128, activation='relu'),   
    
    # This is the output layer. It has 10 neurons (one for each digit 0-9).
    # 'softmax' gives a probability for each digit.
    Dense(10, activation='softmax')   
])

print("TensorFlow model built successfully!")

  super().__init__(**kwargs)


TensorFlow model built successfully!


In [4]:
# Step 4: Compile the model
# Here we define the optimizer, how to measure loss, and what metric to track.
model_tf.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Step 5: Train the model
# We show the model the training data for 5 'epochs' (5 full passes).
print("Starting training...")
model_tf.fit(x_train, y_train, epochs=5)
print("Training finished.")

# Step 6: Evaluate the model
# Let's see how well it performs on the test data it has never seen before.
print("\nEvaluating on test data:")
loss, accuracy = model_tf.evaluate(x_test, y_test, verbose=2)
print(f"\nTest Accuracy: {accuracy*100:.2f}%")

Starting training...
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 3ms/step - accuracy: 0.9257 - loss: 0.2582
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.9667 - loss: 0.1134
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9763 - loss: 0.0784
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 7ms/step - accuracy: 0.9817 - loss: 0.0588
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 6ms/step - accuracy: 0.9857 - loss: 0.0461
Training finished.

Evaluating on test data:
313/313 - 2s - 6ms/step - accuracy: 0.9742 - loss: 0.0821

Test Accuracy: 97.42%


### 🎯 Practice Task (TensorFlow/Keras)

Our model has one hidden layer with 128 neurons.

🧪 **Try this:** In the cell below, create a new model (you can call it `model_tf_2`) with **256 neurons** in the hidden layer. Then compile, train, and evaluate it. Does the accuracy improve?

In [5]:
# Your code here to build and train a model with 256 neurons in the Dense layer.

# Topic 2: PyTorch 🔥

**PyTorch** is another hugely popular open-source machine learning library, developed by Facebook's AI Research lab. It's known for its flexibility and more 'Python-like' feel, which makes it a favorite in the research community. 

You'll notice the process is a bit more manual, giving you more control.

In [7]:
# Step 1: Import PyTorch libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Step 2: Define transformations and load the data
# We define how to prepare the data: convert it to a tensor and normalize it.
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

# PyTorch has built-in data loaders for MNIST.
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

print("PyTorch data loaded successfully.")

100.0%
100.0%
100.0%
100.0%


PyTorch data loaded successfully.


In [8]:
# Step 3: Build the model architecture using a class
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(28*28, 128) # Input is 784, output is 128
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10) # Input is 128, output is 10

    def forward(self, x):
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

model_pytorch = Net()
print("PyTorch model built successfully!")

PyTorch model built successfully!


In [9]:
# Step 4: Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_pytorch.parameters(), lr=0.001)

# Step 5: Train the model (The Training Loop)
# In PyTorch, you write the training loop yourself.
print("Starting training...")
for epoch in range(5):
    running_loss = 0.0
    for images, labels in train_loader:
        optimizer.zero_grad()    # Reset the gradients
        outputs = model_pytorch(images) # Forward pass
        loss = criterion(outputs, labels) # Calculate loss
        loss.backward()          # Backward pass (calculate gradients)
        optimizer.step()         # Update weights
        running_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}")
print("Training finished.")

Starting training...
Epoch 1, Loss: 0.39683890608010264
Epoch 2, Loss: 0.20958272183437082
Epoch 3, Loss: 0.1521850965702648
Epoch 4, Loss: 0.12384487968335338
Epoch 5, Loss: 0.10275054311077954
Training finished.


In [10]:
# Step 6: Evaluate the model
print("\nEvaluating on test data:")
correct = 0
total = 0
with torch.no_grad(): # We don't need to calculate gradients during evaluation
    for images, labels in test_loader:
        outputs = model_pytorch(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Test Accuracy: {accuracy:.2f}%')


Evaluating on test data:
Test Accuracy: 96.57%


### 🎯 Practice Task (PyTorch)

The training loop ran for 5 epochs.

🧪 **Try this:** Copy the training and evaluation code cells below. Change `range(5)` to `range(3)` to train for only 3 epochs. How does this affect the final accuracy?

In [11]:
# Your code here to train and evaluate the PyTorch model for 3 epochs.

# Topic 3: Keras (Standalone) ✨

**Keras** started as a high-level, user-friendly API that could run on top of different backends like TensorFlow, Theano, or CNTK. It's famous for allowing for fast and easy model building. 

While it is now integrated into TensorFlow, you can still use it as a separate library. The code will look *very* similar to our first TensorFlow example, with some minor differences.

In [12]:
# Step 1: Import Keras libraries
# Note: You might need to install keras separately (`pip install keras`)
# if it's not already part of your TensorFlow installation.
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.datasets import mnist
from keras.utils import to_categorical

# Step 2: Load and preprocess the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize the pixel values
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# One-hot encode the labels. (e.g., 5 -> [0,0,0,0,0,1,0,0,0,0])
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

print("Standalone Keras data loaded and preprocessed.")

Standalone Keras data loaded and preprocessed.


In [13]:
# Step 3 & 4: Build and Compile the model
# This part is almost identical to the tf.keras example!
model_keras = Sequential()
model_keras.add(Flatten(input_shape=(28, 28)))
model_keras.add(Dense(128, activation='relu'))
model_keras.add(Dense(10, activation='softmax'))

# A key difference from the first example is the loss function.
# Since we one-hot encoded the labels, we use 'categorical_crossentropy'.
model_keras.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Standalone Keras model built and compiled!")

  super().__init__(**kwargs)


Standalone Keras model built and compiled!


In [14]:
# Step 5 & 6: Train and Evaluate the model
print("Starting training...")
model_keras.fit(x_train, y_train, epochs=5, batch_size=128)

print("\nEvaluating on test data:")
score = model_keras.evaluate(x_test, y_test)
print(f'Test loss: {score[0]:.4f}')
print(f'Test accuracy: {score[1]*100:.2f}%')

Starting training...
Epoch 1/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.9009 - loss: 0.3578
Epoch 2/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.9518 - loss: 0.1696
Epoch 3/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.9659 - loss: 0.1196
Epoch 4/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.9740 - loss: 0.0910
Epoch 5/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.9786 - loss: 0.0734

Evaluating on test data:
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9725 - loss: 0.0889
Test loss: 0.0889
Test accuracy: 97.25%


### 🎯 Practice Task (Standalone Keras)

The `batch_size` determines how many images the model looks at before updating its weights. We used 128.

🧪 **Try this:** Change the `batch_size` in the `model.fit()` call to `32`. How does this affect the training time per epoch and the final accuracy? (Smaller batches often lead to longer training time but can sometimes improve accuracy).

In [None]:
# Your code here to train the Keras model with a batch_size of 32.

## 🚀 Final Revision Assignment

Congratulations! You've successfully built and trained an image classifier in three different ways. You've seen that the core ideas are universal:

**Load Data -> Define Model -> Train Model -> Evaluate Model**

Here are a few challenges to solidify your understanding. Try to solve at least three!

1.  **Deeper Model:** Go back to the first **TensorFlow/Keras** model. Add a *second* hidden `Dense` layer with 64 neurons (and a `'relu'` activation) between the existing 128-neuron layer and the final output layer. Does making the model 'deeper' improve performance?

2.  **Optimizer Choice:** In all three models, we used the 'Adam' optimizer. Research another optimizer like `'sgd'` (Stochastic Gradient Descent). Try swapping `optimizer='adam'` with `optimizer='sgd'` in one of the Keras models. How does it affect the training and final accuracy?

3.  **PyTorch Learning Rate:** The `lr=0.001` in the PyTorch optimizer is the learning rate. This controls how big of a step the optimizer takes. Change it to a much larger value like `lr=0.1`. What happens? Does the model fail to learn? 

4.  **Data Normalization Impact:** We divided the pixel values by `255.0` to scale them between 0 and 1. Comment out this line in one of the Keras models and retrain it. Does the model still work? (Hint: It will likely struggle a lot!).

5.  **Predict a Single Image:** After training one of the models, write code to select just *one* image from the `x_test` dataset, make a prediction on it, and print both the predicted digit and the actual correct digit (`y_test`).

## ✅ Well Done!

You've taken a massive step in your AI journey. Understanding how to implement models in different frameworks is a critical skill. The best way to learn is by experimenting, so feel free to change parameters and see what happens.

Keep learning and stay curious!