In [1]:
# import libraries
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
import copy
from sklearn.model_selection import train_test_split


# NEW! for importing data
import torchvision

import matplotlib.pyplot as plt
from IPython import display
display.set_matplotlib_formats('svg')

  display.set_matplotlib_formats('svg')


In [2]:
# use GPU if available
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Import the data

The code snippet demonstrates how to download the EMNIST dataset, specifically its letters and digits splits, using PyTorch's `torchvision.datasets` module. The EMNIST dataset extends the original MNIST dataset to include handwritten letters, making it a valuable resource for tasks that require recognizing both digits and letters. Here's how the dataset downloading process is carried out for both subsets:

### Dataset Downloading Process
- **Letter Dataset**: The `torchvision.datasets.EMNIST` function is called with the `split` parameter set to `'letters'`. This instructs the function to download the letters part of the EMNIST dataset. The `root` parameter specifies the directory (`'emnist'`) where the dataset should be saved, and `download=True` enables the automatic downloading of the data if it's not already present in the specified directory. The downloaded letters dataset is stored in the variable `letterdata`.
- **Number Dataset**: Similarly, the digits part of the EMNIST dataset is downloaded by setting the `split` parameter to `'digits'`. The rest of the parameters are identical to the letters dataset download process. The variable `numberdata` holds the downloaded digits dataset.

### Utility
Downloading both the letters and digits splits of the EMNIST dataset prepares the data for tasks involving the recognition of handwritten characters beyond the original MNIST digits. This capability is especially useful in applications that require a broader understanding of handwritten text, such as automated form processing or educational software that assists in handwriting learning.

This straightforward method of accessing and downloading standardized datasets underscores the convenience provided by PyTorch and its accompanying `torchvision` module, facilitating quick setup for machine learning and deep learning experiments.


In [4]:
pip install --upgrade torchvision


Collecting torchvision
  Downloading torchvision-0.17.2-cp310-cp310-win_amd64.whl (1.2 MB)
     ---------------------------------------- 1.2/1.2 MB 6.1 MB/s eta 0:00:00
Collecting torch==2.2.2
  Downloading torch-2.2.2-cp310-cp310-win_amd64.whl (198.6 MB)
     ------------------------------------- 198.6/198.6 MB 4.8 MB/s eta 0:00:00
Collecting typing-extensions>=4.8.0
  Downloading typing_extensions-4.11.0-py3-none-any.whl (34 kB)
Installing collected packages: typing-extensions, torch, torchvision
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.4.0
    Uninstalling typing_extensions-4.4.0:
      Successfully uninstalled typing_extensions-4.4.0
  Attempting uninstall: torch
    Found existing installation: torch 2.1.0
    Uninstalling torch-2.1.0:
      Successfully uninstalled torch-2.1.0
Note: you may need to restart the kernel to use updated packages.


ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\fafzali\\AppData\\Local\\anaconda3\\Lib\\site-packages\\~orch\\lib\\asmjit.dll'
Consider using the `--user` option or check the permissions.



In [6]:
# Import the requests module to handle HTTP requests
import requests

# The URL from which the EMNIST dataset will be downloaded
url = 'https://cloudstor.aarnet.edu.au/plus/s/ZNmuFiuQTqZlu9W/download'

# Send a GET request to the specified URL
response = requests.get(url)

# Ensure the request was successful (HTTP status code 200)
if response.status_code == 200:
    # Open a file in binary write mode. The 'wb' parameter is crucial for binary files.
    with open('/mnt/data/emnist.zip', 'wb') as file:
        # Write the content of the response to the file.
        file.write(response.content)
    print("Download successful. File saved as '/mnt/data/emnist.zip'.")
else:
    print(f"Failed to download the file. HTTP status code: {response.status_code}")


Failed to download the file. HTTP status code: 410


In [7]:
from torchvision.datasets import EMNIST

emnist = EMNIST(root='./emnist_data/', split = 'letters', train=True, transform=None, download=True).data.numpy()

indexes = np.arange(len(emnist))
np.random.shuffle(indexes)

samples = emnist[indexes[:N]]

Downloading https://www.nist.gov/itl to ./emnist_data/EMNIST\raw\gzip.zip


100%|████████████████████████████████████████████████████████████████████████████| 112768/112768 [00:00<00:00, 2189737.38it/s]


RuntimeError: File not found or corrupted.

In [8]:
trainset = torchvision.datasets.EMNIST(root='Datasets/EMNIST', split='letters', train=True, download=True)
testset = torchvision.datasets.EMNIST(root='Datasets/EMNIST', split='letters', train=False, download=True)

Downloading https://www.nist.gov/itl to Datasets/EMNIST\EMNIST\raw\gzip.zip


100%|████████████████████████████████████████████████████████████████████████████| 112768/112768 [00:00<00:00, 1499477.77it/s]


RuntimeError: File not found or corrupted.

The code snippet details the process of applying transformations to the EMNIST letters dataset, preparing it for use with convolutional neural networks (CNNs) in PyTorch. This process includes data cleaning, tensor reshaping, normalization, dataset splitting, and DataLoader creation. Here's a breakdown of each step:

### Data Preparation and Transformations
- **Removing N/A Class**: The EMNIST dataset includes a placeholder class for 'N/A'. This class is removed by selecting all classes except the first (`letterCategories = letterdata.classes[1:]`) and adjusting the labels accordingly (`labels = copy.deepcopy(letterdata.targets)-1`).

- **Tensor Reshaping and Normalization**: The images are reshaped into a 4-dimensional tensor suitable for CNNs, with dimensions representing the batch size, number of channels (1 for grayscale images), and image height and width (28x28 pixels). The data type is also converted from `int8` to `float`, and the pixel values are normalized to a range of `[0, 1]` by dividing by the maximum pixel value in the dataset.

### Dataset Splitting
- The dataset is split into training and testing sets using `train_test_split`, with 10% of the data reserved for testing. This split allows for the evaluation of the model on unseen data.

### DataLoader Creation
- **PyTorch Datasets**: The training and testing sets are wrapped into `TensorDataset` objects, pairing the images with their corresponding labels. This encapsulation facilitates the handling of data and labels together.
  
- **DataLoaders**: DataLoader objects for the training and testing sets are instantiated (`letter_train_loader` and `letter_test_loader`). For the training DataLoader, `batch_size` is set to 32, `shuffle=True` ensures the data is shuffled to promote model generalization, and `drop_last=True` discards the last batch if it contains fewer samples than the specified batch size. The testing DataLoader retrieves the entire test set in a single batch, which is practical for model evaluation purposes.

### Utility
This procedure showcases a comprehensive approach to dataset preparation for deep learning tasks, emphasizing the importance of data cleaning, normalization, and batching. By converting the EMNIST letters dataset into a format compatible with CNN architectures and organizing it into manageable batches, the code lays the groundwork for efficient model training and evaluation.


In [None]:
# transformations on the letter data

# remove N/A class
letterCategories = letterdata.classes[1:]
labels = copy.deepcopy(letterdata.targets)-1

# transform to 4D tensor for conv layers (and transform from int8 to float)
letterImages = letterdata.data.view([letterdata.data.shape[0],1,28,28]).float()
letterImages /= torch.max(letterImages)


# split the images and convert to dataloaders
train_data,test_data, train_labels,test_labels = train_test_split(letterImages, labels, test_size=.1)

# convert into PyTorch Datasets
train_data = torch.utils.data.TensorDataset(train_data,train_labels)
test_data  = torch.utils.data.TensorDataset(test_data,test_labels)

# translate into dataloader objects
batchsize = 32
letter_train_loader = DataLoader(train_data,batch_size=batchsize,shuffle=True,drop_last=True)
letter_test_loader  = DataLoader(test_data,batch_size=test_data.tensors[0].shape[0])

The code snippet outlines the process of transforming the EMNIST digits dataset for compatibility with convolutional neural network (CNN) architectures in PyTorch, including dataset splitting and DataLoader creation. This process ensures the data is in the correct format for efficient model training and evaluation. Here’s a step-by-step guide:

### Data Transformation for CNNs
- **Tensor Reshaping and Normalization**: The digits images are first reshaped into a 4-dimensional tensor suitable for CNNs. This tensor has dimensions indicating the batch size, number of channels (1 for grayscale), and the dimensions of the images (28x28 pixels). The pixel values of the images are converted from `int8` to `float` for processing in neural networks and normalized to a `[0, 1]` range by dividing by the maximum value in the dataset.

### Dataset Splitting
- **Training and Testing Split**: The reshaped and normalized images are split into training and testing sets using the `train_test_split` function, with 10% of the data allocated for testing. This split facilitates model validation on unseen data, ensuring that the model's performance is robust and generalizable.

### DataLoader Creation
- **PyTorch Datasets**: Both the training and testing sets are wrapped into `TensorDataset` objects, pairing each image with its corresponding label. This step is crucial for handling the data and labels together efficiently during the training process.
  
- **DataLoader Objects**: DataLoader instances for the training and testing data (`number_train_loader` and `number_test_loader`) are created with specified batch sizes. For the training DataLoader, `shuffle=True` randomizes the order of the data, promoting model generalization, and `drop_last=True` ensures that all batches have a consistent number of samples by discarding the last incomplete batch. The testing DataLoader is configured to process the entire test set in a single batch, optimizing for evaluation speed and simplicity.

### Utility
By converting the EMNIST digits dataset into a format that is directly usable by CNNs and organizing it into efficiently manageable batches, this code facilitates straightforward model training, testing, and evaluation. The careful preparation and batching of the dataset exemplify best practices in data preprocessing for deep learning, ensuring that models can learn from and adapt to the data effectively.


In [None]:
### transformations on numbers data

# transform to 4D tensor for conv layers (and transform from int8 to float)
numberImages = numberdata.data.view([numberdata.data.shape[0],1,28,28]).float()
numberImages /= torch.max(numberImages)


# split the images and convert to dataloaders
train_data,test_data, train_labels,test_labels = train_test_split(numberImages, numberdata.targets, test_size=.1)

# convert into PyTorch Datasets
train_data = torch.utils.data.TensorDataset(train_data,train_labels)
test_data  = torch.utils.data.TensorDataset(test_data,test_labels)

# translate into dataloader objects
batchsize = 32
number_train_loader = DataLoader(train_data,batch_size=batchsize,shuffle=True,drop_last=True)
number_test_loader  = DataLoader(test_data,batch_size=test_data.tensors[0].shape[0])

In [None]:
# visualize some letters
fig,axs = plt.subplots(3,7,figsize=(13,6))

# get a batch of letter data
X,y = next(iter(letter_train_loader))

for i,ax in enumerate(axs.flatten()):
  
  # extract the image and its target letter
  I = np.squeeze( X[i,:,:] )
  letter = letterCategories[y[i]]
  
  # visualize
  ax.imshow(I.T,cmap='gray',vmin=0,vmax=1)
  ax.set_title('The letter "%s"'%letter,fontsize=10)
  ax.set_xticks([])
  ax.set_yticks([])

plt.show()

In [None]:
# visualize some numbers
fig,axs = plt.subplots(3,7,figsize=(13,6))

# get a batch of number data
X,y = next(iter(number_train_loader))

for i,ax in enumerate(axs.flatten()):
  
  # extract the image and its target letter
  I = np.squeeze( X[i,:,:] )
  number = y[i].item()
  
  # visualize
  ax.imshow(I.T,cmap='gray',vmin=0,vmax=1)
  ax.set_title('The number "%s"'%number,fontsize=10)
  ax.set_xticks([])
  ax.set_yticks([])

plt.show()

# Create the DL model

The provided code snippet details the creation of a PyTorch neural network model, `emnistnet`, designed for classifying EMNIST dataset images. This model incorporates both convolutional layers for feature extraction and linear layers for classification, making it well-suited for tasks involving image data. Here's an in-depth explanation of its architecture and functionalities:

### Model Architecture
- **Feature Map Layers**:
  - The model begins with two convolutional layers (`self.conv1` and `self.conv2`), each followed by a max-pooling operation to reduce the spatial dimensions by half, and batch normalization to stabilize the learning by normalizing the layer outputs.
  - The first convolutional layer expands the input from 1 channel to 6, applying a 3x3 kernel with padding to maintain the spatial dimensions. The second convolutional layer maintains the channel depth at 6, further processing the features.
  
- **Linear Decision Layers**:
  - After feature extraction, the data is flattened and passed through two linear layers (`self.fc1` and `self.fc2`). The first linear layer reduces the dimensionality to 50, and the second maps these features to the 26 output classes corresponding to the letters in the EMNIST letters dataset.
  
- **Activation Functions**: Leaky ReLU is used after batch normalization in both convolutional blocks and the first linear layer to introduce non-linearity, allowing the model to learn complex patterns.

### Forward Pass
- The `forward` method defines the data flow through the network. Optional printing controlled by `printtoggle` can provide insights into the tensor shapes at various stages, aiding in understanding the transformation of data through the model.

### Model Instantiation, Loss Function, and Optimizer
- An instance of the `emnistnet` class is created, equipped with the capability to print tensor sizes during the forward pass if `printtoggle` is set to True.
- The model uses cross-entropy loss (`nn.CrossEntropyLoss()`), suitable for multi-class classification tasks.
- Adam optimizer is chosen for updating the model parameters, with a learning rate of 0.001, balancing fast learning while avoiding overshooting the minima.

### Utility
This model demonstrates a structured approach to building convolutional neural networks in PyTorch, tailored for classifying images into multiple categories. By combining convolutional layers for automatic feature extraction with linear layers for decision making, the `emnistnet` provides a solid foundation for tackling image classification problems. This setup showcases best practices in neural network design, including the use of batch normalization and leaky ReLU activation functions to improve training stability and model performance.


In [None]:
# create a class for the model
def makeTheNet(printtoggle=False):

  class emnistnet(nn.Module):
    def __init__(self,printtoggle):
      super().__init__()
      
      # print toggle
      self.print = printtoggle

      ### -------------- feature map layers -------------- ###
      # first convolution layer
      self.conv1  = nn.Conv2d(1,6,3,padding=1)
      self.bnorm1 = nn.BatchNorm2d(6) # input the number of channels in this layer
      # output size: (28+2*1-3)/1 + 1 = 28/2 = 14 (/2 b/c maxpool)

      # second convolution layer
      self.conv2  = nn.Conv2d(6,6,3,padding=1)
      self.bnorm2 = nn.BatchNorm2d(6) # input the number of channels in this layer
      # output size: (14+2*1-3)/1 + 1 = 14/2 = 7 (/2 b/c maxpool)

      
      ### -------------- linear decision layers -------------- ###
      self.fc1 = nn.Linear(7*7*6,50)
      self.fc2 = nn.Linear(50,26)

    def forward(self,x):
      
      if self.print: print(f'Input: {list(x.shape)}')
      
      # first block: convolution -> maxpool -> batchnorm -> relu
      x = F.max_pool2d(self.conv1(x),2)
      x = F.leaky_relu(self.bnorm1(x))
      if self.print: print(f'First CPR block: {list(x.shape)}')

      # second block: convolution -> maxpool -> batchnorm -> relu
      x = F.max_pool2d(self.conv2(x),2)
      x = F.leaky_relu(self.bnorm2(x))
      if self.print: print(f'Second CPR block: {list(x.shape)}')

      # reshape for linear layer
      nUnits = x.shape.numel()/x.shape[0]
      x = x.view(-1,int(nUnits))
      if self.print: print(f'Vectorized: {list(x.shape)}')
      
      # linear layers
      x = F.leaky_relu(self.fc1(x))
      x = self.fc2(x)
      if self.print: print(f'Final output: {list(x.shape)}')

      return x

  # create the model instance
  net = emnistnet(printtoggle)
  
  # loss function
  lossfun = nn.CrossEntropyLoss()

  # optimizer
  optimizer = torch.optim.Adam(net.parameters(),lr=.001)

  return net,lossfun,optimizer

# Create a function that trains the model

The code defines a function, `function2trainTheModel`, to train a given neural network model on specified training data and evaluate its performance on test data across a number of epochs. The function meticulously records both loss and error rates for each epoch, providing a comprehensive view of the model's learning progress and generalization capability. Here's a breakdown of its components:

### Model Preparation and Training Loop
- **Device Allocation**: The model (`net`) is transferred to a GPU if available (`net.to(device)`), optimizing computational efficiency.
- **Initialization**: Arrays for tracking training and test losses (`trainLoss`, `testLoss`) and error rates (`trainErr`, `testErr`) are initialized to zero tensors with lengths equal to the number of epochs (`numepochs`).

### Epoch Iteration
- For each epoch, the function iterates over batches of data from `train_loader`, performing the following steps:
  - **Forward Pass**: The model processes input data (`X`), generating predictions (`yHat`).
  - **Loss Calculation**: The loss between the predictions and true labels (`y`) is computed using a predefined loss function (`lossfun`).
  - **Backpropagation**: Gradients are calculated and the optimizer updates the model parameters.
  - **Metrics Recording**: The loss and error rate for each batch are recorded. The error rate is calculated as the percentage of incorrect predictions.

### Test Performance Evaluation
- After processing all training batches for an epoch, the model's performance is evaluated on the test dataset (`test_loader`):
  - **Model Evaluation Mode**: The model is switched to evaluation mode (`net.eval()`), disabling dropout and batch normalization effects that are specific to training.
  - **Loss and Error Calculation**: Loss and error rate on the test dataset are calculated similarly to the training phase but within a `torch.no_grad()` context to prevent gradient computation.
  
### Metrics Aggregation
- At the end of each epoch, average loss and error rate for training batches are stored, along with the loss and error rate from the test dataset.

### Function Output
- The function returns the recorded training and test losses, error rates, and the trained model. This comprehensive output enables detailed analysis of the model's training progress and its ability to generalize to unseen data.

### Utility
This training function exemplifies a structured approach to neural network training and evaluation in PyTorch, emphasizing careful monitoring of both loss and accuracy metrics. By including both training and test phases within each epoch, it provides a holistic view of the model's performance, guiding further tuning and improvements.


In [None]:
# a function that trains the model

def function2trainTheModel(net,optimizer,train_loader,test_loader,numepochs=10):

  # send the model to the GPU
  net.to(device)

  # initialize losses
  trainLoss = torch.zeros(numepochs)
  testLoss  = torch.zeros(numepochs)
  trainErr  = torch.zeros(numepochs)
  testErr   = torch.zeros(numepochs)


  # loop over epochs
  for epochi in range(numepochs):

    # loop over training data batches
    net.train()
    batchLoss = []
    batchErr  = []
    for X,y in train_loader:

      # push data to GPU
      X = X.to(device)
      y = y.to(device)

      # forward pass and loss
      yHat = net(X)
      loss = lossfun(yHat,y)

      # backprop
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      # loss and error from this batch
      batchLoss.append(loss.item())
      batchErr.append( torch.mean((torch.argmax(yHat,axis=1) != y).float()).item() )
    # end of batch loop...

    # and get average losses and error rates across the batches
    trainLoss[epochi] = np.mean(batchLoss)
    trainErr[epochi]  = 100*np.mean(batchErr)



    ### test performance
    net.eval()
    X,y = next(iter(test_loader)) # extract X,y from test dataloader

    # push data to GPU
    X = X.to(device)
    y = y.to(device)

    with torch.no_grad(): # deactivates autograd
      yHat = net(X)
      loss = lossfun(yHat,y)
      
    # get loss and error rate from the test batch
    testLoss[epochi] = loss.item()
    testErr[epochi]  = 100*torch.mean((torch.argmax(yHat,axis=1) != y).float()).item()

  # end epochs

  # function output
  return trainLoss,testLoss,trainErr,testErr,net

# Train the model on the letters

In [None]:
# create a new model
letterNet,lossfun,optimizer = makeTheNet()

trainLoss,testLoss,trainErr,testErr,letterNet = function2trainTheModel(
                                                letterNet,optimizer,letter_train_loader,letter_test_loader,5)

In [None]:
fig,ax = plt.subplots(1,2,figsize=(16,5))

ax[0].plot(trainLoss,'s-',label='Train')
ax[0].plot(testLoss,'o-',label='Test')
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel('Loss (MSE)')
ax[0].set_title('Model loss')

ax[1].plot(trainErr,'s-',label='Train')
ax[1].plot(testErr,'o-',label='Test')
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel('Error rates (%)')
ax[1].set_title(f'Final model test error rate: {testErr[-1]:.2f}%')
ax[1].legend()

plt.show()

# Test the model on the number data

In [None]:
# extract X,y from NUMBER test dataloader
X,y = next(iter(number_test_loader))
X = X.to(device)
y = y.to(device)

letterNet.eval()
yHat = letterNet(X)

# the test
numberAcc = 100*torch.mean((torch.argmax(yHat,axis=1)!=y).float())

print(f'numberNet error rate on NUMBER data: {numberAcc:.2f}%')

# Fine-tune the model with one training batch

The code snippet demonstrates a method for transferring learned weights from one neural network model (`letterNet`) to another (`numberNet`). This technique is commonly used in transfer learning, where a model trained on one task is adapted for a related but distinct task. The code ensures that the target model (`numberNet`) inherits the knowledge encoded in the weights of the source model (`letterNet`). Here’s a detailed overview:

### Target Model Creation
- **Model Instantiation**: A new model, referred to as `numberNet`, is created using the `makeTheNet()` function. Alongside the model, the function returns a loss function (`lossfun`) and an optimizer (`optimizer`). The `makeTheNet()` function is designed to provide a model suitable for tasks similar to those `letterNet` was trained on, but not necessarily limited to the same exact task.

### Weight Transfer Process
- **Iterative Weight Copying**: The code iterates over the parameters (weights and biases) of both `numberNet` (target) and `letterNet` (source) simultaneously using the `zip` function. For each pair of corresponding parameters, the weights from the source model are copied to the target model. This copying is performed using `copy.deepcopy()`, ensuring that the original weights are duplicated exactly without being linked to the source model.

### Practical Implications and Considerations
- **Immediate Application**: After the weight transfer, `numberNet` is immediately enriched with the knowledge gained by `letterNet` during its training process. This can significantly accelerate learning on new tasks, especially if they are closely related to the original task `letterNet` was trained on.
- **Flexibility in Transfer Learning**: This approach highlights the flexibility of neural networks in leveraging pre-trained models. By adjusting only the final layers or fine-tuning the transferred weights, `numberNet` can be effectively applied to new domains or tasks.

### Ensuring Model Compatibility
- It's crucial that `numberNet` and `letterNet` share a compatible architecture, at least in the layers where weights are being transferred. This compatibility ensures that the weight dimensions match, allowing for a successful transfer process.

This technique exemplifies the power of transfer learning in machine learning and deep learning, showcasing how pre-existing models can be repurposed to bootstrap performance on new tasks, saving both time and computational resources.


In [None]:
# create the target model
numberNet,lossfun,optimizer = makeTheNet()

# then replace all the weights in TARGET model from SOURCE model
for target,source in zip(numberNet.named_parameters(),letterNet.named_parameters()):
  target[1].data = copy.deepcopy( source[1].data )

In [None]:
# check out the network
print(numberNet)
print(' ')

# and the final layer
print(numberNet.fc2)

# replace the final layer to have 10 outputs instead of 26
numberNet.fc2 = nn.Linear(50,10)

# and check it again
print(' ')
print(numberNet)

In [None]:
# now re-train the network on the numbers data

trainLoss,testLoss,trainErr,testErr,numberNet = function2trainTheModel(
                                                   numberNet,optimizer,number_train_loader,number_test_loader,1)

In [None]:
print(f'numberNet TRAIN error rate: {trainErr[-1]:.2f}%')
print(f'numberNet TEST error rate: {testErr[-1]:.2f}%')

# Try again, only train output layer

The code snippet outlines an advanced application of transfer learning where the weights of a pre-trained neural network model (`letterNet`) are transferred to a new target model (`numberNet2`). After the weight transfer, the architecture of the target model is slightly modified, and certain layers are frozen to prevent their weights from being updated during further training. Here’s an in-depth explanation:

### Target Model Creation
- **Model Instantiation**: A new model, `numberNet2`, is created using the `makeTheNet()` function, which also returns a loss function (`lossfun`) and an optimizer (`optimizer`). This setup prepares `numberNet2` for subsequent training or fine-tuning.

### Weight Transfer Process
- **Iterative Weight Copying**: The weights and biases from `letterNet` (source model) are iteratively copied to `numberNet2` (target model) using a loop that iterates over the named parameters of both models. The `copy.deepcopy()` function ensures an exact, deep copy of the weights, preserving the learned features from `letterNet` without linking the two sets of parameters.

### Architecture Adjustment
- **Output Layer Modification**: The final fully connected layer (`fc2`) of `numberNet2` is redefined to adjust the number of output units to 10, matching the typical requirement for classifying MNIST digits. This modification tailors the model's output to the new task.

### Layer Freezing
- **Freezing Specific Layers**: Convolutional (`conv`) and batch normalization (`bnorm`) layers in `numberNet2` are frozen by setting `requires_grad` to `False` for their parameters. This action prevents these layers from being updated during further training, which can be beneficial when the transferred features are already well-suited to the new task and only the final layers need fine-tuning.

### Utility and Implications
- This approach leverages the representational power of `letterNet`, trained on a potentially related task, to jump-start the performance of `numberNet2` on a new task. By freezing certain layers, the model can maintain the generic feature-detecting capabilities of its early layers while adapting its higher-level representations and output layer to the specifics of the new task, such as classifying digits.
- Fine-tuning a model in this manner can lead to significant improvements in learning efficiency and model performance, especially when labeled data for the new task is limited.

The method exemplified in this code demonstrates a strategic blend of weight transfer, architectural adjustment, and selective parameter freezing—a potent combination for adapting neural networks to new tasks with minimal training.


In [None]:
# create the target model
numberNet2,lossfun,optimizer = makeTheNet()

# then replace all the weights in TARGET model from SOURCE model
for target,source in zip(numberNet2.named_parameters(),letterNet.named_parameters()):
  target[1].data = copy.deepcopy( source[1].data )

# adjust number of output units
numberNet2.fc2 = nn.Linear(50,10)

# freeze convolution and batch-norm layers
for p in numberNet2.named_parameters():
  if ('conv' in p[0]) or ('bnorm' in p[0]):
    p[1].requires_grad = False

In [None]:
# now re-train the network on the numbers data

trainLoss,testLoss,trainErr,testErr,numberNet2 = function2trainTheModel(
                                                   numberNet2,optimizer,number_train_loader,number_test_loader,1)

In [None]:
print(f'numberNet TRAIN error rate: {trainErr[-1]:.2f}%')
print(f'numberNet TEST error rate: {testErr[-1]:.2f}%')