In [3]:
!pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
import json
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.datasets import CocoDetection
from torch.utils.data import DataLoader
import numpy as np
import torch.nn.functional as F

## Hyperparameters:
- `num_epochs (10)`: Defines the total number of training cycles the model goes through the entire dataset. It affects how well the model learns from the data. Increasing this number gives the model more opportunities to update its weights but can lead to overfitting if too high.
- `batch_size (4`): The number of training examples utilized in one iteration to update the model's weights. A smaller batch size may lead to faster convergence but can be noisy, whereas a larger batch size offers a more accurate estimate of the dataset's gradient, at the cost of higher computational expense.
- `learning_rate (0.001)`: Controls the size of the step the model takes during the optimization of its weights. A higher learning rate can cause the model to converge quickly but may overshoot the minimum, while a lower learning rate ensures more precise convergence at the risk of getting stuck or taking too long to converge.

In [5]:
num_epochs = 10
batch_size = 4
learning_rate = 0.001

## Transformations
The transformation sequence applied to preprocess input data includes resizing images, converting them to PyTorch tensors, and normalizing their pixel values. This process ensures consistency and optimal conditions for model training.

### Components of the Transformation Sequence:
- **Resize to 224x224 Pixels:** Ensures all images have a uniform size, which is necessary for models that require specific input dimensions.
- **Conversion to PyTorch Tensors:** Transforms images from a traditional height x width x channels format to a tensor format with channels first. This step also scales pixel values to a [0, 1] range, making them suitable for model processing.
- **Normalization:** Adjusts the pixel values based on predetermined mean and standard deviation values for each of the RGB channels. This step is crucial for aligning the input data with the conditions expected by models trained on similar datasets. Normalization helps in stabilizing learning and can lead to faster convergence and improved model performance.

These preprocessing steps are critical for preparing the data, ensuring that it meets the model's requirements and is in a form that facilitates efficient learning.

In [6]:
# Define transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images to 224x224 pixels
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalization
])


## Load COCO Dataset
The COCO dataset is utilized for training and testing, involving object detection and segmentation. It includes specifying paths to image directories and their corresponding annotation files in COCO JSON format. The same transformations are applied to both datasets to ensure consistency.

### Key Points:
- **Training and Testing Datasets:** Both datasets are prepared with a series of transformations to format the data correctly for the model, involving resizing, converting to tensors, and normalizing.
- **Annotations:** The COCO JSON format annotations provide detailed object information, crucial for tasks like object detection.

This setup aims to standardize data preparation for effective model training and evaluation, using the widely recognized COCO format for object detection tasks.

In [7]:
train_dataset = CocoDetection(root='./dataset/ZiggoPortStatus-2/train', annFile='./dataset/ZiggoPortStatus-2/train/_annotations.coco.json', transform=transform)
test_dataset = CocoDetection(root='./dataset/ZiggoPortStatus-2/test', annFile='./dataset/ZiggoPortStatus-2/test/_annotations.coco.json', transform=transform)

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!


In [8]:
# Print one image
image, annotations = train_dataset[0]  # Change the index as needed
print(f"Image shape: {image.shape}, Annotations: {annotations}")

Image shape: torch.Size([3, 224, 224]), Annotations: [{'id': 0, 'image_id': 0, 'category_id': 4, 'bbox': [261, 120, 57.5, 40], 'area': 2300, 'segmentation': [], 'iscrowd': 0}, {'id': 1, 'image_id': 0, 'category_id': 4, 'bbox': [250, 155, 81.5, 32.5], 'area': 2648.75, 'segmentation': [], 'iscrowd': 0}, {'id': 2, 'image_id': 0, 'category_id': 2, 'bbox': [260, 199, 61, 40.5], 'area': 2470.5, 'segmentation': [], 'iscrowd': 0}, {'id': 3, 'image_id': 0, 'category_id': 2, 'bbox': [256, 235, 59, 30.5], 'area': 1799.5, 'segmentation': [], 'iscrowd': 0}]


This section calculates and displays the maximum number of labels (annotations) present in a single image within the training dataset. This value, max_labels, is used to determine the necessary amount of padding for uniformity across all input data instances when preparing the dataset for training a model. This step ensures that each input has the same dimensions by padding shorter sequences of labels to match the longest one.

In [9]:
# Get the maximum number of labels for padding
max_labels = max(len(ann) for _, ann in train_dataset)
print(f"\nmax_labels is {max_labels}")


max_labels is 6


This snippet loads category information from a COCO-format annotation file and extracts class names and their IDs. It then prints a list of detected classes, such as "Port-Status", "Eth-Conn", "Eth-Not-Conn", "TEL-Conn", and "TEL-Not-Conn", demonstrating the variety of objects recognized in the dataset.

In [10]:

# Load category information
with open('./dataset/ZiggoPortStatus-2/train/_annotations.coco.json', 'r') as f:
    coco_info = json.load(f)

# Get the class names
class_names = {cat['id']: cat['name'] for cat in coco_info['categories']}

# Print the class names
print("Class Names:")
for class_id, class_name in class_names.items():
    print(f"Class ID: {class_id}, Class Name: {class_name}")


Class Names:
Class ID: 0, Class Name: Port-Status
Class ID: 1, Class Name: Eth-Conn
Class ID: 2, Class Name: Eth-Not-Conn
Class ID: 3, Class Name: TEL-Conn
Class ID: 4, Class Name: TEL-Not-Conn


This code defines a custom collate function for a data loader that deals with batches of images and their annotations, where each image may have a variable number of labels due to differing numbers of objects detected within them. The function:

- Groups together images and their corresponding targets (annotations) from the batch.
- Extracts the category IDs from the annotations to use as labels for each image.
- Pads the labels for each image to ensure that every image in the batch has the same number of labels, using -1 as the padding value. This uniformity is necessary for batch processing in neural network models.
- Returns a batch of images and their padded labels, ready for training or evaluation in a model.

This approach ensures that the data fed into the model is consistently formatted, accommodating the inherent variability in the number of objects per image within datasets used for object detection tasks.

In [11]:
def custom_collate_fn(batch):
    images, targets = zip(*batch)
    
    # Extract category IDs as labels/classes from the target annotations
    labels = [[ann['category_id'] for ann in target] for target in targets]
#    print(f"\nlabels  is {labels}")

    # Pad labels with -1 so that each batch has the same number of labels
    padded_labels = [l + [-1]*(max_labels - len(l)) for l in labels]
#    print(f"\npadded_labels  is {padded_labels}")

    return torch.stack(images), torch.tensor(padded_labels)


This snippet sets up data loaders for both training and testing datasets using the previously defined custom collate function. The data loaders are responsible for efficiently loading the data in batches with the specified batch size and applying the custom collate function to handle batches with a variable number of labels per image. For the training data loader, data is shuffled to ensure diversity in the batches seen by the model during training, which helps in generalization. For the testing data loader, shuffling is disabled to maintain the order of the dataset, which can be important for evaluating model performance. The custom collate function ensures each batch has uniformly shaped targets by padding, facilitating training and evaluation with datasets where images contain different numbers of annotations.

In [12]:
# Create data loaders with custom collate function
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, collate_fn=custom_collate_fn)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, collate_fn=custom_collate_fn)


This code defines a Convolutional Neural Network (CNN) model for image processing tasks. The CNN comprises layers designed to automatically and adaptively learn spatial hierarchies of features from input images. Here's a brief overview of its structure and functionality:

- **Initialization (__init__):** The model initializes with two convolutional layers (conv1 and conv2) and two fully connected (linear) layers (fc1 and fc2). The convolutional layers are responsible for extracting features from the input images, while the fully connected layers perform classification based on these features.
- **Convolutional Layers:** The first layer (conv1) takes an input with 3 color channels (assuming RGB images) and produces 16 feature maps using a kernel size of 3x3 with a stride of 1. The second layer (conv2) takes these 16 feature maps as input and outputs 32 feature maps, also with a 3x3 kernel and stride of 1. These layers help in capturing patterns such as edges, textures, and other visual elements from the images.
- **Pooling:** After each convolutional layer, a max pooling operation reduces the spatial size of the representation, decreasing the number of parameters and computation in the network, and thereby controlling overfitting.
- **Fully Connected Layers:** Before reaching the first fully connected layer (fc1), the feature map is flattened. The fc1 layer has 128 neurons and is followed by the output layer (fc2), which has a number of neurons equal to the maximum number of labels in the dataset, allowing the model to classify images into multiple categories based on the learned features.
- **Forward Pass (forward):** Defines the forward propagation of the input through the CNN. It sequentially applies ReLU activations after convolutional layers, performs max pooling, flattens the output, and passes it through fully connected layers with ReLU activation between them.

The model's architecture is designed to be adjusted based on the specific dimensions of the resized input images and the unique requirements of the dataset (e.g., the number of labels). This flexibility allows the CNN to be tailored to a wide range of image recognition tasks.

In [13]:
# Define CNN model
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, 1)
        self.conv2 = nn.Conv2d(16, 32, 3, 1)
        self.fc1 = nn.Linear(32*54*54, 128)  # Adjust the input size according to your resized image dimensions
        self.fc2 = nn.Linear(128, max_labels)  # Output layer with the maximum number of labels

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 32*54*54)  # Adjust the input size according to your resized image dimensions
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


This continuation sets up the necessary components for training the previously defined Convolutional Neural Network (CNN) model:
- **Model Instantiation:** A new instance of the CNN model is created, preparing it for training and inference. This step initializes the model's weights and biases, setting it up for the upcoming training process.
- **Loss Function:** The `nn.MultiLabelSoftMarginLoss` is specified as the loss function. This loss is suitable for multi-label classification tasks, where each label is treated independently, and the model predicts the presence or absence of each label in the image. The loss function calculates the difference between the model's predictions and the actual target values, guiding the model to adjust its parameters to minimize this difference over time.
- **Optimizer:** The `Adam` optimizer is chosen to update the model's weights based on the computed gradients. Adam is a popular optimization algorithm in deep learning because it combines the advantages of two other extensions of stochastic gradient descent: Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp). It is known for its efficiency in handling sparse gradients on noisy problems. The learning rate (`lr`) is set according to the predefined `learning_rate` hyperparameter, influencing the size of the steps the optimizer takes towards minimizing the loss function.

By defining the model, loss function, and optimizer, the setup for training the CNN on a multi-label image classification task is complete. The next steps typically involve training the model on the training dataset using the defined data loaders and evaluating its performance on the test dataset to measure its accuracy and effectiveness in classifying images according to the specified labels.

In [14]:
# Instantiate the model
model = CNN()

# Define loss function and optimizer
criterion = nn.MultiLabelSoftMarginLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)


This snippet outlines the training process for the Convolutional Neural Network (CNN) model, focusing on handling multi-label image classification. The training loop iterates over the dataset multiple times (epochs), each time processing batches of images and their corresponding labels.

- **Epoch Iteration:** The outer loop iterates through the dataset for a predefined number of epochs, allowing the model to learn from the entire dataset multiple times.
- **Batch Processing:** Within each epoch, the inner loop iterates over batches of images and labels. Each batch is processed through the model to generate predictions (outputs).
- **One-Hot Encoding of Labels:** Since the task involves multi-label classification, labels are converted into a one-hot encoded format. This format matches the model's output, allowing for direct comparison. Labels for each image are encoded such that each label's presence is marked with a 1, while absence (or padding) is marked with a 0.
- **Loss Calculation:** The loss is computed using the model's predictions and the one-hot encoded labels. This loss measures how well the model's predictions match the actual labels, guiding the optimization process.
- **Optimization:** The backward pass calculates gradients for the model's parameters based on the loss. The optimizer then updates the model's weights to minimize the loss, using the learning rate to determine the size of the step towards loss minimization.
- **Logging:** The loop prints out the loss at regular intervals, providing insights into how the model's performance improves over time.
- **Completion:** After training through all epochs, the process concludes, and the model is considered trained, ready for evaluation or deployment.

In [15]:

# Training loop
total_steps = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):

        # Forward pass
        outputs = model(images)
        
        # Convert labels to one-hot encoded tensors
        labels_one_hot = torch.zeros(outputs.shape[0], max_labels)  # Assuming max_labels is the maximum number of labels
        for batch_idx, label_batch in enumerate(labels):
            for idx, label in enumerate(label_batch):
                if label != -1:
                    labels_one_hot[batch_idx, label] = 1
 
#        print(f"\noutput shape after loop is {outputs}")
        
#        print(f"\nlabels_one_hot shape after loop is {labels_one_hot}")
 
        # Calculate loss
        loss = criterion(outputs, labels_one_hot) 

        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

#        if (i+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_steps}], Loss: {loss.item()}')

print('Finished Training')


Epoch [1/10], Step [1/18], Loss: 0.6953925490379333
Epoch [1/10], Step [2/18], Loss: 0.5800410509109497
Epoch [1/10], Step [3/18], Loss: 0.7747972011566162
Epoch [1/10], Step [4/18], Loss: 1.2767571210861206
Epoch [1/10], Step [5/18], Loss: 0.3531678318977356
Epoch [1/10], Step [6/18], Loss: 0.4733739495277405
Epoch [1/10], Step [7/18], Loss: 0.3368566036224365
Epoch [1/10], Step [8/18], Loss: 0.21362152695655823
Epoch [1/10], Step [9/18], Loss: 0.3257046937942505
Epoch [1/10], Step [10/18], Loss: 0.34739813208580017
Epoch [1/10], Step [11/18], Loss: 0.40003806352615356
Epoch [1/10], Step [12/18], Loss: 0.4502297043800354
Epoch [1/10], Step [13/18], Loss: 0.36195579171180725
Epoch [1/10], Step [14/18], Loss: 0.32516318559646606
Epoch [1/10], Step [15/18], Loss: 0.4769258499145508
Epoch [1/10], Step [16/18], Loss: 0.4650411903858185
Epoch [1/10], Step [17/18], Loss: 0.28603094816207886
Epoch [1/10], Step [18/18], Loss: 0.42104488611221313
Epoch [2/10], Step [1/18], Loss: 0.2479106187820

In [16]:

model.eval()            
with torch.no_grad():
    total = 0
    correct = 0           
    
    for i, (images, labels) in enumerate(test_loader):
        # Forward pass
        outputs = model(images)
        print(f"\nlogits is {outputs}")


        # Convert outputs to predicted labels using argmax
        predicted_labels = torch.argmax(outputs, dim=1)
        print(f"\npredicted_labels is {predicted_labels}")

        # Flatten the labels tensor
        labels_flat = labels.view(-1)
        print(f"\nlabels_flat is {labels_flat}")

        # Ensure labels_flat and predicted_labels have compatible shapes for comparison
        predicted_labels = predicted_labels.unsqueeze(1).expand(-1, labels_flat.size(0))
        print(f"\npredicted_labels is {predicted_labels}")

        # Calculate accuracy
        total += labels_flat.size(0)
        correct += torch.sum(labels_flat == predicted_labels.squeeze()).item()

    accuracy = correct / total
    
    print(f'Test Accuracy: {accuracy * 100:.2f}%')


logits is tensor([[-35.4060,   4.8578,  12.3973,  -2.3474,  18.6544, -30.3625],
        [-37.8067,   2.0152,  12.0022,  -0.3005,  16.6834, -28.4175],
        [-40.6194,   5.2102,  13.1429,  -4.4290,  15.2267, -29.0108],
        [-33.5529,   0.2645,   5.4808,  -2.9439,  10.4664, -22.3965]])

predicted_labels is tensor([4, 4, 4, 4])

labels_flat is tensor([ 3,  4,  1,  1,  2,  2,  4,  4,  2,  2,  2,  2,  4,  4,  1,  2,  2, -1,
         3,  2,  2, -1, -1, -1])

predicted_labels is tensor([[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
        [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
        [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
        [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]])

logits is tensor([[-34.2780,   3.6210,   5.6375,   3.3390,  10.4850, -24.4280],
        [-31.9222,   3.1985,   8.7384,   1.1024,  12.9561, -24.2745],
        [-34.6463,  -3.3063,   8.945