### We first implement the baseline model proposed by the paper "Yoga-82: A New Dataset for Fine-grained Classification of Human Poses".

In the paper, the author proposed to modify DenseNet-201 by creating branches for different levels of classifications. 

In this notebook, we will implement variant 2 which branches out before DenseBlock 4 for first and second level classifications as shown in below fig.

![Screenshot 2025-04-01 215751.png](attachment:1d8c94b9-bce9-4965-b012-3744ca501c25.png)

Let's define the model using pytorch:

We first load DenseNet-201. Then we branch out the model before DenseBlock 4.

By referencing the implementation at https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py, we learn that DenseNet-201 has below 12 layers:

0. conv0
1. norm0
2. relu0
3. pool0
4. denseblock1
5. transition1
6. denseblock2
7. transition2
8. denseblock3
9. transition3
10. denseblock4
11. norm5

We will branch out before transition3 for first and second level classifications.

In addition, according to the paper, we will apply batch normalization, ReLU activation and global average pooling for all branches.

In [2]:
import torch
import torch.nn as nn
import torchvision.models as models

class DenseNet201Variant2(nn.Module):

    def __init__(self, num_classes1, num_classes2, num_classes3):
        super(HierarchicalDenseNet201, self).__init__()
        
        # Load DenseNet-201 backbone without any weights
        densenet = models.densenet201(weights=None)
        features_list = list(densenet.features.children())

        # branch out after denseblock3 and before transition3
        self.branch_features = nn.Sequential(*features_list[:9])
        self.main_features = nn.Sequential(*features_list[9:])

        # Head layers for branch and main features
        self.branch_head = nn.Sequential(
            nn.BatchNorm2d(1792),
            nn.ReLU(inplace=True)
        )
        self.main_head = nn.Sequential(
            nn.BatchNorm2d(1920),
            nn.ReLU(inplace=True)
        )

        # Global average pooling converts feature maps into a feature vector.
        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))

        # Fully connected layers for classification at each hierarchical level.
        self.fc_level1 = nn.Linear(1792, num_classes1)
        self.fc_level2 = nn.Linear(1792, num_classes2)
        self.fc_level3 = nn.Linear(1920, num_classes3)
    
    def forward(self, x):
        
        # Extract features for the main and branch.
        x_branch = self.branch_features(x)
        x_main = self.main_features(x_branch)
        
        # Process branch features.
        branch_feature = self.branch_head(x_branch)
        branch_feature = self.global_pool(branch_feature).view(x.size(0), -1)
        
        # Process main branch features.
        main_feature = self.main_head(x_main)
        main_feature = self.global_pool(main_feature).view(x.size(0), -1)
        
        # Get predictions from fully connected layers.
        class1 = self.fc_level1(branch_feature)
        class2 = self.fc_level2(branch_feature)
        class3 = self.fc_level3(main_feature)
        
        return class1, class2, class3

In [None]:
Next we will define 