<a href="https://colab.research.google.com/github/msmsd778/Fused_Tile_Partitioning/blob/main/FTP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Implementation of Fused Tile Partitioning (FTP) introduced in the paper titled DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

https://ieeexplore.ieee.org/ielaam/43/8496924/8493499-aam.pdf

In [424]:
import torch
import torch.nn as nn
import numpy as np
from torchvision import models, transforms
from torch.autograd import Variable
from PIL import Image
import requests
import psutil

First We define required functions and modified classes. This snippet defines a function called get_layer_properties that takes a neural network model as input and returns a list of tuples containing properties (kernel size, stride, padding) for convolutional and pooling layers within the model. The function iterates through the child layers of the input model, and for each convolutional (nn.Conv2d) or max pooling (nn.MaxPool2d) layer encountered, it extracts the layer's kernel size, stride, and padding. The function skips the properties of the first layer by setting the skip_first_layer flag. If a layer is of type nn.Sequential, the function recursively calls itself to extract properties from nested layers. The final list of layer properties is then returned.

In [425]:
# Function to get layer properties (kernel size, stride, padding) for convolutional and pooling layers
def get_layer_properties(model):
    properties = []
    skip_first_layer = True  # Skipping the properties of the first layer
    for layer in model.children():
        if isinstance(layer, nn.Conv2d) or isinstance(layer, nn.MaxPool2d):
            if skip_first_layer:
                skip_first_layer = False
                continue
            kernel_size = layer.kernel_size[0] if isinstance(layer.kernel_size, tuple) else layer.kernel_size
            stride = layer.stride[0] if isinstance(layer.stride, tuple) else layer.stride
            padding = layer.padding[0] if isinstance(layer.padding, tuple) else layer.padding
            properties.append((kernel_size, stride, padding))
        elif isinstance(layer, nn.Sequential):
            properties.extend(get_layer_properties(layer))
    return properties


This code defines a function named calculate_input_offsets that takes four coordinates (x1, y1, x2, y2), a neural network model, and the index of a layer within that model as input. It calculates and returns the input offsets for the specified layer based on the given coordinates and the properties of the previous layer (retrieved from the layer_properties list). The function distinguishes between convolutional and pooling layers. Based on the section IV of Deepthings paper, for convolutional layers, it computes the input offsets by considering the kernel size, stride, and the specified coordinates. For pooling layers (specifically max pooling, as indicated by the check for nn.MaxPool2d), it adjusts the offsets accordingly. The resulting input offsets are then returned as four values: x1l_minus1, y1l_minus1, x2l_minus1, and y2l_minus1.

In [426]:
# Calculate the input offsets for a given layer
def calculate_input_offsets(x1, y1, x2, y2, model, layer_idx):
    kernel_size, stride, _ = layer_properties[layer_idx - 1]

    # For convolutional layers
    x1l_minus1 = max(0, stride * x1 - kernel_size // 2)
    y1l_minus1 = max(0, stride * y1 - kernel_size // 2)
    x2l_minus1 = min(stride * x2 + kernel_size // 2, layer_properties[layer_idx - 1][0] - 1)
    y2l_minus1 = min(stride * y2 + kernel_size // 2, layer_properties[layer_idx - 1][1] - 1)

    # For pooling layers
    if isinstance(model.features[layer_idx - 1], nn.MaxPool2d):
        x1l_minus1 = stride * x1
        y1l_minus1 = stride * y1
        x2l_minus1 = min(stride * x2 + stride - 1, layer_properties[layer_idx - 1][0] - 1)
        y2l_minus1 = min(stride * y2 + stride - 1, layer_properties[layer_idx - 1][1] - 1)

    return x1l_minus1, y1l_minus1, x2l_minus1, y2l_minus1

This code defines a function partition_image_with_ftp that partitions an input image tensor into a grid with Fused Tile Partitioning (FTP), considering specified partitions (M, N) and a neural network model. It calculates initial grid boundaries, adjusts them with an overlap factor, and iterates through each partition. For each partition, it recursively computes input offsets through neural network layers. The function extracts corresponding sub-regions from the image tensor, creating a list of partitioned tensors. The result is a grid of image partitions suitable for processing through the neural network with FTP and overlap.

In [427]:
# Function to partition the image into a grid with FTP
def partition_image_with_ftp(image, M, N, model):
    _, _, H, W = image.shape

    # Calculate output offsets
    x1L = [(W * j) // M for j in range(M)]
    y1L = [(H * i) // N for i in range(N)]
    x2L = [(W * (j + 1)) // M for j in range(M)]
    y2L = [(H * (i + 1)) // N for i in range(N)]

    partitions = []
    for i in range(N):
        for j in range(M):
            x1L[j] = max((W * j) // M - overlap, 0)
            y1L[i] = max((H * i) // N - overlap, 0)
            x2L[j] = min((W * (j + 1)) // M + overlap, W)
            y2L[i] = min((H * (i + 1)) // N + overlap, H)

            # Recursive backward traversal to calculate required tile region
            x1, y1, x2, y2 = x1L[j], y1L[i], x2L[j] - 1, y2L[i] - 1
            for l in range(len(layer_properties), 0, -1):
                x1, y1, x2, y2 = calculate_input_offsets(x1, y1, x2, y2, model, l)

            # Calculate start and end points with overlap
            start_h = max(y1 - overlap, 0)
            end_h = min(y2 + overlap, H)
            start_w = max(x1 - overlap, 0)
            end_w = min(x2 + overlap, W)
            partition = image[:, :, start_h:end_h, start_w:end_w]
            partitions.append(partition)

    return partitions

This code defines a modified ResNet18 model, ModifiedResNet18, by excluding the last two layers (average pooling and fully connected) from the original model. It inherits from nn.Module, initializes a nn.Sequential module (self.features) with the modified layers, and provides a forward method for applying these layers to input tensors. This modification enables obtaining intermediate feature maps from the ResNet18 model.

In [428]:
# Modified ResNet Model to allow intermediate outputs
class ModifiedResNet18(nn.Module):
    def __init__(self, original_model):
        super(ModifiedResNet18, self).__init__()
        self.features = nn.Sequential(*list(original_model.children())[:-2])

    def forward(self, x):
        x = self.features(x)
        return x


The code defines a class called RemainingLayers, which is a module inheriting from nn.Module. It takes an original_model as input during initialization and retains remaining layers such as the average pooling (avgpool) and fully connected (fc) layers from that model. The forward method applies the average pooling, flattens the output tensor, and passes it through the fully connected layer. The result is then returned. This class essentially captures the remaining layers after feature extraction in a given model.

In [429]:
class RemainingLayers(nn.Module):
    def __init__(self, original_model):
        super(RemainingLayers, self).__init__()
        self.avgpool = original_model.avgpool
        self.fc = original_model.fc

    def forward(self, x):
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

The code initializes an empty dictionary named intermediate_outputs and defines a function called forward_pass_with_reuse. This function takes the model and a list of input partitions. It iterates through the partitions, printing the shape of each partition for the sake of clarity, and performs a forward pass using the model. The function checks if the intermediate output for the current partition index (idx) is already stored in intermediate_outputs. If found, it reuses the stored output; otherwise, it computes the output using the model and stores it in the dictionary for potential future reuse. The function returns a list of outputs corresponding to each input partition.

In [430]:
intermediate_outputs = {}

# Modified function to perform forward pass and store intermediate outputs
def forward_pass_with_reuse(model, partitions):
    outputs = []
    for idx, partition in enumerate(partitions):
        print(f"Partition {idx} shape: {partition.shape}")
        if idx in intermediate_outputs:
            output = intermediate_outputs[idx]
        else:
            output = model(partition)
            intermediate_outputs[idx] = output
        outputs.append(output)
    return outputs


Next two snippets first checks whether a GPU is available and assigns the corresponding device. If the device is a GPU (cuda), it also prints GPU information using !nvidia-smi. If the device is a CPU (cpu), it retrieves information about CPU memory using the psutil library and prints the total, available, used, and percentage of CPU memory.

In [431]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

In [432]:
if device.type == 'cuda':
  !nvidia-smi
elif device.type == 'cpu':
  cpu_memory = psutil.virtual_memory()
  print(f"Total CPU Memory: {cpu_memory.total / (1024 ** 3):.2f} GB")
  print(f"Available CPU Memory: {cpu_memory.available / (1024 ** 3):.2f} GB")
  print(f"Used CPU Memory: {cpu_memory.used / (1024 ** 3):.2f} GB")
  print(f"CPU Memory Usage Percentage: {cpu_memory.percent:.2f}%")

Total CPU Memory: 12.68 GB
Available CPU Memory: 11.48 GB
Used CPU Memory: 0.92 GB
CPU Memory Usage Percentage: 9.40%


This code defines a function calculate_grid_size(device) to determine the grid size based on the available memory of the given computing device. For GPUs, it checks if the total memory is below a threshold and returns (2, 2) if true. For CPUs, it checks if the total memory is below a threshold and also returns (2, 2) if true. Otherwise, it defaults to a grid size of (4, 4). The function is designed for dynamic adjustment of the grid size based on available memory.

In [433]:
def calculate_grid_size(device):
    if device.type == 'cuda':
        gpu_memory = torch.cuda.get_device_properties(0).total_memory if device.type == 'cuda' else 0
        threshold_gpu_memory = 1200 * 1024**2 # Setting GPU threshold to 1200 MiB
        if gpu_memory < threshold_gpu_memory:
            return 2, 2
    elif device.type == 'cpu':
        cpu_memory = psutil.virtual_memory().total
        threshold_cpu_memory = 4 * 1024**3  # Setting CPU threshold to 4 GiB
        if cpu_memory < threshold_cpu_memory:
            return 2, 2

    # Default grid size
    return 4, 4

Next we load the pre-trained ResNet18 model, move it to the specified device, create a modified version using the ModifiedResNet18 class, and print the modified model to see the entire architecture.

In [434]:
original_model = models.resnet18(pretrained=True).to(device)
modified_model = ModifiedResNet18(original_model).to(device)
print(modified_model)

ModifiedResNet18(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_s

Finally the next code snippets downloads an image and sets up a modified ResNet18 model with intermediate output reuse. It uses torchvision to get the original and modified models, extracts layer properties, and preprocesses the image. The FTP strategy partitions the image, and a parallel model is created. The forward pass is performed, and the remaining layers process the mean output. The predicted class label is obtained using softmax.

In [435]:
# Get layer properties for the modified_model, skipping the first layer
layer_properties = get_layer_properties(modified_model)[1:]

In [436]:
pip install wget



In [437]:
import wget

image_url = 'https://media.istockphoto.com/id/877369086/photo/lion-panthera-leo-10-years-old-isolated-on-white.jpg?s=612x612&w=0&k=20&c=J__Jx_BX_FN7iehO965TJtPFYUl0A-bwFgIYaK32R3Y='
# image_url = 'https://i.guim.co.uk/img/media/c67da314f21e43b027db4fd9525ab4047cd5d358/76_188_1940_1164/master/1940.jpg?width=1200&height=900&quality=85&auto=format&fit=crop&s=76e6bdd3a91c0313c698cabd7c1e361f'
image_path = wget.download(image_url)

In [438]:
imagenet_labels_url = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
imagenet_labels = requests.get(imagenet_labels_url).json()

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])

modified_model.eval()
original_model.eval()

image = Image.open(image_path).convert("RGB")
image = transform(image)
image = Variable(image.unsqueeze(0))

overlap = 10
M, N = calculate_grid_size(device)
partitions = partition_image_with_ftp(image, M, N, modified_model)



# In case of using multiple edge devices/nodes with Nvidia GPUs. Further implementation is needed in addition to this.
# parallel_model = nn.parallel.DistributedDataParallel(
#     modified_model, device_ids=[torch.cuda.current_device()]
# )



# In case of using mutiple GPUs. Used to demonstrate the implementation in Colab
parallel_model = nn.DataParallel(modified_model)

# Test the model with the single transformed image
with torch.no_grad():
    outputs = forward_pass_with_reuse(parallel_model, partitions)

remaining_layers = RemainingLayers(original_model)
remaining_layers.eval()

mean_output = torch.stack(outputs).mean(dim=0)
final_output = remaining_layers(mean_output)

softmax = nn.Softmax(dim=1)
probabilities = softmax(final_output)

predicted_class = torch.argmax(probabilities, dim=1)
predicted_label = imagenet_labels[predicted_class.item()]

print(f"Predicted Class: {predicted_label}")

Partition 0 shape: torch.Size([1, 3, 75, 75])
Partition 1 shape: torch.Size([1, 3, 75, 95])
Partition 2 shape: torch.Size([1, 3, 75, 95])
Partition 3 shape: torch.Size([1, 3, 75, 76])
Partition 4 shape: torch.Size([1, 3, 95, 75])
Partition 5 shape: torch.Size([1, 3, 95, 95])
Partition 6 shape: torch.Size([1, 3, 95, 95])
Partition 7 shape: torch.Size([1, 3, 95, 76])
Partition 8 shape: torch.Size([1, 3, 95, 75])
Partition 9 shape: torch.Size([1, 3, 95, 95])
Partition 10 shape: torch.Size([1, 3, 95, 95])
Partition 11 shape: torch.Size([1, 3, 95, 76])
Partition 12 shape: torch.Size([1, 3, 76, 75])
Partition 13 shape: torch.Size([1, 3, 76, 95])
Partition 14 shape: torch.Size([1, 3, 76, 95])
Partition 15 shape: torch.Size([1, 3, 76, 76])
Predicted Class: lion


As previously stated, we displayed the dimensions of each partition. Each tensor includes four elements: [batch size, number of channels, height, width].