Increasing the feature count of the input (i.e. in_channels) makes the filters deeper (`w0[:,:,0]`, `w0[:, :, 1]`) and increasing the feature count of the output (i.e. out_channels) increases the number of filters (Filter `w0`, `w1`). So, doubling the number of features actually quadruples the amount of computation.

![](./2-filters/2025-02-19%2016.24.21.gif)

Reference:
1. https://cs231n.github.io/convolutional-networks/#conv

In [1]:
import torch
import torch.nn as nn

# Define kernel size for demonstration
kernel_size = 3

# Original convolution layer:
# in_channels = 16, out_channels = 32.
conv_original = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=kernel_size)
params_original = conv_original.weight.numel()
print("Original conv weight shape:", conv_original.weight.shape)
print("Number of parameters in original conv:", params_original)

# Doubling both the in_channels and out_channels:
# in_channels becomes 32, out_channels becomes 64.
conv_doubled = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=kernel_size)
params_doubled = conv_doubled.weight.numel()
print("Doubled conv weight shape:", conv_doubled.weight.shape)
print("Number of parameters in doubled conv:", params_doubled)

# Show how the parameter count quadruples:
expected_params_doubled = 4 * params_original
print("Expected parameters (4x original):", expected_params_doubled)
print("Quadrupled computation:", params_doubled == expected_params_doubled)

Original conv weight shape: torch.Size([32, 16, 3, 3])
Number of parameters in original conv: 4608
Doubled conv weight shape: torch.Size([64, 32, 3, 3])
Number of parameters in doubled conv: 18432
Expected parameters (4x original): 18432
Quadrupled computation: True


In [None]:
import torch
import torch.nn as nn

kernel_size = 3

# --- Demonstration of "makes the filters deeper" ---
# Here, each filter must cover each input channel.
# Original convolution with 3 input channels (e.g., an RGB image).
conv_standard = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=kernel_size)
print("Standard filter shape (3 input channels, 16 filters):", conv_standard.weight.shape)
# Shape: (16, 3, 3, 3)

# Increasing the number of input channels to 6. 
# Now, each filter becomes deeper to cover all 6 input channels.
conv_deeper_filters = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=kernel_size)
print("Deeper filter shape (6 input channels, 16 filters):", conv_deeper_filters.weight.shape)
# Shape: (16, 6, 3, 3)

# --- Demonstration of "increases the number of filters" ---
# Here, we increase the number of output channels.
# With 3 input channels and 16 output channels, we have 16 filters.
conv_more_filters = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=kernel_size)
print("More filters shape (3 input channels, 32 filters):", conv_more_filters.weight.shape)
# Shape: (32, 3, 3, 3)

Standard filter shape (3 input channels, 16 filters): torch.Size([16, 3, 3, 3])
Deeper filter shape (6 input channels, 16 filters): torch.Size([16, 6, 3, 3])
More filters shape (3 input channels, 32 filters): torch.Size([32, 3, 3, 3])


# Depthwise Separable Convolution

![](./2-filters/depthwise.png)
![](./2-filters/depthwise-separable-convolution.png)

Reference:
1. https://www.youtube.com/watch?v=vVaRhZXovbw

In [None]:
import torch
import torch.nn as nn

kernel_size = 3

# --- Depthwise Convolution ---
# Here, in_channels=4 with groups=4 means each channel is convolved independently.
depthwise_conv = nn.Conv2d(in_channels=4, out_channels=4, kernel_size=kernel_size, groups=4)
print("Depthwise conv weight shape:", depthwise_conv.weight.shape)
# Expected shape: (4, 1, 3, 3) – one filter per input channel
#                 (number of filters, filter depth, kernel height, kernel width)

# --- Pointwise Convolution ---
# A 1x1 convolution that mixes the channels, increasing the number of filters.
# This example mixes 4 channels and produces 16 output channels.
pointwise_conv = nn.Conv2d(in_channels=4, out_channels=16, kernel_size=1)
print("Pointwise conv weight shape:", pointwise_conv.weight.shape)
# Expected shape: (16, 4, 1, 1)

Depthwise conv weight shape: torch.Size([4, 1, 3, 3])
Pointwise conv weight shape: torch.Size([16, 4, 1, 1])


In [27]:
import torch
import torch.nn as nn

kernel_size = 3

# --- Standard Convolution ---
standard_conv = nn.Conv2d(in_channels=8, out_channels=8, kernel_size=kernel_size)
print("Standard conv weight shape:", standard_conv.weight.shape)

# --- Depthwise Convolution ---
depthwise_conv = nn.Conv2d(in_channels=8, out_channels=8, kernel_size=kernel_size, groups=8)
print("Depthwise conv weight shape:", depthwise_conv.weight.shape)

# --- Pointwise Convolution ---
pointwise_conv = nn.Conv2d(in_channels=8, out_channels=8, kernel_size=1)
print("Pointwise conv weight shape:", pointwise_conv.weight.shape)

Standard conv weight shape: torch.Size([8, 8, 3, 3])
Depthwise conv weight shape: torch.Size([8, 1, 3, 3])
Pointwise conv weight shape: torch.Size([8, 8, 1, 1])


In [32]:
import torch
import torch.nn as nn

kernel_size = 3
in_channels = 8
out_channels = 8

# --- Standard Convolution ---
# This applies out_channels independent filters each spanning all in_channels.
standard_conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size)
standard_params = standard_conv.weight.numel()

print("Standard Convolution:")
print("Weight shape:", standard_conv.weight.shape)
print("Total parameters:", standard_params)
# Weight shape: (8, 8, 3, 3)

# --- Depthwise Separable Convolution ---
# Step 1: Depthwise Convolution: groups = in_channels to convolve each channel independently.
depthwise_conv = nn.Conv2d(in_channels=in_channels, out_channels=in_channels, kernel_size=kernel_size, groups=in_channels)
depthwise_params = depthwise_conv.weight.numel()

# Step 2: Pointwise Convolution: 1x1 convolution to mix channels.
pointwise_conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1)
pointwise_params = pointwise_conv.weight.numel()

total_ds_params = depthwise_params + pointwise_params

print("\nDepthwise Separable Convolution:")
print("Depthwise weight shape:", depthwise_conv.weight.shape)
print("Pointwise weight shape:", pointwise_conv.weight.shape)
print("Total parameters:", total_ds_params)

# --- Comparison ---
print("\nParameter reduction factor: {:.2f}".format(standard_params / total_ds_params))

Standard Convolution:
Weight shape: torch.Size([8, 8, 3, 3])
Total parameters: 576

Depthwise Separable Convolution:
Depthwise weight shape: torch.Size([8, 1, 3, 3])
Pointwise weight shape: torch.Size([8, 8, 1, 1])
Total parameters: 136

Parameter reduction factor: 4.24
