# 1. Implement average pooling through a convolution.

In [6]:
import torch
import torch.nn.functional as F

def average_pooling_conv(input_tensor, pool_size):
    batch_size, channels, height, width = input_tensor.size()
    kernel_size = (pool_size, pool_size)
    stride = (pool_size, pool_size)
    
    # Define the average pooling kernel
    kernel = torch.ones((channels, 1, pool_size, pool_size)) / (pool_size * pool_size)
    
    # Apply the convolution operation with average pooling kernel
    output_tensor = F.conv2d(input_tensor, kernel, stride=stride, padding=0, groups=channels)
    
    return output_tensor

# Example usage
input_tensor = torch.randn(1, 1, 6, 6)  # Batch size of 1, 3 channels, 6x6 input
pool_size = 2
output_tensor = average_pooling_conv(input_tensor, pool_size)
print(input_tensor)
print(output_tensor)

tensor([[[[-0.2548, -0.9004,  1.2047, -0.3124,  0.6214,  0.5526],
          [-0.0407, -0.2220, -0.1048, -0.3434, -0.2535,  2.1091],
          [ 0.7223, -0.4832, -1.2391,  0.2195, -1.2479,  0.7798],
          [-0.7320, -1.7425,  0.1385,  1.4043,  0.1163, -0.2561],
          [-0.5119, -0.8785, -0.7798, -1.1799, -1.0041,  0.3349],
          [-0.4481, -0.0313, -1.8601, -0.4983,  0.5341, -0.3495]]]])
tensor([[[[-0.3545,  0.1110,  0.7574],
          [-0.5589,  0.1308, -0.1520],
          [-0.4674, -1.0795, -0.1211]]]])


# 2. Prove that max-pooling cannot be implemented through a convolution alone.

Max-pooling cannot be implemented through a convolution alone because max-pooling involves a non-linear operation that selects the maximum value within a pooling window, while convolution is a linear operation that computes a weighted sum of values within a kernel window. The non-linearity of max-pooling is essential for its behavior, and it cannot be achieved solely through linear convolution.

# 3. Max-pooling can be accomplished using ReLU operations, i.e., $ReLU(x)=max(0,x)$.

## 3.1 Express $max(a,b)$ by using only ReLU operations.

$max(a,b) = ReLU(a-b)+b$

## 3.2 Use this to implement max-pooling by means of convolutions and ReLU layers.



## 3.3 How many channels and layers do you need for a 
 convolution? How many for a 
 convolution?



# 4. What is the computational cost of the pooling layer? Assume that the input to the pooling layer is of size 
, the pooling window has a shape of 
 with a padding of 
 and a stride of 
.



# 5. Why do you expect max-pooling and average pooling to work differently?



# 6. Do we need a separate minimum pooling layer? Can you replace it with another operation?



# 7. We could use the softmax operation for pooling. Why might it not be so popular?