Pytorch Profiler used to trace the memory for CPU and GPU during training and inference.  

In [None]:
import torch
import torch.nn as nn
import torchvision.models as models
from torch.profiler import profile, record_function, ProfilerActivity

model = models.resnet18()
inputs = torch.randn(5, 3, 224, 224)

with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)
    
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))


register_buffer allows for parameters to be saved and restored in the state_dict, they will not be trained by the optimizer.

They won't be returned in model.parameters().


In [None]:
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.linear_layer = nn.Linear(1,1)
        
        const_val = torch.Tensor([2])
        self.register_buffer('constant', const_val, persistent=False)
    
    def forward(self, x):
        x = x * self.constant
        x = self.linear_layer(x)
        return x
model = Model()

model(torch.Tensor([2]))

storage() is used for finding the real storage of x in RAM which prints dtype and size
The storage object can be saved in a variable to see its type


In [None]:
x = torch.randint(0, 10, (3, 3))
print(x)
# x = x.as_strided(size=(4, 5), stride=(1,1))
x_t = x.transpose(0,1)
id(x_t.storage()) == id(x.storage())
print(x.storage(),  x_t.storage())


stride() is a property that determines how many elements should be skipped over in the storage array in order to get the next element in a given dimension in the ORIGINAL tensor, this means the tensor can be transposed and this property looks at the original tensor rather than the transposed tensor

on a 3 x 3 tensor
ex: tensor([[5, 7, 4],
            [1, 3, 2],
            [7, 3, 8]]) 
        
using stride=(3,1) offset=1 

[6,5,7,4,1,3,2,7,3,8]
 ^ = offset would move 1 column over to start
[6,5,7,4,1,3,2,7,3,8]
     ^=stride[1]=1 column over from 5 to 7
[6,5,7,4,1,3,2,7,3,8]
         ^=then move 3 rows from 5 to 1 


In [None]:
x = x.as_strided(size=(5,5), stride=(1,1))
print(x)
x = x.as_strided(size=(5,5), stride=(2,0))
print(x)


x.as_strided(size=(a,a), stride=(b,c))
This can be thought of as a flatten: 

tensor([57,1,75,62,97,61,19,85,40,28,90,19,58,40,6,78,39,36,45,62,37,77,16,79,75,10,84,22,23,66])

followed by selecting the shape[0](a) then a stride=c selecting the shape[0](a):

[[57,1,75,626,62,97,61], [1,75,626,62,97,61,19],...]

gather() is an advanced indexing function

pass indexing dimension and size of the output to gather() it is the equivalent to doing:
torch.Tensor([
    [tensor[index[0,0],0], tensor[index[0,1],1]],
    [tensor[index[1,0],0], tensor[index[1,1],1]] 
])

In [None]:
tensor = torch.Tensor([
[57,  1, 75, 62, 97, 61],
[19, 85, 40, 28, 90, 19],
[58, 40,  6, 78, 39, 36],
[45, 62, 37, 77, 16, 79],
[75, 10, 84, 22, 23, 66]
])
index = torch.Tensor([[1,2],[3,4]]).type(torch.int64)
tensor.gather(dim=0, index=index)