# Description

This notebook calculates the memory trace for different variants of the autoencoder model that I had experimented with. 

* The code calculates the memory blocks occupied while performing data loading and encoding. 

* The codes were run on a Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz, which has 8 cores.

* Since the code is executed in jupyter, there could be memory overhead due to IPython too, which uniformly affects all the scripts. So the allocation should be considered relatively while drawing any conclusion.


***Please note that to calculate the trace for a new model, the notebook should be restarted and a fresh model should be loaded. Re-runing cells or executing same commands within the same kernel changes the memory allocations.***

### Load packages

In [1]:
import sys
BIN = 'utils/'
sys.path.append(BIN)

import tracemalloc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.utils.data
from torch.autograd import Variable

from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset

import fastai
from fastai.callbacks import ActivationStats
from fastai import basic_train, basic_data

import matplotlib as mpl
import my_matplotlib_style as ms
mpl.rc_file(BIN + 'my_matplotlib_rcparams')
%matplotlib inline

In [2]:
from nn_utils import AE_3D_200, AE_bn_ELU, AE_bn_LeakyReLU


### Check memory trace for loading the data and performing normalization

In [3]:
tracemalloc.start()

train = pd.read_pickle('../datasets/non_normalized_train_4D_100_percent').astype(np.float32)
test = pd.read_pickle('../datasets/non_normalized_test_4D_100_percent').astype(np.float32)

# Perform normalization (using standard normalization here)
train_mean = train.mean()
train_std = train.std()

train = (train - train_mean) / train_std
test = (test - train_mean) / train_std

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
tracemalloc.stop()

# Print the total memory allocation traced
stat = top_stats
total_size = 0
for s in stat:
    total_size += s.size/(1024*1024)
print("Total memory allocation: {:.4f}".format(total_size) + " Megabytes")
print()

# Pick the 2 biggest objects
stat = top_stats[0:2]
for s in stat:
    print(str(s.traceback) + "; Size: {:.4f}".format(s.size/(1024*1024)) + " Megabytes")

Total memory allocation: 3.6679 Megabytes

/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/pandas/core/internals/managers.py:1848; Size: 2.1322 Megabytes
/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/pandas/io/pickle.py:181; Size: 1.0688 Megabytes


### Prepare data and model to find model memory trace

In [4]:
train_x = train
test_x = test
train_y = train_x  
test_y = test_x

train_ds = TensorDataset(torch.tensor(train_x.values), torch.tensor(train_y.values))
valid_ds = TensorDataset(torch.tensor(test_x.values), torch.tensor(test_y.values))

def get_data(train_ds, valid_ds, bs):
    return (
        DataLoader(train_ds, batch_size=bs, shuffle=True),
        DataLoader(valid_ds, batch_size=bs * 2),
    )

train_dl, valid_dl = get_data(train_ds, valid_ds, bs=256)

------------------

## Base model with 7 layers, tanh activation, no batch-norm

In [5]:
model = AE_3D_200()#AE_bn_ELU([4,200,100,50,3,50,100,200,4])
loss_func = nn.MSELoss()
bn_wd = False  
true_wd = True 
wd = 1e-6

In [6]:
# Prepare database for training
db = basic_data.DataBunch(train_dl, valid_dl)

#Initialize the trainer
learn = basic_train.Learner(data=db, model=model, loss_func=loss_func, wd=wd, callback_fns=ActivationStats, bn_wd=bn_wd, true_wd=true_wd)

### Load the model

In [7]:
learn.load('AE_3D_200_no1cycle_std_norm')

model.to('cpu')

AE_3D_200(
  (en1): Linear(in_features=4, out_features=200, bias=True)
  (en2): Linear(in_features=200, out_features=100, bias=True)
  (en3): Linear(in_features=100, out_features=50, bias=True)
  (en4): Linear(in_features=50, out_features=3, bias=True)
  (de1): Linear(in_features=3, out_features=50, bias=True)
  (de2): Linear(in_features=50, out_features=100, bias=True)
  (de3): Linear(in_features=100, out_features=200, bias=True)
  (de4): Linear(in_features=200, out_features=4, bias=True)
  (tanh): Tanh()
)

### Check memory trace for encoding and decoding the entire test set
The test set contains 27945 samples. The following code calculates the memory trace of the encoding operation of the model. It prints the total memory space occupied and the top few objects memory-allocation-wise.

### Encoding 

In [8]:
number_of_events = torch.tensor(test.values).size()[0]
print("Number of events: " + str(number_of_events))

tracemalloc.start()

compressed = learn.model.encode(torch.tensor(test.values)).detach().numpy()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')

# Print the total memory allocation traced
stat = top_stats
total_size = 0
for s in stat:
    total_size += s.size/(1024*1024)
print("Total memory allocation: {:.4f}".format(total_size) + " Megabytes")
print()

# Pick the 2 objects according to their memory block allocation
for i in range(2):
    stat = top_stats[i]
    print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
    print([line for line in stat.traceback.format()])

Number of events: 27945
Total memory allocation: 0.0024 Megabytes

1 memory blocks: 0.5 KiB
['  File "utils/nn_utils.py", line 126', '    m1 = self.en1(x)']
1 memory blocks: 0.4 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532', '    result = self.forward(*input, **kwargs)']


### Decoding

In [9]:
number_of_events = torch.tensor(test.values).size()[0]
print("Number of events: " + str(number_of_events))

tracemalloc.start()

reconstructed = learn.model.decode(torch.tensor(compressed)).detach().numpy()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')

# Print the total memory allocation traced
stat = top_stats
total_size = 0
for s in stat:
    total_size += s.size/(1024*1024)
print("Total memory allocation: {:.4f}".format(total_size) + " Megabytes")
print()

# Pick the 2 objects according to their memory block allocation
for i in range(2):
    stat = top_stats[i]
    print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
    print([line for line in stat.traceback.format()])

Number of events: 27945
Total memory allocation: 0.1780 Megabytes

1330 memory blocks: 136.3 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/linecache.py", line 137', '    lines = fp.readlines()']
308 memory blocks: 15.0 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/IPython/core/compilerop.py", line 101', '    return compile(source, filename, symbol, self.flags | PyCF_ONLY_AST, 1)']


-------- 

## 7 layer model with LeakyReLU and batch-norm

### Prepare data and model to find model memory trace

In [5]:
model = AE_bn_LeakyReLU([4,200,100,50,3,50,100,200,4])
loss_func = nn.MSELoss()
bn_wd = False  
true_wd = True 
wd = 1e-6

In [6]:
# Prepare database for training
db = basic_data.DataBunch(train_dl, valid_dl)

#Initialize the trainer
learn = basic_train.Learner(data=db, model=model, loss_func=loss_func, wd=wd, callback_fns=ActivationStats, bn_wd=bn_wd, true_wd=true_wd)

### Load the model

In [7]:
learn.load('AE_3D_200_ReLU_BN_custom_norm')

model.to('cpu')

AE_bn_LeakyReLU(
  (encoder): Sequential(
    (0): Linear(in_features=4, out_features=200, bias=True)
    (1): LeakyReLU(negative_slope=0.01)
    (2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Linear(in_features=200, out_features=100, bias=True)
    (4): LeakyReLU(negative_slope=0.01)
    (5): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): Linear(in_features=100, out_features=50, bias=True)
    (7): LeakyReLU(negative_slope=0.01)
    (8): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): Linear(in_features=50, out_features=3, bias=True)
    (10): LeakyReLU(negative_slope=0.01)
    (11): BatchNorm1d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (decoder): Sequential(
    (0): Linear(in_features=3, out_features=50, bias=True)
    (1): LeakyReLU(negative_slope=0.01)
    (2): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_

### Encoding 

In [8]:
number_of_events = torch.tensor(test.values).size()[0]
print("Number of events: " + str(number_of_events))

tracemalloc.start()

compressed = learn.model.encode(torch.tensor(test.values)).detach().numpy()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')

# Print the total memory allocation traced
stat = top_stats
total_size = 0
for s in stat:
    total_size += s.size/(1024*1024)
print("Total memory allocation: {:.4f}".format(total_size) + " Megabytes")
print()

# Pick the 2 objects according to their memory block allocation
for i in range(2):
    stat = top_stats[i]
    print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
    print([line for line in stat.traceback.format()])

Number of events: 27945
Total memory allocation: 0.0076 Megabytes

4 memory blocks: 1.8 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532', '    result = self.forward(*input, **kwargs)']
3 memory blocks: 0.6 KiB
['  File "<ipython-input-8-416b51b8dfab>", line 6', '    compressed = learn.model.encode(torch.tensor(test.values)).detach().numpy()']


### Decoding

In [10]:
number_of_events = torch.tensor(test.values).size()[0]
print("Number of events: " + str(number_of_events))

tracemalloc.start()

reconstructed = learn.model.decode(torch.tensor(compressed)).detach().numpy()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')

# Print the total memory allocation traced
stat = top_stats
total_size = 0
for s in stat:
    total_size += s.size/(1024*1024)
print("Total memory allocation: {:.4f}".format(total_size) + " Megabytes")
print()

# Pick the 2 objects according to their memory block allocation
for i in range(2):
    stat = top_stats[i]
    print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
    print([line for line in stat.traceback.format()])

Number of events: 27945
Total memory allocation: 0.2240 Megabytes

1550 memory blocks: 157.5 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/linecache.py", line 137', '    lines = fp.readlines()']
356 memory blocks: 18.3 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/IPython/core/compilerop.py", line 101', '    return compile(source, filename, symbol, self.flags | PyCF_ONLY_AST, 1)']


---------------- 

## 7 layer model with ELU and batch-norm

### Prepare data and model to find model memory trace

In [5]:
model = AE_bn_ELU([4,200,100,50,3,50,100,200,4])
loss_func = nn.MSELoss()
bn_wd = False  
true_wd = True 
wd = 1e-6

In [6]:
# Prepare database for training
db = basic_data.DataBunch(train_dl, valid_dl)

#Initialize the trainer
learn = basic_train.Learner(data=db, model=model, loss_func=loss_func, wd=wd, callback_fns=ActivationStats, bn_wd=bn_wd, true_wd=true_wd)

### Load the model

In [7]:
learn.load('AE_3D_200_ELU_BN_custom_norm')

model.to('cpu')

AE_bn_ELU(
  (encoder): Sequential(
    (0): Linear(in_features=4, out_features=200, bias=True)
    (1): ELU(alpha=1.0)
    (2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Linear(in_features=200, out_features=100, bias=True)
    (4): ELU(alpha=1.0)
    (5): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): Linear(in_features=100, out_features=50, bias=True)
    (7): ELU(alpha=1.0)
    (8): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): Linear(in_features=50, out_features=3, bias=True)
    (10): ELU(alpha=1.0)
    (11): BatchNorm1d(3, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (decoder): Sequential(
    (0): Linear(in_features=3, out_features=50, bias=True)
    (1): ELU(alpha=1.0)
    (2): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Linear(in_features=50, out_features=100, bias=True)
    (

### Encoding 

In [8]:
number_of_events = torch.tensor(test.values).size()[0]
print("Number of events: " + str(number_of_events))

tracemalloc.start()

compressed = learn.model.encode(torch.tensor(test.values)).detach().numpy()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')

# Print the total memory allocation traced
stat = top_stats
total_size = 0
for s in stat:
    total_size += s.size/(1024*1024)
print("Total memory allocation: {:.4f}".format(total_size) + " Megabytes")
print()

# Pick the 2 objects according to their memory block allocation
for i in range(2):
    stat = top_stats[i]
    print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
    print([line for line in stat.traceback.format()])

Number of events: 27945
Total memory allocation: 0.0064 Megabytes

3 memory blocks: 1.4 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532', '    result = self.forward(*input, **kwargs)']
1 memory blocks: 0.6 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 107', '    exponential_average_factor, self.eps)']


### Decoding

In [9]:
number_of_events = torch.tensor(test.values).size()[0]
print("Number of events: " + str(number_of_events))

tracemalloc.start()

reconstructed = learn.model.decode(torch.tensor(compressed)).detach().numpy()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')

# Print the total memory allocation traced
stat = top_stats
total_size = 0
for s in stat:
    total_size += s.size/(1024*1024)
print("Total memory allocation: {:.4f}".format(total_size) + " Megabytes")
print()

# Pick the 2 objects according to their memory block allocation
for i in range(2):
    stat = top_stats[i]
    print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
    print([line for line in stat.traceback.format()])

Number of events: 27945
Total memory allocation: 0.1945 Megabytes

1381 memory blocks: 148.3 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/linecache.py", line 137', '    lines = fp.readlines()']
309 memory blocks: 15.0 KiB
['  File "/home/honey/miniconda3/envs/pytorch/lib/python3.6/site-packages/IPython/core/compilerop.py", line 101', '    return compile(source, filename, symbol, self.flags | PyCF_ONLY_AST, 1)']
