<img src="https://miro.medium.com/max/1400/1*7oukapIBInsovpHkQB3QZg.jpeg" alt="Google Colab" width="500" height="600">


# <center>PLUG AND PLAY</center>

[Image credits](https://www.kdnuggets.com/2020/06/google-colab-deep-learning.html)

# Table of contents <a id='0.1'></a>

* [Introduction](#0)
* [STEP 1 ](#1)
* [STEP 2 ](#2)
* [STEP 3 ](#3)
* [STEP 4 ](#4)
* [STEP 5 ](#5)
* [Checkpoint](#6)
* [Full Notebook Overview with checkpoint](#7)
* [Problems Faced](#8)
* [Reference](#9)

# <a id='0'>Introduction</a>
[Table of contents](#0.1)

**Training on your local machine has its own perks. When it comes to train neural networks with huge dataset most of us find it very hard, especially, beginners and many of us Kagglers do not have access to high end hardware machines. Training takes huge time and hard work. Here in this notebook I will demonstrate how easily you can train your model with huge batch size. So let's get started...**

**COPY AND PASTE CODE FROM THIS NOTEBOOK TO YOUR COLAB NOTEBOOK AND RUN THE CELLS. YOUR OUTPUT SHOULD LOOK SIMILAR TO THE IMAGES I HAVE PROVIDED BELOW IN EACH STEP**.

**IT TOOK ME 7 DAYS TO TRAIN WITH SUCH A HUGE BATCH SIZE OF 64 USING COLAB. TRAINING DURATION DEPENDS ON SIZE OF YOUR IMAGE TOO.**

**PLEASE DO READ THIS NOTEBOOK ONCE BEFORE FOLLOWING STEPS MENTIONED IN THIS NOTEBOOK**.

**FOR FULL OVERVIEW OF CODE. CLICK [HERE](#7).**

### **We are going to use bigger batch size of 64 for training. This will take long time. It took me 7 days to complete training for 30,000 steps (9 hours per day). Don't let colab disconnect for more than 10 minutes. We will be using checkpoint to resume training to save our progress and hard work. Let's move ahead.**

### **I HAVE ALSO MENTIONED ABOUT THE PROBLEM I FACED WHILE TRAINING [HERE](#9)**

In [None]:
import os
from IPython.display import Image

# IMAGE DIR
PATH = '../input/lyftl5googlecolab/'

# <a id='1'>STEP 1</a>
[Table of contents](#0.1)

First and the most important thing you need to add [curlwget](https://chrome.google.com/webstore/detail/curlwget/jmocjfidanebdlinpbcdkcmgdifblncg?hl=en) extention from chrome web store. Then you can see it in top right corner of your browser. You will see how easy it is to download huge datasets from **kaggle** in just few minutes. 

### **In the image below you can see the icon of curlwget with a > sign on top right corner of the browser with other extensions**.

In [None]:
Image(os.path.join(PATH,'curl.png'))

# <a id='2'>STEP 2</a>
[Table of contents](#0.1)

First we need to create separate directory for our project. You will be creating this inside host machine. Also, Mount your google drive. Copy and paste below code in your colab notebook cell.

```
    
import os 

os.makedirs('/content/lyft-motion-prediction-autonomous-vehicles/', exist_ok=True)
os.chdir('/content/lyft-motion-prediction-autonomous-vehicles/')

%cd '/content/lyft-motion-prediction-autonomous-vehicles/'
!pwd
```

In [None]:
Image(filename='../input/lyftl5googlecolab/Screenshot (117).png')

# <a id='3'>STEP 3</a>
[Table of contents](#0.1)

Paste this code cell in your colab notebook but don't run it. We need to generate link so that you can download the dataset (NOT WHOLE SINCE COLAB HAS LIMITED SPACE).

```
os.makedirs('/content/lyft-motion-prediction-autonomous-vehicles/scenes/', exist_ok=True)
os.chdir('/content/lyft-motion-prediction-autonomous-vehicles/scenes/')

!wget --header="Host: storage.googleapis.com" --header="User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" --header="Accept: .........3Dscenes.zip" -c -O 'scenes.zip'
!unzip ./scenes.zip   
    ```

# <a id='4'>STEP 4</a>
[Table of contents](#0.1)

* Now click on scenes folder in the [data section](https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/data). 
* You will see the download icon. Click on it.
* Your files will start downloading. 

In [None]:
Image(os.path.join(PATH, 'scene.png'))

* As it starts downloading cancel the download and click on curlwget icon. You will see the link click on it. Copy it and paste it in your colab (replace with wget link [here](#3) in STEP 3 with the wget link you copied). 

In [None]:
Image(os.path.join(PATH, 'scene_curl.png'))

* run the code cell in given in [STEP 3](#3) inside your colab notebook.
* This will create **scenes** folder inside your **[lyft-motion-prediction-autonomous-vehicles](#2)** root directory.

In [None]:
Image(os.path.join(PATH, 'download.png'))

* As you can see **scene folder is created inside your root directory**.
* See in the image below it will start unzipping your scene data after downloading it. This step will take some time so please wait for its completion.

In [None]:
Image(os.path.join(PATH, 'inf.png'))

* You will see warning message click on ignore and follow next steps. Check the below image.

In [None]:
Image(os.path.join(PATH, 'ignore.png'))

* This will almost fill up your storage and in order to proceed further you need to copy and paste below code in three separate cells. Here we are removing **scenes.zip** in order to free some space.

```
%cd '/content/lyft-motion-prediction-autonomous-vehicles/scenes/'
!pwd
!ls
```
```
%rm -r validate.zarr/
%rm -r sample.zarr/
%rm scenes.zip
```
```
%cd ..
!pwd
!ls
```

* After running the code cells your output should looks something similar to output in image given below.

In [None]:
Image(os.path.join(PATH, 'normal.png'))

* Copy the code given below and paste it your colab notebook cell. Don't run it we need to generate wget link first for **aerial_map** data.

```
os.makedirs('/content/lyft-motion-prediction-autonomous-vehicles/aerial_map/', exist_ok=True)
os.chdir('/content/lyft-motion-prediction-autonomous-vehicles/aerial_map/')

!wget --header="Host: storage.googleapis.com" --header="User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" --header="Accept: ... filename%3Daerial_map.zip" -c -O 'aerial_map.zip'
!unzip ./aerial_map.zip
```
<a id='0.2'></a>
* Now we will follow similar procejure to download **aerial_map** folder. Go to the [data section](https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/data) of the competition page. You need to click on aerial_map folder and then click on the download icon and your downloading will start. As soon as your download starts cancel it then click on curlwget icon copy the link and paste it here (replace with wget line in the code cell above) and run the cell. It will create **aerial_map** folder inside your **[lyft-motion-prediction-autonomous-vehicles](#2)** root directory. Check the image below to get the idea.

In [None]:
Image(os.path.join(PATH, 'aerial.png'))

* After the cell execute copy and paste these two code cells in your colab notebook.

```
%rm -r aerial_map.zip
```
* After running this code cell below you will see aerial_map and scenes folder inside your **[lyft-motion-prediction-autonomous-vehicles](#2)** root directory. See in the image below for reference.

```
%cd ..
!pwd
!ls
```

In [None]:
Image(os.path.join(PATH, 'aerial_dir.png'))

* Copy and paste the code cell below in your notebook. We need to generate wget link before moving ahead. Follow the same steps as mentioned [above](#0.2) and paste (replace) the link with one given in !wget line and run the cell. This will create **semantic_map** folder inside your **[lyft-motion-prediction-autonomous-vehicles](#2)** root directory.

```
os.makedirs('/content/lyft-motion-prediction-autonomous-vehicles/semantic_map/', exist_ok=True)
os.chdir('/content/lyft-motion-prediction-autonomous-vehicles/semantic_map/')

!wget --header="Host: storage.googleapis.com" --header="User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" --header="Accept: .......semantic_map.zip" -c -O 'semantic_map.zip'
!unzip ./semantic_map.zip
```

* Now copy the code below in your colab notebook.

```
%rm -r semantic_map.zip
%cd ..
!pwd
!ls
```

In [None]:
Image(os.path.join(PATH, 'fin.png'))

* Upload meta.json in our **[lyft-motion-prediction-autonomous-vehicles](#2)** root directory.

## **IF EVERYTHING GOES RIGHT YOUR FILE STRUCTURE SHOULD LOOK LIKE THIS**

In [None]:
Image(os.path.join(PATH, 'structure.png'))

* Now after fetching all the necessary files and arranging them properly in your colab. We are ready to train our model. Copy and paste this code there in your colab and run it. This will install the L5Kit dependencies and L5kit in your colab. 

```
# this script transports l5kit and dependencies
!pip -q install pymap3d==2.1.0 
!pip -q install protobuf==3.12.2 
!pip -q install transforms3d 
!pip -q install zarr 
!pip -q install ptable

!pip -q install --no-dependencies l5kit
```

In [None]:
Image(os.path.join(PATH, 'dependency.png'))

# <a id='5'>STEP 5</a>
[Table of contents](#0.1)

We need to upload  the **meta.json** file in our **[lyft-motion-prediction-autonomous-vehicles](#2)** root directory. After that copy and paste all the code cells in your notebook and execute step by step and run them.

```
# import packages
from google.colab import files
import numpy as np
import torch
import gc, os

from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision.models.resnet import resnet18, resnet34, resnet50
from torchvision.models.densenet import densenet121
from tqdm import tqdm
from typing import Dict

from l5kit.data import LocalDataManager, ChunkedDataset
from l5kit.dataset import AgentDataset, EgoDataset
from l5kit.rasterization import build_rasterizer
```

This is our root path.

```
INPUT_DIR = '/content/lyft-motion-prediction-autonomous-vehicles/'

```
# Configuration

Our configuration file. This is controlling certain parameters. Please keep **num_workers=0** otherwise you will see memory error in colab. Also, we will save our checkpoint after every 1000 steps to ensure that progress isn't lost.

```
cfg = {
    'format_version': 4,
    'model_params': {
        'model_architecture': 'resnet18',
        'history_num_frames': 10,
        'history_step_size': 1,
        'history_delta_time': 0.1,
        'future_num_frames': 50,
        'future_step_size': 1,
        'future_delta_time': 0.1
    },

    'raster_params': {
        'raster_size': [300, 300],
        'pixel_size': [0.5, 0.5],
        'ego_center': [0.25, 0.5],
        'map_type': 'py_semantic',
        'satellite_map_key': 'aerial_map/aerial_map.png',
        'semantic_map_key': 'semantic_map/semantic_map.pb',
        'dataset_meta_key': 'meta.json',
        'filter_agents_threshold': 0.5
    },
    
    'train_data_loader': {
        'key': 'scenes/train.zarr',
        'batch_size': 64,
        'shuffle': True,
        'num_workers': 0
    },
    
    'train_params': {
        'max_num_steps': 30000,
        'checkpoint_every_n_steps': 1000,
    }
}
```

```
# set env variable for data
os.environ["L5KIT_DATA_FOLDER"] = INPUT_DIR
dm = LocalDataManager(None)

# get config
print(cfg)
```
# Intialize dataset

```
train_cfg = cfg["train_data_loader"]

# Rasterizer
rasterizer = build_rasterizer(cfg, dm)

# Train dataset/dataloader
train_zarr = ChunkedDataset(dm.require(train_cfg["key"])).open()
train_dataset = AgentDataset(cfg, train_zarr, rasterizer)
train_dataloader = DataLoader(train_dataset,
                              shuffle=train_cfg["shuffle"],
                              batch_size=train_cfg["batch_size"],
                              num_workers=train_cfg["num_workers"])

print(train_dataset)
```

```
gc.collect()
```
# Model: resnet18

```
class LyftModel(nn.Module):
    
    def __init__(self, cfg: dict):
        super().__init__()
        
        self.backbone = resnet18(pretrained=True, progress=True)
        
        num_history_channels = (cfg["model_params"]["history_num_frames"] + 1) * 2
        num_in_channels = 3 + num_history_channels

        self.backbone.conv1 = nn.Conv2d(
            num_in_channels,
            self.backbone.conv1.out_channels,
            kernel_size=self.backbone.conv1.kernel_size,
            stride=self.backbone.conv1.stride,
            padding=self.backbone.conv1.padding,
            bias=False,
        )
        
        backbone_out_features = 512

        # X, Y coords for the future positions (output shape: Bx50x2)
        num_targets = 2 * cfg["model_params"]["future_num_frames"]

        self.head = nn.Sequential(
            # nn.Dropout(0.2),
            nn.Linear(in_features=backbone_out_features, out_features=4096),
        )

        self.logit = nn.Linear(4096, out_features=num_targets)
        
    def forward(self, x):
        x = self.backbone.conv1(x)
        x = self.backbone.bn1(x)
        x = self.backbone.relu(x)
        x = self.backbone.maxpool(x) 

        x = self.backbone.layer1(x)
        x = self.backbone.layer2(x)
        x = self.backbone.layer3(x)
        x = self.backbone.layer4(x)

        x = self.backbone.avgpool(x)
        x = torch.flatten(x, 1)
        
        x = self.head(x)
        x = self.logit(x)

        return x
```
# Intialize model 

```
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = LyftModel(cfg)
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Later we have to filter the invalid steps.
criterion = nn.MSELoss(reduction="none")
```

```
device
```
# Start Training

While training we will be saving model checkpoint along with epoch and optimizer state. We will be needing these three in order to resume our training process.

```
tr_it = iter(train_dataloader)

progress_bar = tqdm(range(cfg["train_params"]["max_num_steps"]))
losses_train = []

for itr in progress_bar:

    try:
        data = next(tr_it)
    except StopIteration:
        tr_it = iter(train_dataloader)
        data = next(tr_it)

    model.train()
    torch.set_grad_enabled(True)
    
    # Forward pass
    inputs = data["image"].to(device)
    target_availabilities = data["target_availabilities"].unsqueeze(-1).to(device)
    targets = data["target_positions"].to(device)
    
    outputs = model(inputs).reshape(targets.shape)
    loss = criterion(outputs, targets)

    # not all the output steps are valid, but we can filter them out from the loss using availabilities
    loss = loss * target_availabilities
    loss = loss.mean()

    # Backward pass
    optimizer.zero_grad()
    loss.backward() 
    optimizer.step()

    losses_train.append(loss.item())

    if (itr+1) % cfg['train_params']['checkpoint_every_n_steps'] == 0:
      torch.save({'epoch': itr + 1,
                  'model_state_dict': model.state_dict(),
                  'optimizer_state_dict': optimizer.state_dict()},
                 f'/content/drive/My Drive/Lyft L5 Motion Prediction/resnet18_300x300_model_state_{itr}.pth')       
    progress_bar.set_description(f"loss: {loss.item()} loss(avg): {np.mean(losses_train[-100:])}")
```

# <a id='6'>Checkpoint</a>
[Table of contents](#0.1)

This is the most important part in whole training procedure. You need to add this in your notebook if somehow it crashes to resume training from last checkpoint. Give the path to your .pth file and load the last successful checkpoint.

### **AlSO, YOU NEED TO MAKE CHANGES IN YOUR TRAINING LOOP IN THIS LINE. ADD epoch in range function.**

### **progress_bar = tqdm(range(epoch, cfg["train_params"]["max_num_steps"]))**

### **SEE THE FULL COMPLETE OVERVIEW BELOW TO GET THE IDEA.**

```
WEIGHT_FILE = '/content/drive/My Drive/PATH TO YOUR .pth file'
checkpoint = torch.load(WEIGHT_FILE, map_location=device)
epoch = checkpoint['epoch']
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
```

# <a id='7'>Full Notebook Overview with Checkpoint</a>
[Table of contents](#0.1)

You can take help from this section to check if everything is correct with respect to your colab notebook.
```
from google.colab import drive
drive.mount('/content/drive')
```

```
import os 

os.makedirs('/content/lyft-motion-prediction-autonomous-vehicles/', exist_ok=True)
os.chdir('/content/lyft-motion-prediction-autonomous-vehicles/')

%cd '/content/lyft-motion-prediction-autonomous-vehicles/'
!pwd
```

```
os.makedirs('/content/lyft-motion-prediction-autonomous-vehicles/scenes/', exist_ok=True)
os.chdir('/content/lyft-motion-prediction-autonomous-vehicles/scenes/')

!wget --header="Host: storage.googleapis.com" --header="User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" --header="Accept: ...........................Dscenes.zip" -c -O 'scenes.zip'
!unzip ./scenes.zip
```

```
%cd '/content/lyft-motion-prediction-autonomous-vehicles/scenes/'
!pwd
!ls
```

```
%rm -r validate.zarr/
%rm -r sample.zarr/
%rm scenes.zip
```

```
%cd ..
!pwd
!ls
```

```
os.makedirs('/content/lyft-motion-prediction-autonomous-vehicles/aerial_map/', exist_ok=True)
os.chdir('/content/lyft-motion-prediction-autonomous-vehicles/aerial_map/')

!wget --header="Host: storage.googleapis.com" --header="User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" --header="Accept: ................Daerial_map.zip" -c -O 'aerial_map.zip'
!unzip ./aerial_map.zip
```

```
%rm -r aerial_map.zip
```

```
%cd ..
!pwd
!ls
```

```
os.makedirs('/content/lyft-motion-prediction-autonomous-vehicles/semantic_map/', exist_ok=True)
os.chdir('/content/lyft-motion-prediction-autonomous-vehicles/semantic_map/')

!wget --header="Host: storage.googleapis.com" --header="User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36" --header="Accept: ..............semantic_map.zip" -c -O 'semantic_map.zip'
!unzip ./semantic_map.zip
```

```
%rm -r semantic_map.zip
%cd ..
!pwd
!ls
```

```
## this script transports l5kit and dependencies
!pip -q install pymap3d==2.1.0 
!pip -q install protobuf==3.12.2 
!pip -q install transforms3d 
!pip -q install zarr 
!pip -q install ptable

!pip -q install --no-dependencies l5kit
```

```
# import packages
from google.colab import files
import numpy as np
import torch
import gc, os

from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision.models.resnet import resnet18, resnet34, resnet50
from torchvision.models.densenet import densenet121
from tqdm import tqdm
from typing import Dict

from l5kit.data import LocalDataManager, ChunkedDataset
from l5kit.dataset import AgentDataset, EgoDataset
from l5kit.rasterization import build_rasterizer
```

```
INPUT_DIR = '/content/lyft-motion-prediction-autonomous-vehicles/'
```

```
cfg = {
    'format_version': 4,
    'model_params': {
        'model_architecture': 'resnet18',
        'history_num_frames': 10,
        'history_step_size': 1,
        'history_delta_time': 0.1,
        'future_num_frames': 50,
        'future_step_size': 1,
        'future_delta_time': 0.1
    },

    'raster_params': {
        'raster_size': [400, 400],
        'pixel_size': [0.5, 0.5],
        'ego_center': [0.25, 0.5],
        'map_type': 'py_semantic',
        'satellite_map_key': 'aerial_map/aerial_map.png',
        'semantic_map_key': 'semantic_map/semantic_map.pb',
        'dataset_meta_key': 'meta.json',
        'filter_agents_threshold': 0.5
    },
    
    'train_data_loader': {
        'key': 'scenes/train.zarr',
        'batch_size': 64,
        'shuffle': True,
        'num_workers': 0
    },
    
    'train_params': {
        'max_num_steps': 30000,
        'checkpoint_every_n_steps': 1000,
    }
}
```

```
# set env variable for data
os.environ["L5KIT_DATA_FOLDER"] = INPUT_DIR
dm = LocalDataManager(None)

# get config
print(cfg)
```

```
####################
#INITIALIZE DATASET#
####################

train_cfg = cfg["train_data_loader"]

# Rasterizer
rasterizer = build_rasterizer(cfg, dm)

# Train dataset/dataloader
train_zarr = ChunkedDataset(dm.require(train_cfg["key"])).open()
train_dataset = AgentDataset(cfg, train_zarr, rasterizer)
train_dataloader = DataLoader(train_dataset,
                              shuffle=train_cfg["shuffle"],
                              batch_size=train_cfg["batch_size"],
                              num_workers=train_cfg["num_workers"])

print(train_dataset)
```

```
gc.collect()
```

```
class LyftModel(nn.Module):
    
    def __init__(self, cfg: Dict):
        super().__init__()
        
        self.backbone = resnet18(pretrained=True, progress=True)
        
        num_history_channels = (cfg["model_params"]["history_num_frames"] + 1) * 2
        num_in_channels = 3 + num_history_channels

        self.backbone.conv1 = nn.Conv2d(
            num_in_channels,
            self.backbone.conv1.out_channels,
            kernel_size=self.backbone.conv1.kernel_size,
            stride=self.backbone.conv1.stride,
            padding=self.backbone.conv1.padding,
            bias=False,
        )
        
        backbone_out_features = 512

        # X, Y coords for the future positions (output shape: Bx50x2)
        num_targets = 2 * cfg["model_params"]["future_num_frames"]

        self.head = nn.Sequential(
            # nn.Dropout(0.2),
            nn.Linear(in_features=backbone_out_features, out_features=4096),
        )

        self.logit = nn.Linear(4096, out_features=num_targets)
        
    def forward(self, x):
        x = self.backbone.conv1(x)
        x = self.backbone.bn1(x)
        x = self.backbone.relu(x)
        x = self.backbone.maxpool(x) 

        x = self.backbone.layer1(x)
        x = self.backbone.layer2(x)
        x = self.backbone.layer3(x)
        x = self.backbone.layer4(x)

        x = self.backbone.avgpool(x)
        x = torch.flatten(x, 1)
        
        x = self.head(x)
        x = self.logit(x)

        return x
```

```
##################
#INITIALIZE MODEL#
##################

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = LyftModel(cfg)
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# Later we have to filter the invalid steps.
criterion = nn.MSELoss(reduction="none")
```

```
device
```
## **ADD THIS CELL TO RESUME TRAINING FROM LAST CHECKPOINT**

```
WEIGHT_FILE = '/content/drive/My Drive/Kaggle/Lyft L5 Motion Prediction/resnet18_400x400_model_state_15999.pth'
checkpoint = torch.load(WEIGHT_FILE, map_location=device)
epoch = checkpoint['epoch']
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
```

```
#################
# TRAINING LOOP #
#################

tr_it = iter(train_dataloader)

progress_bar = tqdm(range(epoch, cfg["train_params"]["max_num_steps"]))
losses_train = []

for itr in progress_bar:

    try:
        data = next(tr_it)
    except StopIteration:
        tr_it = iter(train_dataloader)
        data = next(tr_it)

    model.train()
    torch.set_grad_enabled(True)
    
    # Forward pass
    inputs = data["image"].to(device)
    target_availabilities = data["target_availabilities"].unsqueeze(-1).to(device)
    targets = data["target_positions"].to(device)
    
    outputs = model(inputs).reshape(targets.shape)
    loss = criterion(outputs, targets)

    # not all the output steps are valid, but we can filter them out from the loss using availabilities
    loss = loss * target_availabilities
    loss = loss.mean()

    # Backward pass
    optimizer.zero_grad()
    loss.backward() 
    optimizer.step()

    losses_train.append(loss.item())

    if (itr+1) % cfg['train_params']['checkpoint_every_n_steps'] == 0:
      torch.save({'epoch': itr + 1,
                  'model_state_dict': model.state_dict(),
                  'optimizer_state_dict': optimizer.state_dict()},
                 f'/content/drive/My Drive/Lyft L5 Motion Prediction/resnet18_300x300_model_state_{itr}.pth')       
    progress_bar.set_description(f"loss: {loss.item()} loss(avg): {np.mean(losses_train[-100:])}")
```

# <a id='8'>Problems Faced</a>
[Table of contents](#0.1)

* You will get GPU Cooldown message while training for more than 2 sessions after utilizing your free GPU Quota. In this case you can either wait for few hours or can access the notebook using another accountðŸ˜› by turning on the sharing.
* Your notebook may suddenly stop. This happends in very rare case but you can again reconnect and everything will work fine.
* Training with batch size of 64 takes lots of time. So please keep patience.
* While resuming training again keep in mind to load correct checkpoint and don't forget to delete checkpoints which are of no use to you.

# <a id='9'>Reference</a>
[Table of contents](#0.1)

* https://www.kaggle.com/pestipeti/pytorch-baseline-inference
* https://www.kaggle.com/pestipeti/pytorch-baseline-train
* https://www.kaggle.com/pestipeti/lyft-l5kit-unofficial-fix