# Preface
**In this notebook we would be looking at fine tuning an [EfficientNet model](https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html). EfficientNets are the class of models that work on computer vision tasks and have been implemented by Google. These models achieve state-of-art performance on computer vision tasks. Hence I decided to use it as the pretrained model for this task.**

**This notebook contains only the training part of the model. The inference and ensemble tasks should be carried out in seperate notebooks to iterate faster.**

**You could also look over my [ensemble notebook](https://www.kaggle.com/forwet/lyft-ensemble-and-submission) for a structured way to perform ensembling.**
 

**Version 2 Updates:**
1. raster_size - 300
2. max_num_steps - 500
3. train_batch_size - 12
4. train_num_workers - 4
5. model_save - On minimum training loss + At final iteration
6. WEIGHT_FILE - None


**Version 3 Updates**
1. pixel_size = 0.45
2. raster_size = 222 (calculated using pixel size)
3. epochs = 5
4. used [catalyst module](https://catalyst-team.github.io/catalyst/) for training of the model
5. trained the model on a subset of `12000` samples of the original dataset.
6. Validated the model on a subset of `500` samples.
7. Used function to get image size based on pixel size.

In [None]:
# Declaring the path to load efficientNet models.
import sys
sys.path.append('../input/efficientnet-pytorch/EfficientNet-PyTorch/EfficientNet-PyTorch-master')

In [None]:
#IMPORTS

# PyTorch
import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision.models.resnet import resnet50

# L5kit
from l5kit.configs import load_config_data
from l5kit.geometry import transform_points
from l5kit.rasterization import build_rasterizer
from l5kit.dataset import AgentDataset, EgoDataset
from l5kit.data import LocalDataManager, ChunkedDataset
from l5kit.evaluation.metrics import neg_multi_log_likelihood, time_displace
from l5kit.visualization import PREDICTED_POINTS_COLOR, TARGET_POINTS_COLOR, draw_trajectory
from l5kit.evaluation import write_pred_csv, compute_metrics_csv, read_gt_csv, create_chopped_dataset

# EfficientNet 
from efficientnet_pytorch import model as enet

# Catalyst module
from catalyst import dl
from catalyst.utils import metrics
from catalyst.dl import utils

# Miscellaneous
import os
import gc
import sys
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
from typing import Dict
import matplotlib.pyplot as plt
from tempfile import gettempdir
from prettytable import PrettyTable

# Configrations

In [None]:
# L5KIT'S CONFIGRATIONS

os.environ["L5KIT_DATA_FOLDER"] = "../input/lyft-motion-prediction-autonomous-vehicles"
dm = LocalDataManager()
cfg = {
        'model_params': {'model_architecture': 'efficientnet-b6',
          'history_num_frames': 0,
          'history_step_size': 1,
          'history_delta_time': 0.1,
          'future_num_frames': 50,
          'future_step_size': 1,
          'future_delta_time': 0.1},

        'raster_params': {'raster_size': [300, 300],
          'pixel_size': [0.33, 0.33],
          'ego_center': [0.25, 0.5],
          'map_type': 'py_semantic',
          'satellite_map_key': 'aerial_map/aerial_map.png',
          'semantic_map_key': 'semantic_map/semantic_map.pb',
          'dataset_meta_key': 'meta.json',
          'filter_agents_threshold': 0.5},

        'train_data_loader': {'key': 'scenes/train.zarr',
          'batch_size': 12,
          'shuffle': True,
          'num_workers': 4},

        "valid_data_loader":{"key": "scenes/validation.zarr",
                            "batch_size": 8,
                            "shuffle": False,
                            "num_workers": 4},
    
        }

**Using Peter's great explanation of `raster size` and `pixel size`, I built up a function to
calculate the `raster size` parameter from the given `pixel size`. Using the same values of velocity and time as Peter used in his explanation.**

In [None]:
def calc_img_size(px_size):
    return int(100/px_size)

In [None]:
# CONFIGRATIONS

WEIGHT_FILE = None # Model state_dict path of previously trained model
MODEL_NAME = "efficientnet-b0"
IMG_SIZE = calc_img_size(cfg["raster_params"]["pixel_size"][0])
VALIDATION = True # A hyperparameter you could use to toggle for validating the model

cfg["raster_params"]["raster_size"] = [IMG_SIZE, IMG_SIZE]

MODEL_NAME: It could be one of-
- "efficientnet-b0"
- "efficientnet-b1"
- "efficientnet-b2" 
- "efficientnet-b3"
- "efficientnet-b4" 
- "efficientnet-b5" 
- "efficientnet-b6"
- "efficientnet-b7" (Would likely to cause OOM due to larger architecture.)

IMG_SIZE: You can take either one of the following settings or try some other combinations-
- 300 (pixel size = 0.33)
- 224 (pixel size = 0.45)
- 267 (pixel size = 0.38)
- 245 (pixel size = 0.41)

**More info over `raster_size` and `pixel_size` can be found out [here](https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/178323).** 

# Utility Scripts

In [None]:
%time
def build_model(cfg) -> torch.nn.Module:
    """Creates an instance of the pretrained model with custom input and output"""
    model = enet.EfficientNet.from_name(MODEL_NAME)
    
    num_history_channels = (cfg["model_params"]["history_num_frames"] + 1) * 2
    num_in_channels = 3 + num_history_channels
    num_targets = 2*cfg["model_params"]["future_num_frames"]
    
    model._conv_stem = nn.Conv2d(
        num_in_channels,
        model._conv_stem.out_channels,
        kernel_size=model._conv_stem.kernel_size,
        stride=model._conv_stem.stride,
        padding=model._conv_stem.padding,
        bias=False
    )
    
    model._fc = nn.Linear(in_features=model._fc.in_features, out_features=num_targets)
    return model

def forward(data, model, device, criterion):
    """Forward Propogation function"""
    inputs = data["image"].to(device)
    target_availabilities = data["target_availabilities"].unsqueeze(-1).to(device)
    targets = data["target_positions"].to(device)
    
    outputs = model(inputs)
    outputs = outputs.reshape(targets.shape)
    loss = criterion(outputs, targets)
    
    loss = loss * target_availabilities
    loss = loss.mean()

    # Disabling rmse loss pertaining the mse loss.
    # loss = torch.sqrt(loss) # Using RMSE loss
    return loss, outputs


def get_dataloader(config, zarr_data, subset_len, map_type="py_semantic"):
    """Creates DataLoader instance for the given dataset."""
    cfg["raster_params"]["map_type"] = map_type
    rasterizer = build_rasterizer(cfg, dm)
    chunk_data = ChunkedDataset(zarr_data).open()
    agent_data = AgentDataset(cfg, chunk_data, rasterizer)
    
    # Sample the dataset
    subset_data = torch.utils.data.Subset(agent_data, range(0, subset_len))
    
    dataloader = DataLoader(subset_data, 
                            batch_size=config["batch_size"],
                            num_workers=config["num_workers"],
                            shuffle=config["shuffle"]
                           )
    return dataloader

In [None]:
%time
def train(opt=None, criterion=None, lrate=1e-2):
        """Function for training the model"""
        print("Building Model...")
        device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        model = build_model(cfg).to(device)
        optimizer = optim.Adam(model.parameters(), lr=lrate) if opt is None else opt
        criterion = nn.MSELoss(reduction="none")
        
        if WEIGHT_FILE is not None:
            state_dict = torch.load(WEIGHT_FILE, map_location=device)
            model.load_state_dict(state_dict)
        
        print("Prepairing Dataloader...")
        train_dataloader = get_dataloader(cfg["train_data_loader"], dm.require("scenes/train.zarr"), 12000)
        
        if VALIDATION:
            valid_dataloader = get_dataloader(cfg["valid_data_loader"], dm.require("scenes/validate.zarr"), 500)
            
        print("Training...")
        loaders = {
                    "train": train_dataloader,
                    "valid": valid_dataloader
                }

        device = utils.get_device()
        runner = LyftRunner(device=device)
        
        runner.train(
                model=model,
                optimizer=optimizer,
                loaders=loaders,
                logdir="./logs",
                num_epochs=5,
                verbose=True,
                load_best_on_end=True
            )
        return model

class LyftRunner(dl.Runner):

    def predict_batch(self, batch):
        return self.model(batch[0].to(self.device).view(batch[0].size(0), -1))

    def _handle_batch(self, batch):
        x, y = batch['image'], batch['target_positions']
        y_hat = self.model(x).reshape(y.shape)
        target_availabilities = batch["target_availabilities"].unsqueeze(-1)
        criterion = torch.nn.MSELoss(reduction="none")
        loss = criterion(y_hat, y)
        loss = loss * target_availabilities
        loss = loss.mean()
        self.batch_metrics.update(
            {"loss": loss}
        )

        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

# Visualizing images

**Let's just have a look at how the image would be formed using the current settings for `raster_size` and `pixel_size`.**
> The more you decrease the `pixel_size`, the more the image will be zoomed-in. 

In [None]:
# Testing pixel_size and raster_size START 

In [None]:
# Preparing the EgoDataset from sample zarr file.
sample_zarr = dm.require("scenes/sample.zarr")
sample_chunk = ChunkedDataset(sample_zarr).open()
rasterizer = build_rasterizer(cfg, dm)
sample_ego = EgoDataset(cfg, sample_chunk, rasterizer)

In [None]:
fig, ax = plt.subplots(3, 3, figsize=(15,15))
ax = ax.flatten()
for i in range(9):
    idx = np.random.randint(500)
    data = sample_ego[idx]
    im = data["image"].transpose(1, 2, 0)
    im = sample_ego.rasterizer.to_rgb(im)
    data_positions = transform_points(data["target_positions"]+data["centroid"], data["world_to_image"])
    draw_trajectory(im, data_positions, data["target_yaws"], TARGET_POINTS_COLOR)
    ax[i].imshow(im[::-1])
plt.show()

In [None]:
# Testing END

# Train the model

In [None]:
# Use the above defined utility script to train the model.
model = train()

In [None]:
# Saving model on final iteration
torch.save(model.state_dict(), f"{MODEL_NAME}.pth")

<body>
    <p style="color:#bf190d;font-size:18px;">Hope you liked my work.</p>
    <p style="color:#bf19dd;font-size:18px;">Please upvote if you did.</p>
</body>

<body>
    <p style="color:#bf190d;font-size:15px;">I would be updating my <a href="https://www.kaggle.com/forwet/lyft-ensemble-and-submission">ensemble notebook</a> for evaluation of this model. Stay tuned!</p>
    <p style="color:#6a018a;font-size:15px;">If you've any queries please comment down below. Thanks!</p>

# References

- https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html
- https://arxiv.org/pdf/1905.11946.pdf
- https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/178323
- https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/180468
- https://www.kaggle.com/nxrprime/lyft-understanding-the-data-catalyst-training