In [None]:
import numpy as np, pandas as pd

# Raster Size Ensembling

**This notebook serves as 1) a simple demo showing how to submit multi-mode submissions and 2) to talk about ensembling raster sizes. As you can read about [here](https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/178323) `raster_size` parameter changes how much of the scene the rasterized image shows: a larger raster size increases the region that the model sees during training. As such, it is an important hyperparameter. Does our model need to see agents from very far away? Or should we force it to focus on agents near it? Why don't we do both? We can train models on different raster sizes and then ensemble their predictions via the multi-mode submission ability of this competition**


# Notes

**All the pre-trained models are saved [here](https://www.kaggle.com/tuckerarrants/lyftpretrainedmodels), so you can just add this dataset to a Kaggle kernel and run inference with it, like I do [here](https://www.kaggle.com/tuckerarrants/lyft-inference-resnet18) to get the different raster size predictions that I am blending. You could also just download the output of this kernel if you want the prediction `.csv`.**

**Current LB scores:**
* (300 raster_size, pixel_size .5, 10 history_num_frames, 75000 steps ) - 129.99
* (450 raster_size, pixel_size .5, 10 history_num_frames, 25000 steps ) - 135.899
* (600 raster_size, pixel_size .2, 10 history_num_frames, 75000 steps ) - 121.555


**Defer to Peter's notebook [here](https://www.kaggle.com/pestipeti/pytorch-baseline-train) for further training specific details**

In [None]:
#steal multi mode template from sample sub
multi_mode_submission = pd.read_csv('../input/lyft-motion-prediction-autonomous-vehicles/multi_mode_sample_submission.csv')

#so we can set confidence intervals easily
cols = list(multi_mode_submission.columns)
confs = cols[2:5]
conf0 = cols[5:105]
conf1 = cols[105:205]
conf2 = cols[205:305]

In [None]:
sub0 = pd.read_csv('../input/lyftsubmissions/submission_r300_px.5_74999.csv')
sub1 = pd.read_csv('../input/lyftsubmissions/submission_r450_px.5_24999.csv')
sub2 = pd.read_csv('../input/lyftsubmissions/submission_r600_px.2_74999.csv')

**You can probably get better scores by changing the confidence intervals of the different submissions, or by training with your own raster size and ensembling that prediction in here to continue the experiment**

In [None]:
#change this to experiment with different ensemble weights
multi_mode_submission[confs] = [.3,.2,.5]

multi_mode_submission[conf0] = sub0[conf0]
multi_mode_submission[conf1] = sub1[conf0]
multi_mode_submission[conf2] = sub2[conf0]

In [None]:
#sanity check
multi_mode_submission[conf0].head()

In [None]:
#another sanity check
multi_mode_submission[conf1].head()

In [None]:
#okay last one
multi_mode_submission[conf2].head()

In [None]:
#save to .csv for submission
multi_mode_submission.to_csv('submission.csv', index = False, float_format='%.6g') 

# Model Architeture (For Reference)

**Below is the architeture of the model that was trained on different raster sizes:**

In [None]:
import torch.nn as nn
from typing import Dict

class LyftModel(nn.Module):
    
    def __init__(self, cfg: Dict):
        super().__init__()
        
        self.backbone = resnet18(pretrained=False)
        
        num_history_channels = (cfg["model_params"]["history_num_frames"] + 1) * 2
        num_in_channels = 3 + num_history_channels

        self.backbone.conv1 = nn.Conv2d(
            num_in_channels,
            self.backbone.conv1.out_channels,
            kernel_size=self.backbone.conv1.kernel_size,
            stride=self.backbone.conv1.stride,
            padding=self.backbone.conv1.padding,
            bias=False,
        )
        
        num_targets = 2 * cfg["model_params"]["future_num_frames"]

        self.head = nn.Sequential(
            nn.Linear(in_features=512, out_features=4096),
        )

        self.logit = nn.Linear(4096, out_features=num_targets)
        
    def forward(self, x):
        x = self.backbone.conv1(x)
        x = self.backbone.bn1(x)
        x = self.backbone.relu(x)
        x = self.backbone.maxpool(x)

        x = self.backbone.layer1(x)
        x = self.backbone.layer2(x)
        x = self.backbone.layer3(x)
        x = self.backbone.layer4(x)

        x = self.backbone.avgpool(x)
        x = torch.flatten(x, 1)
        
        x = self.head(x)
        x = self.logit(x)
        
        return x