# Acknowledgments

## For PID data

 - [Mikhail Hushchyn](https://www.hse.ru/org/persons/213369348)

## For SparseVD code

 - [Arsenii Ashukha](https://www.hse.ru/en/org/persons/204848606) from BayesLab for making code public: 
     - https://github.com/senya-ashukha/sparse-vd-pytorch

# About PID

There are six particle types: electron, proton, muon, kaon, pion and ghost. Ghost is a particle with other type than the first five or a detector noise. 

Different particle types remain different responses in the detector systems or subdetectors. Thre are five systems: tracking system, ring imaging Cherenkov detector (RICH), electromagnetic and hadron calorimeters, and muon system.

![pid](pic/pid.jpg)

## Download data

If you not on cocalc use these lines:

In [1]:
# !wget --no-check-certificate "https://onedrive.live.com/download?cid=D058F2D9E01D1496&resid=D058F2D9E01D1496%21107&authkey=AF7V2Rm2NTg2iDk" -O "training.csv.gz"
# training_file = './training.csv.gz'

If you on cocalc, use this one:

In [8]:
training_file = '~/share/pid_sparse/training.csv.gz'

## All necessary imports

In [9]:
!pip install tabulate

Defaulting to user installation because normal site-packages is not writeable




In [10]:
%matplotlib inline
import pandas
import numpy
import math
import torch
import numpy as np

import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
from torch.nn import Parameter

from logger import Logger
from torchvision import datasets, transforms
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss
import matplotlib.pyplot as plt
import seaborn as sns

label_class_correspondence = {'Electron': 0, 'Ghost': 1, 'Kaon': 2, 'Muon': 3, 'Pion': 4, 'Proton': 5}
class_label_correspondence = {0: 'Electron', 1: 'Ghost', 2: 'Kaon', 3: 'Muon', 4: 'Pion', 5: 'Proton'}


def get_class_ids(labels):
    """
    Convert particle type names into class ids.

    Parameters:
    -----------
    labels : array_like
        Array of particle type names ['Electron', 'Muon', ...].

    Return:
    -------
    class ids : array_like
        Array of class ids [1, 0, 3, ...].
    """
    return np.array([label_class_correspondence[alabel] for alabel in labels])

In [11]:
!nvidia-smi

Sat Jul 25 16:27:01 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.126.02   Driver Version: 418.126.02   CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00000001:00:00.0 Off |                    0 |
| N/A   51C    P0    69W / 149W |   2090MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000002:00:00.0 Off |                    0 |
| N/A   62C    P0    60W / 149W |     82MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000003:00:00.0 Off |                    

+-----------------------------------------------------------------------------+


In [12]:
device = torch.device('cuda:3')

# Load data

Load data used to train classifiers.

### Read training file

In [13]:
data = pandas.read_csv(training_file)
data.head()

Unnamed: 0,TrackP,TrackNDoFSubdetector2,BremDLLbeElectron,MuonLooseFlag,FlagSpd,SpdE,EcalDLLbeElectron,DLLmuon,RICHpFlagElectron,EcalDLLbeMuon,...,TrackNDoF,RICHpFlagMuon,RICH_DLLbeKaon,RICH_DLLbeElectron,HcalE,MuonFlag,FlagMuon,PrsE,RICH_DLLbeMuon,RICH_DLLbeProton
0,74791.156263,15.0,0.232275,1.0,1.0,3.2,-2.505719,6.604153,1.0,1.92996,...,28.0,1.0,-7.2133,-0.2802,5586.589846,1.0,1.0,10.422315,-2.081143e-07,-24.8244
1,2738.489989,15.0,-0.357748,0.0,1.0,3.2,1.864351,0.263651,1.0,-2.061959,...,32.0,1.0,-0.324317,1.707283,-7e-06,0.0,1.0,43.334935,2.771583,-0.648017
2,2161.409908,17.0,-999.0,0.0,0.0,-999.0,-999.0,-999.0,0.0,-999.0,...,27.0,0.0,-999.0,-999.0,-999.0,0.0,0.0,-999.0,-999.0,-999.0
3,15277.73049,20.0,-0.638984,0.0,1.0,3.2,-2.533918,-8.724949,1.0,-3.253981,...,36.0,1.0,-35.202221,-14.742319,4482.803707,0.0,1.0,2.194175,-3.070819,-29.291519
4,7563.700195,19.0,-0.638962,0.0,1.0,3.2,-2.087146,-7.060422,1.0,-0.995816,...,33.0,1.0,25.084287,-10.272412,5107.55468,0.0,1.0,1.5e-05,-5.373712,23.653087


### List of columns in the samples

Here, **Spd** stands for Scintillating Pad Detector, **Prs** - Preshower, **Ecal** - electromagnetic calorimeter, **Hcal** - hadronic calorimeter, **Brem** denotes traces of the particles that were deflected by detector.

- ID - id value for tracks (presents only in the test file for the submitting purposes)
- Label - string valued observable denoting particle types. Can take values "Electron", "Muon", "Kaon", "Proton", "Pion" and "Ghost". This column is absent in the test file.
- FlagSpd - flag (0 or 1), if reconstructed track passes through Spd
- FlagPrs - flag (0 or 1), if reconstructed track passes through Prs
- FlagBrem - flag (0 or 1), if reconstructed track passes through Brem
- FlagEcal - flag (0 or 1), if reconstructed track passes through Ecal
- FlagHcal - flag (0 or 1), if reconstructed track passes through Hcal
- FlagRICH1 - flag (0 or 1), if reconstructed track passes through the first RICH detector
- FlagRICH2 - flag (0 or 1), if reconstructed track passes through the second RICH detector
- FlagMuon - flag (0 or 1), if reconstructed track passes through muon stations (Muon)
- SpdE - energy deposit associated to the track in the Spd
- PrsE - energy deposit associated to the track in the Prs
- EcalE - energy deposit associated to the track in the Hcal
- HcalE - energy deposit associated to the track in the Hcal
- PrsDLLbeElectron - delta log-likelihood for a particle candidate to be electron using information from Prs
- BremDLLbeElectron - delta log-likelihood for a particle candidate to be electron using information from Brem
- TrackP - particle momentum
- TrackPt - particle transverse momentum
- TrackNDoFSubdetector1  - number of degrees of freedom for track fit using hits in the tracking sub-detector1
- TrackQualitySubdetector1 - chi2 quality of the track fit using hits in the tracking sub-detector1
- TrackNDoFSubdetector2 - number of degrees of freedom for track fit using hits in the tracking sub-detector2
- TrackQualitySubdetector2 - chi2 quality of the track fit using hits in the  tracking sub-detector2
- TrackNDoF - number of degrees of freedom for track fit using hits in all tracking sub-detectors
- TrackQualityPerNDoF - chi2 quality of the track fit per degree of freedom
- TrackDistanceToZ - distance between track and z-axis (beam axis)
- Calo2dFitQuality - quality of the 2d fit of the clusters in the calorimeter 
- Calo3dFitQuality - quality of the 3d fit in the calorimeter with assumption that particle was electron
- EcalDLLbeElectron - delta log-likelihood for a particle candidate to be electron using information from Ecal
- EcalDLLbeMuon - delta log-likelihood for a particle candidate to be muon using information from Ecal
- EcalShowerLongitudinalParameter - longitudinal parameter of Ecal shower
- HcalDLLbeElectron - delta log-likelihood for a particle candidate to be electron using information from Hcal
- HcalDLLbeMuon - delta log-likelihood for a particle candidate to be using information from Hcal
- RICHpFlagElectron - flag (0 or 1) if momentum is greater than threshold for electrons to produce Cherenkov light
- RICHpFlagProton - flag (0 or 1) if momentum is greater than threshold for protons to produce Cherenkov light
- RICHpFlagPion - flag (0 or 1) if momentum is greater than threshold for pions to produce Cherenkov light
- RICHpFlagKaon - flag (0 or 1) if momentum is greater than threshold for kaons to produce Cherenkov light
- RICHpFlagMuon - flag (0 or 1) if momentum is greater than threshold for muons to produce Cherenkov light
- RICH_DLLbeBCK  - delta log-likelihood for a particle candidate to be background using information from RICH
- RICH_DLLbeKaon - delta log-likelihood for a particle candidate to be kaon using information from RICH
- RICH_DLLbeElectron - delta log-likelihood for a particle candidate to be electron using information from RICH
- RICH_DLLbeMuon - delta log-likelihood for a particle candidate to be muon using information from RICH
- RICH_DLLbeProton - delta log-likelihood for a particle candidate to be proton using information from RICH
- MuonFlag - muon flag (is this track muon) which is determined from muon stations
- MuonLooseFlag muon flag (is this track muon) which is determined from muon stations using looser criteria
- MuonLLbeBCK - log-likelihood for a particle candidate to be not muon using information from muon stations
- MuonLLbeMuon - log-likelihood for a particle candidate to be muon using information from muon stations
- DLLelectron - delta log-likelihood for a particle candidate to be electron using information from all subdetectors
- DLLmuon - delta log-likelihood for a particle candidate to be muon using information from all subdetectors
- DLLkaon - delta log-likelihood for a particle candidate to be kaon using information from all subdetectors
- DLLproton - delta log-likelihood for a particle candidate to be proton using information from all subdetectors
- GhostProbability - probability for a particle candidate to be ghost track. This variable is an output of classification model used in the tracking algorithm.

Delta log-likelihood in the features descriptions means the difference between log-likelihood for the mass hypothesis that a given track is left by some particle (for example, electron) and log-likelihood for the mass hypothesis that a given track is left by a pion (so, DLLpion = 0 and thus we don't have these columns). This is done since most tracks (~80%) are left by pions and in practice we actually need to discriminate other particles from pions. In other words, the null hypothesis is that particle is a pion.

### Look at the labels set

The training data contains six classes. Each class corresponds to a particle type. Your task is to predict type of a particle.

In [14]:
set(data.Label)

{'Electron', 'Ghost', 'Kaon', 'Muon', 'Pion', 'Proton'}

Convert the particle types into class numbers.

In [15]:
data['Class'] = get_class_ids(data.Label.values)
set(data.Class)

{0, 1, 2, 3, 4, 5}

### Define training features

The following set of features describe particle responses in the detector systems:

![features](pic/features.jpeg)

Also there are several combined features. The full list is following.

In [16]:
features = list(set(data.columns) - {'Label', 'Class'})
features = sorted(features)

### Divide training data into 2 parts

In [17]:
training_data, validation_data = train_test_split(data, random_state=11, train_size=0.10)

In [18]:
len(training_data), len(validation_data)

(120000, 1080000)

# Torch Neural Network

On this step your task is to train **Torch** NN classifier to provide lower **log loss** value and lower number of parameters.

In [19]:
from sparse_model import SparseNet
from sparse_vd import SGVLB

In [20]:
model = SparseNet(threshold=0.001, input_dim=len(features), device=device).to(device)
optimizer = torch.optim.Adam(lr=1e-3, params=model.parameters()) # optimizer

fmt = {'tr_los': '3.1e', 'te_loss': '3.1e', 'sp_0': '.3f', 'sp_1': '.3f', 'sp_2': '.3f', 'lr': '3.1e', 'kl': '.2f'}
logger = Logger('sparse_vd', fmt=fmt)

sgvlb = SGVLB(model, len(training_data[features].values))

In [21]:
# train dataloader
tensor_x = torch.Tensor(training_data[features].values) # transform to torch tensor
tensor_y = torch.Tensor(training_data.Class.values).long()

train_dataset = TensorDataset(tensor_x, tensor_y) # create your datset
train_loader = DataLoader(train_dataset, batch_size=256) # create your dataloader

# validation dataloader
tensor_x = torch.Tensor(training_data[features].values) # transform to torch tensor
tensor_y = torch.Tensor(training_data.Class.values).long()
val_dataset = TensorDataset(tensor_x, tensor_y) # create your datset
val_loader = DataLoader(val_dataset, batch_size=256) # create your dataloader

In [22]:
nllloss = torch.nn.NLLLoss()

In [23]:
from tqdm import tqdm

kl_weight = 0.0
epochs = 1

for epoch in tqdm(range(1, epochs + 1)):
    model.train()
    train_loss, train_acc = 0, 0 
    logger.add_scalar(epoch, 'kl', kl_weight)
    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.to(device)
        target = target.to(device)
        optimizer.zero_grad()
        
        output = model(data)
        pred = output.data.max(1)[1] 
        loss = sgvlb(output, target, kl_weight)
        loss.backward()
        optimizer.step()
        
        train_loss += float(loss) 
        train_acc += np.sum(pred.cpu().numpy() == target.data.cpu().numpy())
        
    logger.add_scalar(epoch, 'tr_los', train_loss / len(train_loader.dataset))
    logger.add_scalar(epoch, 'tr_acc', train_acc / len(train_loader.dataset) * 100)
    
    model.eval()
    test_loss, test_acc = 0, 0
    test_nll = 0
    for batch_idx, (data, target) in enumerate(val_loader):
        data = data.to(device)
        
        target = target.to(device)
        output = model(data)
        
        test_loss += float(sgvlb(output, target, kl_weight))
        test_nll += nllloss(output, target).item()
        pred = output.data.max(1)[1] 
        test_acc += np.sum(pred.cpu().numpy() == target.data.cpu().numpy())
        
    logger.add_scalar(epoch, 'te_loss', test_loss / len(val_loader.dataset))
    logger.add_scalar(epoch, 'te_acc', test_acc / len(val_loader.dataset) * 100)
    logger.add_scalar(epoch, 'te_nll', test_nll / len(val_loader))
    print(test_nll / len(val_loader))
    for i, c in enumerate(model.children()):
        if hasattr(c, 'kl_reg'):
            effective, total = c.count_parameters()
            logger.add_scalar(
                epoch, 
                'sp_%s' % i, 
                effective / total
            )
    
    logger.iter_info()

  0%|          | 0/1 [00:00<?, ?it/s]

1.2612428164431282


  epoch    kl    tr_los    tr_acc    te_loss    te_acc    te_nll    sp_0    sp_1    sp_2
-------  ----  --------  --------  ---------  --------  --------  ------  ------  ------
      1  0.00   7.5e+02      39.8    5.9e+02      45.1       1.3   0.852   0.771   0.731


100%|██████████| 1/1 [00:07<00:00,  7.84s/it]

100%|██████████| 1/1 [00:07<00:00,  7.85s/it]




## Score calculation

In [24]:
baseline_neurons = 10000

def get_score(logloss):
    k = -1. / 0.9
    b = 1.2 / 0.9
    score = b + k * logloss
    score = max(score, 0)
    score = min(score, 1)
    return score

### 1. Estimate average log-likelihood on validation data

In [25]:
test_nll = 0
for batch_idx, (data, target) in enumerate(val_loader):
    data = data.to(device)

    target = target.to(device)
    output = model(data)

    test_nll += nllloss(output, target).item()
test_nll = test_nll / len(val_loader)

In [26]:
test_nll

1.2612428164431282

### 2. Count __effective__ number of parameters in the network

In [27]:
from sparse_vd import LinearSVDO
effecive_number_parameters = 0
total_number_parameters = 0
for module in model.children():
    if isinstance(module, LinearSVDO):
        effecive_number_parameters += module.count_parameters()[0]
        total_number_parameters += module.count_parameters()[1]
    else:
        for param in module.parameters():
            effecive_number_parameters += param.numel()
            total_number_parameters += param.numel()

In [28]:
effecive_number_parameters, total_number_parameters

(12491, 15706)

###### Final score is a combination of the compression score and quality of the classification.

In [29]:
final_score = get_score(test_nll) * np.log(1 + baseline_neurons / effecive_number_parameters)

In [30]:
final_score

0.0

## Save weights

In [31]:
torch.save(model.state_dict(), "model_weights.pt")

# Check submission files

## Important notes!

 1. Weights should be named as `model_weights.pt`
 2. Edit model in `sparse_model.py`-file
 3. `sparse_model.py` should use `LinearSVDO`-layer from `sparse_vd.py`-file
 4. __Do not change__ `LinearSVDO`-layer definition in `sparse_vd.py`-file

Let's run a script to check that everything is correct and estimate score on a __train__ dataset.

In [35]:
!python3 test_submission.py /home/user/share/pid_sparse/training.csv.gz

Score on train dataset is: 0.0


# Prepare submission

Select your best classifier and prepare submission file.

In [36]:
!zip submission.zip model_weights.pt sparse_model.py

updating: model_weights.pt (deflated 16%)
updating: sparse_model.py (deflated 60%)


In [37]:
from IPython.display import FileLink
FileLink("submission.zip")

# Ideas how to improve solution

1. Optimize `kl_weight` and `threshold`
2. Individual `threshold`'s for each layer
3. Add learning rate scheduler
4. Start from kl_weight = 1e-5 and gradually increase
5. ????
6. Your ideas :)