# TrackML Particle Tracking Challenge

This kernel is a code along of [Joshua Bonatt's TrackML EDA, etc.](https://www.kaggle.com/jbonatt/trackml-eda-etc/notebook) kernel. I want to get comfortable with reproducing his results more/less from scratch.


# Contents
1. [Imports & Setup](#imports)
2. [Hits](#hits)
3. [Cells](#cells)
4. [Particles](#particles)
5. [Truth](#truth)

## <a name="imports">Imports & Setup</a>

[TrackML library](https://github.com/LAL/trackml-library) downloaded via: **Settings** >> **Add a custom package** >> *Github user/repo:* `LAL/trackml-library`

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from mpl_toolkits import mplot3d
import seaborn as sns

In [2]:
from trackml.dataset import load_event, load_dataset
from trackml.randomize import shuffle_hits
from trackml.score import score_event

## Load one event for EDA

In [3]:
# One event of 8850
event_id = 'event000001000'
# "All methods either take or return pandas.DataFrame objects"
hits, cells, particles, truth = load_event('../input/train_1/' + event_id)

## Data Files

**hits**:
- `hit_id`: numerical identifier of the hit inside the event
- `x,y,z`: measured x,y,z position [mm] of hit in global coords
- `volume_id`: numerical identifier of detector group
- `layer_id`: numerical identifier of detector layer inside group
- `module_id`: numerical identifer of detector module inside layer

**cells**:
- `hit_id`: num id of hit as defined in hits file
- `ch0, ch1`: channel id/coords unique w/n 1 module
- `value`: signal value information: particle charge deposition

**particles**:
- `particle_id`: num id of particle inside event
- `vx,vy,vz`: inittial position or vertex [mm] in global coords
- `px,py,pz`: intitial momentum [GeV/c] along each global axis
- `q`: particle charge (multiple of absolute electron charge)
- `nhits`: number of hits by this particle

**truth**:
- `hit_id`: num id of hit as defined in hits file
- `particle_id`: num id of particle as defnd in particles file.
    - `0`: hit did *not* originate from a reconstrible particle, ie: detector noise
- `tx,ty,tz`: true intersection point in global coords [mm] between particle trajectory and sensing surface.
- `tpx,tpy,tpz`: true particle momentum [GeV/c] in global coords at intersection point. Corresponding vector is *tangent* to particle traj at intersection point
- `weight`: per-hit weight used for scoring metric. **`Σ`**`(weights in 1 event) = 1`


## <a name="hits">Hits</a>

In [4]:
hits.head()

In [5]:
hits.tail()

In [6]:
hits.describe()

The mean of `x,y,z` is only a few mm from the detector's center. The std is very high though: 305, 305, 1061. This means there's a lot of spread in hit location.

## Spatial Distribution of Hits

In [7]:
# plt.figure(figsize=(10,10))
# plt.scatter(hits.x,hits.y, s=1)
# plt.show()
# Same as above, but include Univariate plots & Pearson correlation coeffs:
radialview = sns.jointplot(hits.x, hits.y, size=10, s=1)
radialview.set_axis_labels('x [mm]', 'y [mm]')
plt.show()

The solid core is likely more detectors. Zooming in to find out:

In [8]:
radialview = sns.jointplot(hits[hits.x.abs() < 200].x, hits[hits.y.abs() < 200].y, size=10, s=1)
radialview.set_axis_labels('x [mm]', 'y [mm]');
# plt.show()

The scattering between rings are events from vertical detectors. Below shows these caps removed and again shows the concentric nature of the inner detector (*I still don't exactly know what that means -- I guess the scattering is further along (larger values) of hits on the z-detector*):

$\rightarrow$ *yeah, so you have cylindrical detectors near center, and discal detectors further out. Setting the z limit to 200 effectively ignores detections from the disks further away. So you only see discrete radial detections, instead of a spread.*

In [9]:
def radial_display(dat, lim=2000):
    radialview = sns.jointplot(dat[dat.x.abs()<lim].x, dat[dat.y.abs()<lim].y, size=10, s=1)
    radialview.set_axis_labels('x [mm]', 'y [mm]')
    plt.show()

In [10]:
nocap = hits[hits.z.abs() < 200]
radial_display(nocap, lim=200)

The detectors are layered as flat shingled rectangles. Zooming into the center-most detectors:

In [11]:
radial_display(nocap, lim=50)

In [12]:
def side_display(fgsz = (24,8)): 
    plt.figure(figsize=fgsz)
    axialview = plt.scatter(hits.z, hits.y, s=1)
    plt.xlabel('z (mm)')
    plt.ylabel('y (mm)')
    plt.show()
side_display()

3D plot of a random sample of hits.

In [13]:
def iso_display():
    plt.figure(figsize=(15,15))
    ax = plt.axes(projection='3d')
    sample = hits.sample(30000)
    ax.scatter(sample.z, sample.x, sample.y, s=5, alpha=0.5)
    ax.set_xlabel('z (mm)')
    ax.set_ylabel('x (mm)')
    ax.set_zlabel('y (mm)')
    # These two added to widen the 3D space
    ax.scatter(3000,3000,3000, s=0)
    ax.scatter(-3000,-3000,-3000, s=0)
    plt.show()
iso_display()

## Location of Individual Detector Groups

plotting each detector group as a different color:

In [14]:
volumes = hits.volume_id.unique()

fg,ax = plt.subplots(figsize=(15,15))
for volume in volumes:
    v = hits[hits.volume_id == volume]
    ax.scatter(v.x, v.y, s=10, label='Volume '+str(volume), alpha=0.5)
ax.set_title('Detector Volumes, Radial View')
ax.set_xlabel('x [mm]')
ax.set_ylabel('y [mm]')
ax.legend()
plt.show()

volumes = hits.volume_id.unique()

fg,ax = plt.subplots(figsize=(24,8))
for volume in volumes:
    v = hits[hits.volume_id == volume]
    ax.scatter(v.z, v.y, s=10, label='Volume '+str(volume), alpha=0.5)
ax.set_title('Detector Volumes, Axial View')
ax.set_xlabel('z [mm]')
ax.set_ylabel('y [mm]')
ax.legend()
plt.show()

Plotting this in 3D:

In [15]:
sample = hits.sample(30000)
plt.figure(figsize=(20,20))
ax = plt.axes(projection='3d')
for volume in volumes:
    v = sample[sample.volume_id == volume]
    ax.scatter(v.z, v.x, v.y, s=5, label='Volume '+str(volume), alpha=0.5)
ax.set_xlabel('z (mm)'); ax.set_ylabel('x (mm)'); ax.set_zlabel('y (mm)')
ax.legend()
# added to widen the 3D space:
ax.scatter(3000,3000,3000, s=0); ax.scatter(-3000,-3000,-3000, s=0)
plt.show()

We can also look at the layers:

In [21]:
# RADIAL
layers = hits.layer_id.unique()
fg,ax  = plt.subplots(figsize=(15,15))
for l_name in layers:
    l = hits[hits.layer_id == l_name]
    ax.scatter(l.x, l.y, s=10, label='Layer '+str(l_name), alpha=0.5)
ax.set_title('Detector Layers, Radial View'); ax.set_xlabel('x [mm]'); ax.set_ylabel('y [mm]')
ax.legend()
plt.show()
# AXIAL
fg,ax  = plt.subplots(figsize=(24,8))
for l_name in layers:
    l = hits[hits.layer_id == l_name]
    ax.scatter(l.z, l.y, s=10, label='Layer '+str(l_name), alpha=0.5)
ax.set_title('Detector Layers, Axial View'); ax.set_xlabel('z [mm]'); ax.set_ylabel('y [mm]')
ax.legend()
plt.show()
# ISOMETRIC
sample = hits.sample(30000)
plt.figure(figsize=(20,20))
ax = plt.axes(projection='3d')
for layer in layers:
    l = sample[sample.layer_id == layer]
    ax.scatter(l.z, l.x, l.y, s=5, label='Layer '+str(layer), alpha=0.5)
ax.set_xlabel('z (mm)'); ax.set_ylabel('x (mm)'); ax.set_zlabel('y (mm)'); ax.legend()
# added to widen the 3D space
ax.scatter(3000,3000,3000, s=0); ax.scatter(-3000,-3000,-3000, s=0)
plt.show()

modules not plotted for now (too many)

## Detector Group Inquiry

Modules make up Layers. Layers make up Volumes. `module_id` is a subdir of `layer_id` which is a subdir of `volume_id`. Cells are the smallest unit of resolution and this a subdir of `module_id`. We can look at the population of these:

In [38]:
groups = [hits.volume_id, hits.layer_id, hits.module_id, cells.ch0, cells.ch1]
fig,axes = plt.subplots(1,5, figsize=(30,10))
for i,ax in enumerate(axes):
    sns.distplot(groups[i], ax=ax)

A guess is that low-ID layers, modules, cells are closer to the center (due to the higher number of hits). We can plot hits by their radius to see clearer:

In [41]:
radius2 = np.sqrt(hits.x**2 + hits.y**2)
radius3 = np.sqrt(hits.x**2 + hits.y**2 + hits.z**2)
z2 = hits.z**2
rads = [radius2, radius3, z2]

axlbls = ['sqrt(x^2 + y^2)', 'sqrt(x^2 + y^2 + z^2)', 'z']

fig,axes = plt.subplots(1,3, figsize=(30,10))
for i,ax in enumerate(axes):
    sns.distplot(rads[i], axlabel=axlbls[i], ax=ax)

The general distribution of events are proportional to the radius. Plotting groups by radius:

In [44]:
labels = [['volume_id','radius'],['layer_id','radius'],['module_id','radius']]
groups = [hits.volume_id, hits.layer_id, hits.module_id]
fig,axes = plt.subplots(1,3, figsize=(30,10))
for i,ax in enumerate(axes):
    ax.set_xlabel(labels[i][0]); ax.set_ylabel(labels[i][1])
    ax.scatter(groups[i], radius2)

From these plots:
- **Volumes** are named for **left**, **center**, **right** of detector center.
- **Layers** are named for their **radius** from the detector center, like onion layers.
- **Modules** are named for their **rotation** about the detector cetner.

This can be seen in the 3D plots above as well.

Viewing group distribution:


In [45]:
hits.volume_id.value_counts()

In [46]:
hits.layer_id.value_counts()

In [47]:
hits.module_id.value_counts().head()

## Hit Feature Correlations
Plotting each feature (x,y,z, volume, layer, module) against each other.

In [48]:
# Pairplotting 120k hits takes too long - so a random sample of 3k.
sample = hits.sample(3000)
# Color coding by group
sns.pairplot(sample, hue='volume_id', size=8)
plt.show()

Qualitatively, there are correlations between hit features. Usebale ones may be easily extractable.

Plotting correlation heatmap, dropping `hits_id`:

In [52]:
fg,ax = plt.subplots(figsize=(10,10))
hitscorr = hits.drop('hit_id', axis=1).corr()
sns.heatmap(hitscorr, cmap='coolwarm', square=True, ax=ax)
ax.set_title('Hits Correlation Heatmap');

`module_id` is correlated with `layer_id`, and `layer_id` is correlated with `volume_id`.

## <a name="cells">Cells</a>

In [53]:
cells.head()

In [54]:
cells.tail()

In [55]:
cells.describe()

In [57]:
fig,axe = plt.subplots(figsize=(10,10))
cellscorr = cells.drop('hit_id', axis=1).corr()
sns.heatmap(cellscorr, cmap='coolwarm', square=True, ax=axe)
axe.set_title('Cells Correlation Heatmap');

## <a name="particles">Particles</a>

In [58]:
particles.head()

In [59]:
particles.tail()

In [60]:
particles.describe()

Considerations:
- The particle vertex doesn't necessarily have to be the center of the detector.
- Charqe `q` is always ±1. No ions.
- `nhits` can be as low as `0`.

Plotting histograms:

In [65]:
plt.figure(figsize=(15,10)); 
plt.subplot(1,2,1); plt.xlabel('Charge (e)'); plt.ylabel('Counts')
particles.q.hist(bins=3)
plt.subplot(1,2,2); plt.xlabel('nhits')
particles.nhits.hist(bins=particles.nhits.max())
plt.show();

There's about a 20% difference between +q and -q particles. `nhits` seems to be a a distribution around zero + a gaussian around 12. `nhits` and `momentum` should be able to be convolved.

Checking if `nhits` is proportional to total momentum `p`:

In [68]:
fig,axe = plt.subplots(figsize=(10,10))
p = np.sqrt(particles.px**2 + particles.py**2 + particles.pz**2)
axe.scatter(particles.nhits, p)
axe.set_yscale('log'); axe.set_xlabel('nhits'); axe.set_ylabel('Momentum [GeV/c]')
plt.show();

No correlation is apparent. Looking at a histogram of momentum (log axis):

In [69]:
fig,axes = plt.subplots(1,2,figsize=(15,8))
axes[0].hist(np.sqrt(particles.px**2 + particles.py**2), bins=100, log=True)
axes[0].set_xlabel('Transverse momentum [GeV/c]'); axes[0].set_ylabel('Counts')
axes[1].hist(particles.pz.abs(), bins=100, log=True)
axes[1].set_xlabel('Z momentum [GeV/c]')
plt.show();

Comparing Transverse and Z momenta:

In [70]:
fig,axe = plt.subplots(figsize=(10,10))
axe.scatter(np.sqrt(particles.px**2 + particles.py**2), particles.pz, s=1)
axe.set_xscale('log'); axe.set_xlabel('Transverse momentum [GeV/c]'); axe.set_ylabel('Z momentum [GeV/c]')
plt.show();

Clipping out outliers, such as the pz=500 particle at top right.

In [73]:
p = particles[particles.pz < 200]

fig,axe = plt.subplots(figsize=(10,10))
axe.scatter(np.sqrt(p.px**2 + p.py**2), p.pz, s=3, alpha=0.5)
axe.plot([.1,.1],[p.pz.min(), p.pz.max()], c='g')
axe.plot([.1,np.sqrt(p.px**2 + p.py**2).max()], [.1,.1], c='r', linestyle='--')
axe.set_xscale('log'); axe.set_xlabel('Transverse momentum [GeV/c]'); axe.set_ylabel('Z momentum [GeV/c]')
plt.show();

Plot of momentum in the Z direction (down the length of the accelerator) and transverse (verticle). $\Rightarrow$ particles on the green line have a perfectly parallel trajectory to the beamline, and on red: perpendicular. This plot shows that  particles spread in a cone-like fashion.

Plotting correlation:

In [75]:
f,axe = plt.subplots(figsize=(10, 10))
particlescorr = particles.drop('particle_id', axis=1).corr()
sns.heatmap(particlescorr, cmap='coolwarm', square=True, ax = axe)
axe.set_title('Particles Correlation Heatmap')
plt.show();

## <a name="truth">Truth</a>

Each entry maps 1 hit to 1 particle.

In [76]:
truth.head()

In [77]:
truth.tail()

In [78]:
# looking at a particle
truth[truth.particle_id == 22525763437723648]

In [79]:
# Number of unique particles
len(truth.particle_id.unique())

## Plotting Particle Tracks

In [83]:
# get every kth particle
k = 100; tracks = truth.particle_id.unique()[1::k]

f,axe = plt.subplots(figsize=(15,15))
ax = f.add_subplot(1,1,1, projection='3d')
for track in tracks:
    t = truth[truth.particle_id == track]
    ax.plot3D(t.tz, t.tx, t.ty)
ax.set_xlabel('z [mm]'); ax.set_ylabel('x [mm]'); ax.set_zlabel('y [mm]')
# These two added to widen the 3D space
ax.scatter(3000,3000,3000, s=0); ax.scatter(-3000,-3000,-3000, s=0)
plt.show();

Many particles don't start at the detector cetner, but originate somewhere else. Also more-helical trajectories have less z-momentum.