In [None]:
%load_ext autoreload
%autoreload 2

# Event selection

## Data processing: from raw data to high-level objects
The NA62 data, after being acquired, are first reconstructed. This means we take our raw (binary) data and we extract for each event the signals recorded for each detector individually. For each detectors, these signals are reconstructed as hits containing information like time, position, and geometrical channel ID (i.e. related to physical position rather than electronics position). Hits are further grouped together to reconstruct and form candidates: candidates are more complex objects generally meant to relate to a single particle interacting with the detector. These candidates can be referred to with other names for certain detectors: LKr -> cluster, Spectrometer -> track, GTK -> beam track.

At the analysis level, candidates from multiple detectors can be associated together in space (geometrically) and in time to form higher level objects:
 - beam kaon: GTK track associated to a KTAG candidate
 - downstream track: spectrometer track associated to eventual candidates from CHOD, NewCHOD, MUV1-3, LKr, RICH
 - vertex: association between GTK and spectrometer tracks, or associated between spectrometer tracks only, or association between LKr clusters (neutral vertex)

## Pile-up
As you may have guessed, the data presented to you here consist of events in the form of association of several of these high-level objects. However to reach that point, there has been already a sizable amount of selection performed at analysis level to make these associations. In fact one single event will usually consist in many more of those high-level objects coming from pile-up. Pile-up refers to valid events (either a beam kaon decay, a beam pion decay, a single beam muon) which are *not* (with caveat) related to the event that generated the trigger. In order to make sure we have **all** the information relating to that specific events, we are requesting signals from detector that are spread within ~100 ns of the trigger time. This means we have a lot of pile-up in each event, which is spread in time randomly within the event. But by applying timing cuts at the analysis level, we can easily discard a very large fraction of those pile-up events. There is still a small fraction of events remaining which happen to be at the same time as the triggered event. As it is not possible to distinguish it from the triggered event, additional selection criterion must be applied to ensure it is a valid kaon decay. As the pile-up event is at the same time as the kaon decay, we usually have several possibilities to combine the high-level objects. For instance, if you have one track and 3 isolated clusters and looking for a K2pi decay, you have three association possibilities for the clusters: combine clusters 1 and 2 to form a pi0, combines cluster 2 and 3, or combines clusters 1 and 3. This is called the combinatorics and you can apply conditions on each of all the possible options. If you manage to find a single option that satisfies all your requirements, this could be the correct one and you go ahead forming an event with this selection of high-level object. If not you usually reject the event.

This process of eliminating the pile-up has already been done for you in the data that are available here, and this is the reason why each of your event has only one to three tracks and zero to two clusters. There remains one category of pile-up contaminating your sample (the caveat mentionned above): events that are **not** a kaon decay, but a combination of independent particles that look like a kaon decay. These required additional selection criterion, some of those have already been applied and some that we can apply ourselves in these data.

## Pre-selection
As mentionned above, a comprehensive pre-selection has already been applied to the data available here. You will find below the details on the criterion that have been used for each of the selected 'event_type'.

### K3pi selection
 - Control trigger
 - Exactly one three-track vertex within 6 ns of the trigger time
 - Each track must be in the acceptance of NewCHOD and 4 spectrometer chambers
 - Tracks must be further than 10 cm away from each other at Straw1
 - $q_\text{vtx} = +1$
 - Vertex between 104 m and 180 m and $\chi^2<25$
 - Reconstructed total momentum $p_\text{tot}$ must be within 3 GeV of 75 GeV
 - Reconstructed three-pion invariant mass $m_{3\pi}$ must between 490 MeV and 497 MeV

### Kmu2 selection
 - Control trigger
 - Less than 10 tracks
 - Exactly one good track:
     - Must have CHOD or NewCHOD associated signal
     - Within 10 ns from the trigger time
     - Signal in 4 spectrometer chambers
     - Track $\chi^2<20$
     - Not forming a vertex with any other track. Vertex if:
         - cda < 50 mm
         - z vertex between 60 m and 200 m
 - $q=+1$
 - Track momentum betwen 5 GeV and 70 GeV
 - Track in geometric acceptance of all 4 spectrometer stations, MUV3
 - Vertex between 120 m and 180 m with CDA<40 mm
 - Closest KTAG candidate in time must be within 2 ns of the track CHOD time, or 5 ns from the track NewCHOD time if CHOD time is not available
 - No in-time activity in the LAV, IRC and SAC
 - Track must have MUV3 signal associated to the track within 1.5 ns of the KTAG time, and within 2 ns of the track CHOD time (5 ns of NewCHOD if CHOD not available)
 - Track $E/p < 0.2$
 - Missing mass squared (muon hypothesis) $m_\text{miss}^2(\mu) < 0.02~\text{GeV}^2$

### K2pi selection
 - control trigger
 - At least 1 track and 3 clusters
 - At least 2 good clusters:
     - Within 20 ns of the trigger time
     - Further than 20 mm from a dead cell
     - Cluster must be electromagnetic
     - At least 1 GeV
     - Not associated to a Spectrometer track
     - Must be isolated
 - Build pairs of clusters as pi0:
   - Within 5 ns of each other
   - Further than 200 mm apart
   - Energy sum between 2 GeV and 75 GeV
 - Exactly 1 good track definition:
     - At least one pi0 within 2 ns of the track
     - $q=+1$
     - Not fake
     - Less than 20 GeV between raw momentum and fitted momentum
     - Momentum between 5 GeV and 70 GeV
     - Vertex wrt. beam axis must be between 105 m and 180 m
     - CDA < 30 mm
     - Inside the geometric acceptance of NewCHOD, all 4 Spectrometer stations, LKr
 - Select closest pi0 in time
 - No MUV3 signal associated to the track
 - Must have LKr cluster associated to the track, and $E/p < 0.9$
 - No in-time activity in the LAV, IRC and SAC
 - Reconstructed kaon mass $m_{\pi\gamma\gamma}$ between 460 MeV and 520 MeV
 - Missing mass squared (pion hypothesis) $m_\text{miss}^2(\pi) < 0.015~\text{GeV}^2$

### Kmu3 selection
 - Control trigger
 - At least one track
 - Exactly one good track:
     - Must have MUV3 association
     - Vertex wrt. beam axis must be between 110 m and 180 m
     - CDA < 25 mm
     - Momentum between 5 GeV and 50 GeV
     - $q = +1$
     - Track $\chi^2 < 20$
     - In acceptance of NewCHOD, 4 spectrometer chambers, LKr and MUV3
 - No other track within 10 ns of the good track
 - Must have exactly two good LKr clusters:
     - Energy > 2 GeV
     - Within 6 ns of the track
     - Further than 150 mm from the track impact point on LKr
 - Neutral vertex within 10 m of charged vertex
 - No in-time activity in the LAV, IRC and SAC
 - Reconstructed total momentum $p_\text{tot}$ must be between 15 GeV and 70 GeV
 - Reconstructed transverse total momentum $p_{\text{tot},T}$ must be between 40 and 250 MeV
 - Missing mass squared (muon hypothesis) $m_\text{miss}^2(\mu) < 0.01~\text{GeV}^2$

### Ke3 selection
 - Control trigger
 - At least one track
 - Exactly one good track:
     - Must not have MUV3 association
     - Vertex wrt. beam axis must be between 110 m and 180 m
     - CDA < 25 mm
     - Momentum between 5 GeV and 50 GeV
     - $q = +1$
     - Track $\chi^2 < 20$
     - In acceptance of NewCHOD, 4 spectrometer chambers, LKr and MUV3
 - No other track within 10 ns of the good track
 - Must have exactly two good LKr clusters:
     - Energy > 2 GeV
     - Within 6 ns of the track
     - Further than 150 mm from the track impact point on LKr
 - Neutral vertex within 10 m of charged vertex
 - No in-time activity in the LAV, IRC and SAC
 - Reconstructed total momentum $p_\text{tot}$ must be between 15 GeV and 70 GeV
 - Reconstructed transverse total momentum $p_{\text{tot},T}$ must be between 40 and 250 MeV
 - Missing mass squared (electron hypothesis) $m_\text{miss}^2(e) < 0.01~\text{GeV}^2$



## Criterions categories
From the selections above we can roughly create some "categories" of criterion:
 - Trigger
 - Basic objects requirements
 - Object selection conditions
     - Timing
     - Track quality
     - Charge
     - Momentum
     - Vertex
     - Geometric acceptance
 - Additional requirements
 - Pile-up veto
 - Reconstructed quantities

This can be useful to roughly compare the selections and identify differences. The table below summarises the selection according to these categories

| Type | Sample | Trigger | Basic requirements | O. timing | O. quality | O. charge | O. momentum | O. vertex | O. Geo. Acc. | Add. Req. | Veto | Reco quantities |
| --:    | --:    | :--:    | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
| 3-track | K3pi   | Control | One 3-track vertex | $|t_\text{vtx} - t_\text{trigger}| < 6~\text{ns}$ | $d_{i,j} > 10~\text{cm}$ | +1 | N/A | $104~m < Z_\text{vtx} < 180~m$ & $\chi^2<25$ | NewCHOD & 4 Straw | N/A | N/A | $|p_\text{tot} - 75~\text{GeV}|<3~\text{GeV}$ & $490~\text{MeV} < m_{3\pi} < 497~\text{MeV}$ |
| 1-track | Kmu2   | Control | < 10 tracks | $|t_\text{track} - t_\text{trigger}| < 10~\text{ns}$ | $\chi^2 < 20$ & No other vertex | +1 | $5~\text{GeV} < p < 70~\text{GeV}$ | $120~m < Z_\text{vtx} < 180~m$ & $\text{CDA} < 40~mm$ | MUV3 & 4 Straw | Signal in 4 Straw and (CHOD or NewCHOD) & KTAG in-time & MUV3 in-time signal & $E/p < 0.2$ | LAV, IRC, SAC | $m_\text{miss}^2(\mu) < 0.02~\text{GeV}^2$ |
| 1-track & 2-clusters | K2pi   | Control | 1 track & 3 clusters | $|t_\text{track}-t_{\pi_0}| < 2~\text{ns}$; Cluster: $|t_\text{cluster} - t_\text{trigger}|<20~\text{ns}$ & $\Delta t_{i,j}<5~\text{ns}$ | Track: Not fake & $\Delta p_\text{raw,fit}<20~\text{GeV}$; Cluster: $\Delta d_\text{dead-cell} > 20~\text{mm}$ & Electromagnetic & Not associated to track & Isolated & $\Delta d_{i,j} > 200~\text{mm}$ | Track: +1 | Track: $5~\text{GeV} < p < 70~\text{GeV}$; Cluster: $E>1~\text{GeV}$ & $2~\text{GeV} < E_1 + E_2 < 75~\text{GeV}$ | $105~m < Z_\text{vtx} < 180~m$ & $\text{CDA} < 30~mm$ | NewCHOD & 4 Straw | Track: no MUV3 &  $E/p<0.9$ | LAV, IRC, SAC | $460~\text{MeV} < m_K < 520~\text{MeV}$ & $m_\text{miss}^2(\pi) < 0.015~\text{GeV}^2$ |
| 1-track & 2-clusters | Kmu3 | Control | N/A | $|t_\text{cluster} - t_\text{track}| < 6~\text{ns}$ | Track: $\chi^2 < 20$ & MUV3 association; Cluster: $\Delta d_\text{cls,track} > 150~\text{mm}$ | Track: +1 | Track: $5~\text{GeV} < p < 50~\text{GeV}$; Cluster: $E>2~\text{GeV}$ | Track: $110~m < Z_\text{ch. vtx} < 180~m$ & $\text{CDA} < 25$; Cluster: $|Z_\text{ch. vtx} - Z_\text{neut. vtx}|<10~m$ | Track: NewCHOD & 4 Straw & LKr & MUV3 | No track within 10 ns | LAV, IRC, SAC | $15~\text{GeV} < p_\text{tot} < 70~\text{GeV}$ & $40~\text{MeV} < p_{\text{tot},T} < 250~\text{MeV}$ & $m_\text{miss}^2(\mu) < 0.01~\text{GeV}^2$ |
| 1-track & 2-clusters | Ke3 | Control | N/A | $|t_\text{cluster} - t_\text{track}| < 6~\text{ns}$ | Track: $\chi^2 < 20$ & No MUV3 association; Cluster: $\Delta d_\text{cls,track} > 150~\text{mm}$ | Track: +1 | Track: $5~\text{GeV} < p < 50~\text{GeV}$; Cluster: $E>2~\text{GeV}$ | Track: $110~m < Z_\text{ch. vtx} < 180~m$ & $\text{CDA} < 25$; Cluster: $|Z_\text{ch. vtx} - Z_\text{neut. vtx}|<10~m$ | Track: NewCHOD & 4 Straw & LKr & MUV3 | No track within 10 ns | LAV, IRC, SAC | $15~\text{GeV} < p_\text{tot} < 70~\text{GeV}$ & $40~\text{MeV} < p_{\text{tot},T} < 250~\text{MeV}$ & $m_\text{miss}^2(e) < 0.01~\text{GeV}^2$ |

Even though all the selections seem to contain a similar set of conditions, the cut values are sometimes differing slightly. As we will eventually compare events type selected with different selections, it is best to harmonize the cut values. This will avoid introducing systematic uncertainties due to different behaviour and different agreement between MC and data in different ranges of the variables on which we introduce conditions.

The conditions where we can see obvious differences that can be harmonized are in the following list. When common criterion are selected, we have to use the most constraining values as we cannot "undo" what was already applied before presenting these data.
 - $Z \text{vtx}$: common range is between 120 m and 180 m
 - LKr distance between track and cluster: common limit of at least 150 mm
 - LKr distance between clusters: common limit of at least 200 mm
 - Cluster Energy: common limit of at least 2 GeV
 - Vertex CDA: common limit of less than 25 mm
 - Distance between charged and neutral vertices: common limit of less than 10 m

In [None]:
# Lets first import all we need
import uproot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from na62 import prepare, hlf, extract, constants, stats, histo

In [None]:
data = prepare.import_root_files(["data/run12450.root"])
k2pi = prepare.import_root_files(["data/k2pi.root"], total_limit=1000000)
k3pi = prepare.import_root_files(["data/k3pi.root"], total_limit=1000000)
kmu2 = prepare.import_root_files(["data/kmu2.root"], total_limit=1000000)
kmu3 = prepare.import_root_files(["data/kmu3.root"], total_limit=1000000)
ke3 = prepare.import_root_files(["data/ke3.root"], total_limit=1000000)