# Event selection

In [None]:
# Lets first import all we need
import uproot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from functools import partial
from na62 import prepare, hlf, extract, constants, stats, histo

## Data processing: from raw data to high-level objects
The NA62 data, after being acquired, are first reconstructed. This means we take our raw (binary) data and we extract for each event the signals recorded for each detector individually. For each detectors, these signals are reconstructed as hits containing information like time, position, and geometrical channel ID (i.e. related to physical position rather than electronics position). Hits are further grouped together to reconstruct and form candidates: candidates are more complex objects generally meant to relate to a single particle interacting with the detector. These candidates can be referred to with other names for certain detectors: LKr -> cluster, Spectrometer -> track, GTK -> beam track.

At the analysis level, candidates from multiple detectors can be associated together in space (geometrically) and in time to form higher level objects:
 - beam kaon: GTK track associated to a KTAG candidate
 - downstream track: spectrometer track associated to eventual candidates from CHOD, NewCHOD, MUV1-3, LKr, RICH
 - vertex: association between GTK and spectrometer tracks, or associated between spectrometer tracks only, or association between LKr clusters (neutral vertex)

## Pile-up
As you may have guessed, the data presented to you here consist of events in the form of association of several of these high-level objects. However to reach that point, there has been already a sizable amount of selection performed at analysis level to make these associations. In fact one single event will usually consist in many more of those high-level objects coming from pile-up. Pile-up refers to valid events (either a beam kaon decay, a beam pion decay, a single beam muon) which are *not* (with caveat) related to the event that generated the trigger. In order to make sure we have **all** the information relating to that specific events, we are requesting signals from detector that are spread within ~100 ns of the trigger time. This means we have a lot of pile-up in each event, which is spread in time randomly within the event. But by applying timing cuts at the analysis level, we can easily discard a very large fraction of those pile-up events. There is still a small fraction of events remaining which happen to be at the same time as the triggered event. As it is not possible to distinguish it from the triggered event, additional selection criterion must be applied to ensure it is a valid kaon decay. As the pile-up event is at the same time as the kaon decay, we usually have several possibilities to combine the high-level objects. For instance, if you have one track and 3 isolated clusters and looking for a K2pi decay, you have three association possibilities for the clusters: combine clusters 1 and 2 to form a pi0, combines cluster 2 and 3, or combines clusters 1 and 3. This is called the combinatorics and you can apply conditions on each of all the possible options. If you manage to find a single option that satisfies all your requirements, this could be the correct one and you go ahead forming an event with this selection of high-level object. If not you usually reject the event.

This process of eliminating the pile-up has already been done for you in the data that are available here, and this is the reason why each of your event has only one to three tracks and zero to two clusters. There remains one category of pile-up contaminating your sample (the caveat mentionned above): events that are **not** a kaon decay, but a combination of independent particles that look like a kaon decay. These required additional selection criterion, some of those have already been applied and some that we can apply ourselves in these data.

## Pre-selection
As mentionned above, a comprehensive pre-selection has already been applied to the data available here. You will find below the details on the criterion that have been used for each of the selected 'event_type'.

### K3pi selection
 - Control trigger
 - Exactly one three-track vertex within 6 ns of the trigger time
 - Each track must be in the acceptance of NewCHOD and 4 spectrometer chambers
 - Tracks must be further than 10 cm away from each other at Straw1
 - $q_\text{vtx} = +1$
 - Vertex between 104 m and 180 m and $\chi^2<25$
 - Reconstructed total momentum $p_\text{tot}$ must be within 3 GeV of 75 GeV
 - Reconstructed three-pion invariant mass $m_{3\pi}$ must between 490 MeV and 497 MeV

### Kmu2 selection
 - Control trigger
 - Less than 10 tracks
 - Exactly one good track:
     - Must have CHOD or NewCHOD associated signal
     - Within 10 ns from the trigger time
     - Signal in 4 spectrometer chambers
     - Track $\chi^2<20$
     - Not forming a vertex with any other track. Vertex if:
         - cda < 50 mm
         - z vertex between 60 m and 200 m
 - $q=+1$
 - Track momentum betwen 5 GeV and 70 GeV
 - Track in geometric acceptance of all 4 spectrometer stations, MUV3
 - Vertex between 120 m and 180 m with CDA<40 mm
 - Closest KTAG candidate in time must be within 2 ns of the track CHOD time, or 5 ns from the track NewCHOD time if CHOD time is not available
 - No in-time activity in the LAV, IRC and SAC
 - Track must have MUV3 signal associated to the track within 1.5 ns of the KTAG time, and within 2 ns of the track CHOD time (5 ns of NewCHOD if CHOD not available)
 - Track $E/p < 0.2$
 - Missing mass squared (muon hypothesis) $m_\text{miss}^2(\mu) < 0.02~\text{GeV}^2$

### K2pi selection
 - control trigger
 - At least 1 track and 3 clusters
 - At least 2 good clusters:
     - Within 20 ns of the trigger time
     - Further than 20 mm from a dead cell
     - Cluster must be electromagnetic
     - At least 1 GeV
     - Not associated to a Spectrometer track
     - Must be isolated
 - Build pairs of clusters as pi0:
   - Within 5 ns of each other
   - Further than 200 mm apart
   - Energy sum between 2 GeV and 75 GeV
 - Exactly 1 good track definition:
     - At least one pi0 within 2 ns of the track
     - $q=+1$
     - Not fake
     - Less than 20 GeV between raw momentum and fitted momentum
     - Momentum between 5 GeV and 70 GeV
     - Vertex wrt. beam axis must be between 105 m and 180 m
     - CDA < 30 mm
     - Inside the geometric acceptance of NewCHOD, all 4 Spectrometer stations, LKr
 - Select closest pi0 in time
 - No MUV3 signal associated to the track
 - Must have LKr cluster associated to the track, and $E/p < 0.9$
 - No in-time activity in the LAV, IRC and SAC
 - Reconstructed kaon mass $m_{\pi\gamma\gamma}$ between 460 MeV and 520 MeV
 - Missing mass squared (pion hypothesis) $m_\text{miss}^2(\pi) < 0.015~\text{GeV}^2$

### Kmu3 selection
 - Control trigger
 - At least one track
 - Exactly one good track:
     - Must have MUV3 association
     - Vertex wrt. beam axis must be between 110 m and 180 m
     - CDA < 25 mm
     - Momentum between 5 GeV and 50 GeV
     - $q = +1$
     - Track $\chi^2 < 20$
     - In acceptance of NewCHOD, 4 spectrometer chambers, LKr and MUV3
 - No other track within 10 ns of the good track
 - Must have exactly two good LKr clusters:
     - Energy > 2 GeV
     - Within 6 ns of the track
     - Further than 150 mm from the track impact point on LKr
 - Neutral vertex within 10 m of charged vertex
 - No in-time activity in the LAV, IRC and SAC
 - Reconstructed total momentum $p_\text{tot}$ must be between 15 GeV and 70 GeV
 - Reconstructed transverse total momentum $p_{\text{tot},T}$ must be between 40 and 250 MeV
 - Missing mass squared (muon hypothesis) $m_\text{miss}^2(\mu) < 0.01~\text{GeV}^2$

### Ke3 selection
 - Control trigger
 - At least one track
 - Exactly one good track:
     - Must not have MUV3 association
     - Vertex wrt. beam axis must be between 110 m and 180 m
     - CDA < 25 mm
     - Momentum between 5 GeV and 50 GeV
     - $q = +1$
     - Track $\chi^2 < 20$
     - In acceptance of NewCHOD, 4 spectrometer chambers, LKr and MUV3
 - No other track within 10 ns of the good track
 - Must have exactly two good LKr clusters:
     - Energy > 2 GeV
     - Within 6 ns of the track
     - Further than 150 mm from the track impact point on LKr
 - Neutral vertex within 10 m of charged vertex
 - No in-time activity in the LAV, IRC and SAC
 - Reconstructed total momentum $p_\text{tot}$ must be between 15 GeV and 70 GeV
 - Reconstructed transverse total momentum $p_{\text{tot},T}$ must be between 40 and 250 MeV
 - Missing mass squared (electron hypothesis) $m_\text{miss}^2(e) < 0.01~\text{GeV}^2$



## Criterions categories
From the selections above we can roughly create some "categories" of criterion:
 - Trigger
 - Basic objects requirements
 - Object selection conditions
     - Timing
     - Track quality
     - Charge
     - Momentum
     - Vertex
     - Geometric acceptance
 - Additional requirements
 - Pile-up veto
 - Reconstructed quantities

This can be useful to roughly compare the selections and identify differences. The table below summarises the selection according to these categories

| Type | Sample | Trigger | Basic requirements | O. timing | O. quality | O. charge | O. momentum | O. vertex | O. Geo. Acc. | Add. Req. | Veto | Reco quantities |
| --:    | --:    | :--:    | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
| 3-track | K3pi   | Control | One 3-track vertex | $|t_\text{vtx} - t_\text{trigger}| < 6~\text{ns}$ | $d_{i,j} > 10~\text{cm}$ | +1 | N/A | $104~m < Z_\text{vtx} < 180~m$ & $\chi^2<25$ | NewCHOD & 4 Straw | N/A | N/A | $|p_\text{tot} - 75~\text{GeV}|<3~\text{GeV}$ & $490~\text{MeV} < m_{3\pi} < 497~\text{MeV}$ |
| 1-track | Kmu2   | Control | < 10 tracks | $|t_\text{track} - t_\text{trigger}| < 10~\text{ns}$ | $\chi^2 < 20$ & No other vertex | +1 | $5~\text{GeV} < p < 70~\text{GeV}$ | $120~m < Z_\text{vtx} < 180~m$ & $\text{CDA} < 40~mm$ | MUV3 & 4 Straw | Signal in 4 Straw and (CHOD or NewCHOD) & KTAG in-time & MUV3 in-time signal & $E/p < 0.2$ | LAV, IRC, SAC | $m_\text{miss}^2(\mu) < 0.02~\text{GeV}^2$ |
| 1-track & 2-clusters | K2pi   | Control | 1 track & 3 clusters | $|t_\text{track}-t_{\pi_0}| < 2~\text{ns}$; Cluster: $|t_\text{cluster} - t_\text{trigger}|<20~\text{ns}$ & $\Delta t_{i,j}<5~\text{ns}$ | Track: Not fake & $\Delta p_\text{raw,fit}<20~\text{GeV}$; Cluster: $\Delta d_\text{dead-cell} > 20~\text{mm}$ & Electromagnetic & Not associated to track & Isolated & $\Delta d_{i,j} > 200~\text{mm}$ | Track: +1 | Track: $5~\text{GeV} < p < 70~\text{GeV}$; Cluster: $E>1~\text{GeV}$ & $2~\text{GeV} < E_1 + E_2 < 75~\text{GeV}$ | $105~m < Z_\text{vtx} < 180~m$ & $\text{CDA} < 30~mm$ | NewCHOD & 4 Straw | Track: no MUV3 &  $E/p<0.9$ | LAV, IRC, SAC | $460~\text{MeV} < m_K < 520~\text{MeV}$ & $m_\text{miss}^2(\pi) < 0.015~\text{GeV}^2$ |
| 1-track & 2-clusters | Kmu3 | Control | N/A | $|t_\text{cluster} - t_\text{track}| < 6~\text{ns}$ | Track: $\chi^2 < 20$ & MUV3 association; Cluster: $\Delta d_\text{cls,track} > 150~\text{mm}$ | Track: +1 | Track: $5~\text{GeV} < p < 50~\text{GeV}$; Cluster: $E>2~\text{GeV}$ | Track: $110~m < Z_\text{ch. vtx} < 180~m$ & $\text{CDA} < 25$; Cluster: $|Z_\text{ch. vtx} - Z_\text{neut. vtx}|<10~m$ | Track: NewCHOD & 4 Straw & LKr & MUV3 | No track within 10 ns | LAV, IRC, SAC | $15~\text{GeV} < p_\text{tot} < 70~\text{GeV}$ & $40~\text{MeV} < p_{\text{tot},T} < 250~\text{MeV}$ & $m_\text{miss}^2(\mu) < 0.01~\text{GeV}^2$ |
| 1-track & 2-clusters | Ke3 | Control | N/A | $|t_\text{cluster} - t_\text{track}| < 6~\text{ns}$ | Track: $\chi^2 < 20$ & No MUV3 association; Cluster: $\Delta d_\text{cls,track} > 150~\text{mm}$ | Track: +1 | Track: $5~\text{GeV} < p < 50~\text{GeV}$; Cluster: $E>2~\text{GeV}$ | Track: $110~m < Z_\text{ch. vtx} < 180~m$ & $\text{CDA} < 25$; Cluster: $|Z_\text{ch. vtx} - Z_\text{neut. vtx}|<10~m$ | Track: NewCHOD & 4 Straw & LKr & MUV3 | No track within 10 ns | LAV, IRC, SAC | $15~\text{GeV} < p_\text{tot} < 70~\text{GeV}$ & $40~\text{MeV} < p_{\text{tot},T} < 250~\text{MeV}$ & $m_\text{miss}^2(e) < 0.01~\text{GeV}^2$ |

Even though all the selections seem to contain a similar set of conditions, the cut values are sometimes differing slightly. As we will eventually compare events type selected with different selections, it is best to harmonize the cut values. This will avoid introducing systematic uncertainties due to different behaviour and different agreement between MC and data in different ranges of the variables on which we introduce conditions.

The conditions where we can see obvious differences that can be harmonized are in the following list. When common criterion are selected, we have to use the most constraining values as we cannot "undo" what was already applied before presenting these data.
 - $Z \text{vtx}$: common range is between 120 m and 180 m
 - LKr distance between track and cluster: common limit of at least 150 mm
 - LKr distance between clusters: common limit of at least 200 mm
 - Cluster Energy: common limit of at least 2 GeV
 - Vertex CDA: common limit of less than 25 mm
 - Distance between charged and neutral vertices: common limit of less than 10 m

## Technical aside 
### Functional cuts
As you have seen, we are working in this project with pandas Dataframes. Among other things the dataframe allow us to easily implement cuts, which you have experienced already in the previous notebooks. The way we have done this so far is an operation done in multiple steps (example selecting a track momentum range):
 - Identify all events satisfying the lower range: `low_cond = df["track1_momentum_mag"]>15000`
 - Identify all events satisfying the upper range: `upper_cond = df["track1_momentum_mag"]<40000`
 - Combine both criterion: `cond = low_cond & upper_cond`
 - Select events satisfying both conditions: `df.loc[cond]`

The first three steps actually creates a boolean array with one entry for each event: True or False. Only the last step actually filter out the events we are not interested in. We often shortened this in a single line:
```python
df.loc[(df["track1_momentum_mag"]>15000) & (df["track1_momentum_mag"]<40000)]
df.loc[(df["track2_momentum_mag"]>15000) & (df["track2_momentum_mag"]<40000)]
df.loc[(df["track3_momentum_mag"]>15000) & (df["track3_momentum_mag"]<40000)]
```

However we find that these lines are a bit verbose with a lot of redundancy: `df` is present three times, `track1_momentum_mag` twice. It also lacks flexibility: if we want to apply this condition on all three tracks, we have to write the line three times, with two small changes. Also if we want to try a different value of pmin and pmax, we may have to change this in many places, risking to forget changing it in one place. This could be improved by writing functions and loops, but the code would loose readability. When you read a selection code, you do not care how you applied a momentum cut to three tracks, only that you did and which cut values were used.

Instead we can take advantage of one functionality provided by pandas: the argument of the `.loc()` function can be a `Callable` (i.e. a function or a lambda) taking a single argument (the dataframe) and returning a boolean array used for the filtering. We can therefore use it like this:
```python
def t1_momentum_cut_15_40(df):
    return (df["track1_momentum_mag"]>15000) & (df["track1_momentum_mag"]<40000)
def t2_momentum_cut_15_40(df):
    return (df["track2_momentum_mag"]>15000) & (df["track2_momentum_mag"]<40000)
def t3_momentum_cut_15_40(df):
    return (df["track3_momentum_mag"]>15000) & (df["track3_momentum_mag"]<40000)

df.loc[t1_momentum_cut_15_40]
df.loc[t2_momentum_cut_15_40]
df.loc[t3_momentum_cut_15_40]
```

This is so far not much of an improvement, except for the case where we use these conditions in multiple places. In that case we have to change only one function per track. We can another improvement by using another python propertie: we can write function that create functions!
```python
def make_momentum_cut(min_p, max_p, which_object = None):
    # We can do some pre-processing
    if which_object is None:
        which_object = ""
    else:
        which_object = f"{which_object}_"

    # We define the function from the example above
    def cut(df):
        # This function also works if we provide directly the series!
        if isinstance(df, pd.DataFrame):
            serie_cut = df[f"{which_object}momentum_mag"]
        else:
            serie_cut = df
        # We implement the case where we don't have a lower or an upper limit
        min_momentum_range = serie_cut > min_p if min_p else True
        max_momentum_range = serie_cut < max_p if max_p else True
        return min_momentum_range & max_momentum_range
    # Return the function we created
    return cut

from functools import partial
# Create a new function that binds the min/max parameters, but not which_object
momentum_cut_15_40 = partial(make_momentum_cut, 15000, 40000)

df.loc[momentum_cut_15_40("track1")]
df.loc[momentum_cut_15_40("track2")]
df.loc[momentum_cut_15_40("track3")]
```

True, the overal code is longer, but the on the other hand the interesting part (that last 4 lines) are now much easier to read (reading the line we understand immediately what condition we apply), less error prone (we have a single place to change if we want to modify the cut everywhere), and more flexible (with the same function we can apply to any track without redefinition, and we can easily define another cut in a single line). The `make_momentum_cut` function is implemented in na62.hlf taking even further advantage of those principles by leveraging a generic *min_max* cut function. It is now very short:
```python
def make_momentum_cut(min_p, max_p, which_object = None) -> Callable:
    return make_min_max_cut(min_p, max_p, which_value="momentum_mag", which_object=which_object)
```

Many "make_*_cut" condition functions have been defined in such a way in the na62.hlf module. Have a look at the documentation to see which are available.

### Data/MC histograms
Because we now have MC samples, what we will want to do is to compare our data with the simulation. This is done by plotting together the data and the MC on a single plot. There are however a few things to take into accout:
 - The data comprise the sum of all channels, therefore the MC contributions should be stacked (plotted on **top** of each other and not besides each other).
 - The MC events available for each channel do not represent the same number of kaons. Three parameters have to be taken into account to be able to normalize them relative to each other:
   - The "norm" parameters retrieved in the file. It is the total number of events that were generated (how many kaon decays). Using these we can take into account their relative size (in terms of generation).
   - The branching fraction of each sample.
   - The selection acceptance/efficiency of each sample. The selection cuts do not have the same effect on each sample (because their kinematic distributions are different). 
 - Once the MC samples are correctly normalized relative to each other, the total integrated MC sample must also be normalized to correspond to the number of data events that we have collected.

**Exercise**: Imagine that you simulate 1000 kmu2 events out of which 500 pass your selection, and 2000 K3pi events out of which 100 pass your selection. How many kaons does that corresponds too for each sample? What should be their relative weights? If we select a data sample corresponding to 10000 events (consisting only of kmu2 and k3pi - your selection is very pure), how many kmu2 and k3pi does that correspond to?

In [None]:
n_kmu2 = 1000
n_k3pi = 2000
n_kmu2_sel = 500
n_k3pi_sel = 100
ndata_sel = 10000

# First compute to number of kaons that correspond to those simulated events
n_kmu2_kaons = n_kmu2 / constants.kaon_br_map["kmu2"]
n_k3pi_kaons = n_k3pi / constants.kaon_br_map["kmu2"]

print(f"Number of Kaons for kmu2: {n_kmu2_kaons:.2f}")
print(f"Number of Kaons for kmu2: {n_k3pi_kaons:.2f}")
print(f"Relative weight: {n_k3pi_kaons/n_kmu2_kaons:.2f} (1 k3pi kaon is worth {n_k3pi_kaons/n_kmu2_kaons:.0f} kmu2 kaons)")

# Now compute the weights for each selected event
w_k3pi = (n_k3pi_sel/n_k3pi_kaons)
w_kmu2 = (n_kmu2_sel/n_kmu2_kaons)
print(f"Relative selected weight: {w_k3pi/w_kmu2:.2f} (1 selected k3pi kaon is worth {w_k3pi/w_kmu2:.0f} selected kmu2 kaons)")

mc_normalization = w_k3pi + w_kmu2

print(f"Number of Kmu2 in the data: {ndata_sel*w_kmu2/mc_normalization:.0f}")
print(f"Number of K3pi in the data: {ndata_sel*w_k3pi/mc_normalization:.0f}")

There are two ways to normalize your MC histograms to the data.
 1. Make sure that your MC histograms are correctly normalized relative to each other, then scale them all so that their integral is the same as the integral of the data. This method is simpler to apply early as it only requires a data histogram and MC histograms without additional ingredients. However this assumes that the events in all the histograms have been rigorously selected in the exact same way. Else the integral will be identical but some features may look very different.
 2. Implement a normalization selection (see later in the notebook) to determine a kaon flux (how many kaons entered the experiment) and scale everything relative to that number. This has the advantage that this number is valid whatever selection criterion are applied. This will allow us to better compare data and MC and clearly see features that are not conssistent between the two, without having it diluted through the whole histogram.
    
**Example**: To illustrate the two methods above, we are going to generate a dataset according to a complex distribution (representing data), and another dataset with a slightly different distribution (representing MC). We are then going to produce histograms according to both methods discussed above.

In [None]:
# First define our test distributions: 
#  1. two gaussian with completely different parameters on top of a uniform distribution. The amplitude of the second gaussian can be modified from parameter.
#  2. One simple gaussian
# These distribution will correspond to two "channel" featuring different behaviour.
from lmfit.models import GaussianModel, LinearModel
def test_distribution1(x, amp2=100):
    # Create the model
    distrib_model = GaussianModel(prefix="g1_") + GaussianModel(prefix="g2_") + LinearModel()

    # Set the parameters
    pars = distrib_model.make_params()
    pars["g1_center"].value = 2
    pars["g1_sigma"].value = 0.5
    pars["g1_amplitude"].value = 60
    pars["g2_center"].value = 5
    pars["g2_amplitude"].value = amp2
    pars["intercept"].value = 30
    pars["slope"].value = -3

    # Return the model evaluated on the provided x values
    return distrib_model.eval(x=x, params=pars)

def test_distribution2(x):
    # Create the model
    distrib_model = GaussianModel(prefix="g1_")

    # Set the parameters
    pars = distrib_model.make_params()
    pars["g1_center"].value = 2
    pars["g1_sigma"].value = 0.5
    pars["g1_amplitude"].value = 20

    # Return the model evaluated on the provided x values
    return distrib_model.eval(x=x, params=pars)
    
# We will then randomly pick a dataset corresponding to our distribution. We can do this 
# using the accept-reject method
def accept_reject(distrib, range, max, n_samples, plot_it=False):
    # Generate x and y uniformly distributed in a 2D box
    x = np.random.uniform(range[0], range[1], n_samples)
    y = np.random.uniform(0, max, n_samples)
    # Compute for each x the expected y value according to the input distribution
    y_test = distrib(x)

    # Accept only the values which are below the curve
    accept = np.where(y<=y_test)
    reject = np.where(y>y_test)

    # We can plot to illustrate
    if plot_it:
        plt.scatter(x[reject], y[reject], marker=".", label="Rejected")
        plt.scatter(x[accept], y[accept], marker=".", label="Accepted")
        plt.scatter(x, y_test, marker="x", label="Exact distribution")
        plt.legend(loc="lower left")
        plt.xlabel("x")
        plt.ylabel("y")
    return x[accept]

distrib_max = 100 # For accept-reject method
# We are going to generate data corresponding to a certain flux, and a number of MC that we fix (to have something equivalent to our physics problem with data and MC).
# We are also going to simulate the fact that each MC sample has a different branching ratio (0.75 and 0.25)
flux = 100000
n_simu = 200000
n_simu2 = 150000
br1 = 0.75
br2 = 0.25

x = np.arange(0,10,0.1)
ratio = 0.7528 # Don't worry about this number. It's a trick to keep the BR of the distribution1 the same after changing the amplitude of the second gaussian

# Generate the data
data = accept_reject(lambda x: test_distribution1(x) + test_distribution2(x), (0,10), distrib_max, flux, plot_it=True)
# Generate the MC twice using the first distribution: once with the same distribution as the data, once changing the amplitude of the second Gaussian 
mc_same = accept_reject(test_distribution1, (0,10), distrib_max*br1, n_simu)
plt.figure()
mc = accept_reject(partial(test_distribution1, amp2=50), (0,10), distrib_max*ratio, n_simu)
# And once for the second distribution
mc2 = accept_reject(test_distribution2, (0,10), distrib_max*br2, n_simu2)
# Then we do a third sample from the one with the same distribution as the data, but we remove all the events 
# with x<1 (this would correspond to an additional cut in a data selection)
mc_same_with_cut = mc_same[mc_same>1]
mc2_with_cut = mc2[mc2>1]

First please notice the nice illustration of the "Accept-Reject" sampling method, where we randomly generate points in a 2D space and reject all those that are above the theoretical curve that we defined.

Now below let's histogram our data. Three histograms according to the first histograming method

In [None]:
fig, ax = plt.subplots(1,3, figsize=(21,5))
weights = [0.75/n_simu, 0.25/n_simu2]
histo.hist_data(data, bins=100, range=(0,10), ax=ax[0])
histo.stack_mc_scale([mc_same, mc2], bins=100, range=(0,10), ndata=len(data), weights=weights, ax=ax[0])
ax[0].set_title("MC generated according to same distribution as data")
ax[0].set_xlabel("x")

histo.hist_data(data, bins=100, range=(0,10), ax=ax[1])
histo.stack_mc_scale([mc_same_with_cut, mc2_with_cut], bins=100, range=(0,10), ndata=len(data), weights=weights, ax=ax[1])
ax[1].set_title("MC generated according to same distribution as data\n, wihth cut x>1")
ax[1].set_xlabel("x")

histo.hist_data(data, bins=100, range=(0,10), ax=ax[2])
histo.stack_mc_scale([mc, mc2], bins=100, range=(0,10), ndata=len(data), weights=weights, ax=ax[2])
ax[2].set_title("MC generated according to distribution where\n the amplitude of the second sigma is different")
ax[2].set_xlabel("x")

On these plots, we can clearly see the disadvantave of this normalization technique:
 - Everything looks fine on the first plot, which has been carefully engineered to have MC samples corresponding to the data
 - On the second plot, we added an additional cut to the MC and as a result, the entirety of the MC histograms are shifted upwards.
 - The the last plot, the largest MC contribution is actually slightly different from the data (the second gaussian is a bit smaller) and as a result the whole plot looks bad, including the first Gaussian which is actually correct in MC.

On the other hand these plots have been normalized easily, with the only required input being the BR and the number of generated MC samples. Those values are generally easy to use as they are provided externally (you know the BR of your kaon decay channels, and you know how many events you simulated).

Now for the second normalization technique.

In [None]:
# The `stack_mc_flux` function uses internally the `constants.kaon_br_map` for the BR of each sample. So let's introduce fake channels with the BR that we used
# for the test distributions
constants.kaon_br_map["test1"] = 0.75
constants.kaon_br_map["test2"] = 0.25

fig, ax = plt.subplots(1,3, figsize=(21,5))
histo.hist_data(data, bins=100, range=(0,10), ax=ax[0])
histo.stack_mc_flux({"test1": mc_same, "test2": mc2}, {"test1": n_simu, "test2": n_simu2}, bins=100, range=(0,10), kaon_flux=flux, ax=ax[0])
ax[0].set_title("MC generated according to same distribution as data")
ax[0].set_xlabel("x")

histo.hist_data(data, bins=100, range=(0,10), ax=ax[1])
histo.stack_mc_flux({"test1": mc_same_with_cut, "test2": mc2_with_cut}, {"test1": n_simu, "test2": n_simu2}, bins=100, range=(0,10), kaon_flux=flux, ax=ax[1])
ax[1].set_title("MC generated according to same distribution as data\n, wihth cut x>1")
ax[1].set_xlabel("x")

histo.hist_data(data, bins=100, range=(0,10), ax=ax[2])
histo.stack_mc_flux({"test1": mc, "test2": mc2}, {"test1": n_simu, "test2": n_simu2}, bins=100, range=(0,10), kaon_flux=flux, ax=ax[2])
ax[2].set_title("MC generated according to distribution where\n the amplitude of the second sigma is different")
ax[2].set_xlabel("x")

All the plots produced using this second method look better:
 - The first plot is obviously OK as the MC model correspond to the data model
 - The second plot also look good. The results produced by this normalization method does not change if the cuts applied to data and MC are not exaclty the same. The parts where they are will look good (x>1)
 - Aside from the second Gaussian, which is different by design, the rest of the distribution is correctly normalized. In particular the first gaussian still looks OK, as it should because its parameters are the same as in data.

On the other hand we needed a bit more information to produce these plots: the number of simulated MC samples and the BR (as for the previous method), but also the kaon flux. For this example, this number was just given as input (because this is a made up example), but generally this number is unknown and depends on the data sample. A complex data seletcion must be performed to estimate it, including the estimation of the uncertainty on this number. 

Any serious data analysis will however have to ultimately use this second technique, while the first one can be useful to produce some quick, preliminary plots. The obtention of the kaon flux number will be the topic of one of the next section of this notebook.

## Selections

In [None]:
# Load the data, but also the MC simulations we have for each channel
# N.B. Contrary to what we were doing in previous notebooks, we are now
#      retrieving the second output value from import_root_files.
data, data_norm = prepare.import_root_files(["data/run12450.root"])
k2pi, k2pi_norm = prepare.import_root_files(["data/k2pi.root"])
k3pi, k3pi_norm = prepare.import_root_files(["data/k3pi.root"])
kmu2, kmu2_norm = prepare.import_root_files(["data/kmu2.root"])
kmu3, kmu3_norm = prepare.import_root_files(["data/kmu3.root"])
ke3, ke3_norm = prepare.import_root_files(["data/ke3.root"])

# And we create a dictionary with the normalization parameters that we retrieved
normalization_dict = {"k2pi": k2pi_norm, "k3pi": k3pi_norm, "kmu2": kmu2_norm, "kmu3": kmu3_norm, "ke3": ke3_norm}

In [None]:
# Compute the weights that we will need for plotting
total_data = len(data)
weights = histo.compute_samples_weights(normalization_dict)

### One track selection
As you can see, we have several channels (all except K3pi) which feature only one single track in the final state. It therefore makes sense to try to have a common selection allowing us to select those channels in a similar way. Later on this has the advantage that the comparison between samples from each channel will have similar systematic errors, because the selection criterion are as similar as possible.

*Note*: All examples in this section will use the first method of normalization (scaling to data) because we do not have a normalization selection yet.

In [None]:
# Let's first create a function that will plot the invariant mass for the data and MC passed, this will save us a lot of code later
from typing import Dict, Tuple, List, Callable
def plot_invariant_mass(data: pd.DataFrame, mc_dict: Dict[str, pd.DataFrame], weights_dict: Dict[str, float], 
                        mass_assignment: Dict) -> Tuple[int, int]:
    # Plot the histogram for data
    ndata = histo.hist_data(hlf.invariant_mass(data, mass_assignment), bins=400, range=(200,600))

    # Plot the histogram stack for the MC samples
    nmc = histo.stack_mc_scale([hlf.invariant_mass(mc_dict[mc_name], mass_assignment) for mc_name in mc_dict.keys()], 
        bins=400, range=(200, 600), weights=weights_dict, labels=mc_dict.keys(), ndata=ndata)

    # Some default display parameters
    plt.legend()
    plt.yscale("log")
    plt.ylim(bottom=0.8)

    # Return the number of data events plotted and a dictionary of number of MC events plotted by sample
    return ndata, nmc

# We can also create a function that will take a list of selection criterion and apply them to the data sample and all the MC samples at the same time
def select_all(data: pd.DataFrame, mc_dict: Dict[str, pd.DataFrame], selection_conditions: List[Callable]) -> Tuple[pd.DataFrame, Dict[str, pd.DataFrame]]:
    data_sel = hlf.select(data, selection_conditions)
    mc_dict_sel = {mc_name: hlf.select(mc_dict[mc_name], selection_conditions) for mc_name in mc_dict}
    return data_sel, mc_dict_sel

Let's start first by selecting all single-track with two-cluster events. From this we can plot the invariant mass. In the following we will work in the pion hypothesis (considering k2pi as our test case).

In [None]:
# Make the k2pi assumption
k2pi_mass_assignment = {"track1": constants.pion_charged_mass, "cluster1": constants.photon_mass, "cluster2": constants.photon_mass}
plot_invariant_mass_k2pi = partial(plot_invariant_mass, mass_assignment=k2pi_mass_assignment)

In [None]:
ndata, nmc = plot_invariant_mass_k2pi(data, {"k3pi": k3pi, "kmu2": kmu2, "k2pi": k2pi, "kmu3": kmu3, "ke3": ke3}, weights)
print(f"K2pi selection purity: {nmc['k2pi']/sum(nmc.values()):.2%}")

The Data/MC agreement does not look good. The reason is that we use an inconsistent mix of various pre-selections with various efficiencies depending on the sample. We did not even select events with 1 Track and 2 clusters, which means we have really included in the plot random events.

Furthermore as already mentioned, the pre-selections have a set of similar cuts, but not necessarily at the same values. We will therefore reconcile these cuts to a common value as described earlier. We define hereafter the list of cuts that can be applied to all (1-track & 2 cluster) selections.

In [None]:
# Select the correct topology
cond_1T2C = hlf.make_exists_cut(["track1", "cluster1", "cluster2"], [])

# Distance/position cuts
lkr_dtrack_cluster_cond = partial(hlf.make_lkr_distance_cut, 150, None)
lkr_dclusters_cond = hlf.make_lkr_distance_cut(200, None, "cluster1", "cluster2")
z_vertex_cond = hlf.make_z_vertex_cut(120000, 180000)
cda_cond = hlf.make_cda_cut(None, 25)
neutral_vtx_cond = hlf.make_charged_neutral_vertex_cut(None, 10000, "cluster1", "cluster2", constants.pion_neutral_mass)

# Energy/momentum cuts
cluster_energy_cond = partial(hlf.make_energy_cut, 2000, None)

# PID cuts
rich_e_cond = partial(hlf.make_rich_cut, constants.rich_hypothesis_map["e"])
rich_pi_cond = partial(hlf.make_rich_cut, constants.rich_hypothesis_map["pi"])
rich_mu_cond = partial(hlf.make_rich_cut, constants.rich_hypothesis_map["mu"])
lkr_e_cond = partial(hlf.make_eop_cut, 0.95, 1.05)
lkr_pi_cond = partial(hlf.make_eop_cut, None, 0.9)
lkr_mu_cond = partial(hlf.make_eop_cut, None, 0.2)
muv3_mu_cond = partial(hlf.make_muv3_cut, True, time_window=1.5)
muv3_not_mu_cond = partial(hlf.make_muv3_cut, False, time_window=2.5)

Then we restart the plot selecting at least the correct topology

In [None]:
data_sel, mc_sel = select_all(data, {"k3pi": k3pi, "kmu2": kmu2, "k2pi": k2pi, "kmu3": kmu3, "ke3": ke3}, [cond_1T2C])
ndata, nmc = plot_invariant_mass_k2pi(data_sel, mc_sel, weights)
print(f"K2pi selection purity: {nmc['k2pi']/sum(nmc.values()):.2%}")

This is already better, but we still have issues. Let's apply the remaining cuts to align the selections.

In [None]:
data_sel, mc_sel = select_all(data_sel, mc_sel, 
                             [z_vertex_cond, lkr_dtrack_cluster_cond("track1", "cluster1"), lkr_dtrack_cluster_cond("track1", "cluster2"),
                              lkr_dclusters_cond, cluster_energy_cond("cluster1"), cluster_energy_cond("cluster2"), 
                              cda_cond, neutral_vtx_cond])
ndata, nmc = plot_invariant_mass_k2pi(data_sel, mc_sel, weights)
print(f"K2pi selection purity: {nmc['k2pi']/sum(nmc.values()):.2%}")

These cuts have not changed the picture much. We can confirm this by looking at the acceptance of each cut (i.e. which additional fraction of events the cut will let pass).

In [None]:
mc_sel["k2pi"].attrs["acceptances"]

We can see that indeed each cut has less than 1% effect. The reason however is clear: these cuts have already been applied in the pre-selection, we are just applying fine tuning here (like cutting at cda<25 instead of cda<30).

There are still two major differences between the selections: the PID and the reconstructed kinematic cuts. At this point however we cannot remain generic and we have to actually select specific channels. We will embrace full on the k2pi selection and move to other selections later.

In [None]:
# Define K2pi reconstructed kinematic cuts
k2pi_mmiss2_cond = hlf.make_missing_mass_sqr_cut(min_mm2=None, max_mm2=0.015*1e6, mass_assignments=k2pi_mass_assignment) # In GeV
k2pi_inv_mass_cond = hlf.make_invariant_mass_cut(min_mass=460, max_mass=520, mass_assignments=k2pi_mass_assignment)

In [None]:
# Apply all PID techniques
data_sel_pid, mc_sel_pid = select_all(data_sel, mc_sel, [lkr_pi_cond(which_track="track1"), muv3_not_mu_cond(which_track="track1"), rich_pi_cond(which_track="track1")])
ndata, nmc = plot_invariant_mass_k2pi(data_sel_pid, mc_sel_pid, weights)
plt.ylim(bottom=1e-2)
print(f"K2pi selection pollution (1-purity): {1-nmc['k2pi']/sum(nmc.values()):.2e}")
nmc

Much better, but we can still see a total of ~1.9 estimated background events mostly due to ke3 (for a pollution at the order of $10^{-6}$ which is already very good). We can further improve by adding the final kinematic cuts.

In [None]:
data_sel_pid, mc_sel_pid = select_all(data_sel_pid, mc_sel_pid, [k2pi_mmiss2_cond, k2pi_inv_mass_cond])
ndata, nmc = plot_invariant_mass_k2pi(data_sel_pid, mc_sel_pid, weights)
plt.ylim(bottom=1e-2)
print(f"K2pi selection pollution (1-purity): {1-nmc['k2pi']/sum(nmc.values()):.2e}")
nmc

This last operation gained us another order of magnitude in purity. But what about the acceptance?

In [None]:
display(mc_sel_pid["k2pi"].attrs["acceptances"])
print(f"K2pi selection acceptance: {len(mc_sel_pid['k2pi'])/normalization_dict['k2pi']:.2%}")

We see here that the total selection acceptance is 9.08%, which means that out of all K2pi events simulated, only 9.08% are making it to the end of the selection. This seems not a lot but that is in fact the typical perforformances of a selection in NA62 if we want a pure sample. The acceptance of each individual cut that we have applied is high, and as already mentioned the reason is that most of them have already been applied in the pre-selection and their full performance is actually included in the original 15.7% acceptance of the pre-selection. 

What stands out is that two cuts in particular have a large impact: 
 - The cut on the z vertex. In fact moving from 105 m to 120 m reduces the allowed decay volume by 20%. This translates to a 14% loss in acceptance (because of the kinematics and geometrical acceptance, the vertex Z distribution is not completely uniform - see next plot). This is a large loss, but unfortuntely a necessary one if we want to keep this selection as close as possible to the other 1-track & 2 clusters selections.
 - The RICH cut on the PID induces an enormous acceptance loss of 31.5% (relative). Let's see if if it is really necessary or if E/P and MUV3 is enough?

In [None]:
# Plot the Z vertex distribution of k2pi to illustrate the point made in the previous paragraph.
k2pi["vtx_z"].hist(bins=80, range=(100000, 180000))
plt.title("K2pi sample vertex Z distribution")
plt.xlabel("$Z_\mathrm{vtx}$ [mm]")

In [None]:
# Apply PID techniques but not RICH, directly apply also the reconstructed kinematics cuts
data_sel_pid, mc_sel_pid = select_all(data_sel, mc_sel, [lkr_pi_cond(which_track="track1"), muv3_not_mu_cond(which_track="track1"), k2pi_mmiss2_cond, k2pi_inv_mass_cond])
ndata, nmc = plot_invariant_mass_k2pi(data_sel_pid, mc_sel_pid, weights)
plt.ylim(bottom=1e-2)
print(f"K2pi selection pollution (1-purity): {1-nmc['k2pi']/sum(nmc.values()):.2e}")
nmc

In [None]:
display(mc_sel_pid["k2pi"].attrs["acceptances"])
print(f"K2pi selection acceptance: {len(mc_sel_pid['k2pi'])/normalization_dict['k2pi']:.2%}")

The selection acceptance has increased by 4% absolute. But on the other hand the purity has decreased by two orders of magnitude! Depending on our objective (purity vs. acceptance) we can decide to use this extra PID condition or not.

### Normalization selection

Let's assume in the following that we are interested in a pure selection, because we do not actually want to measure K2pi but only use it as normalization. In NA62 this process is necessary as we do not have an absolute measurement of the kaon flux (i.e. how many kaons enter our experiment). Instead we must use a well known normalization channel (such as K2pi), measure a flux based on a normalization selection (a pure selection for K2pi) and measure our signal channel relative to the normalization channel. We can then move from a relative value to an absolute value by using the K2pi BR as an external input (with its own uncertainties, which is why we want to use a channel with existing precision measurements).

As an example, the process for measuring the Ke3 branching ratio using K2pi as normalization is the following:
 - Perform the K2pi and Ke3 selections, estimate the K2pi selection acceptance $A_\text{k2pi}$ (on K2pi events) and Ke3 selection acceptance $A_\text{ke3}$ (on Ke3 events).
 - We know that in general the number of selected events is $N_\text{sel} = N_K *\text{BR} * A * \varepsilon_\text{trigg}$ where $N_K$ is the kaon flux, BR is the branching fraction of the channel, $A$ is the selection acceptance, and $\varepsilon_\text{trigg}$ is the trigger efficiency.
 - Applying this to the K2pi normalization we have: $$N_K = \frac{N_\text{k2pi,sel}}{A_\text{k2pi}\cdot\text{BR(k2pi)}\cdot\varepsilon_\text{trigg,k2pi}}$$
 - We have the equivalent equation for Ke3, into which we can substitute $N_K$ estimated from K2pi: $$N_\text{ke3,sel} = N_K\cdot A_\text{ke3}\cdot\text{BR(ke3)}\cdot\varepsilon_{trigg,ke3}$$
$$\Rightarrow\text{BR(ke3)} = \frac{N_\text{ke3,sel}}{N_K\cdot A_\text{ke3}\cdot\varepsilon_{trigg,ke3}}$$
$$\Rightarrow\text{BR(ke3)} = \frac{N_\text{ke3,sel}}{N_\text{k2pi,sel}}\cdot\frac{A_\text{k2pi}}{A_\text{ke3}}\cdot\frac{\varepsilon_\text{trigg,k2pi}}{\varepsilon_{trigg,ke3}}\cdot\text{BR(k2pi)}$$
This last expression gives us our final measurement as a function of the external parameter BR(k2pi). As you can see we also take ratios of equivalent quantities between Ke3 and K2pi. This means that in the assumption that this quantities are obtained and apply in a similar way for both channels (selections are closely related, trigger lines are the same and applying on similar conditions, data are acquired at the same time, ...) then they will have similar systematic uncertainties and they will cancel out in the ratio at first order.

In the following we will also assume that the trigger efficiencies are 100% (real value is very high anyways, well above 90%). Please note also that the Kaon flux computed is an **effective** flux, not the absolute one. It depends on the trigger line, on the trigger downscaling, on some geometry parameters, ...

In [None]:
# Let's define our complete K2pi selection
common_1track_2cluster_selection = [cond_1T2C, z_vertex_cond, 
                                    lkr_dtrack_cluster_cond("track1", "cluster1"), lkr_dtrack_cluster_cond("track1", "cluster2"), lkr_dclusters_cond, 
                                    cluster_energy_cond("cluster1"), cluster_energy_cond("cluster2"), 
                                    cda_cond, neutral_vtx_cond]

k2pi_selection = [lkr_pi_cond(which_track="track1"), muv3_not_mu_cond(which_track="track1"), rich_pi_cond(which_track="track1"), 
                  k2pi_mmiss2_cond, k2pi_inv_mass_cond]]

Having defined our K2pi selection for normalization, we can compute the flux $N_\text{k2pi,sel}$

In [None]:
# Select first common 1-track & 2-clusters sample so we can use it multiple times without re-selecting it
data_1T2C, mc_1T2C = select_all(data, {"k3pi": k3pi, "kmu2": kmu2, "k2pi": k2pi, "kmu3": kmu3, "ke3": ke3}, common_1track_2cluster_selection)

In [None]:
k2pi_mc_sel = hlf.select(mc_1T2C["k2pi"], k2pi_selection)
data_k2pi_sel = hlf.select(data_1T2C, k2pi_selection)
k2pi_acc = len(k2pi_mc_sel)/normalization_dict["k2pi"]
k2pi_N_K = len(data_k2pi_sel)/(k2pi_acc * constants.kaon_br_map["k2pi"] * 1.0)

print(f"K2pi selection acceptance: {k2pi_acc:.2%}")
print(f"Number of K2pi candidates in data: {len(data_k2pi_sel)} candidates")
print(f"Kaon flux: {k2pi_N_K:.2e} kaons")