# Data manipulation

This notebook will give some explanations about the NA62 data that are made available for you and explain how to read and use them to perform later some basic data analysis. Low level functionalities to help you use pandas dataframes are provided and we will guide you through providing higher level functionalies based on particle physics (manipulation of 3-and-4-momenta, invariant masses, ...)

Let's start by loading a little bit of data and look at the variables that are provided.

In [2]:
# Import useful packages
import pandas as pd
import numpy as np
from na62 import prepare
from typing import List, Union

In [None]:
# Then load some data (just 10 events)
data = prepare.import_root_files(["run12450.root.10631"], total_limit=10)
data

We can see here 10 events, which contain many variables. Let's look more in details at the variables
## Information about the data structure

In [None]:
# This prints information about the dataframe structure
data.info()

The command above give us the full list of variables with their respective data type. 

Vectors are important variables. The flat datastructure cannot contain a vector object, so it is spread in the structure accross the four variables `[name]_direction{x,y,z}, [name]_momentum_mag` containing respectively the direction and the magnitude of the momentum (the direction vector is a unit vector i.e. with magnitude 1). `[name]` indicates the objects to which the momentum refers.
As can be seen above, the data structure contains information about:
 - The event: run number, burst number, event_time. Those three values allow us to uniquely identify an event in the NA62 data. No two events will share the same triplet of values.
 - The beam: momentum, position at z=102.4m
 - Placeholder for three tracks (track1, track2, track3): for each of the track, a variable indicate if the track exists for the event. If the track exists, the information is filled about its momentum, time and charge, whether the track has associated MUV3 signal, RICH information (hypothesis, ring radius, position and number of hits), and the associated LKr energy. For convenience the EoP (energy over momentum) is already calculated and stored.
 - Similar placeholder for up to two clusters (cluster1, cluster2). If present, those clusters of energy on the LKr are not associated to one of the tracks, and the variables giving information about its energy, position on LKr front plane (na62.constants.lkr_position), and time are filled.

In addition each event was pre-identified and the variable `event_type` indicate which kaon decay channel was detected. Similarly for the `[name]_rich_hypothesis` variables indicating which is the most likely particle that was measured as a track.

These are the basic information that are reconstructed from (part) of the NA62 detector, and that will be enough for the kind of analysis that we want to do here. However we need to be able to combine these informations according to mathematical and physical principles that you know. The dataframe does not provide such facility so you have to develop that yourself as an exercises.

# Exercises
Each exercise will ask you to implement some function to manipulate the data. The input/output of the functions are well determined. You will be able to pass each function to a test suite that will let you know if the implementation is correct.
The test suite is available through the tests module of the na62 package.

# Three-vector operations
The first functions that will needed and that are not provided by the dataframe are vector operations (sum, product, magnitude). We ask you to fill the functions below to provide these functionalities.

You can assume that the dataframes passed as arguments (as a list or single dataframe, depending on the function) contain the following variables to be used: `direction_{x,y,z}`, `momentum_mag`

In [16]:
def three_vector_sum(vectors: List[pd.DataFrame]) -> pd.DataFrame:
    # Check that there are any vectors to sum
    if len(vectors) == 0:
        return pd.Series()

    # [FILL HERE]
    # The code below should perform the sum of all three vectors and return 
    # a new dataframe containing the summed vector using the same format 
    # as the input (the variables "direction_x", "direction_y", "direction_z", "momentum_mag"
    # Make sure that the direction vector is a unit vector

    return # [SOMETHING]


def three_vector_mag(vector: pd.DataFrame) -> pd.Series:
    # [FILL HERE]
    # Return the magnitue of the three vector

    return # [SOMETHING]

In [None]:
# Perform the tests
from na62.tests.test_vectors import Test_ThreeVector
Test_ThreeVector().run_tests(three_vector_sum, three_vector_mag)