# 1. Getting Started

The EIC Detector 1 collaboration will provide full simulation data files in the ROOT data format based on the EDM4hep data model through the Jefferson Lab XRootD service. This allows analysis without the need to download any data files. For this tutorial, we use a very similar dataset from the ATHENA proto-collaboration, as the software toolkit for the EIC Detector 1 is currently being defined .

In this notebook we show how to load a file from the XRootD service using the [uproot](https://pypi.org/project/uproot/) python library. This allows for seemless interfacing with many data science and machine learning tools.

## Importing uproot

Depending on the versions of uproot and XRootD that you have installed, you may encouter a warning from uproot below. Nevertheless, because of the simple data format of the ROOT files, we are able to ignore this warning.

In [None]:
import uproot as ur
print('Uproot version: ' + ur.__version__)

## Opening a file with uproot

To test uproot, we will open a sample file (a single-particle simulation of interest to those who wish to study detector performance):

In [None]:
server = 'root://sci-xrootd.jlab.org//osgpool/eic/'
file = 'ATHENA/RECO/master/SINGLE/pi+/1GeV/3to50deg/pi+_1GeV_3to50deg.0001.root'

In [None]:
events = ur.open(server + file + ':events', library = 'np')

## Exploring the file contents

We can now look into the file, including all its branches. Let's take a look at the possible 'keys':

In [None]:
events.keys()

That is a lot of branches!

Maybe we are only interested in a few branches. Let's look at the branch with particles reconstructed by the track reconstruction algorithms:

In [None]:
events.keys('ReconstructedParticles.*')

## Making a simple plot

Of course, we came here to create plots, not just look at branches. Uproot can give us the data from branches in `numpy` arrays. From there, we can use `matplotlib` to create a histogram. Let's do this with the momentum.

In [None]:
reconstructed_particles = events['ReconstructedParticles'].arrays()

If you are running this on a Jupyter instance that displays the memory use (for example on binder), then you will see that the previous step corresponds to an increase in memory use. This will be important to keep in mind. Since you are accessing files that are (in some cases) several GBs large, you will likely want to avoid reading all arrays from an entire file in memory, even on regular servers.

We can of course have multiple detected particles per events, even for a single-particle final state:
* 0 particles $\rightarrow$ we did not detect this particle
* 1 particle $\rightarrow$ most likely the particle we generated
* 2+ particles $\rightarrow$ this might be due to our particle interacting with the detector material!

In order to easily make a histogram of all particles in each event, we need to "flatten" the array: we go from 2D (array of all N particles per event), to a 1D array of all particles. This is accomplished through the ```flatten``` function from the ```awkward``` package.

In [None]:
import awkward as ak
recmom = ak.flatten(reconstructed_particles['ReconstructedParticles.momentum'])

In [None]:
import matplotlib.pyplot as plt
plt.hist(recmom, bins=100)
plt.show()