# Getting Started

The ATHENA collaboration is providing full simulation data files in the ROOT data format through the Jefferson Lab XRootD service. This allows analysis without the need to download any data files.

In this notebook we show how to load a file from the XRootD service using the [uproot](https://pypi.org/project/uproot/) python library. This allows for seemless interfacing with many data science and machine learning tools.

## Importing uproot

Depending on the versions of uproot and XRootD that you have installed, you may encouter a warning from uproot below. Nevertheless, because of the simple data format of the ATHENA ROOT files, we are able to ignore this warning.

In [None]:
import uproot as ur
print('Uproot version: ' + ur.__version__)

## Opening a file with uproot

To test uproot, we will open a sample file (a single-particle simulation of interest to those who wish to study detector performance):

In [None]:
server = 'root://sci-xrootd.jlab.org//osgpool/eic/'
file = 'ATHENA/RECO/master/SINGLE/pi+/1GeV/3to50deg/pi+_1GeV_3to50deg.0001.root'

In [None]:
events = ur.open(server + file + ':events', library = 'np')

## Exploring the file contents

We can now look into the file, including all its branches. Let's take a look at the possible 'keys':

In [None]:
events.keys()

That is a lot of branches!

Maybe we are only interested in a few branches. Let's look at the branch with particles reconstructed by the track reconstruction algorithms:

In [None]:
events.keys('ReconstructedParticles.*')

## Making a simple plot

Of course, we came here to create plots, not just look at branches. Uproot can give us the data from branches in `numpy` arrays. From there, we can use `matplotlib` to create a histogram. Let's do this with the momentum.

In [None]:
reconstructed_particles = events['ReconstructedParticles'].arrays()

If you are running this on a Jupyter instance that displays the memory use (for example on binder), then you will see that the previous step corresponds to an increase in memory use. This will be important to keep in mind. Since you are accessing files that are (in some cases) several GBs large, you will likely want to avoid reading all arrays from an entire file in memory, even on regular servers.

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.hist(reconstructed_particles['ReconstructedParticles.momentum'])
plt.show()