This notebook describes how to use the provided tools to interface with the data. It goes over the process of installing the tools, retrieving the data, and opening the data within a notebook.
<br>
<br>

# Importing the tools and data#
[Video Tutorial on how to import tools into Jupyter](https://www.youtube.com/watch?v=dWzWwhLmJgw)
<br>
<br>
If you have worked with python notebooks before, you are probably familiar with the more basic included libraries. While you may not use all of their functionality for every activity, they are a very useful option to have available. We can bring them in with the standard `import` command.



In [None]:
import numpy as np
import math as math
import matplotlib.pylab as plt
%matplotlib notebook

Now we need the particle physics specific libraries. If you installed the libraries from the command shell properly as shown in the [local setup tutorial](https://www.youtube.com/watch?v=oQXFZU9RuCY), then most of the work needed to use these should already be done. From here it should be as simple as using the `import` command like for the included libraries.
<br>
<br>
You will also need to bring in the file downloading tool from the `file_download_tools.py` file included in this directory. The function we need will allow you to grab files from the web and download them for use in these notebooks.

In [None]:
import h5hep 
import pps_tools as hep

from file_download_tools import download_file

With all of the tools imported, we now need our data files. While there are a few small test files included in the `playground/data` directory already, most of the files you will need for the activities are located on [this webpage](http://www.sos.siena.edu/~mbellis/ppp_data). To download these files into the `data` folder, we use the `download_file` function we imported. Here, we use a file from the CMS top quark analysis as an example.

In [None]:
url = 'http://www.sos.siena.edu/~mbellis/ppp_data/mc_ww.hdf5'
download_file(url)

You'll notice that the files we are using are in the `.hdf5` format; this filetype is often used in particle physics for compressing large data files. To unpack these files into a format Python will understand, however, we will use the `h5hep` tools:

In [None]:
data,event = h5hep.load('../data/mc_ww.hdf5')

Now you should have all the tools you need to start interfacing with and analyzing the data. 
<br>
<br>
# Interfacing with the data#
[Video overview on data interfacing](https://www.youtube.com/watch?v=tI2foOcuRVM)
<br>
## Reading data and navigating lists##
The ```h5hep``` tools we used to unpack the data files puts the data all into lists that are accessible with dictionaries. You can view all the of the dictionary entries the data is tagged with using the following command (this is entirely optional, and simply gives you an idea of what kind of data the file may have contained).

In [None]:
# Print the keys to see what is in the dictionary   OPTIONAL
for key in event.keys():
    print(key)

To organize the data in a way that makes it easy to find what you need, you will need to use the ```hep``` tools we imported. This can be done in several ways. 
<br>
### Simple way###
The first way uses less commands, but gives you less control over how much data you are using. This command organizes ALL of the data, so if you are using a large data file, this can take a long time. However, it is usually the best choice if you want to use all the data. 
<br>
<br>
NOTE: This command will need to change depending on which experiment your activity is aligned with. For instance, the top quark activity uses CMS tools, so for the ```experiment``` argument in ```get_collisions```, we put ```'CMS'```. The other possible arguments would be ```'CLEO'``` and ```'BaBar'```.

In [None]:
infile = '../data/mc_ww.hdf5'

collisions = hep.get_collisions(infile,experiment='CMS',verbose=False)
print(len(collisions), " collisions")  # This line is optional, and simply tells you how many events are in the file.

This returns a list called ```collisions``` which has all of the collision events as entries. Each event is in turn its own list whose entries are the different types of particles involved in that collision. These are also lists, containing each individual particle of that particular type as entries, which are also lists of the four-momentum and other characteristics of each particle. 
<br>
This can be a bit complicated until you learn to work with it, so we'll try a visualization as an example:

In [None]:
second_collision = collisions[1]   # the first event 
print("Second event: ",second_collision)
all_photojetns = second_collision['jets']    # all of the jets in the first event
print("All jets: ",all_jets)
first_jet = all_jets[0]    # the first jet in the first event
print("First jet: ",first_jet)   
jet_energy = first_jet['e']      # the energy of the first photon
print("First jet's energy: ",jet_energy)

You might notice that each individual event is callable from all collisions by is entry number, as are the individual particles from within their lists of particle types. However, the particle types themselves are only callable from the event list by their names. The characteristics of each particle are also only callable from their lists by the name of the characteristic. The exact dictionary entry needed to call them can be referenced by printing ```event.keys``` as above.

Because ```get_collisions``` puts ALL of the data in a list, to do you analysis, you can simply call everything you need from this one list. For instance, if we wanted to find the energies of all the jets in the entire list of collisions, we could do so using loops:

In [None]:
energies = []

for collision in collisions:          # loops over all the events in the file
    jets = collision['jets']      # gets the list of all photons in the event
  
    for jet in jets:           # loops over each photon in the current event
        e = jet['e']                # gets the energy of the photon
    
        energies.append(e)             # puts the energy in a list

### More involved way###
Alternatively, you can use the following series of commands to organize the data. It is a little more involved, but gives you more control over the data. For instance, it gives you the ability to only use some of the events rather than all of them, which also would decrease some of the computing time. 



In [None]:
infile = '../data/mc_ww.hdf5' 

alldata = hep.get_all_data(infile,verbose=False)
nentries = hep.get_number_of_entries(alldata)

print("# entries: ",nentries)   # This optional line tells you how many events are in the file


The above commands do not actually make the data directly usable, we need one more step for that, which is the ```get_collision``` function. This function is different from the ```get_collisions``` function used in the simpler method in that it only pulls out the information of a single event rather than all of them. This means that to get information from multiple events, you will need to use this command in a loop, for which you can define a range that determines what events you actualy want to use.
<br>
<br>
NOTE: Depending on which activity you are doing, you will have to change the ```experiment``` argument to ```'CMS'```, ```'CLEO'```, or ```'BaBar'```. You will also need to change the ```entry_number``` argument to be the same variable you call in the loop.



In [None]:
for entry in range(nentries):      # This range will loop over ALL of the events
    collision = hep.get_collision(alldata,entry_number=entry,experiment='CMS')

for entry in range(0,int(nentries/2)):     # This range will loop over the first half of the events
    collision = hep.get_collision(alldata,entry_number=entry,experiment='CMS')
    
for entry in range(int(nentries/2),nentries):      # This range will loop over the second half of the events
    collision = hep.get_collision(alldata,entry_number=entry,experiment='CMS')
    

Other than that `get_collision` only gets the information from one event rather than all of them, it essentially organizes the information in the same way that `get_collisions` does. You can interact with this data the same way you would for any individual event from the big list of events that `get_collisions` would give you. 

For instance, to find the energies of all jets in the events we were looking at like we did for the simpler method, it would look very similar. However, you will notice that because we already have to loop over each event to use the `get_collision` function, we can simply nest the rest of our code within this loop.

In [None]:
energies = []

for event in range(0,int(nentries/3)):        # Loops over first 3rd of all events
    
    collision = hep.get_collision(alldata,entry_number=event,experiment='CMS')    # organizes the data so you can interface with it
    jets = collision['jets']         # gets the list of all photons in the current event
  
    for jet in jets:                 # loops over all photons in the event
        e = jet['e']                      # gets the energy of the photon
    
        energies.append(e)                   # adds the energy to a list
    
  