# Learn how to use hax

Aalbers and Tunnell, Feburary 2016

This tutorial describes how to use the basic functionality of the `hax` analysis library from python.

# Table of Contents
 <p><div class="lev1"><a href="#Learn-how-to-use-hax"><span class="toc-item-num">1 - </span>Learn how to use hax</a></div><div class="lev2"><a href="#Introduction"><span class="toc-item-num">1.1 - </span>Introduction</a></div><div class="lev2"><a href="#Boilerplate-startup"><span class="toc-item-num">1.2 - </span>Boilerplate startup</a></div><div class="lev1"><a href="#Using-mini-trees"><span class="toc-item-num">2 - </span>Using mini-trees</a></div><div class="lev2"><a href="#Using-standard-data"><span class="toc-item-num">2.1 - </span>Using standard data</a></div><div class="lev2"><a href="#Selecting-your-own-variables"><span class="toc-item-num">2.2 - </span>Selecting your own variables</a></div><div class="lev1"><a href="#Looping-over-ROOT-files"><span class="toc-item-num">3 - </span>Looping over ROOT files</a></div><div class="lev1"><a href="#Selecting-datasets"><span class="toc-item-num">4 - </span>Selecting datasets</a></div>

## Introduction

The default pax output format is a ROOT file containing an event class, which is fully documented [here](http://xenon1t.github.io/pax/format.html). While you could analyze this with TTree.Draw (in python or C++), this has several disadvantages:
  * You never get access to the actual values of the data;
  * It is difficult to compute things that are not directly in the tree;
  * You need to re-loop over the data every time you want to make a new plot.

You can get much more flexibility by looping over the events in the ROOT file(s) you want to analyze (we will cover this in section 3 below). However, looping over all the ROOT events every time you want to adjust a plot or cut is still very inconvenient.

To extract data, hax lets you make **mini-trees**, small, flat ROOT files which contain just the data you need for every event. Since a mini-tree is such a simple (tabular) structure, for python analyses you can read them into a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/10min.html), one of the most-used objects in data science. Since it's the fastest way to the data, we'll cover mini-trees in section 2.

## Boilerplate startup

A jupyter notebook is a complete application, so you always need some startup code, which is usually the same across notebooks. In future we will hide this (since it isn't very exciting), but in case you are interested we go through it here.

First, we importing the math library [numpy](https://en.wikipedia.org/wiki/NumPy)
and the plotting library [matplotlib](https://en.wikipedia.org/wiki/Matplotlib):

In [1]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

The command below makes matplotlib plots actually appear in the notebook. By default they are meant to pop-up in a new window, but besides being very inconvenient, this doesn't work if you are accessing a notebook server remotely.

In [2]:
%matplotlib inline 

Next, we set up a few more sensible defaults for matplotlib. In the future we can expand this to our very own XENON plot style...

In [3]:
matplotlib.rc('font', size=16)                   # Use big fonts...
plt.rcParams['figure.figsize'] = (12.0, 10.0)    # ... and big plots

Now it is time to import `hax`. This may give a "ShimWarning" which you can ignore (we're trying to get rid of this):

In [4]:
import hax



In [5]:
hax.config.CONFIG['main_data_paths'].append('/cfs/klemming/projects/xenon/common/PaxReprocessed_9/good')
hax.runs.update_datasets()

Finally, you may want to set some special the hax configuration options. For example, use:

    hax.config.CONFIG['main_data_paths'].append(['/path/to/my/secret/data'])
    hax.runs.update_datasets()
    
to add `/path/to/my/secret/data` to the paths hax searches for datasets.

# Using mini-trees

## Using standard data

For basic analyses, you need only some very basic data (s1, s2, positions, etc). You can load this in as follows:

In [8]:
#dataset = 'xe100_120402_2000'
dataset = 'xe100_111002_2248'
data = hax.minitrees.load(dataset)

                                                       

Created minitree Basics for dataset xe100_111002_2248




You have now loaded in a dataframe with info from the 'Basics' minitree. The **variables and units are documented [here](http://hax.readthedocs.org/en/latest/hax.treemakers.html)**; we can get an overview like so: