<a href="https://colab.research.google.com/github/mjsully/ATLAS-Open-Data-notebook/blob/master/ATLAS_Open_Data_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Simple ATLAS Open Data analysis, comparing the production of a top-anti-top pair from either gluon-gluon fusion, or from the decay of a BSM Z'.

#### The following code segments set up the rest of the code.

In [0]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import io

In [0]:
def applyselections(df):
  
  # These are your selections. They should be written in the form: "(df.YOUR-VARIABLE OPERATOR < VALUE)", where the operator is >, >=, ==, <= or <.
  temp_df = df[ (df.jet_n > 3) & ((df.trigE == 1) | (df.trigM == 1)) & (df.lep_n == 1) & (df.lep_trigMatched == 1)]
  return temp_df

In [0]:
def makeplots(plot_variables, plot_bins, zdf, ttdf):
  
  for index, variable in enumerate(plot_variables):
    
    plt.figure(index)
    plt.hist(zdf[variable],color='r',alpha=0.3,label='Z',log=True,bins = plot_bins[index])
    plt.hist(ttdf[variable],color='b',alpha=0.3,label='SM tt',log=True,bins = plot_bins[index])
    plt.legend(loc='best')
    plt.xlabel(variable)
    plt.ylabel('Events')

In [0]:
def calc_eff_pur(nsig, nsig_init, nbkg, nbkg_init):
  
  print('------')
  purity = nsig / (nsig + nbkg)
  efficiency = nsig / nsig_init
  print('Signal events: {}'.format( nsig ))
  print('Background events: {}'.format( nbkg ))
  print('Purity: {}'.format( round(purity, 2) ))
  print('Efficiency: {}'.format( round( efficiency, 2) ))
  print('------')      

#### We first must download the two files from Liverpool.

In [0]:
!wget http://hep.ph.liv.ac.uk/~msullivan/mc_110902.ZPrime750.csv
!wget http://hep.ph.liv.ac.uk/~msullivan/mc_117050.ttbar_lep.csv

#### The following four lines load the Z' and SM tt sample into Pandas dataframes, and evaluate the number of events before selections.

In [0]:
zprime_df = pd.read_csv('mc_110902.ZPrime750.csv')
ttbar_df  = pd.read_csv('mc_117050.ttbar_lep.csv')
nsig_init = len(zprime_df.index)
nbkg_init = len(ttbar_df.index)

#### The following lines will apply selections to your dataframes and evaluate the number of events passing your selections.

In [0]:
zprime_df_selected  = applyselections(zprime_df)
ttbar_df_selected = applyselections(ttbar_df)
nsig = len(zprime_df_selected.index)
nbkg = len(ttbar_df_selected.index)

#### Finally, we draw the variable plots and calculate the efficiency and purity!

In [0]:
plot_variables = ['met_et', 'jet_pt', 'lep_pt']
plot_bins      = [10, 10, 10]
makeplots(plot_variables, plot_bins, zprime_df_selected, ttbar_df_selected)
calc_eff_pur(nsig, nsig_init, nbkg, nbkg_init)