## CopperHead V2 tutorial

This framework builds upon columnar analysis platform coffea 202x python package, using awkward arrays and dask distributed for parallelization.

First we setup our config by specifying the era/year we will be doing our analysis work on.

# Pre-stage
We run ```make_parameters.py``` while specifying the year with ```--year``` flag. The output of the python script will be saved in ```./config/parameters.json```

In [None]:
! python make_parameters.py --year 2018

Now we prepare the list of samples that we will be performing our analysis on. This can be done by executing ```run_prestage.py``` script, specifying the chunksize by using ```--chunksize``` flag and listing the samples we would like to perform our analysis on with ```--input_string``` flag.

The chunksize value is simple: it is an integer value of "chunks" of rows of data that each worker works on during parallelized workflow. 

The input string value for ```--input_string``` needs a bit more explaining. The general format is:

"Year_{2016pre or 2016post or 2017 or 2018}/DataRun_{A,B,C,D,E,F,G,H)}/Bkg_{DY,TT}/Sig_{ggH, VBF}"

Where we define the era after "Year_", MC Background after "Bkg_" and MC Signal after "Sig_". Ie: Year_2018/DataRun_A,C/Bkg_TT/Sig_ , which states data_A, data_C and ttbar MC Background for year 2018, with no signal samples. This operation would take about a minute.


In [None]:
! python run_prestage.py --chunksize 200000 --input_string "Year_2018/DataRun_A,C/Bkg_TT/Sig_"

If we wish to run our analysis only onto a subset of our samples in order to save time, for example, we can do so my specifying the fraction of the samples we would like to perform our analysis on with the ```--change_fraction``` flag with the accompanying floating value representing the fraction of the samples we want to work on.

For example running this cell below would trim our  ```./config/fraction_processor_samples.json``` by approximately ten percent.

In [None]:
! python run_prestage.py --change_fraction 0.1

The code above will only less than a second. This will save a new config file ```./config/fraction_processor_samples.json```. Please note that we don't overwrite the original full config file ```./config/fraction_processor_samples.json```. This is so that if you would like to change your fraction value, you can do so quickly, instead of waiting a full minute to redo the whole prestage step.

# Running Stage 1

Now we're ready to execute stage 1 of the analysis, which refers to the baseline selections we apply just before categorization of Higgs decay categories. we do this by simply running ```run_stage1.py```, though we recommend to also add ```-W ignore``` option to suppress warning flags. This operation takes the most time, ranging from 30 mins for fraction of around 0.25, all the way to hours for a full sample run. The outputs of the ```run_stage1.py``` will be saved as collection of ```.parquet``` files in the directory that's defined in the ```save_path``` directory of the script.

In [None]:
! python -W ignore run_stage1.py

# Stage 1 Validation
Now we validate our stage 1 outputs by plotting validation histograms. Like ```run_prestage.py``` script, we can specify the options of the plots via ```--input_string``` flag, but with different formating, but this time with mostly just boolean values: 


Ratio_{Y or N}/LogY_{Y or N}/ShowLumi_{Y or N}/Status_{work or prelim}

Where we specify if we want Data/MC ratio plot in the bottom panel on with "Y" to mean yes and "N" to mean no after ```Ratio_```, plot in log scale in the y axis after ```LogY_```, show integrated luminosity value of the run after ```ShowLumi_``` and status of the plot after ```Status_```, where the option is "work" for "Work in Progress", "prelim" for "Preliminary" and empty character ("") for no mention of the status at all.

Ie: Ratio_Y/LogY_Y/ShowLumi_N/Status_work indicates to have Data/MC ratio plot on the bottom, plot in logarithmic scale, don't show the integrated luminosity value, and have "Work in progress" label


In [None]:
! python run_stage1_validation.py --fraction 0.001 --input_string "Ratio_Y/LogY_Y/ShowLumi_N/Status_work"
