Skip to content

Commit

Permalink
Merge pull request jemrobinson#16 from mickypaganini/master
Browse files Browse the repository at this point in the history
Updated README + tabular print out of asimov significances
  • Loading branch information
jemrobinson committed Oct 7, 2016
2 parents 50fc323 + 6ba3ead commit a4b0605
Show file tree
Hide file tree
Showing 4 changed files with 216 additions and 69 deletions.
136 changes: 114 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,126 @@
# bbyy_jet_classifier
This package is used by the ATLAS HH->bbyy search.
It defines a classifier to choose which non-b-tagged jet to pair with the b-tagged jet in 1-tag events.
This code uses the ```skTMVA``` class from ```https://github.com/yuraic/koza4ok``` to convert ```scikit-learn``` output into ROOT TMVA xml input.
This package is used by the ATLAS HH->bbyy search. <br/>
It defines a classifier to choose which non-b-tagged jet to pair with the b-tagged jet in 1-tag events. <br/>
This code uses the `skTMVA` class from https://github.com/yuraic/koza4ok to convert ```scikit-learn``` output into `ROOT TMVA` xml input.

# Example: training on SM inputs
./run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/SM_hh.root --exclude Delta_phi_jb --strategy RootTMVA sklBDT --max_events 1000
## Dependencies
This package relies on two different Machine Learning libraries to train BDTs, and the user can control which one to use. One is [`scikit-learn`](http://scikit-learn.org/stable/install.html "Scikit-learn Installation"); the other is [`TMVA`](http://tmva.sourceforge.net/), a CERN specific data science package included in [`ROOT`](https://root.cern.ch/). <br/>
Both `ROOT` and `scikit-learn` are required even if the user decides to only use one of the two for the ML portion of the project. `ROOT` is required to open the `.root` input files and `scikit-learn` is used in different capacities during the data pre-processing stages.
Other necessary libraries include:
* [joblib](https://pythonhosted.org/joblib/installing.html)
* [matplotlib](http://matplotlib.org/faq/installing_faq.html)
* [numpy](http://docs.scipy.org/doc/numpy-1.10.0/user/install.html)
* [rootpy](http://www.rootpy.org/install.html)
* [root_numpy](https://rootpy.github.io/root_numpy/install.html)

# Example: testing previously trained classifier on BSM inputs -- NB. be careful about double counting here
./run_classifier.py --input inputs/X275_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
For updated requirements check the requirements.txt file.

./run_classifier.py --input inputs/X300_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
## Inputs
The most recent set of input TTrees are in:
```
/afs/cern.ch/user/j/jrobinso/work/public/ML_inputs
```

./run_classifier.py --input inputs/X325_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
## Structure
There are two main script to execute: `run_classifier.py` and `evaluate_event_performance.py`. These need to be executed sequentially whenever the output of the former looks satisfactory. <br/>

./run_classifier.py --input inputs/X350_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
### Usage of `run_classifier.py`:
This is the main script. It is composed of different parts that handle data processing, plotting, training and/or testing. This script connects all the steps in the pipeline. <br/>
There are multiple options and flags that can be passed to this script.
```
usage: run_classifier.py [-h] --input INPUT [INPUT ...] [--tree TREE]
[--output OUTPUT]
[--exclude VARIABLE_NAME [VARIABLE_NAME ...]]
[--strategy STRATEGY [STRATEGY ...]] [--grid_search]
[--ftrain FTRAIN] [--training_sample TRAINING_SAMPLE]
[--max_events MAX_EVENTS]
./run_classifier.py --input inputs/X400_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
Run ML algorithms over ROOT TTree input
# Example: training and testing on all inputs
./run_classifier.py --input inputs/*root --exclude Delta_phi_jb
optional arguments:
-h, --help show this help message and exit
--input INPUT [INPUT ...]
List of input file names
--tree TREE Name of the tree in the ntuples. Default: events_1tag
--output OUTPUT Output directory. Default: output
--exclude VARIABLE_NAME [VARIABLE_NAME ...]
List of variables that are present in the tree but
should not be used by the classifier
--strategy STRATEGY [STRATEGY ...]
Type of BDT to use. Options are: RootTMVA, sklBDT.
Default: both
--grid_search Pass this flag to run a grid search to determine BDT
parameters
--ftrain FTRAIN Fraction of events to use for training. Default: 0.6.
Set to 0 for testing only.
--training_sample TRAINING_SAMPLE
Directory with pre-trained BDT to be used for testing
--max_events MAX_EVENTS
Maximum number of events to use (for debugging).
Default: all
```
### Usage of `evaluate_event_performance.py`:
Event level performance evaluation quantified in terms of Asimov significance. <br/>
It compares the performance of three old strategies (`mHmatch`, `pThigh`, `pTjb`) with that of the BDT. The BDT performance is evaluated after excluding events in which the highest BDT score is < threshold. For many threshold values, the performance can be computed in paralled. <br/>
It outputs a plot and a pickled dictionary.
```
usage: evaluate_event_performance.py [-h] [--strategy STRATEGY]
[--category CATEGORY]
[--intervals INTERVALS]
# Example: training and testing on low-mass inputs
./run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/X275_hh.root inputs/X300_hh.root inputs/X325_hh.root inputs/X350_hh.root --exclude Delta_phi_jb --output low_mass
Check event level performance
# Example: training and testing on high-mass inputs
./run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/X400_hh.root inputs/SM_hh.root --exclude Delta_phi_jb --output high_mass
optional arguments:
-h, --help show this help message and exit
--strategy STRATEGY Strategy to evaluate. Options are: root_tmva, skl_BDT.
Default: skl_BDT
--category CATEGORY Trained classifier to use for event-level evaluation.
Examples are: low_mass, high_mass. Default: low_mass
--intervals INTERVALS
Number of threshold values to test. Default: 21
```

# Example: evaluate event-level performance
./evaluate_event_performance.py X275_hh X300_hh X325_hh X400_hh SM_bkg_photon_jet SM_hh
# Examples:
### Training on SM inputs
This will train two BDTs (one with `TMVA`, one with `scikit-learn`) on 1000 events (for debugging purposes) from the Standard Model input files, without usin the Delta_phi_jb variable.
```
python run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/SM_hh.root --exclude Delta_phi_jb --strategy RootTMVA sklBDT --max_events 1000
```

# Inputs
The most recent set of input TTrees are in:
/afs/cern.ch/user/j/jrobinso/work/public/ML_inputs
### Testing previously trained classifier on different individual BSM inputs -- NB. be careful about double counting here
Note: `--ftrain 0` indicates that 0% of the input samples should be used for training; therefore, this will only <i>test</i> the performance of a previously trained net (the location of which is indicated by `--training_sample`) on the input files specified after `--input`.
```
python run_classifier.py --input inputs/X275_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
python run_classifier.py --input inputs/X300_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
python run_classifier.py --input inputs/X325_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
python run_classifier.py --input inputs/X350_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
python run_classifier.py --input inputs/X400_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
```

### Training and testing on all inputs
By default, 60% of the input files will be used for training, 40% for testing.
```
python run_classifier.py --input inputs/*root --exclude Delta_phi_jb
```

### Training and testing on low-mass inputs
The signal samples in the mass range between 275 and 350 GeV are commonly identified as the "low mass" samples. A classifier trained on those will then be identified by that tag `low_mass` and placed in an homonymous folder.
```
python run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/X275_hh.root inputs/X300_hh.root inputs/X325_hh.root inputs/X350_hh.root --exclude Delta_phi_jb --output low_mass
```

### Training and testing on high-mass inputs
Conversely, the Standard Model signal samples and the BSM sample with resonant mass of 400 GeV are commonly identified as the "high mass samples. A classifier trained on those will then be identified by that tag `high_mass` and placed in an homonymous folder.
```
python run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/X400_hh.root inputs/SM_hh.root --exclude Delta_phi_jb --output high_mass
```

### Evaluate event-level performance
`run_classifier.py` only handles the pipeline to the individual jet pair classification stage. A final step is to evaluate the event-level performance of the tagger, by selecting in each event the jet pair that has the highest BDT score and checking how often that corresponds to the correct pair. The event-level performance is then compared with that of other methods that were previosuly tested in the analysis.
```
python evaluate_event_performance.py --category low_mass --strategy root_tmva --intervals 21
```
49 changes: 25 additions & 24 deletions bbyy_jet_classifier/plotting/plot_asimov.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,36 +2,37 @@
import matplotlib
import cPickle
import numpy as np
import plot_atlas

def bdt_old_ratio(data, strategy, baseline_strategy, lower_bound):

plot_atlas.set_style()
figure = plt.figure(figsize=(6, 6), dpi=100)
axes = plt.axes()
figure = plt.figure(figsize=(6, 6), dpi=100)
axes = plt.axes()

# matplotlib.rcParams.update({'font.size': 16})
plt.hlines(1, lower_bound, 0.95, linestyles='dashed', linewidth=0.5)
color = iter(plt.cm.rainbow(np.linspace(0, 1, len(data.keys()))))
# matplotlib.rcParams.update({'font.size': 16})
plt.hlines(1, lower_bound, 0.95, linestyles='dashed', linewidth=0.5)
color = iter(plt.cm.rainbow(np.linspace(0, 1, len(data.keys()))))

for ss in sorted(data.keys()): # loop through the different signal samples
c = next(color)
if len(data[ss]['BDT'][1]) > 1:
_ = plt.plot(data[ss]['BDT'][0], data[ss]['BDT'][1] / data[ss][baseline_strategy][1], label=ss, color=c, linewidth=2)
maxidx = np.argmax((data[ss]['BDT'][1] / data[ss][baseline_strategy][1])[:-1])
_ = plt.scatter(
data[ss]['BDT'][0][maxidx],
(data[ss]['BDT'][1] / data[ss][baseline_strategy][1])[maxidx],
color=c
)
else:
raise ValueError("There are no data points to plot for class " + ss)
for ss in sorted(data.keys()): # loop through the different signal samples
c = next(color)
if len(data[ss]['BDT'][1]) > 1:
_ = plt.plot(data[ss]['BDT'][0], data[ss]['BDT'][1] / data[ss][baseline_strategy][1], label=ss, color=c, linewidth=2)
maxidx = np.argmax((data[ss]['BDT'][1] / data[ss][baseline_strategy][1])[:-1])
_ = plt.scatter(
data[ss]['BDT'][0][maxidx],
(data[ss]['BDT'][1] / data[ss][baseline_strategy][1])[maxidx],
color=c
)
else:
raise ValueError("There are no data points to plot for class " + ss)

plt.title('Asimov significance ratio wrt {} strategy'.format(baseline_strategy))
plt.xlabel('BDT Threshold Value')
plt.ylabel(r'$Z_{\mathrm{Asimov}}^{\mathrm{BDT}} / Z_{\mathrm{Asimov}}^{\mathrm{baseline}}$')
plt.xlim(xmin=lower_bound, xmax=0.95)
plt.ylim(ymin=0.2, ymax=2.8)
plt.title('Asimov significance ratio wrt {} strategy'.format(baseline_strategy))
plt.xlabel('BDT Threshold Value')
plt.ylabel(r'$Z_{\mathrm{Asimov}}^{\mathrm{BDT}} / Z_{\mathrm{Asimov}}^{\mathrm{baseline}}$')
plt.xlim(xmin=lower_bound, xmax=0.95)
plt.ylim(ymin=0.2, ymax=2.8)

plt.legend(loc='upper left')
plt.savefig('threshold_ratio_{}.pdf'.format(strategy))
plt.legend(loc='upper left')
plt.savefig('threshold_ratio_{}.pdf'.format(strategy))
plt.close(figure)
66 changes: 54 additions & 12 deletions evaluate_event_performance.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,16 @@
#! /usr/bin/env python
'''
Event level performance evaluation quantified in terms of Asimov significance.
It compares the performance of three old strategies (mHmatch, pThigh, pTjb)
with that of the BDT. The BDT performance is evaluated after excluding events
in which the highest BDT score is < threshold. For many threshold values, the
performance can be computed in paralled.
Output:
plot + pickled dictionary
Run:
python evaluate_event_performance.py --strategy root_tmva \
--sample_names SM_bkg_photon_jet SM_hh X275 X300 (...) --intervals 21
'''
import cPickle
import glob
import logging
Expand All @@ -7,20 +19,27 @@
from joblib import Parallel, delayed
import numpy as np
import time
from tabulate import tabulate
from bbyy_jet_classifier import utils
from bbyy_jet_classifier.plotting import plot_asimov


def main(strategy, category, lower_bound, intervals):

logger = logging.getLogger("event_performance.main")
# -- test N(=intervals) various threshold values in the range [lower_bound, 1]
# where lower_bound = 0 for sklearn, = -1 for tmva
THRESHOLD = np.linspace(lower_bound, 1, intervals)

# -- print some info to the user to confirm the settings
logger.info("Strategy: {}".format(strategy))
logger.info("Category: {}".format(category))
logger.info("Threshold values: {}".format(THRESHOLD))

# -- find samples by looking at those contained in ./output/<category>/pickles/
base_directory = os.path.join("output", category, "pickles")
sample_names = os.listdir(base_directory)
bkg_sample_name = [ x for x in sample_names if "bkg" in x ][0]
logger.info("Processing data from {} samples...".format(len(sample_names)))

pickle_paths = sum(
[glob.glob(
os.path.join(
Expand All @@ -34,12 +53,13 @@ def main(strategy, category, lower_bound, intervals):
)
logger.info("Found {} datasets to load...".format(len(pickle_paths)))

# -- read in data from each pickle file & evaluate event-level performance
perf_dict = {}
for sample_name, path in zip(sample_names,pickle_paths):
start_time = time.time()
logger.info("Reading: {}...".format(path))
d = cPickle.load(open(path, "rb"))
perf_dict[sample_name] = eval_performance(
perf_dict[sample_name] = eval_performance( # performance evaluation
d["yhat_test_ev"],
d["yhat_mHmatch_test_ev"],
d["yhat_pThigh_test_ev"],
Expand All @@ -51,13 +71,15 @@ def main(strategy, category, lower_bound, intervals):
)
logger.info("Done in {:.2f} seconds".format(time.time()-start_time))

# TO-DO: it would be nicer if the dictionary was indexed by threshold, instead of containing a 2d list of thresholds and asimovs
headers = sorted([s for s in sample_names if s != bkg_sample_name])
if hasattr(THRESHOLD, "__iter__"):
asimov_dict = {
_sample_name: {
strategy: map(np.array, [THRESHOLD, [asimov(s, b) for s, b in zip(perf_dict[_sample_name][strategy], perf_dict[bkg_sample_name][strategy])]])
for strategy in ["BDT", "mHmatch", "pThigh", "pTjb"]
}
for _sample_name in [ x for x in sample_names if x != bkg_sample_name ]
for _sample_name in headers#[ x for x in sample_names if x != bkg_sample_name ]
}

else:
Expand All @@ -66,17 +88,33 @@ def main(strategy, category, lower_bound, intervals):
strategy: map(np.array, [[THRESHOLD], [asimov(s, b) for s, b in zip(perf_dict[_sample_name][strategy], perf_dict[bkg_sample_name][strategy])]])
for strategy in ["BDT", "mHmatch", "pThigh", "pTjb"]
}
for _sample_name in [ x for x in sample_names if x != bkg_sample_name ]
for _sample_name in headers#[ x for x in sample_names if x != bkg_sample_name ]
}

# Write output to disk
# -- Write dictionary of Asimov significances to disk
utils.ensure_directory(os.path.join("output", "pickles"))
with open(os.path.join("output", "pickles", "multi_proc_{}.pkl".format(strategy)), "wb") as f:
cPickle.dump(asimov_dict, f)

# Plot Z_BDT/Z_old for different threshold values
# -- Plot Z_BDT/Z_old for different threshold values
plot_asimov.bdt_old_ratio(asimov_dict, strategy, 'mHmatch', lower_bound)


# -- Print Asimov significance for different strategies and different samples in tabular form
# Each table corresponds to a different threshold value
for threshold in THRESHOLD:
#print '\nAsimov significance for threshold = {}:\n{}'.format(threshold, tabulate(
logger.info('\nAsimov significance for threshold = {}:\n{}'.format(threshold, tabulate(
[
[strategy] + [
asimov_dict[_class][strategy][1][np.isclose(asimov_dict[_class][strategy][0], threshold)]
for _class in headers
]
for strategy in ['BDT', 'mHmatch', 'pTjb', 'pThigh']
],
headers=[''] + headers,
floatfmt=".5f"
))
)

def asimov(s, b):
"""
Expand Down Expand Up @@ -114,7 +152,7 @@ def eval_performance(yhat_test_ev, yhat_mHmatch_test_ev, yhat_pThigh_test_ev, yh
logger.info("BDT: Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_test_ev, y_event)))
logger.info("mHmatch: Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_mHmatch_test_ev, y_event)))
logger.info("pThigh: Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_pThigh_test_ev, y_event)))
logger.info("pTjb: Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_pTjb_test_ev, y_event)))
logger.info("pTjb: Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_pTjb_test_ev, y_event)))
logger.info("Number of events without any correct pair = {}".format(sum([sum(y_event[ev]) == 0 for ev in xrange(len(y_event))])))

# check whether selected pair has m_jb in mass window for truly correct and truly incorrect pairs
Expand Down Expand Up @@ -212,9 +250,13 @@ def in_mjb_window(mjb_event, y_event, yhat_test_ev, w_test, correct_present_trut
utils.configure_logging()

parser = argparse.ArgumentParser(description="Check event level performance")
parser.add_argument("--strategy", type=str, help="strategy to evaluate. Options are: root_tmva, skl_BDT.", default="skl_BDT")
parser.add_argument("--category", type=str, help="which trained classifier to use. Examples are: low_mass, high_mass.", default="skl_BDT")
parser.add_argument("--intervals", type=int, help="number of threshold values to test", default=21)
#parser.add_argument("--sample_names", help="list of names of samples to evaluate", type=str, nargs="+", default=[])
parser.add_argument("--strategy", type=str, default="skl_BDT",
help="Strategy to evaluate. Options are: root_tmva, skl_BDT. Default: skl_BDT")
parser.add_argument("--category", type=str, default="low_mass",
help="Trained classifier to use for event-level evaluation. Examples are: low_mass, high_mass. Default: low_mass")
parser.add_argument("--intervals", type=int, default=21,
help="Number of threshold values to test. Default: 21")
args = parser.parse_args()
if args.strategy == 'skl_BDT':
lower_bound = 0
Expand Down
Loading

0 comments on commit a4b0605

Please sign in to comment.