Merge pull request jemrobinson#16 from mickypaganini/master

Updated README + tabular print out of asimov significances
mickypaganini · Oct 7, 2016 · a4b0605 · a4b0605
2 parents 50fc323 + 6ba3ead
commit a4b0605
Show file tree

Hide file tree

Showing 4 changed files with 216 additions and 69 deletions.
diff --git a/README.md b/README.md
@@ -1,34 +1,126 @@
 # bbyy_jet_classifier
-This package is used by the ATLAS HH->bbyy search.
-It defines a classifier to choose which non-b-tagged jet to pair with the b-tagged jet in 1-tag events.
-This code uses the ```skTMVA``` class from ```https://github.com/yuraic/koza4ok``` to convert ```scikit-learn``` output into ROOT TMVA xml input.
+This package is used by the ATLAS HH->bbyy search. <br/>
+It defines a classifier to choose which non-b-tagged jet to pair with the b-tagged jet in 1-tag events. <br/>
+This code uses the `skTMVA` class from https://github.com/yuraic/koza4ok to convert ```scikit-learn``` output into `ROOT TMVA` xml input.
 
-# Example: training on SM inputs
-./run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/SM_hh.root --exclude Delta_phi_jb --strategy RootTMVA sklBDT --max_events 1000
+## Dependencies
+This package relies on two different Machine Learning libraries to train BDTs, and the user can control which one to use. One is [`scikit-learn`](http://scikit-learn.org/stable/install.html "Scikit-learn Installation"); the other is [`TMVA`](http://tmva.sourceforge.net/), a CERN specific data science package included in [`ROOT`](https://root.cern.ch/). <br/>
+Both `ROOT` and `scikit-learn` are required even if the user decides to only use one of the two for the ML portion of the project. `ROOT` is required to open the `.root` input files and `scikit-learn` is used in different capacities during the data pre-processing stages.
+Other necessary libraries include:
+* [joblib](https://pythonhosted.org/joblib/installing.html)
+* [matplotlib](http://matplotlib.org/faq/installing_faq.html)
+* [numpy](http://docs.scipy.org/doc/numpy-1.10.0/user/install.html)
+* [rootpy](http://www.rootpy.org/install.html)
+* [root_numpy](https://rootpy.github.io/root_numpy/install.html)
 
-# Example: testing previously trained classifier on BSM inputs -- NB. be careful about double counting here
-./run_classifier.py --input inputs/X275_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+For updated requirements check the requirements.txt file.
 
-./run_classifier.py --input inputs/X300_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+## Inputs
+The most recent set of input TTrees are in:
+```
+/afs/cern.ch/user/j/jrobinso/work/public/ML_inputs
+```
 
-./run_classifier.py --input inputs/X325_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+## Structure
+There are two main script to execute: `run_classifier.py` and `evaluate_event_performance.py`. These need to be executed sequentially whenever the output of the former looks satisfactory. <br/>
 
-./run_classifier.py --input inputs/X350_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+### Usage of `run_classifier.py`:
+This is the main script. It is composed of different parts that handle data processing, plotting, training and/or testing. This script connects all the steps in the pipeline. <br/>
+There are multiple options and flags that can be passed to this script.
+```
+usage: run_classifier.py [-h] --input INPUT [INPUT ...] [--tree TREE]
+                         [--output OUTPUT]
+                         [--exclude VARIABLE_NAME [VARIABLE_NAME ...]]
+                         [--strategy STRATEGY [STRATEGY ...]] [--grid_search]
+                         [--ftrain FTRAIN] [--training_sample TRAINING_SAMPLE]
+                         [--max_events MAX_EVENTS]
 
-./run_classifier.py --input inputs/X400_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+Run ML algorithms over ROOT TTree input
 
-# Example: training and testing on all inputs
-./run_classifier.py --input inputs/*root --exclude Delta_phi_jb
+optional arguments:
+  -h, --help            show this help message and exit
+  --input INPUT [INPUT ...]
+                        List of input file names
+  --tree TREE           Name of the tree in the ntuples. Default: events_1tag
+  --output OUTPUT       Output directory. Default: output
+  --exclude VARIABLE_NAME [VARIABLE_NAME ...]
+                        List of variables that are present in the tree but
+                        should not be used by the classifier
+  --strategy STRATEGY [STRATEGY ...]
+                        Type of BDT to use. Options are: RootTMVA, sklBDT.
+                        Default: both
+  --grid_search         Pass this flag to run a grid search to determine BDT
+                        parameters
+  --ftrain FTRAIN       Fraction of events to use for training. Default: 0.6.
+                        Set to 0 for testing only.
+  --training_sample TRAINING_SAMPLE
+                        Directory with pre-trained BDT to be used for testing
+  --max_events MAX_EVENTS
+                        Maximum number of events to use (for debugging).
+                        Default: all
+```
+### Usage of `evaluate_event_performance.py`:
+Event level performance evaluation quantified in terms of Asimov significance. <br/>
+It compares the performance of three old strategies (`mHmatch`, `pThigh`, `pTjb`) with that of the BDT. The BDT performance is evaluated after excluding events in which the highest BDT score is < threshold. For many threshold values, the performance can be computed in paralled. <br/>
+It outputs a plot and a pickled dictionary.
+```
+usage: evaluate_event_performance.py [-h] [--strategy STRATEGY]
+                                     [--category CATEGORY]
+                                     [--intervals INTERVALS]
 
-# Example: training and testing on low-mass inputs
-./run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/X275_hh.root inputs/X300_hh.root inputs/X325_hh.root inputs/X350_hh.root --exclude Delta_phi_jb --output low_mass
+Check event level performance
 
-# Example: training and testing on high-mass inputs
-./run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/X400_hh.root inputs/SM_hh.root --exclude Delta_phi_jb --output high_mass
+optional arguments:
+  -h, --help            show this help message and exit
+  --strategy STRATEGY   Strategy to evaluate. Options are: root_tmva, skl_BDT.
+                        Default: skl_BDT
+  --category CATEGORY   Trained classifier to use for event-level evaluation.
+                        Examples are: low_mass, high_mass. Default: low_mass
+  --intervals INTERVALS
+                        Number of threshold values to test. Default: 21
+```
 
-# Example: evaluate event-level performance
-./evaluate_event_performance.py X275_hh X300_hh X325_hh X400_hh SM_bkg_photon_jet SM_hh
+# Examples: 
+### Training on SM inputs
+This will train two BDTs (one with `TMVA`, one with `scikit-learn`) on 1000 events (for debugging purposes) from the Standard Model input files, without usin the Delta_phi_jb variable.
+```
+python run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/SM_hh.root --exclude Delta_phi_jb --strategy RootTMVA sklBDT --max_events 1000
+```
 
-# Inputs
-The most recent set of input TTrees are in:
-/afs/cern.ch/user/j/jrobinso/work/public/ML_inputs
+### Testing previously trained classifier on different individual BSM inputs -- NB. be careful about double counting here
+Note: `--ftrain 0` indicates that 0% of the input samples should be used for training; therefore, this will only <i>test</i> the performance of a previously trained net (the location of which is indicated by `--training_sample`) on the input files specified after `--input`.
+```
+python run_classifier.py --input inputs/X275_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+
+python run_classifier.py --input inputs/X300_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+
+python run_classifier.py --input inputs/X325_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+
+python run_classifier.py --input inputs/X350_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+
+python run_classifier.py --input inputs/X400_hh.root --exclude Delta_phi_jb --strategy sklBDT --ftrain 0 --training_sample SM_merged
+```
+
+### Training and testing on all inputs
+By default, 60% of the input files will be used for training, 40% for testing.
+```
+python run_classifier.py --input inputs/*root --exclude Delta_phi_jb
+```
+
+### Training and testing on low-mass inputs
+The signal samples in the mass range between 275 and 350 GeV are commonly identified as the "low mass" samples. A classifier trained on those will then be identified by that tag `low_mass` and placed in an homonymous folder. 
+```
+python run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/X275_hh.root inputs/X300_hh.root inputs/X325_hh.root inputs/X350_hh.root --exclude Delta_phi_jb --output low_mass
+```
+
+### Training and testing on high-mass inputs
+Conversely, the Standard Model signal samples and the BSM sample with resonant mass of 400 GeV are commonly identified as the "high mass samples. A classifier trained on those will then be identified by that tag `high_mass` and placed in an homonymous folder. 
+```
+python run_classifier.py --input inputs/SM_bkg_photon_jet.root inputs/X400_hh.root inputs/SM_hh.root --exclude Delta_phi_jb --output high_mass
+```
+
+### Evaluate event-level performance
+`run_classifier.py` only handles the pipeline to the individual jet pair classification stage. A final step is to evaluate the event-level performance of the tagger, by selecting in each event the jet pair that has the highest BDT score and checking how often that corresponds to the correct pair. The event-level performance is then compared with that of other methods that were previosuly tested in the analysis.
+```
+python evaluate_event_performance.py --category low_mass --strategy root_tmva --intervals 21
+```
diff --git a/bbyy_jet_classifier/plotting/plot_asimov.py b/bbyy_jet_classifier/plotting/plot_asimov.py
@@ -2,36 +2,37 @@
 import matplotlib
 import cPickle
 import numpy as np
+import plot_atlas
 
 def bdt_old_ratio(data, strategy, baseline_strategy, lower_bound):
 
     plot_atlas.set_style()
-	figure = plt.figure(figsize=(6, 6), dpi=100)
-	axes = plt.axes()
+    figure = plt.figure(figsize=(6, 6), dpi=100)
+    axes = plt.axes()
 
-	# matplotlib.rcParams.update({'font.size': 16})
-	plt.hlines(1, lower_bound, 0.95, linestyles='dashed', linewidth=0.5)
-	color = iter(plt.cm.rainbow(np.linspace(0, 1, len(data.keys()))))
+    # matplotlib.rcParams.update({'font.size': 16})
+    plt.hlines(1, lower_bound, 0.95, linestyles='dashed', linewidth=0.5)
+    color = iter(plt.cm.rainbow(np.linspace(0, 1, len(data.keys()))))
 
-	for ss in sorted(data.keys()): # loop through the different signal samples
-		c = next(color)
-		if len(data[ss]['BDT'][1]) > 1:
-			_ = plt.plot(data[ss]['BDT'][0], data[ss]['BDT'][1] / data[ss][baseline_strategy][1], label=ss, color=c, linewidth=2)
-			maxidx = np.argmax((data[ss]['BDT'][1] / data[ss][baseline_strategy][1])[:-1])
-			_ = plt.scatter(
-				data[ss]['BDT'][0][maxidx],
-				(data[ss]['BDT'][1] / data[ss][baseline_strategy][1])[maxidx],
-				color=c
-				)
-		else:
-			raise ValueError("There are no data points to plot for class " + ss)
+    for ss in sorted(data.keys()): # loop through the different signal samples
+        c = next(color)
+        if len(data[ss]['BDT'][1]) > 1:
+            _ = plt.plot(data[ss]['BDT'][0], data[ss]['BDT'][1] / data[ss][baseline_strategy][1], label=ss, color=c, linewidth=2)
+            maxidx = np.argmax((data[ss]['BDT'][1] / data[ss][baseline_strategy][1])[:-1])
+            _ = plt.scatter(
+                data[ss]['BDT'][0][maxidx],
+                (data[ss]['BDT'][1] / data[ss][baseline_strategy][1])[maxidx],
+                color=c
+                )
+        else:
+            raise ValueError("There are no data points to plot for class " + ss)
 
-	plt.title('Asimov significance ratio wrt {} strategy'.format(baseline_strategy))
-	plt.xlabel('BDT Threshold Value')
-	plt.ylabel(r'$Z_{\mathrm{Asimov}}^{\mathrm{BDT}} / Z_{\mathrm{Asimov}}^{\mathrm{baseline}}$')
-	plt.xlim(xmin=lower_bound, xmax=0.95)
-	plt.ylim(ymin=0.2, ymax=2.8)
+    plt.title('Asimov significance ratio wrt {} strategy'.format(baseline_strategy))
+    plt.xlabel('BDT Threshold Value')
+    plt.ylabel(r'$Z_{\mathrm{Asimov}}^{\mathrm{BDT}} / Z_{\mathrm{Asimov}}^{\mathrm{baseline}}$')
+    plt.xlim(xmin=lower_bound, xmax=0.95)
+    plt.ylim(ymin=0.2, ymax=2.8)
 
-	plt.legend(loc='upper left')
-	plt.savefig('threshold_ratio_{}.pdf'.format(strategy))
+    plt.legend(loc='upper left')
+    plt.savefig('threshold_ratio_{}.pdf'.format(strategy))
     plt.close(figure)
diff --git a/evaluate_event_performance.py b/evaluate_event_performance.py
@@ -1,4 +1,16 @@
 #! /usr/bin/env python
+'''
+Event level performance evaluation quantified in terms of Asimov significance.
+It compares the performance of three old strategies (mHmatch, pThigh, pTjb)
+with that of the BDT. The BDT performance is evaluated after excluding events
+in which the highest BDT score is < threshold. For many threshold values, the
+performance can be computed in paralled. 
+Output: 
+    plot + pickled dictionary
+Run:
+    python evaluate_event_performance.py --strategy root_tmva \
+    --sample_names SM_bkg_photon_jet SM_hh X275 X300 (...) --intervals 21
+'''
 import cPickle
 import glob
 import logging
@@ -7,20 +19,27 @@
 from joblib import Parallel, delayed
 import numpy as np
 import time
+from tabulate import tabulate
 from bbyy_jet_classifier import utils
 from bbyy_jet_classifier.plotting import plot_asimov
 
-
 def main(strategy, category, lower_bound, intervals):
 
     logger = logging.getLogger("event_performance.main")
+    # -- test N(=intervals) various threshold values in the range [lower_bound, 1]
+    #    where lower_bound = 0 for sklearn, = -1 for tmva
     THRESHOLD = np.linspace(lower_bound, 1, intervals)
 
+    # -- print some info to the user to confirm the settings
+    logger.info("Strategy: {}".format(strategy))
+    logger.info("Category: {}".format(category))
+    logger.info("Threshold values: {}".format(THRESHOLD))
+
+    # -- find samples by looking at those contained in ./output/<category>/pickles/
     base_directory = os.path.join("output", category, "pickles")
     sample_names = os.listdir(base_directory)
     bkg_sample_name = [ x for x in sample_names if "bkg" in x ][0]
     logger.info("Processing data from {} samples...".format(len(sample_names)))
-
     pickle_paths = sum(
         [glob.glob(
                 os.path.join(
@@ -34,12 +53,13 @@ def main(strategy, category, lower_bound, intervals):
         )
     logger.info("Found {} datasets to load...".format(len(pickle_paths)))
 
+    # -- read in data from each pickle file & evaluate event-level performance
     perf_dict = {}
     for sample_name, path in zip(sample_names,pickle_paths):
         start_time = time.time()
         logger.info("Reading: {}...".format(path))
         d = cPickle.load(open(path, "rb"))
-        perf_dict[sample_name] = eval_performance(
+        perf_dict[sample_name] = eval_performance( # performance evaluation
             d["yhat_test_ev"],
             d["yhat_mHmatch_test_ev"],
             d["yhat_pThigh_test_ev"],
@@ -51,13 +71,15 @@ def main(strategy, category, lower_bound, intervals):
         )
         logger.info("Done in {:.2f} seconds".format(time.time()-start_time))
 
+    # TO-DO: it would be nicer if the dictionary was indexed by threshold, instead of containing a 2d list of thresholds and asimovs
+    headers = sorted([s for s in sample_names if s != bkg_sample_name])
     if hasattr(THRESHOLD, "__iter__"):
         asimov_dict = {
             _sample_name: {
                 strategy: map(np.array, [THRESHOLD, [asimov(s, b) for s, b in zip(perf_dict[_sample_name][strategy], perf_dict[bkg_sample_name][strategy])]])
                 for strategy in ["BDT", "mHmatch", "pThigh", "pTjb"]
             }
-            for _sample_name in [ x for x in sample_names if x != bkg_sample_name ]
+            for _sample_name in headers#[ x for x in sample_names if x != bkg_sample_name ]
         }
 
     else:
@@ -66,17 +88,33 @@ def main(strategy, category, lower_bound, intervals):
                 strategy: map(np.array, [[THRESHOLD], [asimov(s, b) for s, b in zip(perf_dict[_sample_name][strategy], perf_dict[bkg_sample_name][strategy])]])
                 for strategy in ["BDT", "mHmatch", "pThigh", "pTjb"]
             }
-            for _sample_name in [ x for x in sample_names if x != bkg_sample_name ]
+            for _sample_name in headers#[ x for x in sample_names if x != bkg_sample_name ]
         }
 
-    # Write output to disk
+    # -- Write dictionary of Asimov significances to disk
     utils.ensure_directory(os.path.join("output", "pickles"))
     with open(os.path.join("output", "pickles", "multi_proc_{}.pkl".format(strategy)), "wb") as f:
         cPickle.dump(asimov_dict, f)
 
-    # Plot Z_BDT/Z_old for different threshold values
+    # -- Plot Z_BDT/Z_old for different threshold values
     plot_asimov.bdt_old_ratio(asimov_dict, strategy, 'mHmatch', lower_bound)
-
+
+    # -- Print Asimov significance for different strategies and different samples in tabular form
+    #    Each table corresponds to a different threshold value
+    for threshold in THRESHOLD:
+        #print '\nAsimov significance for threshold = {}:\n{}'.format(threshold, tabulate(
+        logger.info('\nAsimov significance for threshold = {}:\n{}'.format(threshold, tabulate(
+            [
+                [strategy] + [
+                    asimov_dict[_class][strategy][1][np.isclose(asimov_dict[_class][strategy][0], threshold)] 
+                    for _class in headers
+                ] 
+                for strategy in ['BDT', 'mHmatch', 'pTjb', 'pThigh']
+            ],
+            headers=[''] + headers,
+            floatfmt=".5f"
+        ))
+        )
 
 def asimov(s, b):
     """
@@ -114,7 +152,7 @@ def eval_performance(yhat_test_ev, yhat_mHmatch_test_ev, yhat_pThigh_test_ev, yh
     logger.info("BDT:     Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_test_ev, y_event)))
     logger.info("mHmatch: Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_mHmatch_test_ev, y_event)))
     logger.info("pThigh:  Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_pThigh_test_ev, y_event)))
-    logger.info("pTjb:  Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_pTjb_test_ev, y_event)))
+    logger.info("pTjb:    Number of correctly classified events = {:5} out of {} events having a correct pair".format(*count_correct_total(yhat_pTjb_test_ev, y_event)))
     logger.info("Number of events without any correct pair = {}".format(sum([sum(y_event[ev]) == 0 for ev in xrange(len(y_event))])))
 
     # check whether selected pair has m_jb in mass window for truly correct and truly incorrect pairs
@@ -212,9 +250,13 @@ def in_mjb_window(mjb_event, y_event, yhat_test_ev, w_test, correct_present_trut
     utils.configure_logging()
 
     parser = argparse.ArgumentParser(description="Check event level performance")
-    parser.add_argument("--strategy", type=str, help="strategy to evaluate. Options are: root_tmva, skl_BDT.", default="skl_BDT")
-    parser.add_argument("--category", type=str, help="which trained classifier to use. Examples are: low_mass, high_mass.", default="skl_BDT")
-    parser.add_argument("--intervals", type=int, help="number of threshold values to test", default=21)
+    #parser.add_argument("--sample_names", help="list of names of samples to evaluate", type=str, nargs="+", default=[])
+    parser.add_argument("--strategy", type=str, default="skl_BDT",
+        help="Strategy to evaluate. Options are: root_tmva, skl_BDT. Default: skl_BDT")
+    parser.add_argument("--category", type=str, default="low_mass",
+        help="Trained classifier to use for event-level evaluation. Examples are: low_mass, high_mass. Default: low_mass")
+    parser.add_argument("--intervals", type=int, default=21,
+        help="Number of threshold values to test. Default: 21")
     args = parser.parse_args()
     if args.strategy == 'skl_BDT':
         lower_bound = 0