<img src="tmva_logo.gif" height="20%" width="20%">

# TMVA Higgs Classification Example in Python

In this example we will still do Higgs classification but we will use together with the native TMVA methods also methods from Keras and scikit-learn.

In [1]:
import ROOT
from ROOT import TMVA

Welcome to JupyROOT 6.22/08


## Declare Factory

Create the Factory class. Later you can choose the methods
whose performance you'd like to investigate. 

The factory is the major TMVA object you have to interact with. Here is the list of parameters you need to pass

 - The first argument is the base of the name of all the output
weightfiles in the directory weight/ that will be created with the 
method parameters 

 - The second argument is the output file for the training results
  
 - The third argument is a string option defining some general configuration for the TMVA session. For example all TMVA output can be suppressed by removing the "!" (not) in front of the "Silent" argument in the option string

In [2]:
ROOT.TMVA.Tools.Instance()
## For PYMVA methods
TMVA.PyMethodBase.PyInitialize();


outputFile = ROOT.TFile.Open("TestClassificationOutput.root", "RECREATE")

factory = ROOT.TMVA.Factory("Test_TMVA_Classification", outputFile,
                      "!V:ROC:!Silent:Color:!DrawProgressBar:AnalysisType=Classification" )

## Input Data

We define now the input data file and we retrieve the ROOT TTree objects with the signal and background input events

In [3]:
inputData = ROOT.TFile.Open( "data_ntuple-Copy1.root" )
inputMC = ROOT.TFile.Open( "MC_ntuple-Copy1.root" )
# mcData.ls()
# retrieve input trees
signalTree     = inputMC.Get("ntupleTree")
backgroundTree = inputData.Get("ntupleTree")

signalTree.Print()

******************************************************************************
*Tree    :ntupleTree: ntupleTree                                             *
*Entries :    33027 : Total =        84839065 bytes  File  Size =    8394135 *
*        :          : Tree compression factor =  10.13                       *
******************************************************************************
*Br    0 :hltNames  : vector<string>                                         *
*Entries :    33027 : Total  Size=   60024005 bytes  File Size  =    3025026 *
*Baskets :     1943 : Basket Size=      32000 bytes  Compression=  19.83     *
*............................................................................*
*Br    1 :hltResults : vector<bool>                                          *
*Entries :    33027 : Total  Size=    2851085 bytes  File Size  =     342595 *
*Baskets :       94 : Basket Size=      32000 bytes  Compression=   8.32     *
*...................................................

## Declare DataLoader(s)

The next step is to declare the DataLoader class that deals with input data abd variables 

We add first the signal and background trees in the data loader and then we
define the input variables that shall be used for the MVA training
note that you may also use variable expressions, which can be parsed by TTree::Draw( "expression" )]

In [4]:
loader = ROOT.TMVA.DataLoader("testDataset")

### global event weights per tree (see below for setting event-wise weights)
signalWeight     = 1.0
backgroundWeight = 1.0
   
### You can add an arbitrary number of signal or background trees
loader.AddSignalTree    ( signalTree,     signalWeight     )
loader.AddBackgroundTree( backgroundTree, backgroundWeight )

## Define input variables 
loader.AddVariable("B_Eta")
# loader.AddVariable("Kstar_Mass ")

DataSetInfo              : [testDataset] : Added class "Signal"
                         : Add Tree ntupleTree of type Signal with 33027 events
DataSetInfo              : [testDataset] : Added class "Background"
                         : Add Tree ntupleTree of type Background with 943 events


## Setup Dataset(s)

Setup the DataLoader by splitting events in training and test samples. 
Here we use a random split and a fixed number of training and test events.


In [5]:
## Apply additional cuts on the signal and background samples (can be different)
mycuts = ROOT.TCut("")   ## for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
mycutb = ROOT.TCut("")   ## for example: TCut mycutb = "abs(var1)<0.5";


loader.PrepareTrainingAndTestTree( mycuts, mycutb,
                                  "nTrain_Signal=33027:nTrain_Background=943:SplitMode=Random:"
                                   "NormMode=NumEvents:!V" )

# Booking Methods

Here we book the TMVA methods. We book a Likelihood based a BDT and a standard MLP (shallow NN)

In [6]:
## Boosted Decision Trees
factory.BookMethod(loader,ROOT.TMVA.Types.kBDT, "BDT",
                   "!V:NTrees=200:MinNodeSize=2.5%:MaxDepth=2:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:"
                   "BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20" )

## Multi-Layer Perceptron (Neural Network)
factory.BookMethod(loader, ROOT.TMVA.Types.kMLP, "MLP",
                   "!H:!V:NeuronType=tanh:VarTransform=N:NCycles=100:HiddenLayers=N+5:TestRate=5:!UseRegulator" );

Factory                  : Booking method: [1mBDT[0m
                         : 
                         : Building event vectors for type 2 Signal
                         : Dataset[testDataset] :  create input formulas for tree ntupleTree
                         : Building event vectors for type 2 Background
                         : Dataset[testDataset] :  create input formulas for tree ntupleTree
DataSetFactory           : [testDataset] : Number of events in input trees
                         : 
                         : 
                         : Number of training and testing events
                         : ---------------------------------------------------------------------------
                         : Signal     -- training events            : 33027
                         : Signal     -- testing events             : 0
                         : Signal     -- training and testing events: 33027
                         : Background -- training events            

## Train Methods

In [7]:
factory.TrainAllMethods();

Factory                  : [1mTrain all methods[0m
Factory                  : [testDataset] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'B_Eta' <---> Output : variable 'B_Eta'
TFHandler_Factory        : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :    B_Eta:  0.0032568     1.2551   [    -2.4299     2.4944 ]
                         : -----------------------------------------------------------
                         : Ranking input variables (method unspecific)...
IdTransformation         : Ranking result (top variable is best ranked)
                         : ------------------------------
                         : Rank : Variable  : Separation
                         : ------------------------------
    

## Test  all methods

Here we test all methods using the test data set

In [8]:
factory.TestAllMethods();   

Factory                  : [1mTest all methods[0m
Factory                  : Test method: BDT for Classification performance
                         : 
BDT                      : [testDataset] : Evaluation of BDT on testing sample (0 events)
                         : Elapsed time for evaluation of 0 events: 5.96e-06 sec       
Factory                  : Test method: MLP for Classification performance
                         : 
MLP                      : [testDataset] : Evaluation of MLP on testing sample (0 events)
                         : Elapsed time for evaluation of 0 events: 3.1e-06 sec       


## Evaluate all methods

Here we evaluate all methods and compare their performances, computing efficiencies, ROC curves etc.. using both training and tetsing data sets. Several histograms are produced which can be examined with the TMVAGui or directly using the output file

In [9]:
factory.EvaluateAllMethods();

runtime_error: void TMVA::Factory::EvaluateAllMethods() =>
    runtime_error: FATAL error

Factory                  : [1mEvaluate all methods[0m
Factory                  : Evaluate classifier: BDT
                         : 
BDT                      : [testDataset] : Loop over test events and fill histograms with classifier response...
                         : 
[37;41;1m<FATAL>                         : Number of entries <= 0 (0 in histogram: MVA_BDT_S)[0m
***> abort program execution


Error in <TMVA::Tools::Mean>: sum of weights <= 0 ?! that's a bit too much of negative event weights :) 
Error in <TMVA::Tools::Mean>: sum of weights <= 0 ?! that's a bit too much of negative event weights :) 


## Plot ROC Curve
We enable JavaScript visualisation for the plots

In [None]:
%jsroot on

In [None]:
c1 = factory.GetROCCurve(loader);
c1.Draw();


####  Close outputfile to save all output information (evaluation result of methods)

In [None]:
##outputFile.Close();