<img src="http://oproject.org/tiki-download_file.php?fileId=8&display&x=450&y=128">
<img src="http://files.oproject.org/tmvalogo.png" height="50%" width="50%">

# TMVA Classification Example Using a Convolutional Neural Network

## Declare Factory

Create the Factory class. Later you can choose the methods
whose performance you'd like to investigate. 

The factory is the major TMVA object you have to interact with. Here is the list of parameters you need to pass

 - The first argument is the base of the name of all the output
weightfiles in the directory weight/ that will be created with the 
method parameters 

 - The second argument is the output file for the training results
  
 - The third argument is a string option defining some general configuration for the TMVA session. For example all TMVA output can be suppressed by removing the "!" (not) in front of the "Silent" argument in the option string

In [1]:
import ROOT
from ROOT import TMVA
import os 


Welcome to JupyROOT 6.15/01


In [2]:
ROOT.TMVA.Tools.Instance()
TMVA.PyMethodBase.PyInitialize()

## For PYMVA methods
TMVA.PyMethodBase.PyInitialize();


outputFile = ROOT.TFile.Open("CaloImages_ClassificationOutput.root", "RECREATE")

factory = ROOT.TMVA.Factory("TMVA_CaloImages_Classification", outputFile,
                      "!V:ROC:!Silent:Color:!DrawProgressBar:AnalysisType=Classification:!Correlations" )

## Declare DataLoader(s)

The next step is to declare the DataLoader class that deals with input variables 

Define the input variables that shall be used for the MVA training
note that you may also use variable expressions, which can be parsed by TTree::Draw( "expression" )]

In this case the input data consists of an image of 16x16 pixels. Each single pixel is a branch in a ROOT TTree

In [3]:
inputFileName = "CaloImages_data.root"

##inputFile = ROOT.TFile.Open( inputFileName )
ROOT.TFile.SetCacheFileDir(".");
inputWebFileName = "http://www.cern.ch/moneta/root/" + inputFileName
inputFile = ROOT.TFile.Open(inputWebFileName,"CACHEREAD")
inputFile.ls()



TFile**		./moneta/root/CaloImages_data.root	
 TFile*		./moneta/root/CaloImages_data.root	
  KEY: TTree	sig_tree;1	Signal Tree
  KEY: TTree	bkg_tree;1	Background Tree


Info in <TFile::OpenFromCache>: using local cache copy of http://www.cern.ch/moneta/root/CaloImages_data.root [./moneta/root/CaloImages_data.root]


In [4]:
# retrieve input trees

signalTree     = inputFile.Get("sig_tree")
backgroundTree = inputFile.Get("bkg_tree")

signalTree.Print()

******************************************************************************
*Tree    :sig_tree  : Signal Tree                                            *
*Entries :    20000 : Total =        82781882 bytes  File  Size =    5752320 *
*        :          : Tree compression factor =  14.29                       *
******************************************************************************
*Br    0 :Energy0   : Energy0/F                                              *
*Entries :    20000 : Total  Size=      80829 bytes  File Size  =        484 *
*Baskets :        2 : Basket Size=      32000 bytes  Compression= 132.22     *
*............................................................................*
*Br    1 :Energy1   : Energy1/F                                              *
*Entries :    20000 : Total  Size=      80829 bytes  File Size  =        479 *
*Baskets :        2 : Basket Size=      32000 bytes  Compression= 133.60     *
*...................................................

In [5]:
loader = ROOT.TMVA.DataLoader("dataset")

### global event weights per tree (see below for setting event-wise weights)
signalWeight     = 1.0
backgroundWeight = 1.0
   
### You can add an arbitrary number of signal or background trees
loader.AddSignalTree    ( signalTree,     signalWeight     )
loader.AddBackgroundTree( backgroundTree, backgroundWeight )

imgSize = 8 * 8; 
for  i in range(0,imgSize):
    varName = "Energy"+str(i)
    loader.AddVariable(varName,'F');


DataSetInfo              : [dataset] : Added class "Signal"
                         : Add Tree sig_tree of type Signal with 20000 events
DataSetInfo              : [dataset] : Added class "Background"
                         : Add Tree bkg_tree of type Background with 20000 events


## Setup Dataset(s)

Define input data file and signal and background trees

In [6]:
## Apply additional cuts on the signal and background samples (can be different)
mycuts = ROOT.TCut("")   ## for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
mycutb = ROOT.TCut("")   ## for example: TCut mycutb = "abs(var1)<0.5";


loader.PrepareTrainingAndTestTree( mycuts, mycutb,
                                  "nTrain_Signal=5000:nTrain_Background=5000:SplitMode=Random:"
                                   "NormMode=NumEvents:!V" )

# Booking Methods

Here we book the TMVA methods. We book a DNN and a CNN

#### Booking Deep Neural Network

Here we book the new DNN of TMVA. If using master version you can use the new DL method

In [7]:
inputLayoutString = "InputLayout=1|1|1024"; 
batchLayoutString= "BatchLayout=1|32|1024";
layoutString = ("Layout=DENSE|64|TANH,DENSE|64|TANH,DENSE|64|TANH,DENSE|64|TANH,DENSE|1|LINEAR")

training1  = "Optimizer=ADAM,LearningRate=1e-3,Momentum=0.,Regularization=None,WeightDecay=1e-4,"
training1 += "DropConfig=0.+0.+0.+0.,MaxEpochs=30,ConvergenceSteps=10,BatchSize=32,TestRepetitions=1"
trainingStrategyString = "TrainingStrategy=" + training1


dnnOptions = "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=None:WeightInitialization=XAVIER::Architecture=CPU"

dnnOptions +=  ":" + inputLayoutString
dnnOptions +=  ":" + batchLayoutString
dnnOptions +=  ":" + layoutString
dnnOptions +=  ":" + trainingStrategyString

#we can now book the method
              
factory.BookMethod(loader, ROOT.TMVA.Types.kDL, "DL_DENSE", dnnOptions)


<ROOT.TMVA::MethodDL object ("DL_DENSE") at 0x9504350>

Factory                  : Booking method: [1mDL_DENSE[0m
                         : 
                         : Parsing option string: 
                         : ... "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=G:WeightInitialization=XAVIER::Architecture=CPU:InputLayout=1|1|1024:BatchLayout=1|32|1024:Layout=DENSE|64|TANH,DENSE|64|TANH,DENSE|64|TANH,DENSE|64|TANH,DENSE|1|LINEAR:TrainingStrategy=Optimizer=ADAM,LearningRate=1e-3,Momentum=0.,Regularization=None,WeightDecay=1e-4,DropConfig=0.+0.+0.+0.,MaxEpochs=30,ConvergenceSteps=10,BatchSize=32,TestRepetitions=1"
                         : The following options are set:
                         : - By User:
                         :     <none>
                         : - Default:
                         :     Boost_num: "0" [Number of times the classifier will be boosted]
                         : Parsing option string: 
                         : ... "!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=G:WeightInitialization=XAVIER::Ar

### Book Convolutional Neural Network in TMVA

In [8]:
#input layout 
inputLayoutString = "InputLayout=1|32|32"
                                                                                                
## Batch Layout                                                                                                                                     
batchLayoutString = "BatchLayout=32|1|1024"
                                                   

layoutString = ("Layout=CONV|10|3|3|1|1|1|1|RELU,CONV|10|3|3|1|1|1|1|RELU,MAXPOOL|2|2|1|1,"
            "RESHAPE|FLAT,DENSE|64|TANH,DENSE|1|LINEAR")
                                                                                                                                              


##Training strategies.                                                                                                                          
training1 = ("LearningRate=1e-3,Momentum=0.9,Repetitions=1,"
                     "ConvergenceSteps=10,BatchSize=32,TestRepetitions=1,"
                     "MaxEpochs=20,WeightDecay=1e-4,Regularization=None,"
                     "Optimizer=ADAM,DropConfig=0.0+0.0+0.0+0.0")

trainingStrategyString = "TrainingStrategy=" + training1
    
## General Options.                                                                                                                              
cnnOptions = ("!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=None:"
                       "WeightInitialization=XAVIERUNIFORM");

cnnOptions +=  ":" + inputLayoutString
cnnOptions +=  ":" + batchLayoutString
cnnOptions +=  ":" + layoutString
cnnOptions +=  ":" + trainingStrategyString
cnnOptions +=  ":Architecture=CPU"

##book CNN
factory.BookMethod(loader, ROOT.TMVA.Types.kDL, "DL_CNN", cnnOptions);


### Book Convolutional Neural Network in Keras using a generated model 

In [9]:
## to use tensorflow backend
import os
os.environ["KERAS_BACKEND"] = "tensorflow"

In [10]:
from keras.models import Sequential
from keras.optimizers import Adam, SGD
#from keras.initializers import TruncatedNormal
#from keras import initializations
from keras.layers import Input, Dense, Dropout, Flatten, Conv2D, MaxPooling2D, Reshape
#from keras.callbacks import ReduceLROnPlateau

Using TensorFlow backend.


In [11]:
model = Sequential()
model.add(Reshape((32,32, 1), input_shape=(1024,)))
model.add(Conv2D(10, kernel_size=(3,3), kernel_initializer='glorot_normal', activation='relu', padding='same' ) )
model.add(Conv2D(10, kernel_size=(3,3), kernel_initializer='glorot_normal', activation='relu', padding='same' ) )
#stride for maxpool is equal to pool size
model.add(MaxPooling2D(pool_size=(2, 2) ))
#model.add(Conv2D(10, activation='relu', kernel_size=(3,3), padding='same', kernel_initializer='glorot_normal'))
#model.add(Conv2D(10, activation='relu', kernel_size=(3,3), padding='same', kernel_initializer='glorot_normal'))
#model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
#model.add(Dropout(0.2))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.001), metrics=['accuracy'])
model.save('model_cnn.h5')
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
reshape_1 (Reshape)          (None, 32, 32, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 10)        100       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 10)        910       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 10)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 2560)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                163904    
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 130       
Total para

2018-09-20 13:04:38.632711: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA


In [12]:
kerasOption = ("H:!V:VarTransform=None:FilenameModel=model_cnn.h5:"
               "FilenameTrainedModel=trained_model_cnn.h5:NumEpochs=20:BatchSize=32")
factory.BookMethod(loader, ROOT.TMVA.Types.kPyKeras,"PyKeras",kerasOption)

<ROOT.TMVA::MethodPyKeras object ("PyKeras") at 0xd4b7c00>

Factory                  : Booking method: [1mPyKeras[0m
                         : 
                         : Load model from file: model_cnn.h5


## Train Methods

In [None]:
##retrieve data before to remove output during data retrieval
ROOT.TMVA.MsgLogger.InhibitOutput()
dloader.GetDefaultDataSetInfo().GetDataSet()
ROOT.TMVA.MsgLogger.EnableOutput()

In [13]:
factory.TrainAllMethods()

Exception: void TMVA::Factory::TrainAllMethods() =>
    vector::_M_range_check: __n (which is 64) >= this->size() (which is 64) (C++ exception of type out_of_range)

Factory                  : [1mTrain all methods[0m
DataSetFactory           : [dataset] : Number of events in input trees
                         : 
                         : 
                         : Number of training and testing events
                         : ---------------------------------------------------------------------------
                         : Signal     -- training events            : 5000
                         : Signal     -- testing events             : 15000
                         : Signal     -- training and testing events: 20000
                         : Background -- training events            : 5000
                         : Background -- testing events             : 15000
                         : Background -- training and testing events: 20000
                         : 
DataSetInfo              : Correlation matrix (Signal):
                         : ---------------------------------------------------------------------------------------

Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth only supported for histograms with >= 3 bins. Nbins = 2
Error in <TH1F::Smooth>: Smooth 

## Test and Evaluate Methods

In [None]:
factory.TestAllMethods()

In [None]:
ROOT.TMVA.MsgLogger.InhibitOutput()
factory.EvaluateAllMethods()
ROOT.TMVA.MsgLogger.EnableOutput()

## Plot ROC Curve
We enable JavaScript visualisation for the plots

In [None]:
//%jsroot on

In [None]:
c1 = factory.GetROCCurve(loader);
c1->Draw();


In [None]:
// close outputfile to save output file
outputFile->Close();