#  T M V A Cross Validation Regression
This macro provides an example of how to use TMVA for k-folds cross
evaluation.

As input data is used a toy-MC sample consisting of two guassian
distributions.

The output file "TMVA.root" can be analysed with the use of dedicated
macros (simply say: root -l <macro.C>), which can be conveniently
invoked through a GUI that will appear at the end of the run of this macro.
Launch the GUI via the command:

```
root -l -e 'TMVA::TMVAGui("TMVA.root")'
```

## Cross Evaluation
Cross evaluation is a special case of k-folds cross validation where the
splitting into k folds is computed deterministically. This ensures that the
a given event will always end up in the same fold.

In addition all resulting classifiers are saved and can be applied to new
data using `MethodCrossValidation`. One requirement for this to work is a
splitting function that is evaluated for each event to determine into what
fold it goes (for training/evaluation) or to what classifier (for
application).

## Split Expression
Cross evaluation uses a deterministic split to partition the data into
folds called the split expression. The expression can be any valid
`TFormula` as long as all parts used are defined.

For each event the split expression is evaluated to a number and the event
is put in the fold corresponding to that number.

It is recommended to always use `%int([NumFolds])` at the end of the
expression.

The split expression has access to all spectators and variables defined in
the dataloader. Additionally, the number of folds in the split can be
accessed with `NumFolds` (or `numFolds`).

### Example
 ```
 "int(fabs([eventID]))%int([NumFolds])"
 ```

- Project   : TMVA - a ROOT-integrated toolkit for multivariate data analysis
- Package   : TMVA
- Root Macro: TMVACrossValidationRegression



**Author:** Kim Albertsson (adapted from code originally by Andreas Hoecker)  
<i><small>This notebook tutorial was automatically generated with <a href= "https://github.com/root-project/root/blob/master/documentation/doxygen/converttonotebook.py">ROOTBOOK-izer</a> from the macro found in the ROOT repository  on Thursday, August 29, 2019 at 03:48 AM.</small></i>

In [1]:
%%cpp -d
#include <cstdlib>
#include <iostream>
#include <map>
#include <string>

#include "TChain.h"
#include "TFile.h"
#include "TTree.h"
#include "TString.h"
#include "TObjString.h"
#include "TSystem.h"
#include "TROOT.h"

#include "TMVA/Factory.h"
#include "TMVA/DataLoader.h"
#include "TMVA/Tools.h"
#include "TMVA/TMVAGui.h"
#include "TMVA/CrossValidation.h"

 A helper function is created: 

In [2]:
%%cpp -d
TFile * getDataFile(TString fname) {
   TFile *input(0);

   if (!gSystem->AccessPathName(fname)) {
      input = TFile::Open(fname); // check if file in local directory exists
   } else {
      // if not: download from ROOT server
      TFile::SetCacheFileDir(".");
      input = TFile::Open("http://root.cern.ch/files/tmva_reg_example.root", "CACHEREAD");
   }

   if (!input) {
      std::cout << "ERROR: could not open data file " << fname << std::endl;
      exit(1);
   }

   return input;
}

This loads the library

In [3]:
TMVA::Tools::Instance();

--------------------------------------------------------------------------

Create a root output file where tmva will store ntuples, histograms, etc.

In [4]:
TString outfileName("TMVARegCv.root");
TFile * outputFile = TFile::Open(outfileName, "RECREATE");

TString infileName("./files/tmva_reg_example.root");
TFile * inputFile = getDataFile(infileName);

TMVA::DataLoader *dataloader=new TMVA::DataLoader("dataset");

dataloader->AddVariable("var1", "Variable 1", "units", 'F');
dataloader->AddVariable("var2", "Variable 2", "units", 'F');

Info in <TFile::OpenFromCache>: using local cache copy of http://root.cern.ch/files/tmva_reg_example.root [./files/tmva_reg_example.root]


Add the variable carrying the regression target

In [5]:
dataloader->AddTarget("fvalue");

TTree * regTree = (TTree*)inputFile->Get("TreeR");
dataloader->AddRegressionTree(regTree, 1.0);

DataSetInfo              : [dataset] : Added class "Regression"
                         : Add Tree TreeR of type Regression with 10000 events


Individual events can be weighted
 dataloader->SetWeightExpression("weight", "Regression");

In [6]:
std::cout << "--- TMVACrossValidationRegression: Using input file: " << inputFile->GetName() << std::endl;

--- TMVACrossValidationRegression: Using input file: ./files/tmva_reg_example.root


Bypasses the normal splitting mechanism, cv uses a new system for this.
 Unfortunately the old system is unhappy if we leave the test set empty so
 we ensure that there is at least one event by placing the first event in
 it.
 You can with the selection cut place a global cut on the defined
 variables. Only events passing the cut will be using in training/testing.
 Example: `TCut selectionCut = "var1 < 1";`

In [7]:
TCut selectionCut = "";
dataloader->PrepareTrainingAndTestTree(selectionCut, "nTest_Regression=1"
                                                     ":SplitMode=Block"
                                                     ":NormMode=NumEvents"
                                                     ":!V");

                         : Dataset[dataset] : Class index : 0  name : Regression


--------------------------------------------------------------------------

 This sets up a CrossValidation class (which wraps a TMVA::Factory
 internally) for 2-fold cross validation. The data will be split into the
 two folds randomly if `splitExpr` is `""`.
 
 One can also give a deterministic split using spectator variables. An
 example would be e.g. `"int(fabs([spec1]))%int([NumFolds])"`.

In [8]:
UInt_t numFolds = 2;
TString analysisType = "Regression";
TString splitExpr = "";

TString cvOptions = Form("!V"
                         ":!Silent"
                         ":ModelPersistence"
                         ":!FoldFileOutput"
                         ":AnalysisType=%s"
                         ":NumFolds=%i"
                         ":SplitExpr=%s",
                         analysisType.Data(), numFolds, splitExpr.Data());

TMVA::CrossValidation cv{"TMVACrossValidationRegression", dataloader, outputFile, cvOptions};

--------------------------------------------------------------------------

 Books a method to use for evaluation

In [9]:
cv.BookMethod(TMVA::Types::kBDT, "BDTG",
              "!H:!V:NTrees=500:BoostType=Grad:Shrinkage=0.1:"
              "UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=3");

--------------------------------------------------------------------------

 Train, test and evaluate the booked methods.
 Evaluates the booked methods once for each fold and aggregates the result
 in the specified output file.

In [10]:
cv.Evaluate();

                         : Evaluate method: BDTG
<HEADER> Factory                  : Booking method: BDTG_fold1
                         : 
                         : the option NegWeightTreatment=InverseBoostNegWeights does not exist for BoostType=Grad
                         : --> change to new default NegWeightTreatment=Pray
                         : Regression Loss Function: Huber
                         : Training 500 Decision Trees ... patience please
                         : Elapsed time for training with 4999 events: 1.39 sec         
                         : Dataset[dataset] : Create results for training
                         : Dataset[dataset] : Evaluation of BDTG_fold1 on training sample
                         : Dataset[dataset] : Elapsed time for evaluation of 4999 events: 0.218 sec       
                         : Create variable histograms
                         : Create regression target histograms
                         : Create regression average devia

--------------------------------------------------------------------------

 Save the output

In [11]:
outputFile->Close();

std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
std::cout << "==> TMVACrossValidationRegression is done!" << std::endl;

==> Wrote root file: TMVARegCv.root
==> TMVACrossValidationRegression is done!


--------------------------------------------------------------------------

 Launch the GUI for the root macros

In [12]:
if (!gROOT->IsBatch()) {
   TMVA::TMVAGui(outfileName);
}

return 0;