<img src="http://oproject.org/tiki-download_file.php?fileId=8&display&x=450&y=128">
<img src="http://files.oproject.org/tmvalogo.png" height="50%" width="50%">

# Regression Example

## Declare Factory
Initiate the TMVA library, get the data sample from github, and create a factory to do the regression.

In [None]:
TMVA::Tools::Instance();

auto inputFile = TFile::Open("inputdata_regression.root");
auto outputFile = TFile::Open("TMVA_RegressionOutput.root", "RECREATE");

TMVA::Factory factory("TMVARegression", outputFile,
                      "!V:!Silent:Color:DrawProgressBar:AnalysisType=Regression" ); 

## Declare DataLoader
Define the features and the target for the regression.

In [None]:
TMVA::DataLoader loader("dataset"); 

// Add the feature variables, names reference branches in inputFile ttree
loader.AddVariable("var1");
loader.AddVariable("var2");
loader.AddVariable("var3");
loader.AddVariable("var4");
loader.AddVariable("var5 := var1-var3"); // create new features
loader.AddVariable("var6 := var1+var2");

loader.AddTarget( "target := var2+var3" ); // define the target for the regression


## Setup Dataset
Link dataloader to dataset.

In [None]:
TTree *tree;
inputFile->GetObject("Sig", tree);

TCut mycuts = ""; // e.g. TCut mycuts = "abs(var1)<0.5";

loader.AddRegressionTree(tree, 1.0);   // link the TTree to the loader, weight for each event  = 1
loader.PrepareTrainingAndTestTree(mycuts,
                                   "nTrain_Regression=1000:nTest_Regression=1000:SplitMode=Random:NormMode=NumEvents:!V" );

# Book The Regression Method

Book the method for regression. Here we choose the Boosted Decision Tree model. You have to use gradient boosted trees for regression, hence the BDTG and BoostType=Grad. 

Define the hyperparameters: ntrees, boosttype, shrinkage, and the depth. Also define the loss function you want to use: 'AbsoluteDeviation', 'Huber', or 'LeastSquares'. nCuts determines how finely to look at each feature. Larger values take more time, but you may get more accurate results.

In [None]:
// Boosted Decision Trees 
factory.BookMethod(&loader,TMVA::Types::kBDT, "BDTG",
                   TString("!H:!V:NTrees=64::BoostType=Grad:Shrinkage=0.3:nCuts=20:MaxDepth=4:")+
                   TString("RegressionLossFunctionBDTG=AbsoluteDeviation"));

# Train Method

In [None]:
factory.TrainAllMethods();

# Test and Evaluate the Model

In [None]:
factory.TestAllMethods();    

In [None]:
factory.EvaluateAllMethods();

## Gather and Plot the Results
Let's plot the residuals for the BDTG predictions. First, close the output file so that it saves to disk and we can open it without issue. Then get the results on the test set. Finally, plot the residuals.

In [None]:
%jsroot on
outputFile->Close();
auto resultsFile = TFile::Open("TMVA_RegressionOutput.root");
auto resultsTree = resultsFile->Get("dataset/TestTree"); 
TCanvas c;
resultsTree->Draw("BDTG-target"); // BDTG is the predicted value, target is the true value
c.Draw();