# Dilepton analysis - handling the MC background


### <font color='red'>Not a finished example!!</font>  
This example is only finished up to and including section 2. 

In [1]:
#include <iostream>
#include <string>
#include <stdio.h>

In [2]:
%jsroot on

## 1. Reading the dataset

In [3]:
TChain *dataset = new TChain("mini"); 

A list of all the background samples and their IDs can be found in **Background_samples.txt**. We read that list, and add all the samples to the TChain. We also (for later convenience) make a vector containing the dataset IDs. 

In [4]:
TString sample; 
TString path; 
vector<Int_t> dataset_IDs;
Int_t DSID;

In [5]:
ifstream infile("Background_samples.txt");

In [6]:
infile.clear();
infile.seekg(0, ios::beg);  // Start at the beginning of the file
dataset->Reset(); 
while(infile >> sample >> DSID){
    path = "DataSamples/MC/"+sample; // Specify path to the samples 
    dataset->Add(path);  
    dataset_IDs.push_back(DSID);
}

Define the variables you want to include in the analysis, and link them to branches in the TTree.

In [7]:
const int vs = 5; 

Int_t lepton_n = -1, lepton_charge[vs], lepton_type[vs], ID; 

Float_t lepton_pt[vs], lepton_E[vs], lepton_phi[vs], lepton_eta[vs], MET;

In [8]:
dataset->SetBranchAddress("lep_n",      &lepton_n);
dataset->SetBranchAddress("lep_charge", &lepton_charge);
dataset->SetBranchAddress("lep_type",   &lepton_type);
dataset->SetBranchAddress("lep_pt",     &lepton_pt);
dataset->SetBranchAddress("lep_eta",    &lepton_eta);
dataset->SetBranchAddress("lep_phi",    &lepton_phi);
dataset->SetBranchAddress("lep_E",      &lepton_E);
dataset->SetBranchAddress("met_et",     &MET); 
dataset->SetBranchAddress("channelNumber", &ID);

## 2. Making (a lot of) histograms

Now that we have read our dataset we want to start analyzing the data. To do so we need to put the data into histograms. For reasons that will become clear later in the analysis we must (for each variable) make one histogram per dataset ID. (We have 31 background samples, so if we want to study 10 variables we have to make 310 histograms!) A very elegant way of dealing with all these histograms is by using [map](http://www.cplusplus.com/reference/map/map/)s (the C$++$ equivalent of Python dictionaries). Below we define one map for each variable. Here the *key values* are the dataset IDs, while the *mapped values* are the histograms.   

In [9]:
map<Int_t, TH1*> hist_mll; 
map<Int_t, TH1*> hist_lep_pt; 
map<Int_t, TH1*> hist_met;

In [10]:
for(const auto & i:dataset_IDs){
    hist_mll[i] = new TH1F(); 
    hist_lep_pt[i] = new TH1F(); 
    hist_met[i] = new TH1F();
}

In [11]:
for(const auto & i:dataset_IDs){
    hist_mll[i]->SetNameTitle("hist_mll", "Invariant mass"); 
    hist_lep_pt[i]->SetNameTitle("hist_lep_pt", "Lepton pT"); 
    hist_met[i]->SetNameTitle("hist_met", "Missing ET");
    hist_mll[i]->SetBins(20,0,500); 
    hist_lep_pt[i]->SetBins(20,0,1000);
    hist_met[i]->SetBins(20,0,500); 
}

In [12]:
TLorentzVector l1, l2, dileptons; 

In [39]:
for(const auto & i:dataset_IDs){
    hist_mll[i]->Reset(); 
    hist_lep_pt[i]->Reset(); 
    hist_met[i]->Reset();
}

### 2.1 Fill the histograms 
We can now loop over all events in our dataset, implement desired cuts, and fill the histograms we created above. In this example we choose only events containing exactly to same flavour leptons with opposite charge (i.e. $e^+e^-$ or $\mu^+\mu^-$). 

In [14]:
int nentries = (Int_t)dataset->GetEntries();

In [15]:
cout << nentries << endl; 

44742556


In [40]:
for (int i = 0; i < nentries ; i++){
    
    //if( i%1000000 == 0){ cout << i << " events processed" << endl;}    
    if( i % 100 == 0 ){ 
    dataset->GetEntry(i); // We "pull out" the i'th entry in the chain. The variables are now 
                          // available through the names we have given them. 
    
    // Cut #1: Require (exactly) 2 leptons
    if(lepton_n == 2)
    {
        // Cut #2: Require opposite charge
        if(lepton_charge[0] != lepton_charge[1])
        {
            // Cut #3: Require same flavour (2 electrons or 2 muons)
            if(lepton_type[0] == lepton_type[1])
            {
                l1.SetPtEtaPhiE(lepton_pt[0]/1000., lepton_eta[0], lepton_phi[0], lepton_E[0]/1000.);
                l2.SetPtEtaPhiE(lepton_pt[1]/1000., lepton_eta[1], lepton_phi[1], lepton_E[1]/1000.);
                // Variables are stored in the TTree with unit MeV, so we need to divide by 1000 
                // to get GeV, which is a more practical unit. 
                
                dileptons = l1 + l2;   
    
                hist_mll[ID]->Fill(dileptons.M());
                hist_lep_pt[ID]->Fill(l1.Pt());
                hist_lep_pt[ID]->Fill(l2.Pt()); 
                hist_met[ID]->Fill(MET/1000);   
                
            }
        }
    }
    }        
}
cout << "Loop finished!" << endl; 

Loop finished!


We have now done the "heavy lifting" of an analysis, i.e. looping through all the events. Usually in such an analysis we create new ROOT files where we store the histograms we made above, and then analyse the output in a separate program/script. The advantage of doing this is that you can do the rest of the analysis in another language, e.g. Python, since we are done with part that requires the speed of C$++$. If you want to write ROOT files you can check out the [TFile](https://root.cern.ch/doc/master/classTFile.html) class reference. In this example we will however carry on in C$++$. 

## 3. Scale and classify the histograms

Before we are ready to make plots we need to do some further processing of the histograms we made above. A very important task is to scale the histograms to the right cross section and luminosity, and we also need group the histograms in different background categories.   

[THStack](https://root.cern.ch/doc/master/classTHStack.html) 
[TColor](https://root.cern.ch/doc/master/classTColor.html) 

In [19]:
map<TString, TH1*> H_mll; 
map<TString, TH1*> H_lep_pt; 
map<TString, TH1*> H_met;

In [20]:
vector<TString> Backgrounds; 

In [48]:
Backgrounds = {"Higgs","Diboson", "Wjets", "DY", "singleTop", "ttbar", "Zjets"}; 

In [22]:
for(const auto i:Backgrounds){
    H_mll[i] = new TH1F(); 
    H_lep_pt[i] = new TH1F(); 
    H_met[i] = new TH1F(); 
}

In [52]:
for(const auto & i:Backgrounds){
    H_mll[i]->Reset(); 
    H_lep_pt[i]->Reset(); 
    H_met[i]->Reset();
}

In [53]:
for(const auto & i:Backgrounds){
    H_mll[i]->SetNameTitle("hist_mll", "Invariant mass"); 
    H_lep_pt[i]->SetNameTitle("hist_lep_pt", "Lepton pT"); 
    H_met[i]->SetNameTitle("hist_met", "Missing ET");
    H_mll[i]->SetBins(20,0,500); 
    H_lep_pt[i]->SetBins(20,0,1000);
    H_met[i]->SetBins(20,0,500); 
}

In [25]:
ifstream info("Infofile.txt"); 
TString process; 
TString type; 
Int_t dsid; 
Int_t n_events; 
Double_t red_eff; 
Double_t sum_w; 
Double_t x_sec; 
Double_t L = 1000.6; // Integrated luminosity (pb)
Int_t sf; 

In [54]:
info.clear();
info.seekg(0, ios::beg);  
while(info >> process >> type >> dsid >> n_events >> red_eff >> sum_w >> x_sec){
    //scale_factors[DSID] = x_sec*L/n_events; 
    sf = x_sec*L/n_events; 
    //sf = 1.0; 
    //hist_mll[dsid]->Scale(sf); 
    //hist_lep_pt[dsid]->Scale(sf); 
    //hist_met[dsid]->Scale(sf); 
    
    H_mll[type]->Add(hist_mll[dsid]); 
    H_lep_pt[type]->Add(hist_lep_pt[dsid]); 
    H_met[type]->Add(hist_met[dsid]); 
    
}

## 4. Stacking and plotting 

In [28]:
map<TString, Int_t> colors; 

In [55]:
colors["Diboson"] = kGreen; 
colors["Zjets"] = kYellow; 
colors["ttbar"] = kRed;
colors["singleTop"] = kBlue-7; 
colors["Wjets"] = kBlue+3; 
colors["DY"] = kOrange+1; 
colors["Higgs"] = kMagenta; 

In [56]:
for(const auto h:Backgrounds){
    H_mll[h]->SetFillColor(colors[h]); 
    H_met[h]->SetFillColor(colors[h]);
    H_lep_pt[h]->SetFillColor(colors[h]);
    
    H_mll[h]->SetLineColor(colors[h]); 
    H_met[h]->SetLineColor(colors[h]);
    H_lep_pt[h]->SetLineColor(colors[h]);
}

In [31]:
THStack *stack_mll = new THStack("Invariant mass", "");

In [32]:
THStack *stack_met = new THStack("Missing ET", ""); 

In [57]:
for(const auto h:Backgrounds){
    stack_mll->RecursiveRemove(H_mll[h]); // Remove previously stacked histograms  
    stack_met->RecursiveRemove(H_met[h]);
    stack_mll->Add(H_mll[h]); 
    stack_met->Add(H_met[h]);
}    

Now we make a legend with the different backgrounds, and plot the stacks. 

In [34]:
gStyle->SetLegendBorderSize(0); // Remove (default) border around legend 
TLegend *legend = new TLegend(0.65, 0.70, 0.85, 0.85); 

In [None]:
legend->

In [35]:
legend->Clear();
for(const auto i:Backgrounds){
    legend->AddEntry(H_mll[i], i, "f");  // Add your histograms to the legend
} 

In [31]:
TCanvas *C = new TCanvas("c", "c", 600, 600);

In [36]:
gPad->SetLogy(); // Set logarithmic y-axis

In [50]:
stack_mll->Draw(); 
stack_mll->GetYaxis()->SetTitle("# events");
stack_mll->GetYaxis()->SetTitleOffset(1.3); 
stack_mll->GetXaxis()->SetTitle("m_{ll} (GeV)");
stack_mll->GetXaxis()->SetTitleOffset(1.3);
legend->Draw();
C->Draw();

In [51]:
stack_met->Draw(); 
stack_met->GetYaxis()->SetTitle("# events");
stack_met->GetYaxis()->SetTitleOffset(1.3); 
stack_met->GetXaxis()->SetTitle("E_{T}^{miss} (GeV)");
stack_met->GetXaxis()->SetTitleOffset(1.3);
stack_met->GetXaxis()->SetLimits(0,250);
legend->Draw();
C->Draw();

## 5. Statistical analysis of results? 