# Compare event numbers in synchronization n-tuples

This script takes the synchronization ntuples of the selected teams for a specific model and channel and compares the event numbers. At the end, this information is represented as a matrix, which holds the number of different event numbers.

**Contributing:** Before committing your code changes, please run `Cell/All Output/Clear` or `Kernel/Restart & Clear Output`. Otherwise you'll commit the print statements, which cause unnecessary changes to the notebook.

## Setup

**NOTE:** You have to edit only this section to run the synchronization!

For each team, you need to place a `<TEAM>.yaml.txt` file in the `teams/` folder. This file points for the desired model and channel to the correct synchronization ntuple.

In [None]:
# Select the teams
teams = ['KIT', 'CERN', 'DESY']

# Select the model
model = 'sm'

# Select the channel for the selected model
channel = 'mt'

# Toggle verbosity of notebook
# NOTE: If there are a lot of differences, the notebook can create a lot of output
verbose = True

## Compare event numbers of given synchronization ntuples

The following notebook should stay untouched unless you want to change its behaviour. Of course, check out the results of the different processing steps.

### Import modules

In [None]:
import yaml
import numpy as np
from warnings import warn
import os
from sys import stdout
import ROOT

# Enable Javascript in this notebook
%jsroot on

### Get configs from files

In [None]:
# Get configs
# NOTE: It is assumed that the files are named `<TEAM>.yaml.txt`
configs = {}
for team in teams:
    filepath = 'teams/{}.yaml.txt'.format(team) # FIXME: Set an absolute path here!
    if not os.path.isfile(filepath):
        warn('File not found for team {}: {}'.format(team, filepath))
    file_ = open(filepath)
    configs[team] = yaml.load(file_)

### Load trees and get number of entries in trees

In [None]:
# Load files and trees from config files
files = {}
trees = {}
entries = {}

for team in teams:
    # Check validity of config file
    if not model in configs[team]:
        warn('Model `{}` is not found in config of team {}'.format(model, team))
    if not channel in configs[team][model]:
        warn('Channel `{}` is not found for model `{}` in config of team {}'.format(channel, model, team))
    if not 'file' in configs[team][model][channel]:
        warn('Key `file` is not found for channel `{}` and model `{}` in config of team {}'.format(channel, model, team))
    fileName = configs[team][model][channel]['file']
    if not 'tree' in configs[team][model][channel]:
        warn('Key `tree` is not found for channel `{}` and model `{}` in config of team {}'.format(channel, model, team))
    treeName = configs[team][model][channel]['tree']
    
    # Load ROOT file and tree
    files[team] = ROOT.TFile(fileName)
    if files[team] == None:
        warn('Can not open ROOT file with path `{}` for team {}'.format(fileName, team))
    trees[team] = files[team].Get(treeName)
    if trees[team] == None:
        warn('Can not open tree `{}` from ROOT file with path `{}` for team {}'.format(treeName, fileName, team))
    
    # Get number of events
    entries[team] = trees[team].GetEntries()
    
# Print results
print('Number of entries in trees:')
for team in teams:
    print('  {}\t: {}'.format(team, entries[team]))

### Load event numbers from trees in lists

In [None]:
events = {}
stdout.write('Processing team:\n')
stdout.flush()
for team in teams:
    stdout.write('  {}\n'.format(team))
    stdout.flush()
    events[team] = np.zeros((entries[team],1), dtype=np.int64)
    for iEvent in range(entries[team]):
        trees[team].GetEntry(iEvent)
        events[team][iEvent] = trees[team].evt # NOTE: The events are read from the `evt` branch

### Compare event numbers

The comparison algorithm works as follows:

First, the event number are read out and sorted from smallest to highest entry. Then, we'll go for all combinations of two teams through this sorted lists and check whether the event numbers are the same.

If two event numbers in the lists do not match at the current index, then we go in the list where the number is missing one index further but stay in the other list on the same index. As well, an incrementor counts the differences and both event numbers are printed so that this specific events can be examined later on.

In [None]:
# Sort event lists
for team in events:
    events[team].sort(axis=0)
    
# Compare event numbers
if verbose > 0:
    stdout.write('Differences:\n')
    stdout.write('  team1\t->\tteam2\t : event1\t->\tevent2\n\n')
    stdout.flush()

differences = np.zeros((len(teams),len(teams)), dtype=np.uint64)
for i1, team1 in enumerate(teams):
    for i2, team2 in enumerate(teams):
        team1Min = int(np.min(events[team1]))
        team2Min = int(np.min(events[team2]))
        index1 = 0
        index2 = 0
        while index1 < len(events[team1]) or index2 < len(events[team2]):
            if events[team1][index1] != events[team2][index2]:
                differences[i1, i2] += 1
                if verbose > 0:
                    stdout.write('  {}\t->\t{} : {}\t->\t{}\n'.format(team1, team2, events[team1][index1], events[team2][index2]))
                    stdout.flush()
                if events[team1][index1]<events[team2][index2]:
                    index1 += 1
                else:
                    index2 += 1
            else:
                index1 += 1
                index2 += 1
    if verbose > 0:
        stdout.write('\n')
        stdout.flush()

## Draw difference matrix

Each entry of the matrix holds the number of differences observed in the synchronization ntuples.

**NOTE:** The drawn matrix as 2D histogram does not show zero differences. This case is represented as an white area without any entry.

In [None]:
# Setup canvas and pads
c = ROOT.TCanvas('canvas', 'canvas', 800, 500)
padHist = ROOT.TPad('padHist', 'padHist', 0.0, 0.0, 0.8, 1.0)
padHist.Draw()
padLegend = ROOT.TPad('padLegend', 'padLegend', 0.8, 0.0, 1.0, 1.0)
padLegend.Draw()
padHist.cd()

# Fill integer 2D histogram
hist = ROOT.TH2I('hist', 'hist', len(teams), -0.5, len(teams)-0.5, len(teams), -0.5, len(teams)-0.5)
for i1 in range(len(teams)):
    for i2 in range(len(teams)):
        gbin = hist.GetBin(i1+1, i2+1)
        hist.AddBinContent(gbin, differences[i1, i2])
        
# Set histogram options and draw
hist.SetTitle('Differences in event numbers')
hist.SetStats(False)
hist.GetXaxis().SetNdivisions(len(teams))
hist.GetYaxis().SetNdivisions(len(teams))
hist.Draw('COLZ')

# Setup textbox for legend
padLegend.cd()
text = ROOT.TPaveText(0.1, 0.1, 0.9, 0.9)
text.SetLabel('Teams')
text.SetTextAlign(12)
for i, team in enumerate(teams):
    text.AddText('{}: {}'.format(i, team))
text.Draw()

# Draw canvas
c.Draw()