Welcome to an interactive tutorial for the `b2_plotter` module, which has functionality in both python scripts
and jupyter notebooks. It is meant to primarily be used for notebooks, though.

Below is a sample analysis using the module to create plots with detailed comments.

Note that before you can use its functionality within a file, the `b2_plotter` module needs to be installed via:

`pip install b2_plotter`

in whichever workspace you are in.

Paul

In [3]:
# First we need to import libraries. Strictly speaking, the only ones this demo uses are pandas and root_pandas, but I have imported
# the others in case you need their extra functionalities.

# These modules, with the expection of b2_plotter, will automatically be available to you after setting up
# the belle 2 environment.

import numpy as np
import matplotlib.pyplot as plt 
import root_pandas as rp 
import pandas as pd 


# Import the module's Plotter() class
from b2_plotter.Plotter import Plotter

# Setup matplotlib environment
%matplotlib inline 


ModuleNotFoundError: No module named 'numpy'

In [4]:
# Define the absolute path to the files you will use.
# Note that the module currently does not support data -- that will be coming soon.
# For this example, taupair/charged/mixed were excluded, but the functions will still work
# if you include them.

ccbar = '/belle2work/psgebeli/samples/gmc/mc15rib/xipipi/ccbar.root'
uubar = '/belle2work/psgebeli/samples/gmc/mc15rib/xipipi/uubar.root'
ddbar = '/belle2work/psgebeli/samples/gmc/mc15rib/xipipi/ddbar.root'
ssbar = '/belle2work/psgebeli/samples/gmc/mc15rib/xipipi/ssbar.root'

# Define columns to reduce runtime. You dont need to read in every variable from the root files, just
# the ones that you think will be beneficial to plot. 
# However, you MUST read in the "isSignal" variable and the variable corresponding to your particle's mass.

#               REQUIRED
#       _______________________
#       |                     |
mycols= ['xic_M','xic_isSignal', 'xic_significanceOfDistance','xi_significanceOfDistance', 
         'lambda0_p_protonID', 'xi_M', 'xic_mcFlightTime', 'xic_chiProb']


In [5]:
# Create pandas dataframes 
df_ccbar = rp.read_root(ccbar, key='xic_tree', columns = mycols)
df_uubar = rp.read_root(uubar, key='xic_tree', columns = mycols)
df_ssbar = rp.read_root(ssbar, key='xic_tree', columns = mycols)
df_ddbar = rp.read_root(ddbar, key='xic_tree', columns = mycols)

# Define the dataframe that will be your Monte Carlo signal as well.
# Using ccbar is fine since we work in charm, but I recommend just using all of them
# via a concatenation:
df_mc = pd.concat([df_ccbar, df_uubar, df_ssbar, df_ddbar])


# Create a dictionary that the Plotter() class will use as input. A dictionary is a list of keys and values
# {key1: value1, key2:value2, ...}. In this case, the keys are the labels that will be used in the plots.
mcdfs = {'ccbar' : df_ccbar, 'uubar' : df_uubar, 'ssbar' : df_ssbar, 'ddbar' : df_ddbar}


NameError: name 'rp' is not defined

In [9]:
# Create the plotter object. This calls the __init__ method, which you can see documentation for in README.md in the git repository.

# __init__(self, isSigvar: str, mcdfs: dict, signaldf: pd.DataFrame, datadf: pd.DataFrame = None, interactive: bool = True), where

# isSigvar is the name of your isSignal variable
# mcdfs is the dictionary we defined earlier 
# signaldf is whatever dataframe you want signal to be extracted from (cc or a concatenation)
# datadf is the dataframe corresponding to data, which currently does not have functionality, so ignore until future release
# interactive is a boolean that decides if the plots will be shown (jupyter notebook) or saved as pngs (python scripts)
        
plotter = Plotter(mcdfs = mcdfs, signaldf = df_mc, isSigvar = 'xic_isSignal', interactive = True)

NameError: name 'Plotter' is not defined

In [10]:
# plot() function 

# Now you can define cuts and make plots using plot(). 

# plot(self, var, cuts, myrange = (), nbins = 100, isLog = False, xlabel = '', scale = 1, bgscale = 1), where 

# var is the name of the variable you want to plot (as a string)
# cuts are the cuts to be applied (as a string)
# myrange (optional) is a tuple of the range you wish to use. it defaults to (), so if no range is explicity defined, it will be dynamically calculated
# nbins (optional) is the number of bins to use
# isLog (optional) is a boolean deciding if the plot will be on a logarithmic scale or not 
# xlabel (optional) is a string for labelling the x axis. if none is defined, the label will just be equal to the variable 'var'
# scale, bgscale (optional) are floats determining the amount by which to scale the signal and bkg, respectively, default to 1 

xicmassrangeloose = '2.3 < xic_M < 2.65'

masscuts = xicmassrangeloose + '& xi_M > 1.32 & xi_M < 1.325'
pidcuts = 'lambda0_p_protonID > 0.96'
flightcuts = 'xi_significanceOfDistance > 5'

mycuts = f'{masscuts} and {pidcuts} and {flightcuts}'

# Bare minimum
plotter.plot('xic_M', cuts = mycuts)

# More detailed 
plotter.plot('xic_M', cuts = mycuts, myrange = (2.3, 2.65), nbins = 100, isLog = False, xlabel = r'$\Xi_c^+$ Mass [GeV/c^2]', scale = 1, bgscale = 1)


NameError: name 'plotter' is not defined

In [11]:
# plotFom function 

# Used to create a figure of merit 

# plotFom(self, var, massvar, myrange, signalregion, isGreaterThan = True, nbins = 100, xlabel = '')

# var is the name of your variable to plot, as a string
# massvar is the name of the variable for your particle's mass, which is used to calculate signal yield 
# myrange (optional) is the same as the myrange parameter from plot()

# signal region is a tuple representing the signal region for your particle, a decent way to define this is 
# to perform a fit on your mass and define the signal region as (mean - 3*sigma, mean + 3*sigma)

# isGreaterThan is a bool that tells the FOM to test all of the cuts like: var > testcut_value 
# if this is false, then it calculates var < testcut_value
# You should have some intuition about which to use (e.g greaterthan = True for protonID), but if not, plot both!

# nbins, xlabel are the same as for plot()

plotter.plotFom(var = 'xi_M', massvar = 'xic_M', myrange = (), signalregion = (2.46, 2.475), isGreaterThan = True, nbins = 100, xlabel = '')

# Note that this function also returns a tuple (optimal_cut, max_fom), so you dont have to eyeball the FOM peak.

NameError: name 'plotter' is not defined

In [12]:
# plotStep 

# A useful plot for when the signal is buried underneath background, or you just dont care to see the distributions for each of the MC types cc, uu
# etc. individually 


# plotStep(self, var, cuts, myrange, nbins = 100, xlabel = '')

# All of these params are the same as they are in plot()

plotter.plotStep('xic_M', cuts = xicmassrangeloose)


NameError: name 'plotter' is not defined

In [None]:
# get_purity and get_sigeff 

# These functions return the purity and signal efficiency of a cut (or series of cuts)

# get_sigeff(self, cuts, massVar, signalregion)
# get_purity(self, cuts, massVar, signalregion)

testcut = 'lambda0_p_protonID'

sigeff = plotter.get_sigeff(cuts = testcut, massvar = 'xic_M', signalregion = (2.46, 2.475))
purity = plotter.get_purity(cuts = testcut, massvar = 'xic_M', signalregion = (2.46, 2.475))

# I just like to print something like this -- this helps to see if the cut is good or not.
print(f'Applying {testcut} yields a purity of {purity}% and has signal efficiency {sigeff}.')