# PURSUE Python for HEP: Vectors and Histograms

* So far, we have been using a limit set of features of the Hist library. In this notebook, we will take a closer look into this library and some of its very powerful features.
* Additionally, 

In [None]:
import skhep_testdata
import uproot
import numpy as np
import matplotlib.pyplot as plt
import awkward as ak
import hist
import vector

## Single Dimension Histogram

* Numpy has basic histogramming features. It includes a function, `np.histogram`, which, when given an array of data, bins data and returns the bin heights and bin edges.

In [None]:
tree = uproot.open(skhep_testdata.data_path("uproot-Zmumu.root"))["events"]
np.histogram(tree["M"].array(), 2)

* As we saw briefly, Matplotlib also has some plotting features for histograms.

In [None]:
plt.hist(tree["M"].array())

* However, for specialized uses like those in HEP, the features Numpy and Matplotlib provide are limited.
* The hist library offers advanced histogramming tools. It is built on top of boost-histogram, which is in itself a very fast histogramming library.
* The two powerful features we will take a look at are multidimensional histograms with named axis and slicing of histograms.

In [None]:
import hist

h = hist.Hist(hist.axis.Regular(120, 60, 120, name="mass"))

h.fill(tree["M"].array())

h.plot()

In [None]:
# Slicing bins. Numbers here are NOT coordinate values, but the idx for the bin
h[10:110].plot()

In [None]:
# To select by coordinate value
h[hist.loc(90):].plot()

In [None]:
# Here is how you rebin by a factor
h[hist.loc(70):hist.loc(110):hist.rebin(3)].plot()

In [None]:
# Integrating over a range
h[hist.loc(80):hist.loc(100):sum]

## Multiple Dimension Histograms

In [None]:
picodst = uproot.open(
    "https://pivarski-princeton.s3.amazonaws.com/pythia_ppZee_run17emb.picoDst.root:PicoDst"
)

vertex_data = picodst.arrays(filter_name="*mPrimaryVertex[XYZ]")

In [None]:
vertexhist = hist.Hist(
    hist.axis.Regular(600, -1, 1, label="x"),
    hist.axis.Regular(600, -1, 1, label="y"),
    hist.axis.Regular(40, -200, 200, label="z"),
)

vertexhist.fill(
    ak.flatten(vertex_data["Event.mPrimaryVertexX"]),
    ak.flatten(vertex_data["Event.mPrimaryVertexY"]),
    ak.flatten(vertex_data["Event.mPrimaryVertexZ"]),
)

main_art, top_art, side_art = vertexhist[
    hist.loc(-0.25) : hist.loc(0.25), hist.loc(-0.25) : hist.loc(0.25), sum # x and y from -0.25 to 0.25, z is intregrated over
].plot2d_full() # Allows marginal histograms along x and y

**Exercise**: Using the dimuon data we loaded from opendata, plot a 2d histogram of eta vs phi. Explain what you are looking at in the histogram.

In [None]:
# Answer

## Vectors

* We are dealing with physics data, so wouldn't it be cool to have some kind of data structure specifically made for this sort of purpose? Well, there is!
* Suppose you want to compute $\Delta R$. We know that this is given by
    $$
        \Delta R = \sqrt{(\Delta \eta)^2 + (\Delta \phi)^2}
    $$
    Now suppose you only have $p_x$, $p_y$ and $p_z$. Although you have all the information you need to compute $\Delta R$, it is a bit annoying and you will have to do some extra computations, making your code filled with intermediate steps and forcing you to define functions that should be common. This is where Vector can help! It automatically computes these values when requested.
* With vector, we can construct 2D, 3D and Lorentz vectors.


In [None]:
import vector

one = vector.obj(px=1, py=0, pz=0)
two = vector.obj(px=0, py=1, pz=1)

one + two

one.deltaR(two)

one.to_rhophieta()
two.to_rhophieta()

In [None]:
tree = uproot.open(skhep_testdata.data_path("uproot-Zmumu.root"))["events"]

one = ak.to_numpy(tree.arrays(filter_name=["E1", "p[xyz]1"]))
two = ak.to_numpy(tree.arrays(filter_name=["E2", "p[xyz]2"]))

# Changing dtype.names field
one.dtype.names = ("E", "px", "py", "pz")
two.dtype.names = ("E", "px", "py", "pz")

# Changing view of data
one = one.view(vector.MomentumNumpy4D)
two = two.view(vector.MomentumNumpy4D)

one + two

one.deltaR(two)

one.to_rhophieta()
two.to_rhophieta()