# Scikit-HEP ecosystem updates

**Note:** although I'm presenting in Jupyter, this is a talk, rather than a tutorial. You don't have to follow along.

<br><br><br><br><br>

## State of the ecosystem

<table width="100%"><tr style="background: white">
    <td align="left" width="50%"><img src="img/shells-border.png" width="95%"></td>
    <td align="right" width="50%"><img src="img/shells-hep.png" width="95%"></td>
</tr></table>

<br><br><br><br><br>

<img src="img/pip-allos-scikithep-log.png" width="100%">

<br><br><br><br><br>

## There are more tools than I could reasonably tell you about

And that's good:

   * approaching the ideal of each does one thing well
   * and is maintained by enthusiastic developers with recognition for their work
   * but we also know about each other and ensure that these tools work together

<br><br><br><br><br>

## Illustrative vertical slice: Uproot → Awkward Array → Vector → fastjet → hist

Why these five?

<br><br>

<img src="img/uproot-logo.png" width="200px">

<p style="font-size: 14pt">Reads ROOT data as <span style="background: yellow">arrays</span>.</p>

<br><br>

<img src="img/awkward-logo.png" width="200px">

<p style="font-size: 14pt">Manipulates <span style="background: yellow">arrays</span> of arbitrary data structures.</p>

<br><br>

<img src="img/vector-logo.png" width="200px">

<p style="font-size: 14pt">Manipulates <span style="background: yellow">arrays</span> of 2D, 3D, and Lorentz vectors.</p>

<br><br>

<img src="img/fastjet-logo-300px.png" width="200px">

<p style="font-size: 14pt">Finds jets in <span style="background: yellow">arrays</span> of Lorentz vectors.</p>

<br><br>

<img src="img/hist-logo.png" width="200px">

<p style="font-size: 14pt">Fills histograms with <span style="background: yellow">arrays</span>.</p>

<br><br><br><br><br>

## Major trend (back) toward arrays

<img src="img/chep-papers-paradigm.png" width="85%">

<br><br><br><br><br>

## Speedrun through the vertical slice

Get a TTree with Uproot (from a tutorial 2 years ago).

In [1]:
import uproot

In [2]:
tree = uproot.open("https://github.com/jpivarski-talks/2020-04-08-eic-jlab/raw/master/open_charm_18x275_10k.root:events/tree")
tree

<TTree 'tree' (52 branches) at 0x7f171e713c40>

Read some TBranches from it.

In [3]:
components = tree.arrays(["px", "py", "pz", "tot_e"])
components

<Array [{px: [-0.516, -0.246, ... 3.03]}] type='10000 * {"px": var * float64, "p...'>

Reformat them into an array of lists of four-vectors.

In [4]:
import awkward as ak
import vector
vector.register_awkward()

In [5]:
events = ak.zip(
    {"px": components.px, "py": components.py, "pz": components.pz, "E": components.tot_e},
    with_name="Momentum4D",
)
events

<MomentumArray4D [[{px: -0.516, ... E: 3.03}]] type='10000 * var * Momentum4D["p...'>

See that each list has a different length.

In [6]:
ak.num(events)

<Array [51, 26, 27, 28, 30, ... 37, 42, 25, 11] type='10000 * int64'>

Run FastJet's anti-$k_T$ clustering algorithm on all events.

In [7]:
import fastjet

In [8]:
cluster_sequence = fastjet.ClusterSequence(
    events,
    fastjet.JetDefinition(fastjet.antikt_algorithm, 0.5),
)
cluster_sequence

#--------------------------------------------------------------------------
#                         FastJet release 3.3.4
#                 M. Cacciari, G.P. Salam and G. Soyez                  
#     A software package for jet finding and analysis at colliders      
#                           http://fastjet.fr                           
#	                                                                      
# Please cite EPJC72(2012)1896 [arXiv:1111.6097] if you use this package
# for scientific work and optionally PLB641(2006)57 [hep-ph/0512210].   
#                                                                       
# FastJet is provided without warranty under the GNU GPL v2 or higher.  
# It uses T. Chan's closest pair algorithm, S. Fortune's Voronoi code
# and 3rd party plugin jet algorithms. See COPYING file for details.
#--------------------------------------------------------------------------


<fastjet._pyjet.AwkwardClusterSequence at 0x7f16e2702b80>

In [9]:
clustered_events = cluster_sequence.inclusive_jets()
clustered_events

<MomentumArray4D [[{px: 0.056, ... E: 17.3}]] type='10000 * var * Momentum4D["px...'>

Histogram the number of particles in events and the number of jets in events.

In [10]:
import hist

In [11]:
hist.new.Reg(101, -0.5, 100.5).Double().fill(ak.num(events))

In [12]:
hist.new.Reg(101, -0.5, 100.5).Double().fill(ak.num(clustered_events))

See what happens when you change $\Delta R$!

<br><br><br><br><br>

## The Scikit-HEP way of doing things