# Columnar analysis with Awkward Array

## How this works as a hands-on tutorial

Even though I don't have formal exercises scattered throughout these notebooks, this session can still be interactive.

   * **You** should open each notebook in Binder (see [GitHub README](https://github.com/jpivarski/2020-06-08-uproot-awkward-columnar-hats)) and evaluate cells, following along with me.
   * **I** should pause frequently and stay open to questions. I'll be monitoring the videoconference chat.
   * **We** should feel free to step off the path and try to answer "What if?" questions in real time.

Not all digressions will lead to an answer—I often realize, "That's why it didn't work!" long after the tutorial is over—but tinkering is how we learn.

Consider this a tour and I'm your guide. The planned route is a suggestion to get things started, but your questions are more important.

(Also, I'm awful at writing formal exercises; they end up being too easy _and_ too hard.)

<br><br><br>

## Array-based programming

One of the first programming languages, named **APL** ("A Programming Language") was array-based. It started as a notation for _describing_ hand-written machine code and was later made interactive.

**Nial** was also theoretically motivated, and the two of these inspired a generation of direct descendants (green).

Meanwhile, the **S** language for statistics borrowed many of these ideas while being focused on a particular domain. Its descendent, **R**, is still widely used.

**IDL** was invented for the sciences and gained a lot of traction as an alternative to writing custom Fortran, again using vectorization as a first-class concept.

**MATLAB** was similarly gained traction in the sciences as a commercial product.

**PDL** (Perl Data Language) and **NumPy** introduced the same concepts as libraries within an established language (Perl and Python). **Julia** has some vector-like interfaces, though its focus is on just-in-time compiling imperative code.

![](img/apl-timeline.png)

<br><br><br>

Common features of array-based languages:

   * Arrays are the central data type with most operations applying to arrays. (By contrast, C requires explicit iteration over the arrays: it's imperative.)
   * They are _all_ interactive languages. The array-at-a-time logic makes it possible to define precompiled routines that run in response to user commands.
   * They are primarily data analysis languages, highly targeted to the sciences and statistics.

In retrospect, it sounds like a perfect fit.

<br><br><br>

In this plot of the "astronomical" rise of Python, note that 2 of the 3 languages it's displacing are array languages.

![](img/mentions-of-programming-languages.png)

<br><br><br>

## Why not for particle physics?

Because **data structures**. Particle physicists have _always_ needed to deal with complex data structures, so much so that we invented packages to add them to Fortran.

The following is from [_Initiation to HYDRA_ by R.K. Böck (1976)](https://cds.cern.ch/record/864527?ln=en) as part of an explanation of what a "data structure" is, at a time before Fortran had `FOR` loops. (HYDRA was merged into ZEBRA, which became the basis for ROOT I/O.)

We would draw similar diagrams today.

![](img/hydra-2.png)

<br><br><br>

But the modify-compile-rerun cycle of C++ is too long for interactive data analysis. That's why ROOT invented CINT and then Cling.

But C++ is too complex of a language for data-focused tasks. That's why I was thinking a lot about [extending query languages (like SQL) to data structures](https://stackoverflow.com/questions/38831961/what-declarative-language-is-good-at-analysis-of-tree-like-data).

But I was surprised by how useful the simple JaggedArray class in Uproot turned out to be. My conclusion was that you don't need a new language, just some data types and operations.

![](img/uproot-awkward-timeline.png)

<br><br><br>

<font size="15">That's what </font><img src="img/awkward-logo-300px.png" style="vertical-align:middle"><font size="15"> is.</font>


Just arrays, but with awkward shapes.

![](img/cartoon-schematic.png)

<br><br><br>