This repository contains three notebooks on columnar data analysis, presented at CoDaS-HEP at 12:30pm on August 3, 2022 by Jim Pivarski and Ioana Ifrim.
You don't need to install anything on your computer to participate; I encourage everyone to join on Binder.
Binder tips:
If your notebook becomes unresponsive, reconnect to the kernel or restart the kernel from the "Kernel" menu.
While working on exercises, keep a copy of your work-in-progress in a text editor, so that you don't lose them if the web page reloads. "Run → Run All Above Selected Cell" and "Kernel → Restart Kernel and Run up to Selected Cell" will rerun all of the code to get your Python session back to the state it was in before a page reload or kernel restart.
We use the libraries and versions listed in environment.yml. You also need to install JupyterLab. You don't have to install the libraries with pip (you can use conda, for instance), and you don't need to use the exact versions that this tutorial has been pinned to, but you should probably use at least those versions.
We won't spend any time in the tutorial session on installing libraries. If an installation on your computer doesn't work, switch to Binder by pressing the button above.
If you want to see the notebooks online but don't want to execute them in Binder, the order is
- part-1.ipynb: overview of programming paradigms and array-oriented in particular
- part-2.ipynb: overview of tools for columnar data analysis of particle physics data
- project.ipynb: set-up and challenge problems—discovering the Higgs boson
The default notebooks are unevaluated. To see static outputs from a previous run, look in the evaluated directory.
At SciPy 2022, Jim presented array-oriented programming for a non-physics audience: materials (GitHub and Binder), video (YouTube).
The focus and the problems are different, so if you'd like more practice, take a look!