# Welcome to PyHEP 2020!

<br><br><br>

Before writing this tutorial, I took a look at the survey...

In [None]:
import pandas
df = pandas.read_csv("survey-results.csv")
df

In [None]:
df["Professional life: What best describes your occupation?"].value_counts(ascending=True).plot.barh();

In [None]:
df["Professional life: What best describes the stage of your professional career?"].value_counts(ascending=True).plot.barh();

In [None]:
languages = [
    "C or C++",
    "Python",
    "Matlab",
    "Javascript or other browser-based (e.g. TypeScript, CoffeeScript)",
    "Verilog, VHDL, or other hardware description language",
    "R",
    "Java or other JVM-based (e.g. Kotlin, Scala, Clojure)",
    "Perl",
    "PHP",
    "C#",
    "Julia",
    "Go",
    "Swift",
    "Rust",
    "Ruby",
    "Haskell",
    "Raw assembly or machine code",
    "Other, not listed above",
]
def explode(responses):
    responses = [response.strip() for response in responses.split(";")]
    return [1.0 if language in responses else 0.0 for language in languages]
exploded = df[["Computing and programming: Which of the following languages do you use regularly (i.e. more than 10% of your work)?"]].fillna("").applymap(explode)
indicator = pandas.DataFrame(exploded.iloc[:, 0].tolist(), columns=languages)
indicator.div(indicator.sum(axis=1), axis=0).sum(axis=0).iloc[::-1].plot.barh(figsize=(5, 7));

In [None]:
df[[
    "Computing and programming: Do you *expect* to use Python more or less in the future (as a fraction of your programming time)?",
    "Computing and programming: Do you *want* to use Python more or less in the future (as a fraction of your programming time)?"
]].apply(pandas.Series.value_counts).loc[["Less", "About the same", "More", "Don't know"]].plot.bar(rot=0).legend(bbox_to_anchor=(1.2, 0.5));

In [None]:
cols = {x: x.split(":")[1].strip() for x in df.columns if x.startswith("Python ecosystem:") and "?" not in x}
order = ((df[list(cols)] == "Don't know what it is") | (df[list(cols)] == "Never")).sum(axis=0).sort_values(ascending=False).index.tolist()
pkgs = df[order].rename(columns=cols).apply(pandas.Series.value_counts).T[[
    "Don't know what it is", "Never", "Through dependencies only", "Regularly", "All the time"
]].fillna(0)
pkgs.insert(0, "No selection", pkgs.sum(axis=1).max() - pkgs.sum(axis=1))
pkgs.plot.barh(stacked=True, width=0.9, figsize=(20, 20), color=["#5e79e0", "#798bd1", "#992cc7", "#f5f518", "#ffa640", "#ff5a30"]).legend(bbox_to_anchor=(1.2, 0.5));

In [None]:
cols = {x: x.split(":")[1].strip() for x in df.columns if x.startswith("Particle physics ecosystem:") and "?" not in x}
order = ((df[list(cols)] == "Don't know what it is") | (df[list(cols)] == "Never")).sum(axis=0).sort_values(ascending=False).index.tolist()
pkgs = df[order].rename(columns=cols).apply(pandas.Series.value_counts).T[[
    "Don't know what it is", "Never", "Through dependencies only", "Regularly", "All the time"
]].fillna(0)
pkgs.insert(0, "No selection", pkgs.sum(axis=1).max() - pkgs.sum(axis=1))
pkgs.plot.barh(stacked=True, width=0.9, figsize=(20, 20), color=["#5e79e0", "#798bd1", "#992cc7", "#f5f518", "#ffa640", "#ff5a30"]).legend(bbox_to_anchor=(1.2, 0.5));

In [None]:
hopes = [
    "Particle physics analysis tools (other than ROOT)",
    "General-purpose data analysis toolkits",
    "Machine learning/deep learning toolkits",
    "Software engineering skills (beyond the fundamentals)",
    "ROOT and PyROOT",
    "Python fundamentals (how to program in Python)",
    "Collaboration-specific topics",
    "Other",
]
def explode(responses):
    responses = [response.strip() for response in responses.split(";")]
    return [1.0 if hope in responses else 0.0 for hope in hopes]
exploded = df[["PyHEP feedback: What are you hoping to learn from this workshop?"]].fillna("").applymap(explode)
indicator = pandas.DataFrame(exploded.iloc[:, 0].tolist(), columns=hopes)
indicator.div(indicator.sum(axis=1), axis=0).sum(axis=0).iloc[::-1].plot.barh(figsize=(5, 7));

<br><br><br><br><br>

## Conclusions:

   1. You are mostly grad students and postdocs engaged in physics research.
   2. You use Python and C++ about equally, but want to use Python more.
   3. You are familiar with the major libraries of the Python world: NumPy, Matplotlib, machine learning.
   4. You are less familiar with Python libraries intended for physics analysis.
   5. But you want to learn.

So let's get started!

<br><br><br><br><br>

<img src="img/uproot-logo-300px.png" alt="Uproot" width="300px" style="margin-bottom: -50px; margin-right: 20px"><font size="5"> is a pure-Python implementation of ROOT I/O.</font>

<br><br>

<img src="img/abstraction-layers.png" width="900px">

<img src="img/awkward-logo-600px.png" alt="Uproot" width="350px" style="margin-bottom: -29px; margin-right: 20px"><font size="5"> is a generalization of NumPy to data structures (such as jagged arrays).</font>

<br><br>

<img src="img/cartoon-schematic.png" width="1000px">

<br><br><br>

# Interesting times!

<font size="4">We happen to be in the middle of a transition from <b>Uproot 3.x → Uproot 4.x</b> and <b>Awkward 0.x → Awkward 1.x</b>.</font>

<img src="img/uproot-awkward-timeline.png" width="900px">

<font size="4">You can use both! Old and new versions are independently installable/importable.</font>

<table style="font-size: 1.5em; font-weight: bold; margin-left: 0px">
    <tr style="background: white"><td></td><td style="color: gray">Now</td><td style="color: gray">Later this year</td></tr>
    <tr style="background: white"><td style="color: gray">Old versions</td><td style="color: blue">uproot, awkward</td><td>uproot3, awkward0</td></tr>
    <tr style="background: white"><td style="color: gray">New versions</td><td>uproot4, awkward1</td><td style="color: blue">uproot, awkward</td></tr>
</table>

<img src="img/Raiders-of-the-Lost-Ark-Chamber.jpg" width="800px">

<br><br><br><br><br>

# What will this tutorial use?

New versions of both: **Uproot 4** and **Awkward 1**. This tutorial is bleeding edge.