# Lesson 1: Python

<br><br><br>

Welcome to the HSF-India training event!

<br><br><br>

<br><br><br>

I'm Jim; I'll be presenting the first lessons today and tomorrow on Scientific Python.

<br><br><br>

In many ways of measuring it, Python is now the most popular programming language.

<br><br><br>

<table width="75%">
    <tr style="background: white;">
        <td width="100%"><img src="img/python-rankings-tiobe-2022.png" width="100%"></td>
        <td width="100%"><img src="img/python-rankings-pypl-2022.png" width="100%"></td>
    </tr>
    <tr style="background: white;">
        <td width="100%"><img src="img/python-rankings-stackoverflow-2022.png" width="100%"></td>
        <td width="100%"><img src="img/python-rankings-githut-2022.png" width="100%"></td>
    </tr>
</table>

More importantly, it is the most widely used language for data analysis and machine learning.

<br>

<center>
<img src="img/analytics-by-language.svg" width="65%">
</center>

"Popularity" means more tools are available, more attention has been drawn to their shortcomings, and you can find more information about how to use them online.

<br><br><br>

It also means that Python skills are transferable skills.

<br><br><br>

This first lesson is a tour (possibly review) of Python syntax, using physics data analysis as examples.

## Before we begin: navigation in Jupyter

<br><br>

You don't need to use this "slide mode" that I'm using.

<br>

In the ordinary notebook view,

 1. Click on a code cell to edit it.
 2. Control-enter to run it.
 3. Shift-enter to run it and move on to the next cell.

But be sure to evaluate all code cells so that you're not missing any data.

<br><br>

If you need to refresh your notebook's state, use the **Kernel → Restart Kernel and Run up to Selected Cell** menu item.

## Tour of Python syntax

### Using Python as a desk calculator

<br>

In [None]:
2 + 2

<br><br>

Defining variables

In [None]:
E = 68.1289790
px = -17.945541
py = 13.1652603
pz = 64.3908386

<br><br>

Now we can use `E`, `px`, `py`, `pz`:

In [None]:
px

<br><br><br>

Calculate ${p_x}^2 + {p_y}^2$:

In [None]:
px**2 + py**2

<br><br><br>

Now $\displaystyle \sqrt{{p_x}^2 + {p_y}^2 + {p_z}^2}$:

In [None]:
(px**2 + py**2 + pz**2)**(1/2)

We'll be using these equations a lot:

<br><br>

$$p = \sqrt{{p_x}^2 + {p_y}^2 + {p_z}^2}$$

<br>

$$m = \sqrt{E^2 - p^2}$$

<br><br>

**Quizlet:** Fix the mistake!

In [None]:
m = (E**2 - px**2 + py**2 + pz**2)**(1/2)
m

### Functions

<br>

Define functions using `def`, an argument list, a colon (`:`), and an indented body of statements, usually ending with `return`.

In [None]:
def euclidean(x, y, z):
    return (x**2 + y**2 + z**2)**(1/2)

def minkowski(time, space):
    return (time**2 - space**2)**(1/2)

<br>

We can call them with arguments identified by position or by name.

In [None]:
euclidean(px, py, pz)

<br>

In [None]:
euclidean(z=pz, y=py, x=px)

<br>

Function arguments can be nested.

In [None]:
minkowski(E, euclidean(px, py, pz))

Nested indenting only needs to be deeper and pop back to the previous level, but a standard of 2 or 4 spaces are often used.

<br>

Beware: **tab** is not **space**! (Though both are invisible.)

<br>

In [None]:
def mass(E, px, py, pz):
    def euclidean(x, y, z):
        return (x**2 + y**2 + z**2) ** (1 / 2)

    def minkowski(time, space):
        return (time**2 - space**2) ** (1 / 2)

    return minkowski(E, euclidean(px, py, pz))


mass(E, px, py, pz)

Note: functions can be assigned as variables, too. In Python, everything is an object.

<br>

In [None]:
mag3d = euclidean

<br>

In [None]:
mag3d(px, py, pz)

### Importing functionality into Python

<br>

The `import` statement loads libraries, which may be from Python's standard library or something installed with `pip` or `conda`.

<br>

In [None]:
import math

<br>

This introduced a new variable into the environment.

In [None]:
math

<br>

Objects inside the module can be accessed with a dot (`.`).

In [None]:
math.sqrt(E**2 - px**2 - py**2 - pz**2)

The dot-syntax prevents functions with the same names in different libraries from conflicting.

<br>

In [None]:
import numpy

<br>

In [None]:
numpy.sqrt

<br>

In [None]:
math.sqrt

<br>

In [None]:
numpy.sqrt is math.sqrt

Some libraries have conventional "short names."

In [None]:
import numpy as np

<br>

In [None]:
np.sqrt(E**2 - px**2 - py**2 - pz**2)

Sometimes, you might prefer to extract only one object from a library.

<br>

In [None]:
from hepunits import GeV
from particle import Particle

<br>

In [None]:
muon = Particle.from_name("mu+")
muon

<br>

In [None]:
muon.mass / GeV

In [None]:
?muon

### Data types

<br>

Python has data types, but unlike C++, type correctness is checked just before computation, not in a separate compilation phase.

<br>

In [None]:
1 + "2"

<br>

Check their types:

In [None]:
type(1)

<br>

In [None]:
type("2")

_Therefore_, types are also objects that you can assign to variables and inspect at runtime, unlike C++.

<br>

In [None]:
t1 = type(1)
t1

<br>

In [None]:
t2 = type("2")
t2

<br>

Most type objects are also functions that create or convert data to that type.

<br>

In [None]:
int("2")

<br>

In [None]:
t1("2")

**Quizlet:** before you run the following, what will it do?

<br>

In [None]:
type(type(1)("2"))

<br>

Here is some scratch space:

### Relationships among types

<br>

NumPy (`import numpy as np`) has some types that look like standard Python types, but they're not.

<br>

In [None]:
np_one = np.int32(1)
np_one

<br>

In [None]:
type(np_one)

<br>

`np.int32` is not `int`.

In [None]:
np.int32 == int

<br>

`np.int32` is also not `np.int64`.

In [None]:
type(np.int32(1)) == type(np.int64(1))

Use `isinstance` to check the type of something.

<br>

It doesn't ask, "Is this type object the same as that other one?"

It asks, "Is this value an instance of that type?"

<br>

In [None]:
isinstance(np_one, np.int32)

<br>

...because some types are _subtypes_ of others.

<br>

`np.int32` and `np.int64` are both subtypes of `np.integer`.

Any instance of `np.int32` or of `np.int64` is also an instance of `np.integer`.

In [None]:
isinstance(np.int32(1), np.integer)

<br>

In [None]:
isinstance(np.int64(1), np.integer)

In general, there is a tree of subtype relationships (or even a more general graph, as long as there aren't any cycles).

<br>

<center>
<img src="img/dtype-hierarchy.png" width="65%">
</center>

Using `mro` to list all of a type's supertypes...

<br>

In [None]:
np.int32.mro()

<br>

...we see that it doesn't have any in common with Python's `int`.

<br>

In [None]:
int.mro()

<br>

(Except `object`, but everything in Python is an `object`.)

There are also ways to ask general questions about types, such as "Is this at all integer-like?"

<br>

In [None]:
import numbers

<br>

In [None]:
isinstance(np.int32(1), numbers.Integral)

<br>

In [None]:
isinstance(1, numbers.Integral)

### Collection types

<br>

The two most basic collection types in Python are `list` and `dict`.

<br>

In [None]:
some_list = [0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]
some_list

In [None]:
type(some_list)

In [None]:
len(some_list)

<br>

In [None]:
some_dict = {"one": 1.1, "two": 2.2, "three": 3.3}
some_dict

In [None]:
type(some_dict)

In [None]:
len(some_dict)

You can pull data out of a collection with square brackets: `[` `]`.

<br>

In [None]:
some_list

<br>

In [None]:
some_list[3]

<br>

In [None]:
some_dict

<br>

In [None]:
some_dict["two"]

You can also change the data in a collection if the square brackets are on the left of an assignment (`=`).

<br>

In [None]:
some_list[3] = 33333

<br>

In [None]:
some_list

<br>

In [None]:
some_dict["two"] = 22222

<br>

In [None]:
some_dict

And you can extend them beyond their original length, as well as mix different data types in the same collection.

<br>

In [None]:
some_list.append("mixed types")

<br>

In [None]:
some_list

<br>

In [None]:
some_dict[123] = "mixed types"

<br>

In [None]:
some_dict

Ranges within a list can be "sliced" with a colon (`:`).

<br>

In [None]:
some_list

<br>

In [None]:
some_list[2:8]

<br>

**Quizlet:** Before you run it, what will this do?

In [None]:
some_list[2:8][3]

<br>

<br>

(We'll see a lot more about slices in the next lesson.)

### A little data analysis

<br>

In [None]:
particles = [
    {"type": "electron", "E": 171.848714, "px": 38.4242935, "py": -28.779644, "pz": 165.006927, "charge": 1,},
    {"type": "electron", "E": 138.501266, "px": -34.431419, "py": 24.6730384, "pz": 131.864776, "charge": -1,},
    {"type": "muon", "E": 68.1289790, "px": -17.945541, "py": 13.1652603, "pz": 64.3908386, "charge": 1,},
    {"type": "muon", "E": 18.8320473, "px": -8.1843795, "py": -7.6400470, "pz": 15.1420097, "charge": -1,},
]

<br>

In [None]:
def particle_decay(name, particle1, particle2):
    return {
        "type": name,
        "E": particle1["E"] + particle2["E"],
        "px": particle1["px"] + particle2["px"],
        "py": particle1["py"] + particle2["py"],
        "pz": particle1["pz"] + particle2["pz"],
        "charge": particle1["charge"] + particle2["charge"],
    }

Starting from the observed electrons and muons, we reconstruct unobserved particles by adding energy and momentum.

<br>

<center>
<img src="img/higgs-to-four-leptons-diagram.png" width="50%">
</center>

In [None]:
z1 = particle_decay("Z boson", particles[0], particles[1])
z1

<br>

In [None]:
z2 = particle_decay("Z boson", particles[2], particles[3])
z2

<br>

In [None]:
higgs = particle_decay("Higgs boson", z1, z2)
higgs

**Quizlet:** Define the `particle_mass` function and compute the mass of all `particles`, `z1`, `z2`, and `higgs`.

<br>

In [None]:
def particle_mass(particle):
    ...

<br>

|          | mass (GeV/$c^2$) |
|:---------|-----------------:|
| $e^+$    |   0.0174851 |
| $e^-$    |   0.0097893 |
| $\mu^+$  |   0.1056570 |
| $\mu^-$  |   0.1056493 |
| $Z_1$    |  90.2856289 |
| $Z_2$    |  22.8789293 |
| $H$      | 125.2341336 |

**Physics digression:** Are the measured masses wrong or is this okay?

<br>

In [None]:
Particle.from_name("e+").mass / GeV, particle_mass(particles[0])

In [None]:
Particle.from_name("e-").mass / GeV, particle_mass(particles[1])

In [None]:
Particle.from_name("mu+").mass / GeV, particle_mass(particles[2])

In [None]:
Particle.from_name("mu-").mass / GeV, particle_mass(particles[3])

In [None]:
Particle.from_name("Z0").mass / GeV, particle_mass(z1)

In [None]:
Particle.from_name("Z0").mass / GeV, particle_mass(z2)

In [None]:
Particle.from_name("H0").mass / GeV, particle_mass(higgs)

### `for` loops and `if` branches

<br><br><br>

Can you believe we got this far without `for` and `if`?

<br><br>

These are the fundamental building blocks of _imperative_ programming.

<br><br><br>

Python runs a program, one statement at a time, and `for` tells it to repeat an indented block for each value of a collection.

<br>

In [None]:
for particle in particles:
    print(particle["type"], particle["charge"])

`if` tells it whether it should enter an indented block or not, depending on whether an expression is `True` or `False`.

<br>

In [None]:
for particle in particles:
    if particle["type"] == "electron":
        print(particle)

<br>

It can switch between two indented blocks if an `else` clause is given.

<br>

In [None]:
for particle in particles:
    if particle["type"] == "electron":
        print(particle)
    else:
        print("not an electron")

`if` statements can be nested.

<br>

In [None]:
for particle in particles:
    if particle["type"] == "electron":
        if particle["charge"] > 0:
            print("e+")
        else:
            print("e-")
    else:
        if particle["charge"] > 0:
            print("mu+")
        else:
            print("mu-")

<br>

And `elif` works as a contraction of `else if` with less indenting.

<br>

In [None]:
for particle in particles:
    if particle["type"] == "electron" and particle["charge"] > 0:
        print("e+")
    elif particle["type"] == "electron" and particle["charge"] < 0:
        print("e-")
    elif particle["type"] == "muon" and particle["charge"] > 0:
        print("mu+")
    elif particle["type"] == "muon" and particle["charge"] < 0:
        print("mu-")

### From datum (singular) to data (plural)

<br><br><br><br><br>

_(Switch out of presentation view now.)_

<br><br><br><br><br>

In [None]:
import json

In [None]:
dataset = json.load(open("data/SMHiggsToZZTo4L.json"))

In [None]:
type(dataset)

In [None]:
len(dataset)

Show just the first 3 collision events using a slice, `0:3`.

In [None]:
dataset[0:3]

<br>

**Meaning of each field.** (We will only use a few of these.)

 * **run** (int): unique identifier for a data-taking period of the LHC. This is simulated data, so the run number is 1.
 * **luminosityBlock** (int): unique identifier for a period of relatively stable conditions within a run.
 * **event** (int): unique identifier for one crossing of LHC bunches.
 * **PV** (dict): primary vertex of the collision.
   - **x** (float): $x$-position in cm.
   - **y** (float): $y$-position in cm.
   - **z** (float): $z$-position (along the beamline) in cm.
 * **electron** (list of dict): list of electrons (may be empty).
   - **pt** (float): $p_T$ component of momentum transverse to the beamline in GeV/$c$.
   - **eta** (float): $\eta$ pseudorapidity (roughly, polar angle with respect to the beamline), unitless.
   - **phi** (float): $\phi$ azimuthal angle (in the plane that is perpendicular to the beamline), unitless.
   - **mass** (float): measured mass of the particle in GeV/$c^2$.
   - **charge** (int): either `+1` or `-1`, unitless.
   - **pfRelIso03_all** (float): quantity that specifies how isolated this electron is from the rest of the particles in the event, unitless.
   - **dxy** (float): distance of closest approach to the primary vertex in the plane that is perpendicular to the beamline, in cm.
   - **dxyErr** (float): uncertainty in the **dxy** measurement.
   - **dz** (float): distance of closest approach to the primary vertex in $z$, along the beamline, in cm.
   - **dzErr** (float): uncertainty in the **dz** measurement.
 * **muon** (list of dict): list of muons (may be empty) with the same dict fields as **electron**.
 * **MET** (dict): missing transverse energy (in the plane perpendicular to the beamline).
   - **pt** (float): $p_T$ magnitude, in GeV/$c$.
   - **phi** (float): $\phi$ aximuthal angle, unitless.

<br>

<br>

**Coordinate transformations:**

- $p_x = p_T \cos\phi \cosh\eta$
- $p_y = p_T \sin\phi \cosh\eta$
- $p_z = p_T \sinh\eta$
- $\displaystyle E = \sqrt{{p_x}^2 + {p_y}^2 + {p_z}^2 + m^2}$

<br>

But there's a library for that.

In [None]:
import vector

In [None]:
def to_vector(particle):
    return vector.obj(
        pt=particle["pt"],
        eta=particle["eta"],
        phi=particle["phi"],
        mass=particle["mass"],
    )

In [None]:
for particle in dataset[0]["muon"]:
    v = to_vector(particle)
    print(v.E, v.px, v.py, v.pz)

### Mini-project: let's make an event display

There are lots, and lots, and lots of libraries for visualizing data in Python.

Matplotlib is the oldest and most popular.

In [None]:
import matplotlib.pyplot as plt  # conventional short name for Matplotlib
from mpl_toolkits.mplot3d import Axes3D

In [None]:
%matplotlib widget

fig = plt.figure()

In [None]:
fig.clf()  # clear figure
ax = fig.add_subplot(111, projection="3d")

# 25 Gaussian-distributed (x, y, z) triplets
for x, y, z in np.random.normal(0, 1, (25, 3)):
    # make a black line from (0, 0, 0) to (x, y, z)
    ax.plot([0, x], [0, y], [0, z], c="black")

In [None]:
def draw_particle(ax, particle, color):
    v = to_vector(particle)
    ax.plot([0, v.px], [0, v.py], [0, v.pz], c=color)

In [None]:
def draw_event(ax, event):
    for particle in event["electron"]:
        draw_particle(ax, particle, "blue")
    for particle in event["muon"]:
        draw_particle(ax, particle, "green")

In [None]:
fig.clf()
ax = fig.add_subplot(111, projection="3d")

draw_event(ax, dataset[0])

In [None]:
fig.clf()
ax = fig.add_subplot(111, projection="3d")

for event in dataset[0:10]:
    draw_event(ax, event)

Add more to the event display, for context.

In [None]:
def beamline(ax):
    ax.plot([0, 0], [0, 0], [-100, 100], c="black", ls=":")

In [None]:
def cms_outline(ax):
    z = np.linspace(-100, 100, 50)
    theta = np.linspace(0, 2 * np.pi, 12)
    theta_grid, z_grid = np.meshgrid(theta, z)
    x_grid = 100 * np.cos(theta_grid)
    y_grid = 100 * np.sin(theta_grid)
    ax.plot_surface(x_grid, y_grid, z_grid, alpha=0.2, color="red")

In [None]:
fig.clf()
ax = fig.add_subplot(111, projection="3d")

beamline(ax)
cms_outline(ax)
draw_event(ax, dataset[6417])  # has lots of electrons and muons

ax.set_xlim(-100, 100)
ax.set_ylim(-100, 100)
ax.set_zlim(-100, 100)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")

In [None]:
def draw_position_and_momentum(ax, event, particle, color):
    # 1 unit is 1 cm
    x0 = event["PV"]["x"] - particle["dxy"] * np.cos(particle["phi"])
    y0 = event["PV"]["y"] - particle["dxy"] * np.sin(particle["phi"])
    z0 = event["PV"]["z"] - particle["dz"]

    # 1 unit is 1 GeV/c
    v = to_vector(particle)
    ax.plot([x0, x0 + v.px], [y0, y0 + v.py], [z0, z0 + v.pz], c=color)

In [None]:
fig.clf()
ax = fig.add_subplot(111, projection="3d")

beamline(ax)

event = dataset[6417]  # has lots of electrons and muons

for particle in event["electron"]:
    draw_position_and_momentum(ax, event, particle, "blue")
for particle in event["muon"]:
    draw_position_and_momentum(ax, event, particle, "green")

ax.set_xlim(-100, 100)
ax.set_ylim(-100, 100)
ax.set_zlim(-100, 100)
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.set_zlabel("z")

### Classes and object-oriented programming

So far, we have only been using a few Python types:

  * int
  * float
  * string
  * list
  * dict
  * whatever the libraries provide

In [None]:
type(muon)

In [None]:
type(fig)

In [None]:
type(ax)

In [None]:
type(v)

<br><br><br><br><br>

The electrons, muons, and events in our dataset really want to be classes.

<br><br>

A class is a new type with **attributes** (variables) and **methods** (functions) that can be accessed by the dot-syntax.

<br><br>

The biggest difference between a **method** and a **function** is that it always takes `self` (the object) as its first argument.

<br><br>

Methods with names like `__xyz__` give the object special abilities.

<br><br><br><br><br>

In [None]:
class Electron:
    def __init__(self, E, px, py, pz):
        self.E = E
        self.px = px
        self.py = py
        self.pz = pz

    def __repr__(self):
        return f"<Electron E={self.E} px={self.px} py={self.py} pz={self.pz}>"

    def draw(self, ax):
        ax.plot([0, self.px], [0, self.py], [0, self.pz], c="blue")

In [None]:
event = dataset[96]  # a nice event with 3 electrons and 3 muons

electron_objects = []
for particle in event["electron"]:
    v = to_vector(particle)
    electron_objects.append(Electron(v.E, v.px, v.py, v.pz))

electron_objects

In [None]:
class Muon:
    def __init__(self, E, px, py, pz):
        self.E = E
        self.px = px
        self.py = py
        self.pz = pz

    def __repr__(self):
        return f"<Muon E={self.E} px={self.px} py={self.py} pz={self.pz}>"

    def draw(self, ax):
        ax.plot([0, self.px], [0, self.py], [0, self.pz], c="green")

In [None]:
event = dataset[96]  # a nice event with 3 electrons and 3 muons

muon_objects = []
for particle in event["muon"]:
    v = to_vector(particle)
    muon_objects.append(Muon(v.E, v.px, v.py, v.pz))

muon_objects

In [None]:
fig.clf()
ax = fig.add_subplot(111, projection="3d")

beamline(ax)
cms_outline(ax)

for electron in electron_objects:
    electron.draw(ax)

for muon in muon_objects:
    muon.draw(ax)

<br><br><br><br><br>

The above works, but there's some redundancy: the implementation of `Muon` is almost the same as that of `Electron`.

<br><br>

Classes also let us share code among similar types by defining some as _subclasses_ of others.

<br><br><br><br><br>

In [None]:
class Particle:
    def __init__(self, E, px, py, pz):
        self.E = E
        self.px = px
        self.py = py
        self.pz = pz

    def __repr__(self):
        return (
            f"<{type(self).__name__} E={self.E} px={self.px} py={self.py} pz={self.pz}>"
        )

    def draw(self, ax):
        raise NotImplementedError(f"{type(self).__name__} has not been implemented yet")

In [None]:
class Electron(Particle):
    def draw(self, ax):
        ax.plot([0, self.px], [0, self.py], [0, self.pz], c="blue")

In [None]:
class Muon(Particle):
    def draw(self, ax):
        ax.plot([0, self.px], [0, self.py], [0, self.pz], c="green")

In [None]:
event = dataset[96]  # a nice event with 3 electrons and 3 muons

electron_objects = []
for particle in event["electron"]:
    v = to_vector(particle)
    electron_objects.append(Electron(v.E, v.px, v.py, v.pz))

muon_objects = []
for particle in event["muon"]:
    v = to_vector(particle)
    muon_objects.append(Muon(v.E, v.px, v.py, v.pz))

fig.clf()
ax = fig.add_subplot(111, projection="3d")

beamline(ax)
cms_outline(ax)

for electron in electron_objects:
    electron.draw(ax)

for muon in muon_objects:
    muon.draw(ax)

<br><br><br>

This is the end of the "Tour of Python syntax" section.

<br>

A few more Python features will be introduced as we go along, but they won't be the main focus anymore.

<br>

Go to [exercise-1.ipynb](exercise-1.ipynb) now.