<br><br><br><br><br>

# Accelerating Python

<br><br><br><br><br>

<br>

<p style="font-size: 1.25em">Apart from clever algorithms, the speed of a program is determined by how laden it is with work unrelated to its main task.</p>

<center><img src="img/swallows-coconut.jpg" width="50%"></center>

<p style="font-size: 1.25em">Python is slower than C because of dynamic type checking, garbage collection, everything-is-a-hashtable, pointer chasing, string equality checks...</p>

<br>

<p style="font-size: 1.25em">Compare to Java, which has garbage collection but not dynamic type checking, and C, which has neither (on a variety of benchmark algorithms).</p>

<img src="img/benchmark-games.png" width="100%">

<br><br>

<p style="font-size: 1.25em">If you care about speed <i>more than</i> ease of development, use C or C++ (or Rust!).</p>

<br>

<p style="font-size: 1.25em">If you want to use Python for all of its conveniences—dynamic type checking, garbage collection, everything-is-a-hashtable, interactive development and debugging—and need to speed up a critical section, there are ways to do it.</p>

<br>

<p style="font-size: 1.25em">But you have to give up Python's dynamism <i>in that section.</i></p>

<br><br>

In [1]:
# Some wacky things you can do to types and objects in Python:
class Point:
    def __init__(self, x, y):
        self.x, self.y = x, y
p = Point(1, 2)

# dynamically add an attribute to an instance (attributes are really a hashtable)
p.z = 3

# dynamically add a method to a class (class is a hashtable of functions)
Point.mag2 = lambda self: self.x**2 + self.y**2 + self.z**2
p.mag2()

# dynamically add a method to an instance (differs only in assigning "self")
p.mag = (lambda self: self.mag2()**0.5).__get__(p)
p.mag()

class Pointy(Point):
    def __repr__(self):
        return "Pointy({0}, {1})".format(self.x, self.y)

# dynamically change the class of an object (type is just an attribute)
p.__class__ = Pointy
p

Pointy(1, 2)

In [2]:
# That's because in Python, this:

print()
print(Point(1, 2))

# is not fundamentally different from this:

print()
print({"x": 1, "y": 2})

# as you can see by this:

print()
print(Point(1, 2).__dict__)


<__main__.Point object at 0x7859f8469710>

{'x': 1, 'y': 2}

{'x': 1, 'y': 2}


<br><br>

<p style="font-size: 1.25em">As a statically compiled program, Python has only one data type, <tt>PyObject*</tt> with a pointer to its runtime type, which is yet another <tt>PyObject*</tt>.</p>

<center><img src="img/pyobject.png" width="50%"></center>

<p style="font-size: 1.25em">That gives us the flexibility to do all the wacky things on the previous slide, but at a runtime cost of checking those types <i>every time they are used.</i></p>

<br><br>

In [3]:
# x is an integer
x = 0

for i in range(1000000):
    # in the millionth step, replace x with a string
    if i == 999999:
        x = "hello"
    # do an operation on x that only works for integers
    x += 1

TypeError: can only concatenate str (not "int") to str

In [4]:
import numba

@numba.jit(nopython=True)
def f():
    x = 0
    for i in range(1000000):
        # in the millionth step, replace x with a string
        if i == 999999:
            x = "hello"
        # do an operation on x that only works for integers
        x += 1
    return x

f()

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Cannot unify int64 and Literal[str](hello) for 'x', defined at <ipython-input-4-290cb8971149> (5)

File "<ipython-input-4-290cb8971149>", line 5:
def f():
    x = 0
    ^

[1] During: typing of assignment at <ipython-input-4-290cb8971149> (9)

File "<ipython-input-4-290cb8971149>", line 9:
def f():
    <source elided>
        if i == 999999:
            x = "hello"
            ^

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile

If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new


In [6]:
import ROOT

ROOT.gInterpreter.Declare("""
int f() {
    int x = 0;
    for (int i = 0; i < 1000000; i++) {
        // in the millionth step, replace x with a string
        if (i == 999999) {
            x = "hello";
        }
        // do an operation on x that only works for integers
        x += 1;
    }
    return x;
}""")

ROOT.f()

AttributeError: f

[1minput_line_33:7:17: [0m[0;1;31merror: [0m[1massigning to 'int' from incompatible type 'const char [6]'[0m
            x = "hello";
[0;1;32m                ^~~~~~~
[0m

<br><br><br>

<p style="font-size: 1.25em">The unsurprising, conventional answer to <b>"how do I make it fast?"</b></p>

<ol>
    <li style="font-size: 1.25em; margin-bottom: 0.75em">Determine all data types statically: i.e. one variable ↔ one type and all values in a list/array have the same type.
    <li style="font-size: 1.25em">Take advantage of the static types by converting the program into machine code: i.e. verify those types and generate code without runtime type-checks.
</ol>

<br>

<p style="font-size: 1.25em; text-align: center"><b>We usually call this "compiling."</b></p>

<br><br><br>

<br><br><br>

<p style="font-size: 1.25em"><b>But:</b> "compiling" does not necessarily mean "rewrite in C or C++."</p>

<ul>
    <li style="font-size: 1.25em; margin-bottom: 0.75em">Any language can be compiled.
    <li style="font-size: 1.25em; margin-bottom: 0.75em">Compilation does not need to be a separate phase from running the program.
    <li style="font-size: 1.25em; margin-bottom: 0.75em">The compiled section can be as little or as much as you want.
    <li style="font-size: 1.25em; margin-bottom: 0.75em">If you're using Python to organize your analysis, focus only on the part that needs to be fast. <i>Which part scales with the number of events?</i>
</ul>

<br><br><br>

<br><br><br>

<center><img src="img/numba-logo.png" width="25%"></center>

<br>

<p style="font-size: 1.25em; text-align: center; margin-left: 0px; margin-right: 0px; padding-left: 0px; padding-right: 0px">Numba is a just-in-time compiler of Python code.</p>

<br><br><br>

In [7]:
import numpy

@numba.jit(nopython=True)
def mag(xarray, yarray):
    out = numpy.empty(min(len(xarray), len(yarray)))
    
    for i in range(len(out)):
        out[i] = numpy.sqrt(xarray[i]**2 + yarray[i]**2)
    
    return out

mag

CPUDispatcher(<function mag at 0x7859c0072ae8>)

In [8]:
a = numpy.arange(1000000, dtype=numpy.int32)
b = numpy.arange(1000000, dtype=numpy.int64)

gap = ",\n                        "
print(f"mag.overloads.keys() = [{gap.join(str(x) for x in mag.overloads.keys())}]")
print(f"\nmag(a, a).sum()      = {mag(a, a).sum()}")
print(f"mag.overloads.keys() = [{gap.join(str(x) for x in mag.overloads.keys())}]")
print(f"\nmag(a, b).sum()      = {mag(a, b).sum()}")
print(f"mag.overloads.keys() = [{gap.join(str(x) for x in mag.overloads.keys())}]")
print(f"\nmag(b, b).sum()      = {mag(b, b).sum()}")
print(f"mag.overloads.keys() = [{gap.join(str(x) for x in mag.overloads.keys())}]")

mag.overloads.keys() = []

mag(a, a).sum()      = 707106074079.7664
mag.overloads.keys() = [(array(int32, 1d, C), array(int32, 1d, C))]

mag(a, b).sum()      = 707106074079.7664
mag.overloads.keys() = [(array(int32, 1d, C), array(int32, 1d, C)),
                        (array(int32, 1d, C), array(int64, 1d, C))]

mag(b, b).sum()      = 707106074079.7664
mag.overloads.keys() = [(array(int32, 1d, C), array(int32, 1d, C)),
                        (array(int32, 1d, C), array(int64, 1d, C)),
                        (array(int64, 1d, C), array(int64, 1d, C))]


<br><br><br>

<p style="font-size: 1.25em">Numba compiles a <a href="https://numba.pydata.org/numba-doc/dev/reference/pysupported.html" target="_blank">subset of the Python language</a> and a <a href="https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html" target="_blank">subset of Numpy functions</a> to machine code (through LLVM).</p>

<br>

<p style="font-size: 1.25em">It also sets up conversions between Python/Numpy and the compiled code.</p>

<br>

<p style="font-size: 1.25em">A good way to use it: develop in Python because it's easy, eliminate dynamic features from the part that needs to be fast, and try <tt>@numba.jit</tt> until successful.</p>

<br><br><br>

In [9]:
# Numba has a particular affinity for Numpy (and an unfortunate name).
#
# Most of the type declarations (int this, float that) come from Numpy dtypes.

# numba.vectorize lets you define a Numpy ufunc from a scalars → scalar function:
@numba.vectorize
def mag(x, y):
    return numpy.sqrt(x**2 + y**2)

mag(a, b)

array([0.00000000e+00, 1.41421356e+00, 2.82842712e+00, ...,
       1.41420932e+06, 1.41421073e+06, 1.41421215e+06])

In [10]:
%%cpp -d

//# Of course, you can also write the part that needs to be fast in C++.

void mag(int n, double* xarray, double* yarray, double* out) {
    for (int i = 0; i < n; i++) {
        out[i] = sqrt(xarray[i]*xarray[i] + yarray[i]*yarray[i]);
    }
}

In [11]:
xarray = numpy.arange(1000000, dtype=numpy.float64)
yarray = numpy.arange(1000000, dtype=numpy.float64)
out = numpy.empty(1000000, dtype=numpy.float64)

ROOT.mag(len(out), xarray, yarray, out)

out

array([0.00000000e+00, 1.41421356e+00, 2.82842712e+00, ...,
       1.41420932e+06, 1.41421073e+06, 1.41421215e+06])

<br><br><br>

<p style="font-size: 1.25em">After you <tt>import ROOT</tt>,</p>

<ul>
    <li style="font-size: 1.25em"><tt>%%cpp</tt> at the top of a Jupyter cell evaluates as a line of C++ ROOT,
    <li style="font-size: 1.25em"><tt>%%cpp -d</tt> at the top of a Jupyter cell defines a C++ function,
    <li style="font-size: 1.25em"><tt>ROOT.gInterpreter.ProcessLine</tt> in Python evaluates a line of C++ ROOT,
    <li style="font-size: 1.25em"><tt>ROOT.gInterpreter.Declare</tt> in Python defines a C++ function,
</ul>

<p style="font-size: 1.25em">and PyROOT lets you call C++ functions from Python.</p>

<br><br><br>

<br>

<img src="img/03-cheat-sheet.png" width="100%">

<br>

In [12]:
# You can't redefine C++ functions, which makes interactive work difficult.
# 
# Here's a way to make a redefinable function:

pyname = "cpp_func"
cppname = pyname + "_%d" % sum(1 if x.startswith(pyname) else 0 for x in dir(ROOT))
ROOT.gInterpreter.Declare(r"""

    double """ + cppname + r"""(double x) {
        return x*x;
    }

""")
exec(f"{pyname} = ROOT.{cppname}")

cpp_func(5)

25.0

In [19]:
# Also, you need to be very careful about data types because ROOT does not
# read them off of Numpy arrays.

if not hasattr(ROOT, "assumes_double"):
    ROOT.gInterpreter.Declare("""
void assumes_double(int n, double* xarray, double* out) {
    for (int i = 0; i < n; i++) {
        out[i] = xarray[i]*xarray[i];
    }
}
""")

xarray = numpy.arange(10, dtype=numpy.float64)     # change to int64
out = numpy.empty(10, dtype=numpy.float64)         # change to int64

ROOT.assumes_double(10, xarray, out)
out

array([ 0.,  1.,  4.,  9., 16., 25., 36., 49., 64., 81.])

In [20]:
# If you're sending arrays of data to and from ROOT, you should probably let
# ROOT control the type, size, and memory of those arrays as C++ std::vector.

# a_vector = std::vector<double>(1000000);
a_vector = ROOT.std.vector("double")(1000000)

# a_array only WRAPS it as a Numpy array.
a_array = numpy.asarray(a_vector)

# Assigning to this Numpy array changes the memory owned by the std::vector.
a_array[:] = numpy.arange(1000000, dtype=numpy.float64)

# See?
list(a_vector[:10])

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

In [32]:
if not hasattr(ROOT, "safer"):
    ROOT.gInterpreter.Declare("""
void safer(std::vector<double> xarray, std::vector<double> out) {
    for (int i = 0; i < xarray.size(); i++) {
        out[i] = xarray[i]*xarray[i];
    }
}
""")

b_vector = ROOT.std.vector("double")(1000000)    # change to "int"
b_array = numpy.asarray(a_vector)

ROOT.safer(a_vector, b_vector)

b_array

array([0.00000e+00, 1.00000e+00, 2.00000e+00, ..., 9.99997e+05,
       9.99998e+05, 9.99999e+05])

<br><br>

<p style="font-size: 1.25em"><b>Reminder:</b> a Numpy object is a Python object that <i>points to</i> memory, maybe someone else's memory.</p>

<br>

<center><img src="img/numpy-memory-layout.png" width="75%"></center>

<br><br>