<h1 style="color: teal">Lecture 2 - Performance</h1>

<strong style="color: #1B2A49">E. Margarita Palacios Vargas<br>
Fundación Universitaria Konrad Lorenz</strong>

---

<h2 style="color: teal">Performance</h2>

We already discussed that <strong>performance measurements are stochastic</strong> — i.e., repeated runs of the same program can produce slightly different execution times.

Recapping, we can measure:

- **Counts** — how often an event occurs
- **Duration** — the time taken for some interval or operation
- **Size** — the amount of data or memory used by a variable

Next, we will look at how to measure performance in <strong>Python</strong>.

---

<h2 style="color: teal">2.1. Measuring CPU time</h2>

<h4>1. <code>timeit</code></h4>

The <code>timeit</code> module can be used to measure the execution time of small pieces of code.

If you go to the address of your Python script on the terminal (e.g., the "examples" folder where this document is), you can time the execution of a function inside of it :

```bash
python -m timeit -s "import collatz" "collatz.main()"
# Example output:
# 1 loop, best of 5: 174 msec per loop
```
The "-m" and "-s" are flags, here is the meaning of some of them:

- **`-s "SETUP"`** → Setup code, runs once before timing (e.g., imports, variable definitions).  
- **`-n N`** → Number of loops per trial (how many times to run the snippet each trial).  
- **`-r R`** → Number of repeats (how many trials to do, default 5).  
- **`-p P`** → Precision of results (digits after decimal in output).  
- **`-t`** → Use `time.time()` (wall clock, default on Unix).  
- **`-c`** → Use `time.perf_counter()` (high-resolution timer, default on Windows).  
- **`-h`** → Show help message with all options.

In a <strong>Jupyter Notebook</strong>, you can use the <strong>magic command</strong> <code>%timeit</code>:

```bash
%timeit my_function()
```

In [None]:
import numpy as np

%timeit [x**4 for x in range(10000)]
%timeit np.arange(10000)**4 # Just as expected, numpy aranges run faster than Python core

**Notes:**
- *Magic commands* are called like that because they’re special shortcuts provided by IPython/Jupyter (using `%` or `%%`) that aren’t part of standard Python but add extra functionality like timing, running shell commands, or controlling the notebook.  
- *IPython* (Interactive Python) is an enhanced Python shell that powers Jupyter, offering features like tab completion, inline help, rich output, and magic commands on top of the standard interpreter.
- **`%timeit`** → one line, **`%%timeit`** → whole cell

As you can see, an approach to beat the stochastic CPU time is to use statistics. The output of this magic function shows the mean and standard deviation after running the subsequent code a number of times.

We can also measure the execution CPU time of functions:

In [None]:
%%timeit # Now it applies to the entire cell
def sum2d(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

In [None]:
# The previous script does not define the function
def sum2d(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

a = np.ones((2048, 2048)) 
a.size == 2048 ** 2 # Elements

In [None]:
%timeit sum2d(a)

---

<h4>2. <code>njit from numba</code></h4>

The <code>njit</code> decorator from the <strong>Numba</strong> library performs <strong>Just-In-Time (JIT)</strong> compilation of Python functions to optimized machine code, often achieving speeds comparable to compiled languages like C or Fortran.

**Note:** `@njit` is equivalent to `@jit(nopython = True)` and is now the recommended usage. The `nopython = True` flag forces Numba to compile to pure machine code (fastest) instead of falling back to Python. The older `@jit` form is still supported, but its *object mode fallback* behavior has been deprecated. See the [Numba documentation](https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit).

In [None]:
from numba import njit

a = np.ones((2048, 2048))

In [None]:
@njit
def sum2dv3(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

In [None]:
%timeit sum2d(a)
%timeit sum2dv3(a) # njit's compiled code runs faster

---

<h4>3. <code>numexpr</code></h4>

The `numexpr` library in Python is designed to efficiently evaluate numerical expressions on arrays. It provides a way to accelerate numerical computations, especially those involving large arrays, by optimizing memory usage, utilizing multiple CPU cores and avoiding the creation of large temporal arrays. I can also often outperform Numpy. You can check its documentation [here](https://numexpr.readthedocs.io/en/latest/user_guide.html).  

In [None]:
import numexpr as ne

In [None]:
a = np.random.rand(100000)
b = np.random.rand(100000)

%timeit np.sin(a) + np.log(b)
%timeit ne.evaluate("sin(a) + log(b)")

In [None]:
%timeit 2*a + 3*b
%timeit ne.evaluate("2*a + 3*b")

---

<h2 style="color: teal">2.2. Measuring size</h2>

Inspecting attributes like the folowing reveal how arrays are stored in memory, which is useful for understanding memory usage and optimizing performance. This is not considered *as* important as it was several decades ago because of Moore's law.

In [None]:
x = np.array([1.3, 2.4, 3.3])

In [None]:
x.data # Memory Location

In [None]:
# 'data' = A 2-tuple whose first argument is a Python integer
# that points to the data-area storing the array contents.
x.__array_interface__

In [None]:
# Size (number of elements of the array)
x.size

In [None]:
# Memory size of one array element (in bytes)
x.itemsize

In [None]:
# Memory size of the full (in bytes)
x.itemsize * x.size

<h2 style="color: teal">2.3. Profiling</h2>

Profiling in Python means analyzing the performance of your code to identify bottlenecks and areas that can be optimized. Python provides several built-in tools for profiling. Here, we will cover some that are considered native (<i>i.e., they do not require additional software</i>).

---

<h4>1. <code>cProfile</code></h4>

<strong>Syntax (on bash):</strong>
```bash
python -m cProfile collatz.py
python -m cProfile -o stats.out collatz.py # To output to a file
```

This prints a lot of unnecessary elements, but we can consume it as well from Jupyter with shorter results regarding code and stats:

In [None]:
import cProfile, pstats

# Run and show results
cProfile.run("print('Hello profiling!')")

What are we seeing?
- `ncalls`: This column shows the number of times each function was called during the execution of the program.
- `tottime`: This column indicates the total time (in seconds) spent in each function excluding time spent in its subfunctions. It's the "internal" time spent exclusively in the function itself.
- `percall`: This column shows the average time (in seconds) spent in each function call, calculated as tottime / ncalls.
- `cumtime`: This column represents the cumulative time (in seconds) spent in the function and all its subfunctions. It includes the time spent in the function itself and all the functions called from it.
- `percall` (cumtime): Average cumulative time (in seconds) per primitive call, calculated as cumtime / primitive calls. (If there’s no recursion, primitive calls ≈ ncalls, so the numbers will look the same.)
- `filename:lineno(function)`: This column provides information about the location of the function in your code, including the filename, line number, and function name.

The output is generally sorted by the cumtime column, which helps you quickly identify functions that consume the most overall time. These are potential candidates for optimization. You will want to look at functions with **high cumtime and ncalls values**.

In [None]:
# Save results to a file
cProfile.run("print('Hello profiling!')", "profiler")

In [None]:
# Load and inspect saved results
stats = pstats.Stats("profiler")
stats.print_stats()

`cProfile` can also be invoked as a module to profile a given script (or module). The syntax is as follows.
```bash
python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py)
python -m cProfile examples/factorial.py
```

In [None]:
import os # Module to work with bash
os.system("type examples\\Example0.py") # Windows
os.system("cat examples/Example0.py") # Mac/Linux

---

<h4>2. <code>profile</code></h4>

The `profile` module is another built-in profiler that provides a higher-level interface for profiling your code. It outputs information about function calls and their time consumption. You can use the `profile` module to profile specific parts of your code.

In [9]:
import profile

In [10]:
def main(): # This is Example2.py
	x = [1.0] * (2048 * 2048) 
	a = str(x[0]) 
	a += " is a one..." 
	del x			
	print(a)

profiler = profile.Profile()
profiler.runcall(main)
profiler.print_stats()

1.0 is a one...
         34 function calls in 0.031 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.031    0.031    0.031    0.031 2287880165.py:1(main)
        2    0.000    0.000    0.000    0.000 :0(__exit__)
        1    0.000    0.000    0.000    0.000 :0(append)
        2    0.000    0.000    0.000    0.000 :0(get)
        2    0.000    0.000    0.000    0.000 :0(getpid)
        1    0.000    0.000    0.000    0.000 :0(is_done)
        2    0.000    0.000    0.000    0.000 :0(isinstance)
        2    0.000    0.000    0.000    0.000 :0(items)
        2    0.000    0.000    0.000    0.000 :0(len)
        1    0.000    0.000    0.000    0.000 :0(print)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        2    0.000    0.000    0.000    0.000 :0(write)
        1    0.000    0.000    0.000    0.000 iostream.py:138(_event_pipe)
        1    0.000    0.000    0.000    0.000 iostream.py:259(sch

---

<h4>3. <code>line_profiler</code></h4>

We are still getting an output similar to `cProfile`. To get an output of the performance line-by-line, we should do something else.
1. Install `line_profiler` using `pip/anaconda`.
```bash
pip install line-profiler
```
2. On the .py file that you want to analyze, put the decorator `@profile` above the function that you want to profile.
3. Use `kernprof.py` (found [here](https://github.com/pyutils/line_profiler/blob/main/kernprof.py), but also inside `examples`) on your .py file.
```bash
python kernprof.py -l Example1.py
```
4. Execute the command the terminal tells you to continue with:
```bash
C:\Users\Margarita\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.13_qbz5n2kfra8p0\python.exe -m line_profiler -rmt "Example1.py.lprof"
```
The result looks something like this:
```console
Timer unit: 1e-06 s

Total time: 0.0112843 s
File: Example1.py
Function: main at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           @profile
     2                                           def main():
     3         1       6471.7   6471.7     57.4         x = [1.0]*(2048*2048) 
     4         1          9.4      9.4      0.1         a = str(x[0]) 
     5         1          1.7      1.7      0.0         a += " is a one..." 
     6         1       4610.5   4610.5     40.9         del x
     7         1        191.0    191.0      1.7         print(a)

  0.01 seconds - Example1.py:1 - main
```

Try to do this for the example files in `examples/`.

If we were to do something similar in the terminal we'd use `prun` or `snakeviz`, which shows a nice interactive graph. For `snakeviz` we have to do a little instalation first:
```bash
conda install snakeviz
```

In [11]:
def main(a, b, c):
	print("a = ", a)
	print("b = ", b)
	print(np.dot(a, b))
	print(a @ b)

a = np.array([[1, 2], [4, 3]])
b = np.array([[1, 2], [4, 3]])
c = np.arange(2) + 1

In [4]:
%prun main(a, b, c)

a =  [[1 2]
 [4 3]]
b =  [[1 2]
 [4 3]]
[[ 9  8]
 [16 17]]
[[ 9  8]
 [16 17]]
 

         863 function calls (827 primitive calls) in 0.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     28/4    0.000    0.000    0.000    0.000 arrayprint.py:829(recurser)
       12    0.000    0.000    0.000    0.000 iostream.py:655(write)
        8    0.000    0.000    0.000    0.000 socket.py:626(send)
        1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        8    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.002    0.002 2853867017.py:3(main)
        4    0.000    0.000    0.000    0.000 arrayprint.py:1274(__init__)
      2/0    0.000    0.000    0.000          {built-in method select.select}
       16    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        4    0.000    0.000    0.001    0.000 arrayprint.

In [7]:
%load_ext snakeviz
%snakeviz main(a, b, c)

a =  [[1 2]
 [4 3]]
b =  [[1 2]
 [4 3]]
[[ 9  8]
 [16 17]]
[[ 9  8]
 [16 17]]
 
*** Profile stats marshalled to file 'C:\\Users\\MARGAR~1\\AppData\\Local\\Temp\\tmpzc75zjo9'.
Embedding SnakeViz in this document...
<function display at 0x0000028867F596C0>


---