# Lecture 11. Numba

In [None]:
import numpy as np
import pandas as pd
import timeit
from time import sleep

* Python's dirty little secret is that it can be made to run pretty fast.  
  * e.g., Nested loops are usually a bad idea. <br>
<br>
* But often you won't know where your code is slowing down just by looking at it and trying to accelerate everything can be a waste of time. <br>
<br>
* The first step is always to find the bottlenecks in your code: Analyzing your code by measuring the execution time of its parts.

### Motivation: Some bad code

Here's a bit of code guaranteed to perform poorly: it sleeps for 1.5 seconds after doing any work! 

In [None]:
def bad_call(dude):
    sleep(.5)
    
def worse_call(dude):
    sleep(1)
    
def sumulate(foo):
    
    a = np.random.random((1000, 1000))
    a @ a
    
    ans = 0
    for i in range(foo):
        ans += i
        
    bad_call(ans)
    worse_call(ans)
        
    return ans

In [None]:
sumulate(150)

### Using `cProfile`

* [`cProfile`](https://docs.python.org/3.4/library/profile.html#module-cProfile) is the built-in profiler in Python.  <br>
<br>
* It provides a function-by-function report of execution time. 
  * First import the module, then usage is simply a call to ````cProfile.run()```` with your code as argument. It will print out a list of all the functions that were called, with the number of calls and the time spent in each.

In [None]:
import cProfile

In [None]:
cProfile.run('sumulate(150)')

You can see here that when our code ````sumulate()```` executes: It spends almost all its time in the method `time.sleep` (a bit over 1.5 seconds).

### Using `line_profiler`

`line_profiler` offers more granular information than `cProfile`: it will give timing information about each line of code in a profiled function.

First, load the `line_profiler` extension (```pip install line-profiler```)

In [None]:
%load_ext line_profiler

In [None]:
%lprun -f bad_call -f worse_call sumulate(13)  # profiling only two functions "bad_call" and "worse_call";

---

### Using `jit`

#### Array sum

The function below is a naive `sum` function that sums all the elements of a given array.

In [None]:
def sum_array(inp):
    J, I = inp.shape
    
    #this is a bad idea
    mysum = 0
    for j in range(J):
        for i in range(I):
            mysum += inp[j, i]
            
    return mysum

In [None]:
arr = np.random.random((300, 300))

In [None]:
sum_array(arr)

In [None]:
%timeit sum_array(arr)

Let's now use `numba.jit` to speed up the codes. 

In [None]:
from numba import jit

In [None]:
sum_array_numba = jit(sum_array)

In [None]:
sum_array_numba(arr)

In [None]:
%timeit sum_array_numba(arr)

#### (More commonly) As a decorator

In [None]:
@jit
def sum_array(inp):
    I, J = inp.shape
    
    mysum = 0
    for i in range(I):
        for j in range(J):
            mysum += inp[i, j]
            
    return mysum

#### When does `numba` compile things?

The first time you call the function.  

In [None]:
start = timeit.default_timer()

sum_array(arr)

print(f'Total configuration execution time: {(timeit.default_timer() - start):.4f}s.', flush=True)

In [None]:
sum_array(arr)
%timeit sum_array(arr)

#### How does this compare to NumPy?

In [None]:
%timeit arr.sum()

#### However, it is not always possible to vectorize the computation.

Consider a simple autoregressive (AR) model of order one:
$$
y_t = \rho y_{t-1} + \epsilon_t, \ \ \text{where } |\rho| < 1, \ \ \epsilon_t \sim \text{ iid } \mathcal{N} (0, \sigma^2) \text{ and } y_0 = 0. 
$$
In this example, the time-$t$ value of $y$ depends on its one-period lag, so we have to use for loops to simulate this AR(1) process.

In [None]:
def simulate_ar1(Tsim, rho, sigma):
    Ysim = np.zeros((Tsim,))
    for i in range(1, Tsim):
        Ysim[i] = Ysim[i-1]*rho + np.random.normal(loc=0,scale=sigma)
    return Ysim

In [None]:
%timeit simulate_ar1(Tsim=12000, rho=0.1, sigma=1)

In [None]:
@jit
def simulate_ar1_jit(Tsim, rho, sigma):
    Ysim = np.zeros((Tsim,))
    for i in range(1, Tsim):
        Ysim[i] = Ysim[i-1]*rho + np.random.normal(loc=0,scale=sigma)
    return Ysim

In [None]:
Ysim = simulate_ar1_jit(Tsim=60, rho=0.1, sigma=1)   # run simulate_ar1_jit to compile the function

%timeit simulate_ar1_jit(Tsim=12000, rho=0.1, sigma=1)

---

**Lesson**

* `numba.jit` is powerful and can speed up the `for` loop. 

* However, whenever it is possible to use vectorization (e.g., using ```numpy``` functions), try to avoid writing your own compiled functions.

* This function is particularly useful when we cannot avoid the `for` loop. 

More details about Numba can be found in
https://numba.readthedocs.io/en/stable/.

---

## Final Exam

Examination time: 2:30 pm - 4:30 pm, Dec 13 2024.

This examination consists of seven questions worth 100 points in total. You are required to answer all questions. If you find some questions unclear, please clearly state the assumptions that you make and answer the questions based on the additional assumptions. Please do not leave any question blank. 

Time Allowed: 120 minutes.

Candidates are permitted to use any online/electronic/printed/handwritten materials in the examination. Internet searching is allowed, but crowdsourcing from group messages, online forums or social media, etc. is strictly forbidden. 

* You are required to finish the final exam paper in the lab computers; that is, you are NOT allowed to use your own laptop/IPad/Phone during the exam.

* Frankie will post the exam paper (a Jupyter Notebook file) onto Moodle 10 minutes before the exam time: You have sufficient time to download all the lecture materials from Moodle.

* Whether can you post your materials on a cloud account?
    * Yes, but I am not sure whether the Internet is good enough.
    * Suggestion: Print all the materials as a backup plan in case you cannot use your cloud account during the exam.

* Whether can you use Chatgpt?
    * Theoretically no.
    * In reality, the Internet connection is so bad that you cannot get access to it. 

* Short questions ($5 \times 10\% = 50\%$), e.g., 
    * the questions in Problem Set 1
    * Q1 in Problem Set 2
* Long questions ($2 \times 25\% = 50\%$), e.g., 
    *  Q2 in Problem Set 2
    *  the questions in Problem Set 3

### Sections that are *NOT* required:

* Lecture 3. Classes and Objected-Oriented Programming
* Lecture 6. SQL
* Lecture 11. Parallel Computing (```joblib```) and ```numba```

### Suggestion

* Get yourself familiar with the lecture notes
  
* Go through the three problem sets for several times (exam questions will be similar to the PS questions)

---

# END