# Profiling compute/speed

I came across 3 major ways of profiling:

    1. deterministic: every function call is included in the profile
    
    2. statistical: the process is sampled every X seconds, only those samples are included in the profile
    
    3. line profiling: each line of code is timed seperately
    
This notebook shows these options, by profiling the `fast_end_to_end.py` script

## Option 1: deterministic profiling

CPython is the build in tool for this. 
This is nice for very small scripts, but it quickly becomes very hard to analyse. An example:

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from IPython.display import SVG
import pstats

In [3]:
from pathlib import Path
path = Path("../../../anomaly_detection/scripts/config/fast_end_to_end_experiment.json")
absolute_path_config = path.absolute()

In [4]:
!python -W ignore -m cProfile -o profile.prof ../../../anomaly_detection/scripts/fast_end_to_end_experiment.py --config {absolute_path_config} --skip-mlflow prepare-dataset

UnboundLocalError: local variable 'child' referenced before assignment

In [None]:
p = pstats.Stats("profile.prof")
p.sort_stats("cumulative")
print(f"There are {len(p.stats)} function calls in the stats overview \n")
p.print_stats(10)

In [None]:
!snakeviz "profile.prof"

# Option 2: Statistical profiling

Statistical profiling increases intepretability as the profiler samples the process every X seconds. This means that short calls that do not influence the runtime significantly are left out of the chart.

Packages for this are for example Pyinstrument ot py-spy.
However, Pyinstrument can not handle multiple threads, or code build on top of C such as numpy. 
So lets look into py-spy. You can set the sampling rate in number of samples per second with the -r flag:

#### 10 samples per second

In [None]:
!py-spy record -r 10 -o profile10.svg -- python -W ignore ../../../anomaly_detection/scripts/fast_end_to_end_experiment.py --config {absolute_path_config} --skip-mlflow prepare-dataset

In [None]:
SVG("profile10.svg")

#### 1 sample per second

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
!py-spy record -r 1 -o profile1.svg -- python -W ignore ../../anomaly_detection/scripts/fast_end_to_end_experiment.py --config {absolute_path_config} --skip-mlflow prepare-dataset

In [None]:
SVG("profile1.svg")

### Combine py-spy with this nice profiling viz tool: https://www.speedscope.app/

In [None]:
!py-spy record --format speedscope -r 10 -o profile.speedscope.json -- python -W ignore ../../../anomaly_detection/scripts/fast_end_to_end_experiment.py --config {absolute_path_config} --skip-mlflow end-to-end

# Option 3: line profiler

My personal favourite is to profile the code line by line, using https://github.com/pyutils/line_profiler.

One downside is that you have to add the profiler decorator, and thus remove this when committing code.

Note that instead of using their buildin @profile decorator, a costum one is used here because that allows for profiling 
functions from other files then \_\_main\_\_.

In [None]:
from profiling_utils import profile
import inspect
from IPython.display import display, Markdown

source_code = inspect.getsource(profile)
display(Markdown(f"```python\n{source_code}\n```"))

In [None]:
# !python -W ignore  ../../../anomaly_detection/scripts/fast_end_to_end_experiment.py --config {absolute_path_config} --skip-mlflow end-to-end