# Profiling

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lukeconibear/swd6_hpp/blob/main/docs/01_profiling.ipynb)

In [None]:
# if you're using colab, then install the required modules
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    %pip install line_profiler snakeviz

[Profiling](https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html) analyses your code in terms of speed and/or memory.  

This can help identify where the bottlenecks are and how much potential there is for improvement.

_Side note_

[IPython magic commands](https://jakevdp.github.io/PythonDataScienceHandbook/01.03-magic-commands.html) are very useful for common problems in data analysis.  

- These have one `%` at the start for a single line.
- Or two `%%` at the start for a cell.

## [timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)

`timeit` (a IPython magic command) measures the time execution of an expression.

It runs a few times, depending on how intensive the expression is, and returns the best result with the uncertainty.

It is useful for benchmarking a small code snippet.

In [None]:
%timeit range(100)

In [None]:
%%timeit
for x in range(100):
    pass

## [line_profiler](https://github.com/pyutils/line_profiler)

The `line_profiler` module measures the time spent in each line of a function.

In [None]:
%load_ext line_profiler

In [None]:
def bottleneck(nums):
    for index, num in enumerate(nums):
        nums[index] = num ** 2
        
    return nums

def my_function():
    nums = [num for num in range(1_000)]
    nums = bottleneck(nums)
    nums.append(1_001)

In [None]:
%lprun -f my_function my_function()

## [SnakeViz](https://jiffyclub.github.io/snakeviz/)

`cProfile` is part of the Python standard library (like [`profile`](https://docs.python.org/3/library/profile.html#module-profile)).

You can profiling longer functions and programs with it.

However, the output isn't very intuitive:

In [None]:
import cProfile

In [None]:
cProfile.run('my_function()')

SnakeViz visualises this output from `cProfile` in nice interactive plots.

In [None]:
%load_ext snakeviz

In [None]:
%%snakeviz  
my_function()

Switch between the two styles using the "Style" dropdown.

### Style: Icicle

- Functions are represented by rectangles.
- The root function is the _top-most rectangle_, with functions it calls below it, then the functions those call below them, and so on.
- The amount of time spent inside a function is represented by the width of the rectangle.
    - A rectangle that stretches across most of the visualization represents a function that is taking up most of the time of its calling function, while a skinny rectangle represents a function that is using hardly any time at all.

### Style: Sunburst

- Functions are represented by arcs.
- The root function is the _centre circle_, with functions it calls around, then the functions those functions call, and so on. 
- The amount of time spent inside a function is represented by the angular extent of the arc (how far around the circle it goes).
    - An arc that wraps most of the way around the circle represents a function that is taking up most of the time of its calling function, while a skinny arc represents a function that is using hardly any time at all.

## Exercise

...

## Further information

### Other options

- [VizTracer](https://github.com/gaogaotiantian/viztracer)
  - A low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.
- [pyinstrument](https://pyinstrument.readthedocs.io/en/latest/)
  - A statistical profiling module of wall-clock time (recording the call stack every 1ms), lowering the overhead compared to tracing profilers. It hides library frames, so you can focus on the slow parts of your code. The output shows *how* the function executes using a traffic light colour legend.
- [Profile parallel code with Dask](https://docs.dask.org/en/latest/diagnostics-local.html#example)  

#### Memory profiling
- [memory_profiler](https://github.com/pythonprofilers/memory_profiler)
  - Measures the memory used by a function, at its peak and the overall increment.