## Profiling code

- Why and how of profiling
- Specific tools: simple and advanced


## Why profile?

- Profiling for time and memory
- Not all code takes time (or consumes memory)
- Where does one focus?
- What matters and what does not?
- Example three functions:
  - func1: 5%
  - func2: 40%
  - func3: 35%


## Approach

- Use some profiling tool
- Information on relative performance
- Allow you to focus effort where it matters


## Kinds of profilers

- CPU time, GPU time, and memory
- Function profilers: only function information
- Line profilers: line by line information
- Tracing (deterministic) vs Sampling
- Instrumentation vs no instrumentation
- Other things:
  - Support for threads, multi-processing, c-extensions, GPU
  - Overhead
  - memory profiling


## Python profiling tools

- Many many tools
- stdlib: `profile`, `cProfile`
- `line_profiler` and `kernprof`: a line-by-line profiler
- pyinstrument, py-spy, Austin
- Scalene



## Using the standard profilers

- A quick demo

```
$ python -m cProfile -o result.prof simple_profile1.py
```

In [None]:
%cd ../code

In [None]:
import pstats
p = pstats.Stats('result.prof')
p.strip_dirs().sort_stats('cumulative').print_stats(20)

## IPython integration

- Use `%run -p` or `%prun`
- `%prun -l nlines -s key code`


In [None]:
%run?

In [None]:
%run -p -l 20 -s cumulative simple_profile1.py

In [None]:
from simple_profile1 import test_me
%prun test_me()

## Scalene: features

- Sampling profiler: very fast (low overhead)
- Function and line-by-line
- No instrumentation needed
- Supports threads, multi-processing, system time/python time, GPU
- Support for memory profiling, copy volumes, memory leaks
- https://github.com/plasma-umass/scalene


## Scalene: usage examples

- Install with pip as usual

```
$ scalene program.py

$ scalene --help

$ python -m scalene prog.py

$ scalene --reduced-profile prog.py

$ scalene --html --outfile prof.html prog.py

```


## Demo of features

- Function profiling
- Line-by-line
- Python vs Native
- Memory
- Copy
- Sparklines
- IPython support: `%load_ext scalene`, `%scrun`



## Takeaways

- Profile code first before optimizing
- No point profiling something that is already quick
- Start at the function level
- Drill down to line by line
- Push execution time to native rather than Python
- Look for copying/memory also

