# Profiling

One of the main things I learnt Ian's course is that:
- Profiling is the only way to get the truth for what runs fast and what runs slow. Your intuitions can often be very wrong.
- Profiling is pretty easy, if you know what tools to use!

In [1]:
%load_ext autoreload
%autoreload 2

In [6]:
import numpy as np
from hpp.demos import get_areas, monte_carlo_pi

In [10]:
radii = np.random.uniform(0, 1, size=1_000)

This will profile the time taken for each function with a `-f` flag

In [12]:
%load_ext line_profiler
%lprun -f get_areas -f monte_carlo_pi  get_areas(radii, 100)

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


Output:

```
Timer unit: 1e-07 s

Total time: 0.246178 s
File: c:\users\jarya\dropbox\projects\python\higher-performance-python\hpp\demos.py
Function: monte_carlo_pi at line 6

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     6                                           def monte_carlo_pi(n_samples):
     7      1000       4319.0      4.3      0.2      acc = 0
     8    101000     370311.0      3.7     15.0      for i in range(n_samples):
     9    100000     481281.0      4.8     19.6          x = random.random()
    10    100000     450167.0      4.5     18.3          y = random.random()
    11    100000     794443.0      7.9     32.3          if (x ** 2 + y ** 2) < 1.0:
    12     78658     355722.0      4.5     14.4              acc += 1
    13      1000       5540.0      5.5      0.2      return 4.0 * acc / n_samples

Total time: 0.428766 s
File: c:\users\jarya\dropbox\projects\python\higher-performance-python\hpp\demos.py
Function: get_areas at line 16

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    16                                           def get_areas(radii, n_samples):
    17         1        375.0    375.0      0.0      a = np.zeros_like(radii)
    18      1001       4947.0      4.9      0.1      for i, r in enumerate(radii):
    19      1000    4272154.0   4272.2     99.6          pi = monte_carlo_pi(n_samples)
    20      1000      10185.0     10.2      0.2          a[i] = pi * r ** 2
```

We can also do memory profiling

In [13]:
%load_ext memory_profiler
%mprun -f get_areas -f monte_carlo_pi  get_areas(radii, 100)




Output:

```
Filename: c:\users\jarya\dropbox\projects\python\higher-performance-python\hpp\demos.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    16     98.6 MiB     98.6 MiB           1   def get_areas(radii, n_samples):
    17     98.6 MiB      0.0 MiB           1       a = np.zeros_like(radii)
    18     98.6 MiB      0.0 MiB        1001       for i, r in enumerate(radii):
    19     98.6 MiB  98648.4 MiB        1000           pi = monte_carlo_pi(n_samples)
    20     98.6 MiB      0.0 MiB        1000           a[i] = pi * r ** 2


Filename: c:\users\jarya\dropbox\projects\python\higher-performance-python\hpp\demos.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     6     98.6 MiB  98648.4 MiB        1000   def monte_carlo_pi(n_samples):
     7     98.6 MiB      0.0 MiB        1000       acc = 0
     8     98.6 MiB      0.0 MiB      101000       for i in range(n_samples):
     9     98.6 MiB      0.0 MiB      100000           x = random.random()
    10     98.6 MiB      0.0 MiB      100000           y = random.random()
    11     98.6 MiB      0.0 MiB      100000           if (x ** 2 + y ** 2) < 1.0:
    12     98.6 MiB      0.0 MiB       78302               acc += 1
    13     98.6 MiB      0.0 MiB        1000       return 4.0 * acc / n_samples
```

## Viztracer

Viztracer lets you inspect the stack as your code runs, so you can see which parts of your code are taking the most time.

In [22]:
import pandas as pd
from tqdm.auto import tqdm

In [18]:
df = pd.read_csv(
    "https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv"
)

In [14]:
from viztracer import VizTracer

In [15]:
%load_ext viztracer

In [27]:
%%viztracer
mean_stats_per_type = df[['Type 1', 'HP', 'Attack', 'Defense', 'Speed']].groupby('Type 1').mean()

Button(description='VizTracer Report', style=ButtonStyle())

which gives you a beautiful report, letting you see some of the underlying complexities of how Pandas works!

 <img src="../images/viztrace.jpg" width="1000"> 

The width of a bar denotes the amount of time spent in a particular operation. This can give you hints on how to optimize your code.

In [33]:
def geom_mean(row):    
    return np.power(np.prod(row), 1./row.shape[0])

In [37]:
%%viztracer
df['geom_mean'] = df[['Attack', 'Defense', 'Speed']].apply(geom_mean, axis=1)

Button(description='VizTracer Report', style=ButtonStyle())

 <img src="../images/apply.jpg" width="1000"> 

In [35]:
from numba import njit

@njit
def geom_mean_numba(row):    
    return np.power(np.prod(row), 1./row.shape[0])

# this will both test, and compile, our numba function
np.testing.assert_allclose(geom_mean(np.array([1.,2.,3])), geom_mean_numba(np.array([1.,2.,3])))  

In [38]:
%%viztracer
df['geom_mean'] = df[['Attack', 'Defense', 'Speed']].apply(geom_mean_numba, axis=1, raw=True)

Button(description='VizTracer Report', style=ButtonStyle())

 <img src="../images/apply-numba.jpg" width="1000"> 