# Python for Data Analysis III




**Agenda:**

    * CProfile 
    * Cython
    * sklearn

Writing programs is fun, but making them fast can be a pain. Python programs are no exception to that, but the basic profiling toolchain is actually not that complicated to use. Here, I would like to show you how you can quickly profile and analyze your Python code to find what part of the code you should optimize.

You can do profiling manually

In [4]:
!pip3 install line_profiler

[33mYou are using pip version 9.0.3, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [5]:
import numpy as np

In [6]:
%%writefile speedup.py

import random

class Matrix(list):
    @classmethod
    def zeros(cls, shape):
        n_rows, n_cols = shape
        return cls([[0] * n_cols for i in range(n_rows)])

    @classmethod
    def random(cls, shape):
        M, (n_rows, n_cols) = cls(), shape 
        for i in range (n_rows):
            M.append([random.randint(-255, 255) for j in range (n_cols)])
        return M

    @property
    def shape(self):
        return ((0, 0) if not self else (len(self), len(self[0])))
    
    
def dot_product(X, Y):
    n_xrows, n_xcols = X.shape
    n_yrows, n_ycols = Y.shape
    Z = Matrix.zeros((n_xrows, n_ycols))
    for i in range(n_xrows):
        for j in range(n_xcols):
            for k in range(n_ycols):
                Z[i][k] += X[i][j] * Y[j][k]
    return Z

def bench(shape=(64, 64), n_iter=16):
    X = Matrix.random(shape)
    Y = Matrix.random(shape)
    for iter in range(n_iter):
        dot_product(X, Y)

if __name__ == "__main__":
    bench()

Overwriting speedup.py


In [7]:
%%timeit
a1 = np.random.rand(3,2)
a2 = np.random.rand(2,3)
a1.dot(a2)

3.85 µs ± 90.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


The cProfile module allows you to profile Python code up to a function or method call:

In [8]:
import cProfile

source = open("speedup.py").read()
cProfile.run(source, sort="tottime")

         41389 function calls in 2.188 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       16    2.166    0.135    2.166    0.135 <string>:22(dot_product)
     8192    0.007    0.000    0.014    0.000 random.py:170(randrange)
     8192    0.006    0.000    0.007    0.000 random.py:220(_randbelow)
     8192    0.003    0.000    0.018    0.000 random.py:214(randint)
      128    0.002    0.000    0.020    0.000 <string>:14(<listcomp>)
     8213    0.001    0.000    0.001    0.000 {method 'getrandbits' of '_random.Random' objects}
        1    0.001    0.001    2.187    2.187 <string>:32(bench)
        1    0.001    0.001    2.188    2.188 {built-in method builtins.exec}
     8192    0.001    0.000    0.001    0.000 {method 'bit_length' of 'int' objects}
       16    0.000    0.000    0.000    0.000 <string>:8(<listcomp>)
        2    0.000    0.000    0.020    0.010 <string>:10(random)
        1    0.000    0.000    2.187 

In [9]:
%load_ext line_profiler

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [10]:
from speedup import dot_product, bench
%lprun -f dot_product bench