#Where are the Bottlenecks?

##Dynamic Typing

eg. a + bをするとき、処理を施す前にタイプを判断しなきゃダメ。

##2.Data Access

###Summing in Pure Python

#Vectorization

Vectorization is about sending batches of related operations to native machine code

##Operations on Arrays

In [1]:
import random
import numpy as np

In [4]:
%%timeit
n = 100000
sum = 0
for i in range(n):
    x = random.uniform(0, 1)
    sum += x**2

10 loops, best of 3: 74.5 ms per loop


In [5]:
%%timeit
n = 100000
x = np.random.uniform(0, 1, n)
np.sum(x**2)

100 loops, best of 3: 1.75 ms per loop


##Universal Functions

In [7]:
np.cos(1.0)

0.54030230586813965

In [8]:
np.cos(np.linspace(0, 1, 3))

array([ 1.        ,  0.87758256,  0.54030231])

In [9]:
def f(x, y):
    return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

In [10]:
grid = np.linspace(-3, 3, 1000)

In [11]:
-np.inf

-inf

In [14]:
%%timeit
m = -np.inf
for x in grid:
    for y in grid:
        z = f(x, y)
        if z > m:
            m = z
print m

0.999981964109
0.999981964109
0.999981964109
0.999981964109
1 loops, best of 3: 3.7 s per loop


In [16]:
%%timeit
x, y= np.meshgrid(grid, grid)
print np.max(f(x, y))

0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
0.999981964109
10 loops, best of 3: 72.3 ms per loop


#Numba

Numba aims to automatically compile functions to native machine code instructions on the fly

The process isn’t flawless, since Numba needs to infer type information on all variables to generate pure machine instructions



##An Example

$x_{t+1} = 4x_t(1-x_t)$

In [17]:
from numba import jit

In [24]:
def qm(x0, n):
    x = np.empty(n+1)
    x[0] = x0
    for t in range(n):
        x[t+1] = 4 * x[t] * (1 - x[t])
    return x

In [27]:
qm_numba = jit(qm)

In [25]:
%%timeit
qm(0.1, 100000)

10 loops, best of 3: 84.8 ms per loop


In [28]:
%%timeit
qm_numba(0.1, 100000)

The slowest run took 417.59 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 346 µs per loop


#Cython

 the Cython language can be thought of as Python with type definitions

##A First Example

$\displaystyle{\sum_{i=0}^n \alpha^i}$

In [1]:
%load_ext Cython

In [2]:
import numpy as np

In [3]:
%%cython
def geo_prog_cython(double alpha, int n):
    cdef double current = 1.0
    cdef double sum = current
    cdef int i    
    for i in range(n):
        current = current * alpha
        sum = sum + current        
    return sum

LinkError: command 'C:\\MinGW\\bin\\gcc.exe' failed with exit status 1

##Example 2: Cython with NumPy Arrays

In [11]:
%%cython
import numpy as np
def qm_cython1(double x0, int n):
    cdef int t
    x = np.zeros(n+1, float)
    x[0] = x0
    for t in range(n):
        x[t+1] = 4.0 * x[0] * (1 - x[t])
    return np.asarray(x)

CompileError: command 'C:\\cygwin\\bin\\gcc.exe' failed with exit status 1