# Introduction to Numpy 

## Why Bother 

Why would we bother with numpy? We already have python, which is a great language, that has support for numbers built in. We even have fancy things in our `math` module which comes as part of the base library. So why bother?

In this notebook we'll just show one reason: performance. It really is a fast library specialized for numerical operations and we'll encounter many of those. 

## Lists, NumPy and Performance 

A downside of python is that because it is a dynamically-typed language (you don't have to specify the type of your variables while writing code, the interpreter will figure this out on the fly) we lose some performance. The `numpy` module is meant to counter this by giving us an API with python objects that call highly-performant C code so that we can still write python code for performant numerical problems. 

Let's do a small experiment and let's see if we can measure this. We can use the `%%timeit` notebook magic. 

In [None]:
import math
import numpy as np

In [None]:
%%timeit 
sum(range(100000))

In [None]:
%%timeit 
sum([i for i in range(100000)])

In [None]:
%%timeit
np.sum(np.arange(100000))

In [None]:
%%timeit
np.sum(np.arange(0,100000))

In [None]:
%%timeit
np.sum([i for i in range(100000)])

In [None]:
%%timeit
sum(np.arange(100000))

In [None]:
%%timeit 
s = 0 
for i in range(100000):
    s += 1

# First Observations 

1. The code that we wrote in pure numpy (only `np.sum` and `np.arrange`) is orders of magnitude faster than it's python counterpart. 
2. The performance seems to go away once we intermix the base python code and the numpy code. It seems much faster to keep numerical operations to the numpy side. Example; using the standard python `sum` function on a numpy array is the worst case! 
3. All the speeds right now are pretty fast either way, but especially when we are dealing with large datasets we can expect a massive improvement. 

# Other Benchmarks 

Similar conclusions can be made when dealing with vectorized operations. Check these benchmarks.

In [None]:
%%timeit 
[ 2 * i for i in range(100000) ]

In [None]:
%%timeit
np.arange(100000) * 2

In [None]:
%%timeit 
[ math.sqrt(i) for i in range(100000) ]

In [None]:
%%timeit 
np.sqrt( np.arange(100000) )

Numpy will run precompiled C-code and low level BLAS/Fortran. A combination with normal python code is bound to add overhead. 

# Conclusion 

It's not the only reason, but numpy is popular because it is **fast**. It is still bound by python's GIL but it is able to use very low level code to ensure that numerical operations can occur very very fast.  