# 01. NumPy Intro

![numpy logo](https://upload.wikimedia.org/wikipedia/en/thumb/8/82/Logo_of_NumPy.svg/440px-Logo_of_NumPy.svg.png)

Website: http://www.numpy.org/

## A. What does NumPy Provide?

1. Efficient multi-dimensional array (ndarray)
2. Fast mathematical operations on ndarrays

We'll begin by importing the numpy package and printing the version number:

In [None]:
import numpy as np
print(f'numpy version: {np.__version__}')

## B. Why do we need NumPy?

Pure Python code can be slow when performing large amounts of numerical computation. Let's time a simple summation of all integers between zero and a million.

### Example 1: The basic sum

$$total = 0 + 1 + 2 + ... + 1,000,000$$

or in shorter notation:

$$total = \sum_{i=0}^{1,000,000} i$$

In [None]:
data = list(range(1000001))
print(f'{data[:3]} ... {data[-3:]}')

#### Attempt 1

In [None]:
%%timeit
total = 0
for i in range(len(data)):
    total += data[i]

#### Attempt 2
What if we use another iteration method?

In [None]:
%%timeit
total = 0
for d in data:
    total += d

#### Attempt 3
Okay that is a little better. But "Aha!" you say, there is a builtin function called sum() we should be using!

In [None]:
%%timeit
sum(data)

#### NumPy way
Now let's do the same thing in numpy:

In [None]:
array = np.arange(1000001, dtype=np.int32)
print(f'array: {array}')

In [None]:
%%timeit
np.sum(array)

On my Mac laptop, the NumPy version is 100x faster than our first loop-based implementation, and 10x faster than Python's builtin sum() method.

### Example 2: The dot product

Let's try a slightly more complex operation that often appears in statistics and machine learning, the weighted sum, a.k.a the dot product:

$$dot(w, x) = \sum_{i=0}^{1,000,000} w_{i}x_{i}$$

#### Exercise: Python implementation

Write code for this operation in the cell below. A sample implementation is provided in the hidden cell, but try it on your own before peeking!

In [None]:
weights = list(range(1000001))

In [None]:
%%timeit
# Sample implementation
dot_product = sum(w*x for w,x in zip(weights, data))

In [None]:
%%timeit
# Write your code below


#### Exercise: Numpy implementation

Now let's do it the numpy way with the library method called dot().

dot documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

In [None]:
# Each weights_array[i] will be multiplied by array[i] in the summation
array = np.arange(1000001, dtype=np.int32)
weights_array = np.arange(1000001, dtype=np.int32)

In [None]:
# Sample implementation
numpy_dot_product = np.dot(weights_array, array)
print(numpy_dot_product)

In [None]:
numpy_dot_product = # Put your code here
print(numpy_dot_product)

You probably just experienced a negative result, which clearly isn't correct.

This is one of the tradeoffs one makes when using numpy. While it can detect things like underflow and overflow in some situations it does not always do so, and one must be careful! Let's switch to 64-bit ints, which is the default.

In [None]:
# Weights, each weights_array[i] will be multiplied by array[i] in the summation
array = np.arange(1000001)
weights_array = np.arange(1000001)

numpy_dot = # Put your code here
python_dot = sum(w*x for w,x in zip(weights, data))

print(f'numpy array dtype: {array.dtype}')
print(f'python_dot: {python_dot}, numpy_dot: {numpy_dot}, equal? {python_dot == numpy_dot}')

In [None]:
%%timeit
numpy_dot = # Put your code here

Pretty fast!