# Numpy
numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Adding the functionality of numpy to our current session is done by the `import` built-in function.

In [1]:
import numpy as np

Note that above we imported the numpy package "as np". This is for convenience; it allow us to use np as a prefix instead of numpy. numpy is in very widespread use, and the convention is to use the np abbreviation.


### Numpy Arrays

The core functionality of numpy is its "ndarray", for n-dimensional array, data structure. Just like with type conversions with lists, tuples, and other data types we've looked at, we can convert a list to a NumPy array using `np.array()`. A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [2]:
# Create a NumPy array from a list
arr = np.array([1, 2, 3, 4])
print(arr)
print(type(arr))
print(arr.shape)
print("------------------")
mat = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(mat)
print(type(mat))
print(mat.shape)

[1 2 3 4]
<class 'numpy.ndarray'>
(4,)
------------------
[[1 2 3]
 [4 5 6]
 [7 8 9]]
<class 'numpy.ndarray'>
(3, 3)


Accessing values in a ndarray is done by using square brackets.

In [3]:
mat = np.array([[10,20,30], [10,50,60]])
print(mat.shape)
print(mat.max())
print(mat.min())

(2, 3)
60
10


In [4]:
mat

array([[10, 20, 30],
       [10, 50, 60]])

In [5]:
print(arr[0])
arr[0] = 10
print(arr)
print(mat[1])
print(mat[1,1])
print(mat[1][1])

1
[10  2  3  4]
[10 50 60]
50
50


Numpy arrays support a wide range of methods and operations. 

In [6]:
mat

array([[10, 20, 30],
       [10, 50, 60]])

In [7]:
print(mat.max())
print(mat.min())
print(mat.sum())
print(mat.mean())
print(mat.std())

60
10
180
30.0
19.148542155126762


There are other ways to make ndarrays:

In [8]:
n = 10
np.zeros(n)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [9]:
np.ones(n)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [10]:
np.zeros_like(arr) # create a ndarray with the same shape as arr

array([0, 0, 0, 0])

In [11]:
np.eye(4) # creates a unit matrix

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [12]:
np.random.random(size=(5,5)) # creates an array filled with random values

array([[0.09063198, 0.18619867, 0.32152286, 0.66123424, 0.12126965],
       [0.98079443, 0.37748913, 0.99611816, 0.60810851, 0.76431639],
       [0.08364057, 0.34221035, 0.84732686, 0.94358561, 0.47481909],
       [0.50844952, 0.0175807 , 0.14737426, 0.77773729, 0.26954927],
       [0.18253752, 0.66755028, 0.71883999, 0.6000674 , 0.06594034]])

In [13]:
np.random.randn(20)

array([-0.02722489, -1.58288385,  0.01531619,  0.08855313,  0.8839647 ,
       -0.18143541, -2.48066214, -0.14795928, -0.54211757, -0.73239043,
        1.2080542 ,  1.27680808, -0.70083397,  1.77585119,  1.44593176,
        0.01383634,  2.75598138,  1.33142476,  0.77843285, -1.56463628])

### Broadcasting
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [14]:
mat = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [15]:
mat

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [16]:
mat = np.vstack((mat, [10,11,12]))
print(mat)
vec = np.array([-1,0,1])
print(vec)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[-1  0  1]


In [17]:
result = np.zeros_like(mat)

for row in range(4):
    result[row] = mat[row] + vec
    
print(result)

[[ 0  2  4]
 [ 3  5  7]
 [ 6  8 10]
 [ 9 11 13]]


While this works, explicit loops in python are often slow and should be generally avoided. However, notice that adding `vec` to each row is the same as creating a matrix by stacking 4 copies of `vec` vertically.

In [18]:
mat_vec = np.tile(vec, (4,1))
print(mat_vec)

[[-1  0  1]
 [-1  0  1]
 [-1  0  1]
 [-1  0  1]]


In [19]:
result = mat + mat_vec
print(result)

[[ 0  2  4]
 [ 3  5  7]
 [ 6  8 10]
 [ 9 11 13]]


Broadcasting allows us to perform this type of computation without making any copies.

In [20]:
mat + vec

array([[ 0,  2,  4],
       [ 3,  5,  7],
       [ 6,  8, 10],
       [ 9, 11, 13]])

In [21]:
vec.shape

(3,)

In [22]:
mat.shape

(4, 3)

This line works even though `mat.shape = (4,3)` and `vec.shape = (3,)` due to broadcasting.
Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

The [documentation](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) contains additional examples and explanations. 

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the documentation.

## Numpy arrays vs python arrays

You should always use numpy implementations whenever possible. numpy saves arrays in a homogeneous and contiguous block of memory, unlike regular Python arrays which are scattered across the system memory. Spatial locality in memory access results in performance gains notably due to the CPU cache and allows numpy to take advantage of vectorized instructions of modern CPUs. In addition, a large part of numpy is written in C, thus the performance boost when using numpy will be significant and well worth your while. For example, run the following blocks of code: 

In [23]:
arr1 = np.random.choice(10, size=10_000_000)
arr2 = np.random.choice(10, size=10_000_000)

In [24]:
%%time
naive_dot = 0
for i in range(10_000_000):
    naive_dot += arr1[i] * arr2[i]

Wall time: 4.32 s


In [25]:
%%time
numpy_dot = arr1.dot(arr2)

Wall time: 5.99 ms


In [26]:
numpy_dot==naive_dot

True