## Using NumPy to efficiently work with large, multi-dimensional data
This part of the course will serve as a short introduction to NumPy, a widely used Python module which leverages C code to efficiently process multi-dimensional arrays of data. You don't need to learn C to write efficient code, NumPy takes care of this without any effort from the programmer!

This part of the course will cover:
 - What are NumPy arrays.
 - Creating NumPy arrays.
 - Basic vectorized operations.
 - Access NumPy array values.

### What are NumPy arrays?
While a Python list can contain different data types within a single list, in order to improve efficiency, all of the elements in a NumPy array should be homogeneous.

An array is a grid of values and it contains information about the raw data and how to locate an element. The elements are all of the same type, referred to as the array dtype. Note, this will not be one of the built-in Python types.
 
The rank of the array is the number of dimensions the array has. 
The shape of the array is a tuple of integers giving the size of the array along each dimension.

### Creating numpy arrays

Let's create some numpy arrays and explore their structure:


In [2]:
# lets print the arrays we create, along with array metadata
def describe_np(aray):
    # tuple of integers giving size of the array across each dimension
    print(f'shape: {aray.shape}')
    # rank
    print(f'rank: {aray.ndim}')
    # type of data
    print(f'dtype: {aray.dtype.name}')
    # size of each entry in array
    print(f'itemsize: {aray.itemsize}')
    # number of elements in array
    print(f'size: {aray.size}')
    print('-'*20)
    print(f'a:{aray}')
    print('-'*20)
    

I've created a function to display notable information about a NumPy array, such as the shape, number-of-dimensions, item datatypes, number of items in the array, and the array itself

In [3]:
import numpy as np

# create a numpy array with numbers from 0 to 15
# of rank 2
# with the first dimension of size 3,
# and second dimension of size 5

a = np.arange(15).reshape(3, 5)
describe_np(a)



shape: (3, 5)
rank: 2
dtype: int64
itemsize: 8
size: 15
--------------------
a:[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
--------------------


Here we've created a NumPy array with 15 items, with 3 rows, and 5 columns, containing integers.

In [4]:
# create np array from Python array
a = np.array([2, 3, 4])
describe_np(a)

shape: (3,)
rank: 1
dtype: int64
itemsize: 8
size: 3
--------------------
a:[2 3 4]
--------------------


We can create a NumPy array from a Python sequence type

In [5]:
b = np.array([1.2, 3.5, 5.1])
# notice dtype is float64
describe_np(b)

shape: (3,)
rank: 1
dtype: float64
itemsize: 8
size: 3
--------------------
a:[1.2 3.5 5.1]
--------------------


NumPy will do its best to give your array a sensible type, in this case floats

In [6]:
# numpy will auto convert sequences of sequences to 2 dim array
# this also applies with more nested sequences in higher dimenstions
c = np.array([(1.5, 2, 3), (4, 5, 6)])
describe_np(c)

shape: (2, 3)
rank: 2
dtype: float64
itemsize: 8
size: 6
--------------------
a:[[1.5 2.  3. ]
 [4.  5.  6. ]]
--------------------


We can use nested sequences, NumPy will create an array with the appropriate shape

In [8]:
# we can define the type when we create an array
d = np.array([1, 2], dtype=np.float64)
describe_np(d)

shape: (2,)
rank: 1
dtype: float64
itemsize: 8
size: 2
--------------------
a:[1. 2.]
--------------------


We have specified this sequence of integers is treated as floats by NumPy

In [9]:
# we can use zeros to create an array filled with zeros
# we have to pass the shape of the array to the zeros fn
e = np.zeros((2,3))
describe_np(e)

shape: (2, 3)
rank: 2
dtype: float64
itemsize: 8
size: 6
--------------------
a:[[0. 0. 0.]
 [0. 0. 0.]]
--------------------


NumPy provides the zeros function to initialise an array of some shape with all 0 values

In [10]:
# ones does the same
f = np.ones((2,3))
describe_np(f)

shape: (2, 3)
rank: 2
dtype: float64
itemsize: 8
size: 6
--------------------
a:[[1. 1. 1.]
 [1. 1. 1.]]
--------------------


NumPy also provides a similar ones function

In [11]:
# we can use arange to create a sequence of integers - similar to range in Python
g = np.arange(4)
describe_np(g)

shape: (4,)
rank: 1
dtype: int64
itemsize: 8
size: 4
--------------------
a:[0 1 2 3]
--------------------


arange creates a sequence of integers with the same syntax as range in Python. here we've created a NumPy array with the numbers 0-3

In [12]:
# use linspace to do the same for floating point sequences
h = np.linspace(0, 2, 9) # 9 numbers from 0 to 2
describe_np(h)

shape: (9,)
rank: 1
dtype: float64
itemsize: 8
size: 9
--------------------
a:[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]
--------------------


If you'de like to create a sequence of floats, you generally get more predictable results with the linspace function

### Basic vectorized operations
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

In [14]:
a = np.array([20, 30, 40, 50])
# describe_np(a < 35)

a

array([20, 30, 40, 50])

We evaluate item is less than 35 for all items in a, returning a new NumPy array with the boolean results of each evaluation.

In [16]:
# b = np.arange(4) #[0, 1, 2, 3]
# describe_np(b*2)
b

array([0, 1, 2, 3])

We evaluate item multiplied by 2 for each item in b, returning a new NumPy array with the results of each evaluation

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ndarray class.

In [17]:
a = np.array([20, 30, 40, 50])
print(a.sum())
print(a.min())
print(a.max())

140
20
50


Here we get the sum, min and max of the a array, using built-in NumPy unary functions.

In [18]:
# sum along one dimension
b = np.ones((2,3))
# sum all values across first dimension
describe_np(b.sum(axis = 0))

shape: (3,)
rank: 1
dtype: float64
itemsize: 8
size: 3
--------------------
a:[2. 2. 2.]
--------------------


We can use the axis parameter to specify which dimension to apply a function across, returning an array of values across that dimension

### Access NumPy array values.
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

In [19]:
a = np.arange(10)
describe_np(a[2])
print('')
describe_np(a[2:5])

shape: ()
rank: 0
dtype: int64
itemsize: 8
size: 1
--------------------
a:2
--------------------

shape: (3,)
rank: 1
dtype: int64
itemsize: 8
size: 3
--------------------
a:[2 3 4]
--------------------


one dimensional slicing is similar to Python array slicing. 

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

In [20]:
b = np.array([[1,2,3],[4,5,6]])
describe_np(b)
print('')
describe_np(b[0,0])
print('')
# all values in column 1
describe_np(b[:, 1])

shape: (2, 3)
rank: 2
dtype: int64
itemsize: 8
size: 6
--------------------
a:[[1 2 3]
 [4 5 6]]
--------------------

shape: ()
rank: 0
dtype: int64
itemsize: 8
size: 1
--------------------
a:1
--------------------

shape: (2,)
rank: 1
dtype: int64
itemsize: 8
size: 2
--------------------
a:[2 5]
--------------------


Here we can grab a single value from b by specifying indexes across both dimensions.

We then grab a larger slice, by specifying a slice in one-dimension, and an index in another.

Iterating over multidimensional arrays is done with respect to the first axis:



In [21]:
b = np.array([[1,2,3],[4,5,6]])
for index, row in enumerate(b):
    print(f'row {index}: {row}')

row 0: [1 2 3]
row 1: [4 5 6]


Here we've iterated over all rows in b.

or we can use the flat attribute to iterate over all elements in an array

In [22]:
b = np.array([[1,2,3],[4,5,6]])
for index, element in enumerate(b.flat):
    print(f'element {index}: {element}')

element 0: 1
element 1: 2
element 2: 3
element 3: 4
element 4: 5
element 5: 6


Here we've iterated over all values in b.