## Numpy for python

Numerical Python (NumPy) is an open source Python library for scientific computing. NumPy provides a host of  features that allow a Python programmer to work with high-performance (and multidimensional) arrays and matrices. NumPy arrays are stored more efficiently than Python lists and allow mathematical operations to be vectorised, which results in significantly higher performance than with looping constructs in Python. 

Read more here: https://tinyurl.com/yxz8qfjq

Let's start by importing the modules we will be using in this notebook.

In [3]:
# Import numpy and random modules.
import random
import numpy as np

## Creating numpy arrays
There are several ways we can create numpy arrays, some examples are provided below.

In [4]:
a1 = np.array([1, 2, 3, 4, 5])             # create an array from a list.
a2 = np.array(range(20))                   # create an array from a range.
a3 = np.arange(0, 100, 5)                  # a step can be provided.
a4 = np.zeros(10, dtype=int)               # create an array containing 10 zeros of type integer.
a5 = np.linspace(1, 100, 6)                # create an array with equally spaced values.
a6 = [random.random() for _ in range(100)] # create an array containing random elements.

Print out the arrays below and check that each contains what you would expect. Change the values above and re-run the cells to see what impact this has.

In [5]:
print(f'The contents of a1 is {a1}')
print(f'The contents of a2 is {a2}')
print(f'The contents of a3 is {a3}')
print(f'The contents of a4 is {a4}')
print(f'The contents of a5 is {a5}')
print(f'The contents of a6 is {a6[:3]}') # this prints the first 3 values only.

The contents of a1 is [1 2 3 4 5]
The contents of a2 is [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
The contents of a3 is [ 0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95]
The contents of a4 is [0 0 0 0 0 0 0 0 0 0]
The contents of a5 is [  1.   20.8  40.6  60.4  80.2 100. ]
The contents of a6 is [0.6993859709181645, 0.9117622682490406, 0.7237876323924528]


## Reshaping arrays

It is often easier to create a single dimension numpy array and then reshape it to whatever you need. See below:

In [6]:
a7 = np.arange(20)
a8 = a7.reshape(10,2)
a9 = a7.reshape(4,5)
print(f'Structure of a7\n{a7}')
print(f'Structure of a8 (10 rows, 2 columns)\n{a8}')
print(f'Structure of a9(4 rows, 5 columns)\n{a9}')

Structure of a7
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
Structure of a8 (10 rows, 2 columns)
[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]
 [16 17]
 [18 19]]
Structure of a9(4 rows, 5 columns)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


## Checking the size of a numpy array

You can use the np_array.shape command to report the size of an array.

In [7]:
print(f'a8 is a {a8.shape} array.')
print(f'a9 is a {a9.shape} array.')

a8 is a (10, 2) array.
a9 is a (4, 5) array.


## Operations on numpy arrays

Numpy arrays use something called vectorisation to perform fast operations on arrays. This is also supported by pandas.

In [8]:
a10 = np.arange(20) 
    
def double_it(x):
    return x * 2  
    
# np.vectorize applies the method to all items in an array 
a11 = np.vectorize(double_it)(a10) 

a12 = a10 * 3 # this is a more direct way of using vectorisation for simple operations.

print(f'Contents of the initial array {a10}')
print(f'Contents of the array after doubling {a11}')
print(f'Contents of the array after trebling {a12}')

Contents of the initial array [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
Contents of the array after doubling [ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38]
Contents of the array after trebling [ 0  3  6  9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57]


### Filtering numpy arrays

Logical operations test each element of an array against specific criteria. 

In [14]:
# find elements of a10 that are greater than 12.
print(f'Elements of a10 that are > 10 {a10 > 12}')

# select only the elements of a10 that are greater than 12.
condition = a10 > 12
print(f'Elements greater than 12 are {a10[condition]}')

# how many elements are greater than 12?
print(f'There are {np.sum(a10 > 12)} elements greater than 12.')

Elements of a10 that are > 10 [False False False False False False False False False False False False
 False  True  True  True  True  True  True  True]
Elements greater than 12 are [13 14 15 16 17 18 19]
There are 7 elements greater than 12.
