## Working with numpy

**numpy** is very useful for performing arithmetic and other operations on arrays, vectors and so on. It is an important part of many other packages and, for instance, very useful in machine learning.

Here, we create a numpy array containing random numbers sampled from a normal distribution. You can change the size, mean and standard deviation.

We can calculate some basic properties of the array, and see how good a job numpy is doing at creating these values. In the code below we calculate **mean** and **standard deviation** iteratively using loops. We first turn the numpy array into a standard list. Experiment with different values of `n`, `mean` and `mu` and convince yourself that the code works.


In [None]:
# import the libraries we will use
import time
import numpy as np
from numpy.random import default_rng

# Create a random generator. We can use seeds if we want the values to always be the same
rng = default_rng()

#Parameters for a normal distribution of n values, with a mean and standard deviation (mu)
n = 1000
mean = 10
mu = 2

# Create a numpy array of random values pulled from a normal distribution with these properties
vals = mean + mu * rng.standard_normal(n)

#Now create a simple Python list containing the same values, so that we can see how fast an iterative solution is
valsList = vals.tolist()

# Calculate the mean and standard deviation of the values using iteration
tic = time.perf_counter()

# Calculate the mean
mean = 0
for i in range(len(valsList)):
    mean = mean + valsList[i]
mean = mean/(len(valsList))

# Calculate the standard deviation
sum = 0
for i in range(len(valsList)):
    sum = sum + (valsList[i] - mean) ** 2
stdev = (sum/len(valsList))**0.5

toc = time.perf_counter()

print(f'Mean: {mean} + standard deviation: {stdev}')
print(f'- time to iterate:   {toc - tic:0.4f} seconds')
iterateTime = toc - tic

tic = time.perf_counter()

#You should write your vectorised code here

toc = time.perf_counter()

# Uncomment these lines to print out the performance of your vectorised code
# print()
# print(f'Mean: {mean} + standard deviation: {sd}')
# print(f'- time to iterate:   {toc - tic:0.4f} seconds')
# vectorTime = toc-tic
# print()
# print(f'Using vectorisation was {iterateTime/vectorTime:0.1f} times faster')


Calculate the mean and standard deviation using **vectorisation** on the numpy array. Work out how much faster this approach is. 

**Hint**: Using vectorisation means that you don't iterate, but carry out calculations directly on the numpy array. For example,

`c = np.subtract(a,b)` or `c = a - b`

will subtract array b from array c. If a or b are constants, the constant will be subtracted from the array.


In [None]:
# import the libraries we will use
import time
import numpy as np
from numpy.random import default_rng

# Create a random generator. We can use seeds if we want the values to always be the same

rng = default_rng()

#Parameters for a normal distribution of n values, with a mean and standard deviation (mu)

n = 5000
mean = 7
mu = 3

# Create a numpy array of random values pulled from a normal distribution with these properties

vals = mean + mu * rng.standard_normal(n)

#Now create a simple Python list containing the same values, so that we can see how fast an iterative solution is

valsList = vals.tolist()

# Calculate the mean and standard deviation of the values using iteration

tic = time.perf_counter()

# Calculate the mean

mean = 0
for i in range(len(valsList)):
    mean = mean + valsList[i]
mean = mean/(len(valsList))

# Calculate the standard deviation

sum = 0
for i in range(len(valsList)):
    sum = sum + (valsList[i] - mean) ** 2
stdev = (sum/len(valsList))**0.5

toc = time.perf_counter()

print(f'Mean: {mean} + standard deviation: {stdev}')
print(f'- time to iterate with a loop:   {toc - tic:0.4f} seconds')
iterateTime = toc - tic

tic = time.perf_counter()

# The vectorised version of calculating mean and standard deviation with anumpy array looks like this …

mean_arr = np.mean(vals)
std_arr = np.std(vals)
toc = time.perf_counter()

# To calculate the performance gain of our vectorisation approach …

print()
print(f'Mean: {mean_arr} + standard deviation: {std_arr}')
print(f'- time to iterate with vectorisation:   {toc - tic:0.4f} seconds')

vectorTime = toc-tic

print()
print(f'Using vectorisation was {iterateTime/vectorTime:0.1f} times faster')