# ECS784: Part of Lab 2

# Introduction to the Numpy Library

In this lab we will be looking into the numerical Python library known as Numpy

## What is Numpy ?

Numpy is a core Python library for scientific computing: 

 * provides a powerful N-dimensional array object;
 * access to highly optimised linear algebra tools;
 * very close integration with C/C++ and Fortan code;
 * licensed under a BSD license (free to use).
 
 
## Importing the Numpy

The conventional approach is



In [1]:
import numpy as np   # just hit shift+enter to run a cell in notebook!

In the above statement, the python keyword 'as' allows us to use 'np' as a shorthand to refer to the 'numpy' module.

In [2]:
# You will also frequenty see the following statement

import matplotlib.pyplot as plt

# this allows the alias 'plt' to now access plotting libraries wrapped inside pyplt!

## Generating a numpy array

There are several ways to generate a numpy array. 
 * It could be generated from a Python lists containing numeric data;
 * using numpy array generating functions;
 * reading numeric data from a file.
 
Numpy represents N-dimensional array as type 'numpy.ndarray'

## Creating array from lists

In [3]:
data_list = [1,2,3,4,5]
array_1D = np.array(data_list)

print(array_1D)            # [1 2 3 4 5]
print(type(array_1D))      # <class 'numpy.ndarray'>

[1 2 3 4 5]
<class 'numpy.ndarray'>


In [4]:
my_2d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(my_2d_array)
print(type(my_2d_array))

[[1 2 3]
 [4 5 6]
 [7 8 9]]
<class 'numpy.ndarray'>


## The ndarray object’s properties

The ndarray has various properties that we can access, for example using the above my_2d_array we can see:
 

In [5]:
print(my_2d_array.shape)   # the shape of a numpy array 

(3, 3)


In [6]:
print(my_2d_array.size)    # the size of a numpy array
                           # returns the total number of elements

9


In [7]:
print(my_2d_array.dtype)   # Print the type of data the array holds which is automatically inferred from types used in arrays
                           # (eg: int64, float64 etc)

int64


## N-dimensional arrays

Note, numpy generalises arrays to be N-dimensional


In [8]:
x2 = np.array([[1,2], [3, 4]])       # a matrix 
print('X2 is 2-D:\n{}\nwith shape {}'.format(x2, x2.shape))                      # (2, 2)
print('element (0, 1) has value {}\n'.format(x2[0][1]))

x3 = np.array([x2, x2])              # stacking two matrices
print('X3 is 3-D:\n{}\nwith shape {}'.format(x3, x3.shape))                 
print('element (0, 1, 0) has value {}\n'.format(x3[0][1][0]))


X2 is 2-D:
[[1 2]
 [3 4]]
with shape (2, 2)
element (0, 1) has value 2

X3 is 3-D:
[[[1 2]
  [3 4]]

 [[1 2]
  [3 4]]]
with shape (2, 2, 2)
element (0, 1, 0) has value 3



In [9]:
x4 = np.array([x3, x3, x3, x3, x3])  # stacking 5 3-D structures
print('x4 is 4-D:\n{}\nwith shape {}'.format(x4, x4.shape))               
print('element (0, 1, 0) has value {}\n'.format(x4[0][1][0]))

x5 = np.array([x4, x4])              # stacking 2 4-D structures!

x4 is 4-D:
[[[[1 2]
   [3 4]]

  [[1 2]
   [3 4]]]


 [[[1 2]
   [3 4]]

  [[1 2]
   [3 4]]]


 [[[1 2]
   [3 4]]

  [[1 2]
   [3 4]]]


 [[[1 2]
   [3 4]]

  [[1 2]
   [3 4]]]


 [[[1 2]
   [3 4]]

  [[1 2]
   [3 4]]]]
with shape (5, 2, 2, 2)
element (0, 1, 0) has value [1 2]



## Array generating functions

### 1. arange function

In [10]:
print(np.arange(10))     # generates an array containing 10 integers starting from 0

[0 1 2 3 4 5 6 7 8 9]


In [11]:
print(np.arange(10, 20))   # generates an array containing integers starting from 10 to less than 20 

[10 11 12 13 14 15 16 17 18 19]


In [12]:
print(np.arange(10, 20, 2))   # generates an array containing integers starting from 10 to less than 20, with a difference of 2

[10 12 14 16 18]


### 2.  linspace

In [13]:
print(np.linspace(10, 20, 5))  # start, stop, n-points; i.e., generates 5 values between 10 and 20 with equal discrepancy between them

[10.  12.5 15.  17.5 20. ]


### 3. ones and zeros functions

In [14]:
print(np.zeros((3, 3)))   # the argument is a tuple !

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [15]:
print(np.ones((3, 3)))   # the argument is a tuple !

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


### 4. diagonal matrix

In [16]:
print(np.diag((4,5,3)))  # generate diagonal matrix; the function infers the shape from the given elements

[[4 0 0]
 [0 5 0]
 [0 0 3]]


## 5. Random numbers

#### Initialise an array with random numbers ranging from 0 to 1, sampled from a uniform distribution


In [17]:
print(np.random.rand(2,4)) # Create an array of the given shape and populate it with random samples 
                           # from a uniform distribution over [0, 1]

[[0.52999812 0.32848098 0.53843141 0.93315819]
 [0.01883973 0.27342587 0.74203708 0.40660448]]


#### Initialise an array with random numbers from a normal distribution

In [18]:
print(np.random.standard_normal((2,4)))  # Draw samples from a Standard Normal distribution
                                         # has default hyperparameters mean=0, stdev=1

[[ 1.37566281  1.0487077  -0.15576304 -2.43495059]
 [ 1.0698463   0.46617571 -0.1478796   0.04809644]]


#### Initialise an array with random integer numbers

In [19]:
x = np.random.randint(low=1, high=100, size=10)  # generate 10 random integers ranging from 1 to 100
print(x)                                             

[38 68 83 26 75 92 45 38 55 86]


## Reading arrays from files

 * genfromtxt function can be used to read from a TXT file;
 * savetxt function can be used to save a TXT file;
 * load and save functions for reading and writing in numpy's .npz format

###  reading a TXT file with comma separated values

In [20]:
data = np.genfromtxt('Lab_2_testInput.txt', delimiter=',')
print(data)

[[1. 2. 3. 4. 5.]
 [6. 7. 8. 9. 1.]
 [3. 2. 4. 6. 7.]
 [7. 3. 6. 9. 1.]]


### saving a CSV file in the same working directory the script is in

In [21]:
np.savetxt('output.csv', data, delimiter=',', fmt='%.5f') # fmt represents the format; float numbers up to 5 decimal points

### Saving in numpy .npy binary format

In [22]:
np.save('output1', data)   # just pass the output file name

### Loading the previously saved numpy .npy binary file

In [23]:
read_data = np.load('output1.npy')  # must supply file extension .npy

In [24]:
data2 = data  # creating a reference of array data into data2 (changes in data2 also affect data, and changes in data also affect data2)
print(data2)

[[1. 2. 3. 4. 5.]
 [6. 7. 8. 9. 1.]
 [3. 2. 4. 6. 7.]
 [7. 3. 6. 9. 1.]]


### Saving numpy arrays in archive form with .npz extension

This allows multiple arrays to be saved to one file

In [25]:
np.savez('output2', data=data, data2=data2) # save two different arrays

### Loading .npz archive file

In [26]:
read_data = np.load('output2.npz') # read the previously saved file
print(type(read_data))

<class 'numpy.lib.npyio.NpzFile'>


In [27]:
print(type(read_data.files))
print(read_data.files)    # to view the files in zipped data

<class 'list'>
['data', 'data2']


In [28]:
print(read_data['data'])   # accessing the first array of read_data

[[1. 2. 3. 4. 5.]
 [6. 7. 8. 9. 1.]
 [3. 2. 4. 6. 7.]
 [7. 3. 6. 9. 1.]]


In [29]:
print(read_data['data2'])   # accessing the second array of read_data

[[1. 2. 3. 4. 5.]
 [6. 7. 8. 9. 1.]
 [3. 2. 4. 6. 7.]
 [7. 3. 6. 9. 1.]]


## Array manipulation

In [30]:
x = np.array([1,2,3,4,5,6,7])
print (x[0])
print (x[2:5])
print (x[:4])
print (x[4:])

#Array indexing is similar to lists but generalised to N-dimensions

1
[3 4 5]
[1 2 3 4]
[5 6 7]


In [31]:
print (data2[1:3,1:5]) # return the rows from 1 to 3 and all the columns from 1 to 5
                   # remember index starts from 0!

[[7. 8. 9. 1.]
 [2. 4. 6. 7.]]


### Extracting a row or column vector from a matrix

In [32]:
data = np.genfromtxt('Lab_2_testInput.txt', delimiter=',') # read the data file into variable data
data

array([[1., 2., 3., 4., 5.],
       [6., 7., 8., 9., 1.],
       [3., 2., 4., 6., 7.],
       [7., 3., 6., 9., 1.]])

In [33]:
print (data[2,:])            # extract row 2 (can also write as x[2])
print (data[2,:].shape)      # size of the row

[3. 2. 4. 6. 7.]
(5,)


In [34]:
print (data[:,2])            # extract column 2
print (data[:,2].shape)      # size of the column

[3. 8. 4. 6.]
(4,)


## Data processing

There exists many data processing methods such as min, max, sum, product, mean etc

In [35]:
x = np.array([1,2,3,4,5,6])

In [36]:
print(x.min())         # minimum in x, prints 1
print(x.max())         # maximum in x, prints 6
print(x.sum())         # sum of all elements in x 
print(x.prod())        # product of all elements in x 
print(x.mean())        # mean of x
print(x.var())         # variance of x

1
6
21
720
3.5
2.9166666666666665


Another way of calling these functions would be to use
 * np.max(array_name)
 * np.mean(array_name)

## Same operations apply to matrices

In [37]:
x = np.array([[1,2,3,4,0,6],[3,4,0,6,7,8]])     # 2d Matrix
print('Shape of x is {} and elements are:\n{}\n'.format(x.shape, x))

Shape of x is (2, 6) and elements are:
[[1 2 3 4 0 6]
 [3 4 0 6 7 8]]



**Note** - take care what axis argument is controlling ... here it specifies the dimension over which the operation occurs
for example, sum(axis=0) means we are <b>summing over the row axis</b>, i.e. we are getting the column sums

In [38]:
print('\nMinimum over whole matrix:')
print(x.min())           #returns 0

print('\nMinimum (over rows) for each column:')
print(x.min(axis=0))     #returns [1, 2, 0, 4, 0, 6] - the vector which contains the minimum value in each column
                         # axis=0 means we will do minimum across rows, i.e. will get minimum for each COLUMN

print('\nMinimum (over columns) for each row:')
print(x.min(axis=1))     #returns [0, 0] - the vector which contains the minimum value in each row
                         # axis=1 means we will do minimum across columns, i.e. will get minimum for each ROW
    
print('\nSum over whole matrix:')
print(x.sum())           # sum of all elements in x

print('\nSum (over rows) for each column:')
print(x.sum(axis=0))     # sum of elements over rows, i.e. sum for each column

print('\nMean over whole matrix:')
print(x.mean())                # mean of x
print('\nMean (over rows for each column:')
print(x.mean(axis=0))          # mean of x over rows, i.e. mean of each column


Minimum over whole matrix:
0

Minimum (over rows) for each column:
[1 2 0 4 0 6]

Minimum (over columns) for each row:
[0 0]

Sum over whole matrix:
44

Sum (over rows) for each column:
[ 4  6  3 10  7 14]

Mean over whole matrix:
3.6666666666666665

Mean (over rows for each column:
[2.  3.  1.5 5.  3.5 7. ]


## Reshaping and resizing

Converting a matrix to vector or vice versa

In [39]:
M = np.array([1,2,3,4,5,6,7,8,9]).reshape(3,3) # create a 3x3 matrix
print('Becomes a 3x3 matrix:\n{}'.format(M))

Becomes a 3x3 matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [40]:
v = M.reshape(9) # reshape the 3x3 matrix into a one dimensional array containing 9 columns
print('\nReshape back to a vector: {}'.format(v))


Reshape back to a vector: [1 2 3 4 5 6 7 8 9]


In [41]:
try:
    v = M.reshape(8) # We will get error message. "cannot reshape array"
except Exception as e:
    print('line failed with exception: {}'.format(e))
    

line failed with exception: cannot reshape array of size 9 into shape (8,)


## Adding a new dimension

np.newaxis is just an alias for None to try and make it clearer what you are doing

In [42]:
v = np.array([1,2,3,4,5])
print('v is 1-D array with shape {} and elements {}\n'.format(v.shape, v))

v_row = v[np.newaxis, :]   # : means we are taking all the elements of v for new row element      
print('v_row is 2-D array with shape {} and elements {}\n'.format(v_row.shape, v_row))

v_col = v[0:3, None]  # 0:3 means we are just taking first three elements of v
print('v_col is 2-D array with shape {} and elements:\n{}\n'.format(v_col.shape, v_col))


v is 1-D array with shape (5,) and elements [1 2 3 4 5]

v_row is 2-D array with shape (1, 5) and elements [[1 2 3 4 5]]

v_col is 2-D array with shape (3, 1) and elements:
[[1]
 [2]
 [3]]



### Stacking arrays

Arrays can be stacked horizontally or vertically. Make sure the dimensions are compatible.


In [43]:
x = np.ones((2,3))
print('\nx has shape {} and elements:\n{}'.format(x.shape, x))
y = np.zeros((2,2))
print('\ny has shape {} and elements:\n{}'.format(y.shape, y))

z = np.hstack((x, y, x))       # Stacking arrays horizontally
                               # Arrays are passed as a tuple
print('\nSame numbers of rows so we can stack horizontally, shape {}, elements:\n{}'
      .format(z.shape, z))


x has shape (2, 3) and elements:
[[1. 1. 1.]
 [1. 1. 1.]]

y has shape (2, 2) and elements:
[[0. 0.]
 [0. 0.]]

Same numbers of rows so we can stack horizontally, shape (2, 8), elements:
[[1. 1. 1. 0. 0. 1. 1. 1.]
 [1. 1. 1. 0. 0. 1. 1. 1.]]


In [44]:
x = np.ones((2,2))
print('\nx has shape {} and elements:\n{}'.format(x.shape, x))

y = np.zeros((1,2))
print('\ny has shape {} and elements:\n{}'.format(y.shape, y))

z = np.vstack((x,y))     # Stacking the arrays vertically
print('\nSame numbers of columns so we can stack vertically, shape {}, elements:\n{}'
      .format(z.shape, z))


x has shape (2, 2) and elements:
[[1. 1.]
 [1. 1.]]

y has shape (1, 2) and elements:
[[0. 0.]]

Same numbers of columns so we can stack vertically, shape (3, 2), elements:
[[1. 1.]
 [1. 1.]
 [0. 0.]]


## Duplicating array using tile and repeat

**Tile** replicates the matrix as a whole

**Repeat** repeats elements within the matrix

In [45]:
x = np.array([[1,2],[3,4]])
print('\nx has shape {} and elements:\n{}'.format(x.shape, x))

y = np.tile(x, ())         # repeat x three times
print('\ny has shape {} and elements:\n{}'.format(y.shape, y))

y = np.tile(x, (4, 2))         # repeat x 4 times on row axis, 2 on col axis
print('\ny has shape {} and elements:\n{}'.format(y.shape, y))



x has shape (2, 2) and elements:
[[1 2]
 [3 4]]

y has shape (2, 2) and elements:
[[1 2]
 [3 4]]

y has shape (8, 4) and elements:
[[1 2 1 2]
 [3 4 3 4]
 [1 2 1 2]
 [3 4 3 4]
 [1 2 1 2]
 [3 4 3 4]
 [1 2 1 2]
 [3 4 3 4]]


In [46]:
x = np.array([[1,2],[3,4]])
print('\nx has shape {} and elements:\n{}'.format(x.shape, x))

a=np.repeat(x, 4)            # repeat each element 4 times. Does not preserve the shape/dimension
print('\na has shape {} (axis not specified so flattens array) and elements:\n{}'
      .format(a.shape, a))

b=np.repeat(x, 4, axis=1)    # repeat each element 4 times preserving dimension 
print('\nb has shape {} (repeats along col axis, row axis same) and elements:\n{}'
      .format(b.shape, b))

c=np.repeat(x, 4, axis=0)    # repeat each element 4 times preserving dimension 
print('\nc has shape {} (repeats along row axis, col axis same) and elements:\n{}'
      .format(c.shape, c))


x has shape (2, 2) and elements:
[[1 2]
 [3 4]]

a has shape (16,) (axis not specified so flattens array) and elements:
[1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]

b has shape (2, 8) (repeats along col axis, row axis same) and elements:
[[1 1 1 1 2 2 2 2]
 [3 3 3 3 4 4 4 4]]

c has shape (8, 2) (repeats along row axis, col axis same) and elements:
[[1 2]
 [1 2]
 [1 2]
 [1 2]
 [3 4]
 [3 4]
 [3 4]
 [3 4]]


## Copying in python (this is very important!)

By default, arrays in python are handled by reference. This means that when you do A=B you are just copying a reference to the memory, rather than creating another instance of the data. Therefore any changes you make on B will actually be reflected in A.


In [47]:
A = np.array([1,2,3,4,5,6])
B = A
B[0] = 10
print('We changed B: {}, but A also changed {}'.format(B, A))
print(' ... because memory address of B={} is same as address of A={}'.format(id(B), id(A)))

We changed B: [10  2  3  4  5  6], but A also changed [10  2  3  4  5  6]
 ... because memory address of B=4768800208 is same as address of A=4768800208


Note, this is also true for Python lists and objects in general

In [48]:
A = [1,2,3,4]
B = A
B[0] = 10
print(A)

[10, 2, 3, 4]


## Deep copy using the copy() method

To actually copy the data stored in the array, you can use the numpy copy method.

In [49]:
A = np.array([1, 2, 3, 4])
B = A.copy()                     # can also write, B = np.copy(A)

B[0] = 10
print('We changed B: {}, but this time A did not change {}'.format(B, A))
print(' ... because memory address of B={} is not same as address of A={}'.format(id(B), id(A)))

We changed B: [10  2  3  4], but this time A did not change [1 2 3 4]
 ... because memory address of B=4768794256 is not same as address of A=4768800496


To copy Python lists of basic types we can similarly use list copy() function:

In [50]:
A = [1,2,3,4]
B = A.copy()
B[0] = 10
print(A, B)

[1, 2, 3, 4] [10, 2, 3, 4]


## Operations in Matrix

Numpy supports different array operations such as 
 * addition, subtraction, multiplication, division;
 * transpose, inverse etc

### Array addition and substraction

In [51]:
X = np.array([[1, 2, 3], [4, 5, 6]])
Y = np.ones((2,3))

print(X+Y)                     # Element wise adds X and Y

print(X - 2 * Y)                      # Multiply each element of Y by 2 and then subtract 
                               # with every elements of X

# note, scalar multiplication
print(X + np.array([[2,2,2],[2,2,2]]))  # perform addition element wise

[[2. 3. 4.]
 [5. 6. 7.]]
[[-1.  0.  1.]
 [ 2.  3.  4.]]
[[3 4 5]
 [6 7 8]]


## Matrix multiplication

### Element wise multiplication

In [52]:
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[3,3,3],[4,4,4]])

In [53]:
C = A * B           # element wise multiplication

In [54]:
print(C)

[[ 3  6  9]
 [16 20 24]]


### Standard matrix multiplication

In [55]:
D = np.dot(A, B.T)  # A = (2,3), B.T = (3,2)
                    # Performs standard matrix multiplication
                    # Number of columns in A must match number of Rows in B

In [56]:
print(D)

[[18 24]
 [45 60]]


## Transpose

Transpose operation reverses the dimensions.

In [57]:
A = np.array([[1,2,3],[4,5,6]])
print(A)
print('-----------')
print(A.T)        # Take transpose of A. Just use .T
                  # Rows of A now becomes columns

print(A.shape)     # prints (2,3)
print('-----------')
print(A.T.shape)   # (3,2)

v = np.array([1,2,3,4,5])
print(v)
print('-----------')
print(v.T)
print('-----------')
# Vectors only have one dimension. Transpose does nothing.
print(v.shape)     # prints (5,)
print('-----------')
print(v.T.shape)   # prints (5,)

[[1 2 3]
 [4 5 6]]
-----------
[[1 4]
 [2 5]
 [3 6]]
(2, 3)
-----------
(3, 2)
[1 2 3 4 5]
-----------
[1 2 3 4 5]
-----------
(5,)
-----------
(5,)


## Inverse of a matrix

The linalg submodule provides functions to compute matrix determinant and inverse.

In [58]:
A = np.array([[2,1],[3,2]])
print(A)
print('-----------')

det_A = np.linalg.det(A)
print(det_A)
print('-----------')

inv_A = np.linalg.inv(A)
print(inv_A)

[[2 1]
 [3 2]]
-----------
0.9999999999999998
-----------
[[ 2. -1.]
 [-3.  2.]]


In [59]:
print(np.dot(A,inv_A)) # Recovers the identity matrix

[[1. 0.]
 [0. 1.]]


## Documentation and manual

 * Full documentation available here: http://docs.scipy.org/doc/numpy/