# Using the NumPy package

## Using packages and libraries in Python

Thus far we've worked in base Python.  But one of the strengths of the Python language are the many, many additional   **Packages** and **Libraries** available that allow you to extend and enhance Python with new modules.  A **Package** then is a collection of modules.  **Modules**, at their core, a collection of related code stored in a **.py** file.  Modules may contain **functions** (that do things to your data, make calculation, etc.), but also define data classes, types, or variables.  The term **Library** and **Package** are often used interchangably (I'll definitely do so) to mean a collection of modules, although libraries can also specifically refer to a collection of several packages. In order to make use of these additional modules, we need to `import` them.  In the sections below, we'll go through some of the most common libraries or packages we'll use to add important functionality to Python. 

In this notebook, we'll specifically look at one of the most commonly used libraries, `numpy`. We'll make use of the functionality of `numpy` throughout this course as it makes possible a lot of the types of calculations and data manipulations we'll need. 

Many packages and libraries will already be available to you if you setup your Python installation with Anaconda or Miniconda.  If a package isn't installed, however, you can usually easily do it yourself one of the following ways, using either `pip` or `conda`.  `pip` installs Python packages, whereas `conda` can install packages across many languages into its own environment. 

`%pip install package_name` where `package_name` is the name of the package you want to install. 

or

`%conda install package_name`

In most cases this will install the package to your base Python on your system.  We'll talk about the benefits of creating virtual environments at another time, but for now this should get you up and running

## Introduction to NumPy

 One of the most commonly used packages in Python is [NumPy](https://numpy.org/), which adds modules for numerical computing tasks, including operating on multidimensional arrays of data, linear algebra, generating random numbers, and many many more. 

In Python, we have to import packages like NumPy into our Python workspace in order to use them.  To import Numpy, we can do the following:

In [1]:
import numpy as np

Here, we shorten the imported name to `np` for better readability of code using NumPy. This is a widely adopted convention that makes your code more readable for other Python users.  I do recommend to always use import numpy as `np`.  In theory, you could alternative call the numpy functions as `numpy.function`, but convention in Python has become to use `np.function`.

There are a lot of functions in the NumPy package.  You can see the documentation here: https://numpy.org/doc/stable/. When we introduce new functions in class, we'll discuss the options and input information necessary to use them successfully.  Here, we'll just look at a few examples of NumPy functions.

For instance, in the previous notebook we saw we could use base Python and simple notation to calculate the value of one number raised to a power of another.

In [2]:
2**2 # raise 2 to the power of 2

4

In base Python you can also use the `pow` function to do the same thing:

In [3]:
pow(2,2)

4

Finally, in NumPy we have a function that allows us to do the same thing.  We write the call to this function in Numpy in the following way:

In [None]:
np.power(2,2) # raise 2 to the power of 2 using NumPy

As you can see, and generally speaking, there are often multiple ways of doing the same thing across base Python and the associated packages, libraries, and additional modules.  Some might be faster or more intuitive that others, but all should (!) do the same thing.

An important thing we can do is create (and operate) on **arrays** using NumPy functions.  An array from NumPy is similar to a list from base Python in some ways - you use square brackets to create them, they are mutable, and you can index into their elements.  However, a list cannot handle mathematical operators, but arrays _can_ be used for doing math.   Arrays are also faster to use than lists and it is easier to create multidimensional arrays.  Arrays need to be declared using a specific command from `NumPy`, while lists can be simply formed with square brackets and an equals sign.  Lists can contain different data types (strings, numbers, etc.) and are often easier to modify as well.  Because they can be used for maths, however, much of what we will do in this class will use arrays and other more complex data types or structures. 

The combination of NumPy arrays and its ability to do linear algebra will form one of our core uses.  The NumPy Array data type can be simply formed using the `.array` method:

In [None]:
my_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
type(my_array) # you'll see that my_array is a NumPy ndarray data type

Let's go ahead and create another example array and look at some of NumPy's functionality.  In the code block below, we will use NumPy (`np`) to create a vector of the integers from 0 to 14 using `.arange`, then reshape (using `.reshape`) the resulting 15 numbers into a matrix with 3 rows and 5 columns. 

In [None]:
a = np.arange(0,15,1).reshape((3, 5)) # since start and step are optional in .arange, you could also write a = np.arange(15).reshape(3, 5)
a

Observe a few things about the commands above.  First, the object-oriented focus of Python allows us to string together several operations or methods in a row, instead of doing these on several lines.  Second, take a look at the module `.arange`.  This module takes an (optional) starting number, a stopping number, and an optional step size and generates a set of evenly spaced values separated by a given step between a certain interval.  As we learned in the Introductory notebook, an odd behavior of `.arange` is that the interval does not include the stop value (!) if it is an integer (you can see this perhaps as 'starting with 0, give me 15 numbers', but it is still counter-intuitive to me).  See more about `.arange` here: https://numpy.org/doc/stable/reference/generated/numpy.arange.html.

`.reshape` allows us to change the shape of a set of numbers to a certain dimension of row, columns, etc.  See here for more: https://numpy.org/doc/stable/reference/generated/numpy.reshape.html.  Notice that the shape of the new array (a matrix with 3 rows and 5 columns) is a _tuple_ indicated between parentheses (e.g. these values are related and immutable, since they together define the shape of the new array). 

Once again, you can also see that the matrix that was created is an `numpy.ndarray` or ndarray, a data type created by NumPy:

In [None]:
type(a)

You can also ask for the `dtype` or data type of a variable (see more here: https://numpy.org/doc/stable/reference/arrays.dtypes.html):

In [None]:
a.dtype

This indicates that the numbers in the ndarray `a` are 64-bit integers.  The precision/format/dtype of the variable can sometimes determine what operations will be valid, so being able to see this can help you debug your code if you're getting unexpected errors.

Now, our array (or, since it is a 2 dimensional array, we can call it a matrix as well) also has other characteristic we can reveal using other modules.  For instance, we can get the dimensions (how many rows, how many columns, etc.) of the matrix using `.shape`:

In [None]:
a.shape

Remember, as we discussed before, that methods applied to an object as above could also be applied using the following command, instead of treating `a` like an object:

In [None]:
np.shape(a)

Other useful methods for arrays include `.ndim` and `.size`:

In [None]:
print(a.ndim) # number of dimensions, in this case 2 for the row dimension and the column dimension
print(a.size) # the total number of elements

You can also add additional new dimensions to an existing array using `.newaxis` and `.expand_dims`

In [None]:
print(a.shape) # a has 2 dimensions
a2 = a[np.newaxis, :] # add a new dimension to a as the leftmost dimension and assign this to variable a2
print(a2.shape) # a2 now has 3 dimensions

Alternatively, we could add a new dimension with `.expand_dims`:

In [None]:
print(a.shape) # a has 2 dimensions
a3 = np.expand_dims(a, axis=2) # add a new dimension to a as the rightmost dimension and assign this to variable a3
print(a3.shape) # a2 now has 3 dimensions

There are also special types of arrays (or special matrices) we will want to create later in the class, including matrices of a certain size consisting entirely of ones:

In [None]:
z = np.ones((3, 4))
z

We can also generate a linear sequence of numbers of a certain size using `.linspace':

In [None]:
c = np.linspace(0, 2, 9) # or 'generate 9 numbers from 0 to 2, inclusive'
c

## Basic operations using arrays

Arithmetic operators applied to arrays apply _elementwise_. A new array is created and filled with the result:

In [None]:
a = np.array([20, 30, 40, 50]) # 4 numbers, [20, 30, 40, 50]
b = np.arange(4) # 4 numbers from 0 to 3, [0, 1, 2, 3]
c = a - b # element-wise subtraction e.g. 20-0, 30-1, 40-2, 50-3
print(c)

We can also apply comparisons to an array and get a Boolean array in return:

In [None]:
a < 35 # returns True, True, False, False

For multiplication, and unlike MATLAB for instance, the simple product (multiplication) is also element-wise, as opposed to matrix multiplication (which we'll learn about soon).  This means that as long as two arrays are the same size, you can readily multiply (or add, or subtract) them.

In [None]:
A = np.array([[1, 1],
              [0, 1]])
B = np.array([[2, 0],
              [3, 4]])
A * B # elementwise, or 1*2=2, 1*0=0, 0*3=0, 1*4=4

Addition is elementwise as well:

In [None]:
A + B # elementwise as well - the numbers in the same position in each matrix are added

NumPy arrays can be operated on along any of their dimensions.  The default behavior is to treat them as a simple list of numbers, even if you have an array with multiple dimensions.  Let's see how this works:

In [None]:
d = np.array([[ 0,  1,  2,  3], [ 4,  5,  6,  7], [ 8,  9, 10, 11]])
d

We created a 3 row by 4 column matrix above, but if we apply the `.sum` method without specifying a dimension to take the sum over, the default behavior is to treat the matrix as just a list of numbers:

In [None]:
d.sum() # sum of all the individual elements in d, as if they were a long list and not a matrix!

So, if you want to operate on your ndarray as if it was a matrix, you can specify the dimension you want to operate along by adding the `axis=` option.  The axis specifies the dimemsion (0th, 1st, 2nd, etc) along which to apply the operation.  The following figure may be useful in understanding how Python thinks about these dimensions:

![image.png](attachment:image.png)

Let's take the sum down the rows (so, as if adding columns of numbers in a spreadsheet) on our 2 dimensional matrix `d`:

In [None]:
d.sum(axis=0) # sum down the columns, so you end up with 4 values, this is axis=0

In [None]:
d.sum(axis=1) # sum along the rows, so you end up with 3 values corresponding to each row, this is axis=1

Similar to list, with arrays you can index into the elements or slice them out.  Let's first create a new array to work with:

In [None]:
a = np.arange(10)**3 # create an array of numbers from 0 to 9 and raise to the 3rd power
a

As with lists and tuples, in arrays you can use square brackets after the array variable name to look at the individual elements.  Keeping in mind the effect of zero indexing, here is the value of the element in position 2:

In [None]:
a[2] # square brackets for indexing, will return the value in the '2' position: 0th, 1st, 2nd ...

In [None]:
a[2:5]  # note that the output of indexing or slicing an array is also an array


You can also use indexing to alter the values of the array, since arrays are mutable.  So for instance, the code below says 'starting with position zero and going to position 6, make every 2nd position have the value of 1000':

In [None]:
a[0:6:2] = 1000 # starting with position 0 and going to position 6 by steps of 2, change those entries to 1000
a # value in position 0, 2, and 4 are now 1000 - not it didn't change position 6! 

As we showed with lists, with indexing you can also leave out some of the elements and just have the ':' ... for instance, the lines of code below would give the same result as above:

In [None]:
a = np.arange(10)**3
a[:6:2] = 1000 # when the zero index position is implied (as it is here) you can omit it for the same result
a

There are various ways to manipulate the shape of an array.  Let's start with a new array that has 3 rows and 3 columns:

In [None]:
my_array = np.array([(0, 1, 2),(3, 4, 5),(6, 7, 8)])
print(my_array)


You can flatten the array -- change it from a 3x3 to a single column of values, using `.ravel`:

In [None]:
my_array.ravel() # ending .ravel in () prints the result to the notebook output

In [None]:
my_array.shape # note that this didn't change the dimensions of my_array permanently! 

A word about whether methods change the original variable or not:  As you can see above, using `.ravel` doesn't alter the shape of the original `my_array`.  We can differentiate between methods (or functions) that return a new array that would to be assigned to a new variable, or methods that alter variables 'in place'.   This behavior may be inconsistent across packages and methods, so be wary of this. 

We previous saw an application of the `.reshape` command above.  Here, we can specify the new number of rows, columns, etc. 

In [None]:
my_array.reshape(9,1) 

Finally -- and this will be important later in the course -- we can transpose the array, which interchanges the rows and the columns (so, the first row becomes the first column, the second row becomes the second column, etc).  This is accomplished in Numpy with `.T`

In [None]:
my_array.T

In the case of all three of the methods above, the original array is not modified:

In [None]:
my_array.ravel().shape

In [None]:
my_array.reshape(9,1).shape

In [None]:
print(my_array.T)
my_array.T.shape

In [None]:
print(my_array)
my_array.shape

On the otherhand, using the `.resize` operator will change the shape of the original matrix:

In [None]:
my_array.resize(9,1)
print(my_array)

It is worth thinking very carefully about how reshaping or otherwise manipulating the order of the values in a data array works.  For instance, let's take a look at another array:

In [None]:
a = np.arange(1,13,1).reshape((3, 4)) # since start and step are optional in .arange, you could also write a = np.arange(15).reshape(3, 5)
print(a)

In [None]:
a.reshape(2,6)

Python accomplishes this reshaping by operating row-wise.  So, in otherwords, specifying that the array above should be reshaped from 3 rows and 4 columns into 2 rows and 6 columns causes the first two entries from the 2nd row to be moved from the 2nd row and appended to the righthand side of the first row, then the remaining values in the 2nd row had all the values in the 3rd row appended to them.   This gives you a 2x6 matrix assembled by giving primacy to the rows. 

Why am I starting the obvious? Because if you come from another programming background (as I do, with MATLAB), it might not be done this way.  Indeed, THIS IS NOT HOW MATLAB WOULD DO IT.  In MATLAB, reshaping our matrix to 2 rows and 6 columns would be accomplished by operating column-wise, not row-wise as Python does above.  In MATLAB, the 2x6 reshaped array would have the first column be 1 and 5 (take the first two entries in the first column from the first column).  **This is really important and I urge you to always check that the way reshaping and matrix manipulation is working is the way you think it is.**

You can reshape in the FORTRAN or MATLAB way by adding an `order='F'` to the reshape command - compare the results below with the one above:

In [None]:
# since start and step are optional in .arange, you could also write a = np.arange(15).reshape(3, 5)
a = np.arange(1, 13, 1).reshape((3, 4))
print(a)
a.reshape(2,6,order='F')

We can also stack or concatenate arrays (or lists!) in Numpy.  Let's use the following example:

In [None]:
# we'll use .asarray here to go from 2 dimensional lists to 2-dimensional arrays, 
# BUT the stacking methods work on lists as well and return arrays
a = [(1,2),
     (3,4)]

a = np.asarray(a) # turn a from a list to an array

b = [(5,6),
     (7,8)]

b = np.asarray(b)

We can use the following commands to stack these arrays vertically or horizontally:

In [None]:
np.vstack((a,b)) # note the double parentheses

In [None]:
np.hstack((a,b))

Alternatively, you can use `.concatenate` and specify the axis (dimension) along which to stack:

In [None]:
np.concatenate((a, b), axis=0) # stack vertically, or 'down rows'


In [None]:
np.concatenate((a, b), axis=1) # stack horizontally, or 'across columns'

You can also split arrays into smaller pieces of arrays using `hsplit`, `vsplit`, and/or `split`:

In [None]:
c = np.concatenate((a, b), axis=1) # creates a 2x4 matrix
print(c)
np.hsplit(c,2) # splits c into 2 arrays along the horizontal dimension

In [None]:
np.vsplit(c,2) # splits x into 2 arrays, each 1x4, along the vertical direction

There may be other instances where you need to reorder or flip an array.  NumPy has methods of this as well:

In [None]:
my_array = np.array([1, 2, 3, 4, 5, 6, 7, 8])
np.flip(my_array)


if you have a 2 dimensional array, you can specify the dimension along which to flip:

In [None]:
a = [(1,2),
     (3,4)]

a = np.asarray(a)

print(np.flip(a,axis=0)) # row-wise flip, or 'up-down'
print(np.flip(a,axis=1)) # column-wise flip, or 'left-right'
