# ![](http://www.numpy.org/_static/numpy_logo.png) Introduction to NumPy
##### NumPy supports arrays which are very useful to numerical computations
* Arrays are N dimensional: 1d (vector), 2d (plane),...,N dim
* Many packages use numpy arrays to store data


* Arrays can be used to make calculations in one command, without loops or list comprehension.
  * This is known as *vectorisation*


* Using vectorisation will make your code faster, and easier to read.
  * Arrays are (generally) faster than lists.
  
See more in this guide: https://github.com/rougier/from-python-to-numpy

### What is vectorisation?

Say you want to calculate an array of values `array2`, based on your original data in `array1`. 

Vectorised example:
```python
array2 = array1 * k + c
```

Non-vectorised example, requires a loop:
```python
for i in range(len(array1)):
    array2[i] = array1[i] * k + c   
```

Without vectorisation, multi-dimensional arrays would need multiple nested loops.

### Do we still need lists?
 
* Lists can have different objects as elements. Arrays are homogenous.
```python
example_list = [number, string, cat, dog]
example_array = [cat1, cat2, cat3]
```
* Lists can be nested 
```python
nested_list = [[1, 2], ['a', 'b', 'qwerty'], [1]]
```
Arrays can also be nested but it negates some of the advantages of n-dimensional arrays

## Let's get started ...

In [None]:
import numpy as np

### Looking for help?

* Documentation: http://docs.scipy.org/doc/numpy/reference/
* Use help function (remember tab will show options available)
```python
    help(np.mean)
```
* Interactive help: NumPy has an a built-in search engine

In [None]:
np.lookfor('weighted average')

### Creating an array from a list

In [None]:
a1d = np.array([3, 4, 5, 6])
a1d

In [None]:
a2d = np.array([[10., 20,  30], [9, 8, 5]])
a2d

In [None]:
print( type( a1d[0] ) )
print( type( a2d[0,0] ) )

In [None]:
type(a1d)

The **core class** of NumPy is the `ndarray` (homogeneous n-dimensional array).

To find methods or attributes:

```
a1d.   ->tab
```
More on this below.

### Common mistakes

Say we forget to use the square brackets:

In [None]:
try:
    a = np.array(1,2,3,4)   # WRONG, only 2 non-keyword arguments accepted
except ValueError as err:
    print('ValueError: ',err)

# help(np.array)

In [None]:
a = np.array([1,2,3,4]) # RIGHT
print(a)

Arrays can be created using `np.ndarray`, but this works differently to `np.array`. Here you specify the dimensions within the first set of square backets.

In [None]:
np.ndarray([1,2,3,4]) #  This is not the recommended way to create arrays

Result: an "empty" 4-D array.

This is not the recommended method for creating arrays. `ndarray` is a class, with `array` the recommended function. 

Below is a number of other possible methods for allocating arrays. 

### Functions for creating arrays

#### ``np.arange([start,] stop[, step,], dtype=None)``

evenly spaced, *defined by step*

In [None]:
np.arange(1, 9, 2)

In [None]:
# for integers, np.arange is same as range but returns an array instead of a list 
np.array( range(1,9,2) )

#### ``np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)``

evenly spaced, *defined by length*

In [None]:
np.linspace(0, 1, 10)   # start, end, num-points

Note that for ``np.linspace`` the upper value is included within the array. 
This is different to the result for ``range`` or ``np.arange``.  

---

## Exercise 1

In [None]:
# a) Create array with units of seconds through the day, from 00:00 to 24:00, inclusive 
# b) Change the length of your array, to have either 1 second, 1 minute, or 1 hour intervals.
# (enter your code below)



---

### More functions for creating arrays


#### Empty array

In [None]:
np.empty((2,2))

As seen in the example with `ndarray` above, the array is not truly empty. Result is an array of *uninitialised* (arbitrary) values. 

This method should only be used if all values will be allocated at a later stage i.e. use with caution. 

####  Array filled with zeros

In [None]:
np.zeros((2, 2))

By default, the dtype of the created array is float64 but other dtypes can be used:

In [None]:
np.zeros((2, 2), dtype=int)

#### Filled with ones

In [None]:
np.ones((2, 3))

#### Filled with random numbers

In [None]:
np.random.rand(4)       # uniform in [0, 1]

In [None]:
np.random.normal(0, 10, size=4)      # Gaussian (mean, std dev, size/num samples)

In [None]:
np.random.gamma(1, 1, (2,2))      # Gamma (shape of distribution, scale, size/num samples)

As you can see, you can get multiple outputs in whatever distribution you'd like. 

The keyword for 'size' is optional. Note that in the third example, size is given as a tuple.

### Grid generation

* A common task is to generate a pair of 2D (or ND) arrays that represent data coordinates. 
  * Useful for interpolation of mapping contours.

* When orthogonal 1D coordinate arrays already exist, NumPy's `meshgrid` function is very useful:

In [None]:
x = np.linspace(-5, 5, 3)
y = np.linspace(10, 40, 4)
print(x)
print(y)

In [None]:
x2d, y2d = np.meshgrid(x, y)
print(x2d)
print(y2d)

### Transpose arrays 
This can be very useful when dealing with grids; there are several ways:

In [None]:
print(y2d,'\n')
print(np.transpose(y2d),'\n') # using a numpy function
print(y2d.transpose(),'\n')   # using a method of y2d
print(y2d.T)                  # using a property of y2d (i.e. a specific version of general methods above)

---

---

## Exploring arrays: Array indexing

* Indices begin at 0, like other Python sequences and C/C++. 
  * Note that many languages, such as Matlab, R and Fortran, start with 1
  
* In 2D, the first dimension corresponds to rows, the second to columns. This is known as [row-major indexing](https://numpy.org/doc/stable/glossary.html#term-row-major). 

* For multi-dimensional arrays, the order of axes in python follows C-style indexing. 
    * *The fastest varying dimension is the last dimension.* 

As a simple example, consider an array with 2 rows and 4 columns. In a row-major (or C-style) index system, values will be read in the order (from 0-7): 
```
| 0 | 1 | 2 | 3 |
| 4 | 5 | 6 | 7 | 
```
In terms of nested loops (which we should avoid if possible in our code!), this array is being accessed using: 
```
for row in rows: 
  for column in columns: 
    ...
```    
This has an impact on the time taken to access each index e.g. it is quicker to access all items in the first axes. 

A nice explanation of this can be found [here](https://agilescientific.com/blog/2018/12/28/what-is-the-fastest-axis-of-an-array).

### 1D Examples

For 1D arrays, indexing is exactly the same as discussed for lists [here](https://github.com/ueapy/pythoncourse2020-materials/blob/master/notebooks/07-Built-in-Data-Structures.ipynb)

In [None]:
a = np.arange(10, 100, 10)
a

In [None]:
a[0]

In [None]:
a[2:9:3] # [start:end:step]

Notice that the 'end' number, 9, actually lies out of bounds (try `a[9]` and see that it gives an error).  
The end-index is never inluded.

In [None]:
a[:3] # last is not included

In [None]:
a[-2] # negative index counts from the end

Here we started counting 'down' from -1 (and not -0!).

### Indexing in practice: How to calculate x[ i ] - x[ i-1 ] without a loop?

In [None]:
x = np.random.rand(6) # create an array of random numbers (0-1)
x = np.sort(x)        # sort them in order of ascending value
print(x)

In [None]:
x[1:] - x[:-1]

In [None]:
# Note there is actually a function in NumPy that can do this for us
np.diff(x) 

---

## Exercise 2

Create a 2D NumPy array from the following list and assign it to the variable "a":

In [None]:
# [[2, 3.2, 5.5, -6.4, -2.2, 2.4],
#  [1, 22, 4, 0.1, 5.3, -9],
#  [3, 1, 2.1, 21, 1.1, -2]]

a) Can you guess what the following slices are equal to? Print them to check your understanding.

In [None]:
# a[:, 3]

In [None]:
# a[1:4, 0:4]

In [None]:
# a[1:, 2]

b) How would you extract: i) the last column; ii) the row before last?

In [None]:
# a[]

In [None]:
# a[]

---

### Fancy indexing

NumPy arrays can be indexed with slices, but also with boolean or
integer arrays (masks)

In [None]:
a = np.random.randint(1, 100, 6) # array of 6 random integers between 1 and 100
a

First, an example with an array of boolean values:

In [None]:
mask = ( a % 3 == 0 ) # Where divisible by 3 (% is the modulus operator).
mask

In [None]:
a[mask]

Now an example with an array of integers:

In [None]:
b = np.array([0,3,5])
a[b]

---
---

## Array attributes

In [None]:
a2d = np.array([[10., 20,  30], [9, 8, 5]])
type(a2d)

#### ndarray.ndim
the number of dimensions (axes) of the array. In NumPy, the number of dimensions is referred to as rank.

In [None]:
a2d.ndim

#### ndarray.shape
the dimensions of the array

In [None]:
a2d.shape

This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m).  
The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

In [None]:
# Let's use those values

NLines,NCols = a2d.shape
print('NLines:', NLines,'NCols:',NCols)

#### ndarray.size
the total number of elements in the array

In [None]:
a2d.size

Note that for an ND array `size` is not equal to `len()`. The latter returns the length of just the *first* dimension.

In [None]:
len(a2d)

#### ndarray.dtype

type of data within the array

In [None]:
a2d.dtype

---

## Copies and Views (a warning)

In [None]:
original = np.array([99,98,97])

other = original
other[0] = 0

What do we think has happened to the two arrays? 

In [None]:
print('other is now ',other)
print('original is now ',original)

NumPy, in its frugality, will create a *view* by default, unless told to make a copy.

In [None]:
original = np.array([99,98,97])
copy = original.copy()
copy[0] = 0

print('copy is now ',copy)
print('original is now ',original)

Be aware that views are also used when you create slices of arrays ... 

In [None]:
original = np.ones((4,3))

row = original[:,0]
row[2:] = 10 

print('row is now ',row)
print('original is now \n',original)

... but not when you use "fancy indexing".  

In [None]:
original = np.ones((4,3))

fancy = original[0,[0,1,2]]
fancy[:] = 10 

print('fancy is now ',fancy)
print('original is now \n',original)

**To avoid this leading to errors propagating through your data, be sure to check whether a copy is needed.** 

### Copies in functions vs. methods


From help(numpy):

```
Most of the functions in `numpy` return a copy of the array argument
(e.g., `np.sort`).  In-place versions of these functions are often
available as array methods, i.e. ``x = np.array([1,2,3]); x.sort()``.
Exceptions to this rule are documented.
```

In [None]:
original = np.array([99,98,97])

# Function np.sort
sortedCopy = np.sort(original)
print('original:',original,'returned:',sortedCopy)

# Method sort()
original.sort()
print('original:', original)


No new variable (or copy) is created with the method - methods act on the array they're attached to.  
So using `.sort()` is equivalent to: 
```
orginal = np.sort(original)
```

---

## NumPy Statistics 

NumPy has a large number of useful methods and functions, enabling you to perform statistical operations on arrays. 

A full list can be found in the [NumPy documentation](https://numpy.org/doc/stable/reference/index.html), but a selection of useful methods and functions are provided below. 

### Statistical methods of arrays

In [None]:
a1d=np.random.normal(0,10,5) 
print('array a1d                       :', a1d)
print('Minimum and maximum             :', a1d.min(), a1d.max())
print('Index of minimum and maximum    :', a1d.argmin(), a1d.argmax())
print('Sum and product of all elements :', a1d.sum(), a1d.prod())
print('Mean and standard deviation     :', a1d.mean(), a1d.std())


A full list of available methods can be found [here](https://numpy.org/doc/1.18/reference/generated/numpy.ndarray.html).

### Statistical functions

https://numpy.org/doc/stable/reference/routines.statistics.html    

In [None]:
print('Median and percentile           :', np.median(a1d), np.percentile(a1d,75))

### Operations over a given axis

In [None]:
print(a2d)
print('sum  :',a2d.sum())
print('sum  :',a2d.sum(axis=0))
print('sum  :',a2d.sum(axis=1))

### What about NaN values?

A series of functions are available that will excluding NaN values (missing numbers) from your calculations.

In [None]:
a1d[2] = np.nan
print(a1d)

In [None]:
print('mean  :',np.mean(a1d))
print('nanmean  :',np.nanmean(a1d))

### Vectorisation: operations on whole arrays

In [None]:
a=np.random.rand(4)
print(a)

result = np.exp(a/100.)/a

print(result)

In [None]:
# Non-vectorised
result=np.zeros(a.shape)   # create an array to hold the results

for i in range(a.size): 
    result[i] = np.exp(a[i]/100.)/a[i]
    
print(result)

Vectorization is generally faster than using `for` loops. 

However, for more complicated algorithms it might not always be possible, or the most readable


---

## Exercise 3

Consider a 4 x 5 2D array of negative integers:

In [None]:
a = np.arange(-100, 0, 5).reshape(4, 5)
a

Suppose you want to return an array `result`, which has the squared value when an element in array `a` is greater than `-90` and less than `-40`, and is 1 otherwise.

Using a `for` loop, the result would look like this:

In [None]:
result = np.zeros(a.shape, dtype=a.dtype)    # pre-allocate the result array

for i in range(a.shape[0]):                  # loop over rows
    for j in range(a.shape[1]):              # loop over columns
        if a[i, j] > -90 and a[i, j] < -40:  # only square the number if within the chosen limits
            result[i, j] = a[i, j]**2
        else:                                # set to 1 otherwise
            result[i, j] = 1
            
result

**Can you write a vectorised solution?**

Hint: use np.logical_and() to create a condition for indexing (information on all NumPy's logic functions can be found [here](https://numpy.org/doc/stable/reference/routines.logic.html)). 


In [None]:
# Your code here



---
---

## Masked arrays - how to handle (propagating) missing values

![](../figures/masked_array.png)

All operations related to masked arrays live in `numpy.ma` submodule.

The simplest example of manual creation of a masked array:

In [None]:
a = np.ma.masked_array(data=[1, 2, 3],
                       mask=[True, True, False],
                       fill_value=-999)
a

Often, a task is to mask array depending on a criterion.

In [None]:
a = np.linspace(1, 15, 15)

In [None]:
masked_a = np.ma.masked_greater_equal(a, 11)

In [None]:
masked_a

Note: In a masked array, unlike setting values to NaN, the raw data still exists, should you need it. 

In [None]:
masked_a.data

---

## Exercise 4

1. Create a "data" array of evenly spaced numbers, in the interval (-10, 20) spaced by 0.5
2. Calculate the (natural) logarithm of the data
3. Create a condition i.e. a True/False (boolean) array, that you can use to mask these results
    - The resulting array should be masked when either of the following conditions apply
        - larger or equal than 10
        - larger than -1 and smaller than 1 
        - data is not a real number
4. Mask the array depending on these conditions


In [None]:
# Your code:
# 1. Hint: use `np.linspace` or `np.arange` functions


In [None]:
# 2.

In [None]:
# 3. Hint: use np.isfinite


In [None]:
# 4. Hint: use np.ma.masked_where(condition,arr)


---
---

## Shape manipulation

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
print('array: \n {}'.format(a))
print('shape: {}'.format(a.shape))

In [None]:
a.flatten()

In [None]:
a.repeat(4)

In [None]:
a.reshape((3, 2)) # NB. new shape must be consistent with number of values. 

In [None]:
print('Old shape: {}'.format(a.shape))
print('New shape: {}'.format(a.reshape((3, 2)).shape))

---
## Exercise 5

Generate a 2d array with 5x5. The first value is 0 and it grows left to right and top to bottom in increments on 0.1.

In [None]:
# Your code here


---
## Further axes manipulation


We saw earlier that it is possible to *transpose* an array i.e. flip the order of axes. 

The `transpose` function can also be used more generally to change the order of axes within an array. 


In [None]:
a3d = np.ones((4,3,2))
print(a3d.shape)

new_order = np.transpose(a3d,[1, 2, 0]) # numbers in list refer to axes
print(new_order.shape)

#### Dimensions can also be added

In [None]:
a3d[..., np.newaxis].shape

---

## Broadcasting

The fact that NumPy operates on an element-wise basis means that in principle arrays must always match one another's shape. However, NumPy will also helpfully "broadcast" dimensions when possible. 

**The Broadcasting Rule**  
In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.

In [None]:
# Example

a1d = np.arange(0,5)
a2d = np.zeros((5,3))
print(a1d)
print(a2d)

# try:
# a1d + a2d

# add a trailing axis: 
# a1d[..., np.newaxis] + a2d

## References
* [NumPy docs](https://numpy.org/doc/stable/)
* [SciPy lectures](http://www.scipy-lectures.org/)

If you're moving from Matlab to python, this link could be particularly useful: 
* [NumPy for Matlab users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)
   * Contains a reference table of Matlab-NumPy equivalents.