# Python for Spatial Analysis
## Second part of the module of GG3209 Spatial Analysis with GIS.
### Notebook to learn and practice NumPy

---
Dr Fernando Benitez -  University of St Andrews - School of Geography and Sustainable Development - First Iteration 2023 v.1.0 

### Introduction 

During the previous weeks we practice and learn the python basics skills **functions**, **flow control**, **loops**, **modules**, and **reading/writing** data from disk. This notebook aims to work with the library NumPy, the core foundation of ost of the geospatial libraries, so is important we understand how does it work.

**NumPy** allows for the efficient analysis and processing of data arrays with varying sizes, shapes, and number of dimensions while **Pandas** allows for reading in and working with data tables. In this section, we will focus on NumPy. 

After working through this notebook you will be able to:

### Content

1. Describe and use NumPy data types.  
2. Describe the data type, size, shape, and number of dimensions in an array. 
3. Create, reshape, and slice **NumPy arrays**.
4. Perform numeric and comparison operations on arrays.

Please go through every cell, reading carefully all descriptions and run the code cell to see the examples.


# NumPy

The **NumPy** and **Pandas** libraries are central to data science in Python. NumPy allows for the efficient analysis and processing of data arrays with varying sizes, shapes, and number of dimensions while Pandas allows for reading in and working with data tables. In this section, we will focus on NumPy. 

After working through this module you will be able to:

1. Describe and use NumPy data types.  
2. Describe the data type, size, shape, and number of dimensions in an array. 
3. Create, reshape, and slice **NumPy arrays**.
4. Perform numeric and comparison operations on arrays.

## Creating NumPy Arrays

The **NumPy** library allows for creating and working with **arrays**. It is very fast and memory efficient. As mentioned, arrays are similar to **lists** in that they store a series of values or elements. However, arrays can be expanded to include many dimensions.

For example, an image could be represented as an array with 3 dimensions: height, width, and channels. In short, array-based calculations and manipulations are essential to data science, so **NumPy** is an important library to learn if you work in the Python environment. 

The complete documentation for NumPy can be found [here](https://numpy.org/).

Before you can use NumPy, you must make sure that it is installed into your Anaconda environment, our **py4sa environment** has integrated this library. Once NumPy is installed, you will need to import it in order to use it in your code. It is common to assign NumPy an alias name of "np" to simplify your code.  

In [18]:
import numpy as np

**Lists** can be converted to NumPy arrays using the *array()* method. Once the list object is converted to an array the type is defined as *numpy.ndarray*, which indicates that it is a NumPy array specifically. Since this array only has one dimension, it is specifically called a **vector**.

In [2]:
lst1 = [3, 6, 7, 8, 9]
arr1 = np.array(lst1)
print(type(lst1))
print(type(arr1))
print(arr1)

<class 'list'>
<class 'numpy.ndarray'>
[3 6 7 8 9]


A two dimensional array is known as a **matrix**. In the example below, I am generating a matrix array from a list of lists. 

In [3]:
lst2 = [[3, 6, 7, 8, 9], [3, 6, 7, 8, 9], [3, 6, 7, 8, 9]]
arr2 = np.array(lst2)
print(arr2)

[[3 6 7 8 9]
 [3 6 7 8 9]
 [3 6 7 8 9]]


Again, one of the powerful advantages of NumPy arrays is the ability to store data in arrays with many dimensions. In the example below, you are creating a three dimensional array from a list of lists of lists. 

This would be similar to an image with dimensions image height, image width, and image channels (for example, red, green, and blue).

A four dimensional array could represent a time series (height, width, channels, and time) or a video containing multiple frames (frame height, frame width, channels, and frame number).


In [4]:
lst3 = [[[3, 6, 7, 8, 9], [3, 6, 7, 8, 9], [3, 6, 7, 8, 9]], 
         [[3, 6, 7, 8, 9], [3, 6, 7, 8, 9], [3, 6, 7, 8, 9]], 
         [[3, 6, 7, 8, 9], [3, 6, 7, 8, 9], [3, 6, 7, 8, 9]]]
arr3 = np.array(lst3)
print(arr3)

[[[3 6 7 8 9]
  [3 6 7 8 9]
  [3 6 7 8 9]]

 [[3 6 7 8 9]
  [3 6 7 8 9]
  [3 6 7 8 9]]

 [[3 6 7 8 9]
  [3 6 7 8 9]
  [3 6 7 8 9]]]


Other **methods** for generating arrays. 

1. the **tile()** method created repetitive arrays, which accepts:

    * *A*—A number, list, or array to repeat

    * *reps*—How many times to repeat it?

2. **.repeat()** creating repetitive arrays. Unlike np.tile, the np.repeat function repeats each element consecutively and not the entire array (or list).


1. The **arange()** method returns an array of evenly spaced values and accepts start, stop, step, and data type parameters. In the example, you have created an array of evenly spaced values from 0 to 100 with a step size of 5. Specifically define the data type as integer, but NumPy can infer a data type if it is not provided. 

2. The **linspace()** method is similar to *arange()*; however, a number of samples is specified as opposed to a step size. In the example, since 5 samples are requested, 5 evenly spaced values between 0 and 100 are returned. 

Another useful methods:

3. The **ones()** method is used to return an array of 1s. In the example, I have generated a three dimensional array where the first dimension has a length of 3, the second a length of 4, and the third a length of 4. The shape and dimensions of the array are specified using a tuple. 

Similar to *ones()*, *zeros()* generates an array of zeros.

4. It is also possible to generate random values between 0 and 1 (**random.rand()**) and a specified number of random integer values between two values (**random.randint()**). 

In [5]:
arr2 = np.tile(12, 7)
print(arr2)

arr3 = np.repeat([1, 2], 3)
print(arr3)

arr4 = np.arange(0, 100, 5, dtype="int")
print(arr4)

arr5 = np.linspace(0, 100, 5, dtype="int")
print(arr5)

arr6 = np.ones((3, 4, 4))
print(arr6)

arr7 = np.zeros((3, 4, 2))
print(arr7)

arr8 = np.random.rand(3, 4, 5)
print(arr8)

arr9 = np.random.randint(1, 200, 7)
print(arr9)

[12 12 12 12 12 12 12]
[1 1 1 2 2 2]
[ 0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95]
[  0  25  50  75 100]
[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
[[[0. 0.]
  [0. 0.]
  [0. 0.]
  [0. 0.]]

 [[0. 0.]
  [0. 0.]
  [0. 0.]
  [0. 0.]]

 [[0. 0.]
  [0. 0.]
  [0. 0.]
  [0. 0.]]]
[[[0.44990505 0.1518331  0.39699194 0.05821834 0.24228859]
  [0.22341729 0.18354277 0.73037906 0.55518633 0.91299664]
  [0.9271362  0.20487093 0.51623381 0.05440539 0.61072077]
  [0.90391342 0.1686723  0.35298835 0.19290319 0.78841508]]

 [[0.71538242 0.87201218 0.66630802 0.69398758 0.22561166]
  [0.7232225  0.51774147 0.05754919 0.12162593 0.435862  ]
  [0.09201323 0.65162255 0.19256906 0.90433878 0.49631103]
  [0.93319527 0.80987813 0.88156399 0.90126271 0.02306528]]

 [[0.46704675 0.88416805 0.01893544 0.84317743 0.1162956 ]
  [0.45812775 0.1297

## NumPy Data Types

NumPy provides additional and more specific data types in comparison to base Python. Here, there is a brief explanation of commonly used data types. 

* *bool_*: Boolean *True* or *False*
* *int8*: 8-bit signed integer (-128 to 127)
* *int16*: 16-bit signed integer (-32,768 to 32,767)
* *int32*: 32-bit signed integer (-2,147,483,648 to 2,147,483,647)
* *int64*: 64-bit signed integer (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
* *uint8*: 8-bit unsigned integer (0 to 255)
* *uint16*: 16-bit unsigned integer (0 to 65,535)
* *uint32*: 32-bit unsigned integer (0 to 4,294,967,295)
* *uint64*: 64-bit unsigned integer (0 to 18,446,744,073,709,551,615)
* *float16*: half precision float
* *float32*: single precision float
* *float64*: double precision float

**Signed integers** can differentiate positive and negative values while **unsigned integers** cannot. **Float** data can store decimal values while **integer** data cannot. There are also data types for complex numbers, which we will not discuss here. 

Below we could demonstrated how to define the data type with the *dtype* parameter.

In all cases I am using *.ones()* to create an array with three elements.

For both *int8* and *int16*, 1 as an integer value is returned. When the data are defined as *float16*, 1 as a float value is returned (1.).

Lastly, when the type is set to *bool_* Boolean *True* is returned since 1 indicates *True* and 0 indicates *False*. 

Note that the **data type will impact the amount of memory needed**. For example an *int8* will require less memory than an *int16*. 

Generally try to use the data type that can represent the data with the least amount of memory unless a specific data type is needed in an analysis. 

In [6]:
arr1 = np.ones((3), dtype="int8")
print(arr1)
print(arr1.dtype)

arr1 = np.ones((3), dtype="int16")
print(arr1)
print(arr1.dtype)

arr1 = np.ones((3), dtype="float16")
print(arr1)
print(arr1.dtype)

arr1 = np.ones((3), dtype="bool_")
print(arr1)
print(arr1.dtype)

[1 1 1]
int8
[1 1 1]
int16
[1. 1. 1.]
float16
[ True  True  True]
bool


## Understanding and Manipulating Array Shape and Dimensions

Let's spend some time discussing the dimensions and shape of an array. 

The **shape** of an array relates to the length of each dimension.

The *len()* function will return the length of the first dimension (in this case 3).

To obtain a tuple of the lengths for all dimensions, you must use the *shape* property.

So, the array generated has three dimensions with lengths of 3, 4, and 4, respectively. The number of dimensions is returned with the *ndim* property. The *size* property returns the number of features in the array. There are 48 features in the example array: 3 X 4 X 4 = 48. The *dtype* property provides the data type. 

In [7]:
arr6 = np.ones((3, 4, 4))

print("Length of first dimension: " + str(len(arr6)))
print("Shape of array: " + str(arr6.shape))
print("Number of dimensions: " + str(arr6.ndim))
print("Size of array: " + str(arr6.size))
print("Data type of array: " + str(arr6.dtype))

Length of first dimension: 3
Shape of array: (3, 4, 4)
Number of dimensions: 3
Size of array: 48
Data type of array: float64


**NumPy** has a built-in methods for changing the shape of an array, *.reshape()*. Note that the number of features or size of the array must perfectly fill the new shape. In the first example, the number of dimensions is the same but changing the shape or length of each dimension. In the second two examples, you are  converting the three-dimensional array to two-dimensional arrays. Lastly, you can convert the array to a one-dimensional array, or vector, with a length of 48.

In [19]:
arr6b = arr6.reshape(4, 4, 3)
arr6c = arr6.reshape(4, 12)
arr6d = arr6.reshape(12, 4)
arr6e = arr6.reshape(48)
print(arr6b)
print(arr6c)
print(arr6d)
print(arr6e)

[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


When reshaping, it is possible to have NumPy determine the appropriate size of a single dimension to fill an array with the available elements. This is accomplished using -1 in the array dimension location when applying the *.reshape()* method. 

In [9]:
arr10 = np.random.randint(1, 1200, 1000)
arr10b = arr10.reshape(-1, 10, 10)
arr10c = arr10.reshape(10, -1, 10)
arr10d = arr10.reshape(10, 10, -1)
print(arr10.shape)
print(arr10b.shape)
print(arr10c.shape)
print(arr10d.shape)

(1000,)
(10, 10, 10)
(10, 10, 10)
(10, 10, 10)


## NumPy Array Indexing

Similar to lists, NumPy arrays are indexed. 

So, **values from the array can be extracted or referenced using their associated index.**

Since arrays often have multiple dimensions, indexes must also extend into multiple dimensions. 

See the **comments** below for general array indexing rules.

Remember that indexing starts at **0**, the first index provided is included, and the last index provided is not included. 

Extracting portions of an array is known as **slicing**.

![image.png](attachment:image.png)

In [10]:
arr11 = np.linspace(0, 50, 50, dtype="int")
arr12 = arr11.reshape(2,5,5)
print("Original array")
print(arr12)
print("All values in first index of first dimension")
print(arr12[0]) #This will extract just the values from the first index in the first dimension
print("All values in second index of first dimension")
print(arr12[1]) #This will extract just the values from the second index in the first dimension
print("All values in first index of first dimension and first index of second dimension")
print(arr12[0][0]) #This will extract all values occurring in the first index of both the first and second dimensions
print("A single value specified with three indexes, one for reach dimension")
print(arr12[1, 3, 3]) #This will extract a specific value based on an index in all three dimensions
print("Incorporating ranges")
print(arr12[1, 0:2, 0:2]) #All values in second index of first dimension that are also include in the first to second index of the second and third dimensions
print("Using colons")
print(arr12[:, 0:2, 0:2]) #Only a colon means select all values in a dimension
print(arr12[:,2:,0:2]) #Can also use colons to select all values before or after an index

Original array
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]

 [[25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]
  [40 41 42 43 44]
  [45 46 47 48 50]]]
All values in first index of first dimension
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
All values in second index of first dimension
[[25 26 27 28 29]
 [30 31 32 33 34]
 [35 36 37 38 39]
 [40 41 42 43 44]
 [45 46 47 48 50]]
All values in first index of first dimension and first index of second dimension
[0 1 2 3 4]
A single value specified with three indexes, one for reach dimension
43
Incorporating ranges
[[25 26]
 [30 31]]
Using colons
[[[ 0  1]
  [ 5  6]]

 [[25 26]
  [30 31]]]
[[[10 11]
  [15 16]
  [20 21]]

 [[35 36]
  [40 41]
  [45 46]]]


Once values have been selected using index notation they can be changed. In the example below I have converted all values in the first index of the first dimension and the first index of the second dimension to 0. 

In [11]:
arr12[0][0] = 0
print(arr12)

[[[ 0  0  0  0  0]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]
  [20 21 22 23 24]]

 [[25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]
  [40 41 42 43 44]
  [45 46 47 48 50]]]


## Boolean Arrays

It is also possible to create arrays of Boolean values as demonstrated below. 

In [12]:
arr13 = np.array([True, False, True, False, True, False, True, False, False])
arr13b = arr13.reshape(3, 3)
print(arr13b)

[[ True False  True]
 [False  True False]
 [ True False False]]


**Comparison Operators** can be used to compare each value in an array to a value and return the Boolean result to the associated position in a new array.

In [13]:
arr10 = np.random.randint(1, 1200, 100)
arr10b = arr10.reshape(10, 10)
print(arr10b)
arr10bool = arr10b > 150
print(arr10bool)

[[ 867  233  903 1052  197 1135  479  411 1081   90]
 [ 525  934 1052  251 1028  158  936  818  742  298]
 [ 394 1154   99  374  546  316  340  429  815  764]
 [ 781   20  955  964  975  388  136  476  748  345]
 [ 501  907  597  891  360   47  374  284 1099  483]
 [ 918  127  250 1189  723  674  445  424  745  825]
 [ 771  833  178  752  486 1002  438   66  359  926]
 [ 998  204  562  800 1044  237 1014  347  879  571]
 [ 175  609  733  824  745  947   91 1085   36  920]
 [1180  221  425  260 1096   24  127  138  905   24]]
[[ True  True  True  True  True  True  True  True  True False]
 [ True  True  True  True  True  True  True  True  True  True]
 [ True  True False  True  True  True  True  True  True  True]
 [ True False  True  True  True  True False  True  True  True]
 [ True  True  True  True  True False  True  True  True  True]
 [ True False  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True False  True  True]
 [ True  True  True  True  T

### Array Arithmetic and Operations

It is generally easy to perform mathematical operations on arrays as demonstrated below. In all cases, the same operation is applied to all elements in the array. 

In [14]:
arr14 = np.random.randint(1, 1200, 25)
arr14b = arr14.reshape(5, 5)
print(arr14b)
print(arr14b+21)
print(arr14b-52)
print(arr14b*2)
print(arr14b/3)
print(arr14b**2)

[[ 944   81  566   23 1175]
 [ 404  569  318  998  924]
 [ 436  574 1080  955  743]
 [ 147  346 1008   37  196]
 [   6  568  451  494  923]]
[[ 965  102  587   44 1196]
 [ 425  590  339 1019  945]
 [ 457  595 1101  976  764]
 [ 168  367 1029   58  217]
 [  27  589  472  515  944]]
[[ 892   29  514  -29 1123]
 [ 352  517  266  946  872]
 [ 384  522 1028  903  691]
 [  95  294  956  -15  144]
 [ -46  516  399  442  871]]
[[1888  162 1132   46 2350]
 [ 808 1138  636 1996 1848]
 [ 872 1148 2160 1910 1486]
 [ 294  692 2016   74  392]
 [  12 1136  902  988 1846]]
[[314.66666667  27.         188.66666667   7.66666667 391.66666667]
 [134.66666667 189.66666667 106.         332.66666667 308.        ]
 [145.33333333 191.33333333 360.         318.33333333 247.66666667]
 [ 49.         115.33333333 336.          12.33333333  65.33333333]
 [  2.         189.33333333 150.33333333 164.66666667 307.66666667]]
[[ 891136    6561  320356     529 1380625]
 [ 163216  323761  101124  996004  853776]
 [ 190096

It is also possible to perform mathematical operations on sets of arrays as long as they have the same shape. In such cases, elements are matched based on having the same position within the array. 

In [15]:
arr14 = np.random.randint(1, 1200, 25)
arr14b = arr14.reshape(5, 5)
print(arr14b)
print(arr14b+arr14b)
print(arr14b-arr14b)

[[ 356  711  125  835   67]
 [ 949  238  916    4 1151]
 [ 935  560  270  760   52]
 [1174  158   57  122  841]
 [  92  214  231  760  966]]
[[ 712 1422  250 1670  134]
 [1898  476 1832    8 2302]
 [1870 1120  540 1520  104]
 [2348  316  114  244 1682]
 [ 184  428  462 1520 1932]]
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]


To summarize the results from above it is possible to:

1. Perform mathematical operations between an array with any shape and a scalar (i.e., single value)
2. Perform mathematical operations between arrays that have the same shape 

### Broadcasting

A tecnique to perform another mathematical operation, describes how NumPy treats arrays with different shapes during arithmetic operations.

The following rules summarize when broadcasting can be used and how.

1. **If two arrays have a different number of dimensions**, the shape of the array with fewer dimensions is padded with ones on its leading side (for example, to multiply an array of shape (3) by an array of shape (3,3), the first array must be converted to shape (1,3)).

3. **If the shape of the arrays does not match in a dimension**, the array with shape equal to 1 in that dimension is stretched to match the other shape. 

4. **If in any dimension the sizes disagree and neither is equal to 1**, an error is raised.

In the example below, an array of shape (6, 6) is multiplied by an array of shape (6). This requires that the second array be broadcasted to a shape of (1, 6)

For more information visit: https://numpy.org/doc/stable/user/basics.broadcasting.html

![image.png](attachment:image.png)

The simplest example of broadcasting, the scalar b is stretched to become an array of same shape as a so the shapes are compatible for element-by-element multiplication.

Another example
![image.png](attachment:image-2.png)

In [16]:
arr1 = np.random.randint(1, 100, 36)
arr1b = arr1.reshape(6, 6)

arr2 = np.ones((6))
arr2[:] = 2

print(arr1b)
print(arr2)
print(arr1b*arr2)

[[26 33 76 25 82 12]
 [86 64 59 50 40 78]
 [88 33 21 53 12 78]
 [51  8 19 72 41 59]
 [69 94 69  6 66 44]
 [90 21 50 45 76  5]]
[2. 2. 2. 2. 2. 2.]
[[ 52.  66. 152.  50. 164.  24.]
 [172. 128. 118. 100.  80. 156.]
 [176.  66.  42. 106.  24. 156.]
 [102.  16.  38. 144.  82. 118.]
 [138. 188. 138.  12. 132.  88.]
 [180.  42. 100.  90. 152.  10.]]


NumPy provides mathematical functions and methods for performing common tasks. The last block of code below provides some examples. 

In [17]:
arr14 = np.random.randint(1, 1200, 25)
arr14b = arr14.reshape(5, 5)

print(np.max(arr14b))
print(np.min(arr14b))
print(np.sqrt(arr14b))

1168
1
[[25.47547841 30.61045573  9.16515139 22.86919325 18.11077028]
 [24.77902339 26.90724809 28.74021573 32.80243893 15.32970972]
 [29.35983651 34.17601498 33.86738844 22.737634   30.96772513]
 [19.26136028  1.         24.8394847  31.89043744 10.81665383]
 [17.4642492   7.         11.         26.66458325 31.22498999]]


## NumPy Cheat Sheet

Here we have general overview of the main funcions, tasks and methods you probably will need when working with NumPy

![image.png](attachment:image.png)

## Conclusions 

As mentioned above, NumPy is central to using Python for analyzing data. So, an understanding of NumPy is important for data and geospatial data scientists. Additional libraries and modules make use of NumPy to expand Python's data science functionalities. In the next notebook, you will explore one of these libraries: **Pandas**.