# **Introduction to Python**
Source:
- https://training.digitalearthafrica.org/en/latest/python_basics/
- https://www.learnpython.org/

## **1 Introduction to Google Colab and Python basics**

This is text cell.

You can run cells using Shift+Enter or by pressing "play" button |&#9658;|.

In [None]:
# This is code cell
# You can run it using Shift+Enter or by pressing "play" button

print("Hello world!")

# Use the hash symbol # to comment lines of your code
# Use comments to make notes about what your code is doing

#### **Mathematical computation**
We can do mathematical computation using Python.

In [None]:
result = 999 + 1

# To display output, we don't need to use print when it is the last command in the cell
result

#### **Running cells**
The `[ ]:` symbol to the left of each code cell describes the state of the cell:

* `[ ]:` means that the cell has not been run yet.
* `[*]:` means that the cell is currently running.
* `[1]:` means that the cell has finished running and was the first cell run.

Sometimes, the code that you run in a cell takes a while to compute because it is loading a large dataset or performing complex computations. Before the cell is run you'll see the symbol `[ ]` meaning that cell has not been executed yet. While a cell is running it shows the |&#9658;| symbol with circle spinning around and once it has completed running you'll see |&#9658;| symbol without circle spinning around or a number representing the number of cells being run, for example `[4]`. This allows you to keep track of the cells that have been run and their relative order.

Consider this Python program. Here, we have decided to call our variable `a`.

```python
a = 1
print(a)
a = a + 1
print(a)
```

We can break down this program and execute each line using separate cells.

> **Note:** Python is case sensitive. The variable `a` is not the same as `A`.

In [None]:
a = 1

In [None]:
print(a)

In [None]:
a = a + 1

In [None]:
print(a)

If you run again the first `a` cell you'll see that it returns `2`, the updated value. This is called **global state**, and means that once a variable is declared it is accessible anywhere in the notebook, even in cells above where it has been declared. This is different to traditional programs which execute sequentially line by line from the top to the bottom. This can be confusing in the beginning, but keep an eye on the number in the brackets `[ ]` to see cell execution order.

This also means you can modify all the code in the cells and run them as many times as you want in any order. You can jump back and forth to update variables or re-run analysis.

### **Exercises 1**

### 1.1 Fill the asterisk line with your name and run the cell.

In [None]:
# Fill the ****** space with your name and run the cell.

message = "My name is ******"

message

### 1.2 You can add new cells to insert new code at any point in a notebook. Click on the `+` icon in the top menu to add a new cell below the current one. Add a new cell below the next cell, and use it to print the value of variable `a`.

In [None]:
a = 365*24

# Add a new cell just below this one. Use it to print the value of variable `a`

> **Question:** Now what happens if you scroll back  up the notebook and execute a different cell containing `print(a)`?

### **Python basics**
Learn Python using webpage: https://www.learnpython.org/

Go through the **Learn the Basics**:

<img src='https://drive.google.com/uc?id=1pHXvCA17_wpPrcWM-O3aSj-0vq0NPjpW'>



## **2 Introduction to Numpy**

In order to be able to use numpy we need to import the numpy library using the special word `import`. To avoid typing `numpy` every time we want to use one of its functions, we can provide an alias using the special word `as`. We will nickname numpy as `np`:

In [None]:
import numpy as np

>**Note:** If we do not `import numpy`, we cannot use any of the numpy functions. If you forget to import packages, you may get an error that says `name is not defined`.

Now, we have access to all the functions available in `numpy` by typing `np.name_of_function`. For example, the equivalent of `1 + 1` in Python can be done in `numpy`:

In [None]:
np.add(1,1)

By default the result of a function or operation is shown underneath the cell containing the code. If we want to reuse this result for a later operation we can assign it to a variable. For instance, let us call the variable `a`:

In [None]:
a = np.add(2,3)

We have just declared a variable `a` that holds the result of the function. We can now use of display this variable, at any point of this notebook. For example we can show its contents by typing the variable name in a new cell:

In [None]:
a

 One of numpy's core concepts is the `array`. They can hold multi-dimensional data. To declare a numpy array explicity we do:

In [None]:
arr_1 = np.array([1,2,3,4,5,6,7,8,9])
arr_1

>**Note:** The array defined above has only 1 dimension.

We can see the shape of array using `.shape`

In [None]:
arr_1.shape

This is 2-dimensional array.

In [None]:
arr_2 = np.array([[1, 2, 3], [4, 5, 6]])
print("Shape:", arr_2.shape)
arr_2

Most of the functions and operations defined in numpy can be applied to arrays. For example, with the previous `add` operation:

In [None]:
arr1 = np.array([1,2,3,4])
arr2 = np.array([3,4,5,6])

np.add(arr1, arr2)

We can also add arrays using the following convenient notation:

In [None]:
arr1 + arr2

### **Moving in array**
Arrays can be sliced and diced. We can get subsets of the arrays using the indexing notation which is `[ start : end : stride ]`. Let's see what this means:

In [None]:
arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

print(arr[5])
print(arr[5:])
print(arr[:5])
print(arr[::2])

Experiment playing with the indexes to understand the meaning of `start`, `end` and `stride`. What happens if you don't specify a start? What value does numpy use instead?

> **Note:** Numpy indexes start on `0`, the same convention used in Python lists.

Indexes can also be negative, meaning that you start counting by the end. For example, to select the last 2 elements in an array we can do:

In [None]:
arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

arr[-2:]

Numpy arrays can have multiple dimensions. Dimensions are indicated using nested square brackets `[ ]`. The convention in numpy is that the outer `[ ]` represent the first dimension and the innermost `[ ]` contains the last dimension.

<img src="https://training.digitalearthafrica.org/en/latest/_images/numpy_array_t.png" alt="drawing" width="600" align="left"/>

### **2-dimensional array**
The following cell declares a 2-dimensional array with shape (1, 9).

> **Tip:** Notice the nested (double) square brackets `[[ ]]`. As there are two brackets, this indicates the array is 2-dimensional.

In [None]:
np.array([[1,2,3,4,5,6,7,8,9]])

To visualise the shape (dimensions) of a numpy array we can add the suffix `.shape` to an array expression or variable containing a numpy array.

In [None]:
arr1 = np.array([1,2,3,4,5,6,7,8,9])
arr2 = np.array([[1,2,3,4,5,6,7,8,9]])
arr3 = np.array([[1],[2],[3],[4],[5],[6],[7],[8],[9]])

arr1.shape, arr2.shape, arr3.shape, np.array([1,2,3]).shape

### **Python Dynamic Types**
Numpy arrays can contain numerical values of different types. These types can be divided in these groups:

 * Integers
    * Unsigned
        * 8 bits: `uint8`
        * 16 bits: `uint16`
        * 32 bits: `uint32`
        * 64 bits: `uint64`
    * Signed
        * 8 bits: `int8`
        * 16 bits: `int16`
        * 32 bits: `int32`
        * 64 bits: `int64`

* Floats
    * 32 bits: `float32`
    * 64 bits: `float64`
    
We can look up the type of an array by using the `.dtype` suffix.

In [None]:
arr = np.ones((10,10,10))

arr.dtype

Numpy arrays normally store numeric values but they can also contain boolean values, `'bool'`. Boolean is a data type that can have two possible values: `True`, or `False`. For example:

In [None]:
arr = np.array([True, False, True])

arr, arr.shape, arr.dtype

We can operate with boolean arrays using the numpy functions for performing logical operations such as `and` and `or`.

In [None]:
arr1 = np.array([True, True, False, False])
arr2 = np.array([True, False, True, False])

print(np.logical_and(arr1, arr2))
print(np.logical_or(arr1, arr2))

These operations are conveniently offered by numpy with the symbols `*` (`and`), and `+` (`or`).

> **Note:** Here the `*` and `+` symbols are not performing multiplication and addition as with numerical arrays. Numpy detects the type of the arrays involved in the operation and changes the behaviour of these operators.

In [None]:
print(arr1 * arr2)
print(arr1 + arr2)

Boolean arrays are often the result of comparing a numerical arrays with certain values. This is sometimes useful to detect values that are equal, below or above a number in a numpy array. For example, if we want to know which values in an array are equal to 1, and the values that are greater than 2 we can do:

In [None]:
arr = np.array([1, 3, 5, 1, 6, 3, 1, 5, 7, 1])

print(arr == 1)
print(arr > 2)

You can use a boolean array to mask out `False` values from a numeric array. The returned array only contains the numeric values which are at the same index as `True` values in the `mask` array.

In [None]:
arr = np.array([1,2,3,4,5,6,7,8,9])
mask = np.array([True,False,True,False,True,False,True,False,True])

arr[mask]

### **Exercises 2**

### 2.1 Use the numpy `add` function to add the values `34` and `29` in the cell below.

In [None]:
# Use numpy add function to add 34 and 29



### 2.2 Declare a new array with contents [5,4,3,2,1] and slice it to select the last 3 items.

In [None]:
# Substitute the ? symbols by the correct expressions and values

# Declare the array

arr = ?

# Slice array for the last 3 items only

arr[?:?]

### 2.3 Select all the elements in the array below excluding the last one, `[15]`.

In [None]:
arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

# Substitute the ? symbols by the correct expressions and values

arr[?]

### 2.4 Use `arr` as defined in 2.3. Exclude the last element from the list, but now only select every 3rd element. Remember the third index indicates `stride`, if used.
> **Hint:** The result should be `[0,3,6,9,12]`.

In [None]:
# Substitute the ? symbols by the correct expressions and values

arr[?:?:?]

### 2.5 You'll need to combine array comparisons and logical operators to solve this one. Find out the values in the following array that are greater than `3` AND less than `7`. The output should be a boolean array.
> **Hint:** If you are stuck, reread the section on boolean arrays.

In [None]:
arr = np.array([1, 3, 5, 1, 6, 3, 1, 5, 7, 1])

# Use array comparisons (<, >, etc.) and logical operators (*, +) to find where
# the values are greater than 3 and less than 7.

boolean_array = ?

### 2.6 Use your boolean array from 2.5 to mask the `False` values from `arr`.
> **Hint:** The result should be `[5, 6, 5]`.

In [None]:
# Use your resulting boolean_array array from 2.5
# to mask arr as defined in 2.5



## **3 Introduction to Matplotlib**

We wil use part of matplotlib called `pyplot`. We can import pyplot by specifying it comes from matplotlib. We will abbreviate `pyplot` to `plt`.

In [None]:
import numpy as np
from matplotlib import pyplot as plt

### **Plot images**

Images are 2-dimensional arrays containing pixels. Therefore, we can use 2-dimensional arrays to represent image data and visualise with matplotlib.

In the example below, we will use the numpy `arange` function to generate a 1-dimensional array filled with elements from `0` to `99`, and then reshape it into a 2-dimensional array using `reshape`.

In [None]:
arr = np.arange(100).reshape(10,10)

print(arr)

plt.imshow(arr, cmap="gray")

In numpy section, we have learned to address regions of a numpy array using the square bracket `[ ]` index notation. For multi-dimensional arrays we can use a comma `,` to distinguish between axes.

```python
[ first dimension, second dimension, third dimension, etc. ]
```

As before, we use colons `:` to denote `[ start : end : stride ]`. We can do this for each dimension.

For example, we can update the values on the left part of this array to be equal to `1`.

In [None]:
arr = np.arange(100).reshape(10,10)
arr[:, :5] = 1
print(arr)

plt.imshow(arr, cmap="gray")

The indexes in the square brackets of `arr[:, :5]` can be broken down like this:

```python
[ 1st dimension start : 1st dimension end, 2nd dimension start : 2nd dimension end ]
```

Dimensions are separated by the comma `,`. Our first dimension is the vertical axis, and the second dimension is the horizontal axis. Their spans are marked by the colon `:`. Therefore:

```python
[ Vertical start : Vertical end, Horizontal start : Horizontal end ]
```

If there are no indexes entered, then the array will take all values. This means `[:, :5]` gives:

```python
[ Vertical start : Vertical end, Horizontal start : Horizontal start + 5 ]
```

Therefore the array index selected the first 5 pixels along the width, at all vertical values.

We can also use stride, which is the third value `[ start : end : stride ]`.

In [None]:
arr = np.arange(100).reshape(10,10)
arr[:, ::2] = 1
print(arr)

plt.imshow(arr, cmap="gray")

Now let's see what that looks like on an actual image.

> **Tip**: Ensure you uploaded the file `Guinea_Bissau.JPG` to your folder along with the tutorial notebook. We will be using this file in the next few steps and exercises. Download it from webpage: https://training.digitalearthafrica.org/en/latest/_static/python_basics/Guinea_Bissau.JPG

We can use the pyplot library to load an image using the matplotlib function `imread`. `imread` reads in an image file as a 3-dimensional numpy array. This makes it easy to manipulate the array.

By convention, the first dimension corresponds to the vertical axis, the second to the horizontal axis and the third are the Red, Green and Blue channels of the image. Red-green-blue channels conventionally take on values from 0 to 255.

In [None]:
im = np.copy(plt.imread('Guinea_Bissau.jpg'))

# This file path (red text) indicates 'Guinea_Bissau.JPG' is in the
# same folder as the tutorial notebook. If you have moved or
# renamed the file, the file path must be edited to match.

im.shape

In [None]:
# another option to load image from google drive

#from google.colab import drive
#drive.mount('/content/drive/')
#%cd "/content/drive/My Drive/Colab Notebooks/Neuronove siete"
#im = np.copy(plt.imread('Guinea_Bissau.jpg'))
#im.shape

`Guinea_Bissau.JPG` is an image of Rio Baboque in Guinea-Bissau in 2018. It has been generated from Landsat 8 satellite data.

The results of the above cell show that the image is 590 pixels tall, 602 pixels wide, and has 3 channels. The three channels are red, green, and blue (in that order).

<img src="https://www.any-lamp.co.uk/media/wysiwyg/Blog/RGB_foto_1.webp" alt="drawing" width="200" align="rigth"/>

Let's display this image using the pyplot `imshow` function.

In [None]:
plt.imshow(im)

### **Plot graphs**
We can plot graphs using `plot` function.

In [None]:
xpoints = np.arange(0, 6)
ypoints = np.arange(5, 35, 5) # we can use stride in np.arange (third value)

print("x =",xpoints)
print("y =",ypoints)

plt.plot(xpoints, ypoints)

We will change one value in y array and see the graph.

In [None]:
xpoints = np.arange(0, 6)
ypoints = np.arange(5, 35, 5) # we can use stride in np.arange (third value)
ypoints[3] = 2

print("x =",xpoints)
print("y =",ypoints)

plt.plot(xpoints, ypoints)

We can use points instead of line, when we add `'o'` parameter to `plot` function.

In [None]:
xpoints = np.arange(0, 6)
ypoints = np.arange(5, 35, 5)
ypoints[3] = 2

print("x =",xpoints)
print("y =",ypoints)

plt.plot(xpoints, ypoints, 'o')

We can change the point marker.

In [None]:
xpoints = np.arange(0, 6)
ypoints = np.arange(5, 35, 5)
ypoints[3] = 2

print("x =",xpoints)
print("y =",ypoints)

plt.plot(xpoints, ypoints, 'o', marker='*')

We can add lables to axis using `xlabel` and `ylabel` and title using `title`.

In [None]:
xpoints = np.arange(0, 6)
ypoints = np.arange(5, 35, 5)
ypoints[3] = 2

plt.plot(xpoints, ypoints, 'o-', marker='*')

plt.xlabel("X axis")
plt.ylabel("Y axis")
plt.title("My first graph")

print("x =",xpoints)
print("y =",ypoints)

### **Exercises 3**

### 3.1 Let's use the indexing functionality of numpy to select a portion of this image. Select the top-right corner of this image with shape `(200,200)`.
> **Hint:** Remember there are three dimensions in this image. Colons separate spans, and commas separate dimensions.

In [None]:
# We already defined im above, but if you have not,
# you can un-comment and run the next line

# im = np.copy(plt.imread('Guinea_Bissau.JPG'))

# Fill in the question marks with the correct indexes

topright = im[?,?,?]

# Plot your result using imshow

plt.imshow(topright)

If you have selected the correct corner, there should be not much water in it!

### 3.2 Let's have a look at one of the pixels in this image. We choose the top-left corner with position `(0,0)` and show the values of its RGB channels.

In [None]:
# Run this cell to see the colour channel values

im[0,0]

The first value corresponds to the red component, the second to the green and the third to the blue. `uint8` can contain values in the range `[0-255]` so the pixel has a lot of red, some green, and not much blue. This pixel is a orange-yellow sandy colour.

Now let's modify the image.

### What happens if we set all the values representing the blue channel to the maximum value?

In [None]:
# Run this cell to set all blue channel values to 255
# We first make a copy to avoid modifying the original image

im2 = np.copy(im)

im2[:,:,2] = 255

plt.imshow(im2)

> The index notation `[:,:,2]` is selecting pixels at all heights and all widths, but only the 3rd colour channel.

### Can you modify the above code cell to set all red values to the maximum value of `255`?

### 3.3 You have graph from the first part.

In [None]:
xpoints = np.arange(0, 6)
ypoints = np.arange(5, 35, 5)
ypoints[3] = 2

print("x =",xpoints)
print("y =",ypoints)

plt.plot(xpoints, ypoints, 'o')

Explore possible graph modifications on the webpage: https://www.w3schools.com/python/matplotlib_markers.asp


Modify graph to show green line and yellow triangles with size 15.

In [None]:
plt.plot(xpoints, ypoints, ???)

### Explore more graph modifications.

## **4 Numpy dictionaries and categorical data**

We will introduce a numpy structure called a **dictionary**.

A dictionary represents a mapping between **keys** and **values**. The keys and values are Python objects of any type. We declare a dictionary using curly braces `{}`. Inside we specify the key then its associated value, with the keys and values separated by a colon `:`. Commas `,` are used to separate elements in the dictionary.

```python
dictionary_name = {key1: value1, key2: value2, key3: value3}
```

For example:

In [None]:
d = {1: 'one',
     2: 'two',
     3: 'apple'}
d

In the above dictionary `d`, we have three **keys** `1`, `2`, `3`, and their respective **values** `'one'`, `'two'` and `'apple'`.

We can look up elements in a dictionary using the `[ key_name ]` to address the value stored under a key. The syntax looks like:

```python
dictionary_name[key_name]
```

In our example dictionary `d` above, we can call upon the value associated with the key name `1` like so:

```python
d[1]
```

In [None]:
print(d[1], " + ", d[2], " = ", d[3])

Elements in a dictionary can be modified or new elements added by using the `dictionary_name[key_name] = value` syntax.

In [None]:
d[3] = 'three'
d[4] = 'four'

print(d[1], " + ", d[2], " = ", d[3])

Dictionary keys doesn't have to be numbers.

In [None]:
d2 = {'apple': 'red',
     'banana': 'yellow',
     'pear'  : 'green'}
d2

In [None]:
d2['banana']

### **Categorical values**

Dictionaries are useful for data analysis because they make it easy to assign **categorical values** to our dataset.

As an example, the following cells simulate a very simple image containing three different land cover types. Value `1` represents area covered with grass, `2` croplands and `3` city.

First, we import the libraries we want to use.

In [None]:
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import colors

We will now create a 2-dimensional 100 pixel x 100 pixel numpy array where every value is `1`. This is done using the `numpy.ones` function. Then, we use array indexing to assign part of the area to have the value `2`, and another part to have the value `3`.

In [None]:
# grass = 1
area = np.ones((100,100))

# crops = 2
area[10:60,20:50] = 2

# city = 3
area[70:90,60:80] = 3

area.shape, area.dtype

In [None]:
area

We now have a matrix filled with 1s, 2s and 3s. At this point, there is no association between the numbers and the different types of ground cover.

If we want to show what the area looks like according to the grass/crops/city designation, we might want to give each of the classifications a colour.

In [None]:
# We map the values to colours
index = {1: 'green', 2: 'yellow', 3: 'grey'}

# Create a discrete colour map
cmap = colors.ListedColormap(index.values())

# Plot
plt.imshow(area, cmap=cmap)

In the case above, every pixel had a value of either `1`, `2` or `3`. What happens if our dataset is incomplete and there is no data in some places?

This is a common problem in real-life datasets. Real datasets can be incomplete and may be missing data at certain times or places. To deal with this, we use the special value known as `NaN`, which stands for **Not a Number**.

`NaNs` are designated by the numpy `np.nan` function.

In [None]:
arr = np.array([1,2,3,4,5,np.nan,7,8,9], dtype=np.float32)

arr

To compute statistics on arrays containing NaN values, numpy has special versions of common functions such as `mean`, standard deviation `std`, and `sum` that ignore the `NaN` values. For example, the next cell shows the difference between using the usual `mean` function and the `nanmean` function.

The `mean` function cannot handle `NaN` values so it will return `nan`. The `nanmean` function does not include `NaN` values in the calculation, and therefore returns a number value.

In [None]:
print(np.mean(arr))

print(np.nanmean(arr))

Note that `NaN` is generally not used as a key in dictionary key-value entries because there are different ways of expressing `NaN` in Python and they are not always equivalent. However, it is still possible to visualise data with `NaNs`; there will be gaps in the image where there is no data.

### **Exercises 4**

### 4.1 The harvesting season has arrived and our cropping lands have changed colour to brown. Can you:

#### 4.1.1 Modify the yellow area to contain the new value `4`?
#### 4.1.2 Add a new entry to the `index` dictionary mapping number `4` to the value `brown`.
#### 4.1.3 Plot the area.

In [None]:
# 4.1.1 Modify the yellow area to hold the value 4


In [None]:
# 4.1.2 Add a new key-value pair to index that maps 4 to 'brown'


In [None]:
# 4.1.3 Copy the cmap definition and re-run it to add the new colour

# Plot the area


> **Hint:** If you want to plot the new area, you have to redefine `cmap` so the new value is assigned a colour in the colour map. Copy and paste the `cmap = ...` line from the original plot.

### 4.2 Set `area[20:40, 80:95] = np.nan`. Plot the area now.

In [None]:
# Set the nan area


In [None]:
# Plot the entire area


### 4.3 Find the median of the `area` array from 4.2 using `np.nanmedian`. Does this match your visual interpretation? How does this compare to using `np.median`?

In [None]:
# Use np.nanmedian to find the median of the area
