# Lesson 2: Basic elements of Python


In this lesson we will learn how data can be stored in Python lists, some useful ways of using and modifying Python lists, and how to make different data types work together in Python.


---


## General information
>
>### Sources
>
>This lesson is inspired by the [Geo-python module at the University of Helsinki](https://geo-python-site.readthedocs.io/en/latest/course-info/course-info.html) which in turn acknowledges the [Programming in Python lessons](http://swcarpentry.github.io/python-novice-inflammation/) from the [Software Carpentry organization](http://software-carpentry.org). This version was adapted for Colab and a UK context by Ruth Hamilton.
>
>### About this document
>
>This is a [Google Colab Notebook](https://colab.research.google.com/?utm_source=scs-index). This particular notebook is designed to introduce you to a few of the basic concepts of programming in Python. Like other common notebook formats (e.g. [Jupyter](http://jupyterlab.readthedocs.io/en/stable/) ), the contents of this document are divided into cells, which can contain:
>
> *   Markdown-formatted text,
> *   Python code, or
> *   raw text
>
> You can execute a snippet of code in a cell by pressing **Shift-Enter** or by pressing the **Run Cell** button that appears when your cursor is on the cell .


> **Note**: There are some Python cells in this notebook that *already* contain code. You just need to press **Shift**-**Enter** to run those cells. We're trying to avoid having you race to keep up typing in basic things for the lesson so you can focus on the main points :D.



---



## Lists and indices

We saw a bit about variables and their values in the previous lesson, and we continue today with some variables related to [UK Met Office weather stations](https://www.metoffice.gov.uk/research/climate/maps-and-data/uk-synoptic-and-climate-stations) on the island of Jersey. Rather than having individual variables for each of those stations as we have previously, we can store many related values in a *collection*. The simplest type of collection in Python is a
[`list`](https://www.w3schools.com/python/python_lists.asp).


### Creating a list

Let’s first create a **list** of selected `station_name` values and print it to the screen.


In [None]:
station_names = ['Jersey Airport', 'Jersey Gorey Castle', 'Jersey st Helier', 'Jersey Trinity, States Farm']

In [None]:
#print the station_names list to the screen
print(station_names)

We can also check the type of the `station_names` list using the `type()` function.

In [None]:
type(station_names)

Here we have a list of 4 `station_name` values in a list called `station_names`. As you can see, the `type()` function recognizes this as a list.

---





>**Important note:**
>*Lists* are created using the square brackets `[` and `]`, with commas separating the values in the list.

---

### Index values

To access an individual value in the list we need to use an `index` value. An index value is a number that refers to a given position in the list.
>**TASK** Print out the first value in our list, `station_names[1]`:

In [None]:
station_names[1]

Wait, what? This is the second value in the list we’ve created, what is wrong? As it turns out, Python (and many other programming languages) start values stored in collections with the index value `0`. Thus, to get the value for the first item in the list, we must use index `0`. Let's print out the value at index `0` of `station_names` below.

In [None]:

station_names[0]

OK, that makes sense, but it may take some getting used to...

---

>This often referred to as **zero-based** indexing. If you want to get into the detail, have a look at [this article](https://albertkoz.com/why-does-array-start-with-index-0-65ffc07cbce8) or, for an even more detailed mathematical discussion, [this one](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html).

> **Note:** In contrast to python, **R** is an example of a language that uses **one-based indexing**

---

### Number of items in a list

We can find the *length* of a list using the `len()` function.
>**TASK** Use the `len()` function to check the length of the `station_names` list.

In [None]:
#Use the len() function to check the length of the station_names list.

Just as expected, there are 4 values in our list and `len(station_names)` returns a value of `4`.

### Working with index values and lists

If we know the length of the list, we can now use it to find the value of the last item in the list, right?
>**TASK** What happens if you print the value from the `station_names` list at index `4`, the value of the length of the list?

In [None]:
station_names[4]

An `IndexError`? That’s right, since our list starts with index `0` and has 4 values, the index of the last item in the list is `len(station_names) - 1` or `3` for our `station_names` list, That isn’t ideal, but fortunately there’s a nice trick in Python to find the last item in a list.

>**TASK** Print the `station_names` list to remind us of the values that are in it.

In [None]:
print(station_names)

To find the value at the end of the list, we can print the value at index `-1`. To go further up the list in reverse, we can simply use larger negative numbers, such as index `-4`.
>**TASK** Print out the values at the index values `-1` and `-2` below.

In [None]:
station_names[-1]

In [None]:
station_names[-2]

Yes, in Python you can go backwards through lists by using negative index values. Index `-1` gives the last value in the list and index `-len(station_names)` would give the first. Of course, you still need to keep the index values within their ranges.
>**TASK** What happens if you check the value of the `station_names` list at index `-5`?

In [None]:
#What is the value of station_names at index -5?


### Modifying list values

Another nice feature of lists is that they are *mutable*, meaning that the values in a list that has already been defined can be modified. Here, we define a list of the observation station *type*, corresponding to the station names in the `station_names` list.

In [None]:
station_types = ['Automatic', 'Manual', 'Manual', 'Manual']
print(station_types)

>**TASK** Change the value for `station_types[2]` to be `'Automatic'` and print out the `station_types` list again.

In [None]:
station_types[2]='Automatic'
print(station_types)


### Data types in lists

Lists can also store more than one type of data. Let’s consider that in addition to having a list of each station name,  we would like to have a list of all of the variables for a particular station, ‘Jersey Airport’. Before we create this list we need to define a few variables related to the Jersey Airport station.

> Variables: `Station number, Station name, Country, Latitude, Longitude, Station type`

> Values: `401, Guernsey Airport, Channel Islands, 49.432, -2.598, Automatic`

In [None]:
station_name = 'Jersey Airport'

In [None]:
station_id = 401

In [None]:
station_lat = 49.432

In [None]:
station_lon = -2.093

In [None]:
station_type = 'Automatic'

Now that we have defined some of the Jersey Airport variables we can create the Jersey Airport list.

In [None]:
station_jer_airport = [station_id, station_name,  station_lat, station_lon, station_type]
print(station_jer_airport)

Here we have one list with 3 different types of data in it. We can confirm this using the `type()` function. Let's check the type of `station_jer_airport`, then the types of the values at indices `0-2` in the cells below.

In [None]:
type(station_jer_airport)

In [None]:
type(station_jer_airport[0])

In [None]:
type(station_jer_airport[1])

In [None]:
type(station_jer_airport[2])

This shows us that the *list* `station_jer_airport` contains integer, text and float data types.

### Adding and removing values from lists

Finally, we can add and remove values from lists to change their lengths. Let’s consider that we no longer want to include the first value in the `station_names` list. Since we haven't see that list in a bit, let's first print it to the screen.

In [None]:
print(station_names)

`del` allows values in lists to be removed. It can also be used to delete values from memory in Python. To remove the first value from the `station_names` list, we can simply type `del station_names[0]`. If you then print out the `station_names` list, you should see the first value has been removed.

In [None]:
del station_names[0]

In [None]:
print(station_names)



---

**Note** Deleting the first item in the list, changes the indexing of the rest of the list; so the item  `Jersey Gorey Castle` is now in `station_names[0]`.

---


If we would instead like to add a few samples to the `station_names` list, we can type `station_names.append('List item to add')`, where `'List item to add'` would be the text that would be added to the list in this example. Let's add two more Channel Island weather stations to our list in the cells below: `'Guernsey Airport'` and `'Scilly St Marys Airport'`. After doing this, let's check the list contents by printing to the screen.

In [None]:
station_names.append("Guernsey Airport")
station_names.append("Scilly St Marys Airport")

In [None]:
print(station_names)

As you can see, we add values one at a time using `station_names.append()`.

---

**Important note:**

`.append()` is called a **method** in Python. A **method** is a function that is available for a given object (in this case our list `station_names`) because of the object's type (in this case a *list*). We’ll see some other examples of useful list methods below.

---

### Appending to an integer? Not so fast...

Let’s consider our list `station_names`. As we know, we already have data in the list `station_names`, and we can modify that data using built-in methods such as `station_names.append()`. In this case, the method `append()` is something that exists for lists, but not for other data types. It is intuitive that you might like to add (or append) things to a list, but perhaps it does not make sense to append to other data types. Below, let's create a variable `station_name_length` that we can use to store the length of the list `station_names`. We can then print the value of `station_name_length` to confirm the length is correct.

In [None]:
station_name_length=len(station_names)

In [None]:
print(station_name_length)

If we check the data type of `station_name_length`, we can see it is an integer value, as expected (do that below). What happens if you try to append the value `1` to `station_name_length`?

In [None]:
type(station_name_length)

In [None]:
station_name_length.append(1)

Here we get an `AttributeError` because there is no method built in to the `int` data type to append to `int` data. While `append()` makes sense for `list` data, it is not sensible for `int` data, which is the reason no such method exists for `int` data.

###Appending a list to a list
What happens if we want to add a *list* to the end of another list? Try it below:

In [None]:
new_stations=['Heathrow airport', 'Wick Airport']
station_names.append(new_stations)

What is the last element in `station_names` now?
>**TASK** In the cell below use your knowledge of list indexing to print it to the screen

In [None]:
#print the value of the last element in station_names
station_names[-1]

Did you see what you were expecting to see? Hopefully not! What the `.append()` method has done is added the new_stations list as an element to `station_names', this means when we look at the last element, we see a *list* rather than the value 'Wick Airport'.

If we we want to add the elements of one list to the end of another list, we need to use the method `.extend(<list>)` instead of `.append()`. Try this below:

In [None]:
station_names.extend(new_stations)
print(station_names)
print(station_names[-1])

The list has now been *extended* and has 'Heathrow' and 'Wick' added to the end - but it sill has the `['Heathrow airport', 'Wick Airport']` list as an element. We need to remove it.

>**TASK** Remove the `['Heathrow airport', 'Wick Airport']` element from the `station_names` list. **Tip:** We used a command [earlier](#scrollTo=1vkm8g8odqmO&line=1&uniqifier=1) to remove elements from a list - and remember to take into account 0-based indexing...

In [None]:
#Remove the ['Heathrow airport', 'Wick Airport'] element form the station_names list

Your list **should** read:
```
['Jersey Gorey Castle', 'Jersey st Helier', 'Jersey Trinity, States Farm', 'Guernsey Airport', 'Scilly St Marys Airport', 'Heathrow airport', 'Wick Airport']
```
If it doesn't, you can copy and paste the list above to re-define the `station_names` list in the code cell below.

In [None]:
#use the list above to re-define the station_names variable

### Some other useful list methods

With lists we can do a number of useful things, such as count the number of times a value occurs in a list or where it occurs. The `list.count()` method can be used to find the number of instances of an item in a list. For instance, we can check to see how many times `'Jersey st Helier'` occurs in our list `station_names` by typing `station_names.count('Jersey st Helier')`.

In [None]:
# The count method counts the number of occurences of a value
station_names.count("Jersey st Helier")

Similarly, we can use the `list.index()` method to find the index value of a given item in a list. Let's use the cell below to find the index of `'Jersey st Helier'` in the `station_names` list.

In [None]:
# The index method gives the index value of an item in a list
station_names.index("Jersey st Helier")

The good news here is that our selected station name is only in the list once. Should we need to modify it for some reason, we also now know where it is in the list (index `2`).

The next method we want to use is `.insert()`, remember we *removed* the first element of the `station_names` list using the `del` command? We can use `.insert()` to add it back in.

The `.insert()` method takes two *arguments*, the first is the *index* where you are inserting it, the second is the *value* you are inserting.

So the line:
```
station_names.insert(1,'An Airport')
```
will insert the value 'An Airport' as the *second* element of the list (remember we are using 0-indexing); all the following members of the list will be moved along by 1.

>**TASK** use the `.insert()` method to add the value 'Jersey Airport' back into the *first* element of the list.

In [None]:
# use the .insert(index,value) method to add the text 'Jersey Airpot' into the first element of the list, sstation_names


There are two other common methods for lists that we need to see.

### Reversing a list

First, there is the `list.reverse()` method, used to reverse the order of items in a list. Let's reverse our `station_names` list below and then print the results.

In [None]:
station_names.reverse()

In [None]:
print(station_names)


Yay, it works!
>**TASK** What si the *first* element of the list now? Is it what you expected?

In [None]:
station_names[0]



---


**Caution**: A common mistake when reversing lists is to do something like `station_names = station_names.reverse()`. **Do not do this!** When reversing lists with `.reverse()` the `None` value is returned (this is why there is no screen ouput when running `station_names.reverse()`). If you then assign the output of `station_names.reverse()` to `station_names` you will reverse the list, but then overwrite its contents with the returned value `None`. This means you’ve deleted the contents of your list (!).


---



### Sorting a list

The `list.sort()` method works the same way. Let's sort our `station_names` list and print its contents below.

In [None]:
station_names.sort()  # Notice no output here...

In [None]:
print(station_names)

As you can see, the list has been sorted alphabetically using the `list.sort()` method, but there is no screen output when this occurs. Again, if you were to assign that output to `station_names` the list would get sorted, but the contents would then be assigned `None`.

---

**Note 1**: As you may have noticed, `Jersey Trinity` comes before `Jersey st Helier` in the sorted list. This is because alphabetical sorting in Python places capital letters before lowercase letters.


---



**Note 2**: To sort in *reverse* order, you need to use the flag `reverse=True`


```
station_names.sort(reverse=True)         #note `True` is case sensitive
```



---

---

**Useful Tip**:

Here is a summary of a few of the *list* methods, applied to a list `s`:

Method | Effect
--- | ---
`s.pop(i)` | return value at index `i` and delete it from the list
`s.append(x)`  | Put `x` at the end of the list
`s.insert(i, x)` | Insert `x` at index `i` in the list
`s.remove(x)` | Remove the first occurrence of `x` from the list
`s.index(x)`| Returns the index of the first occurence of `x` in the list
`s.reverse()` | Reverse the order of items in the list
`s.sort()` | Sort the order of items in the list


---

### Slicing a list

Slicing is a way of extracting a *subset* of characters or elements from any *sequential* data type including lists (a sequential dataypte is any structure that can be accessed using an *index*). Generally, a slice specifies the starting index and the ending index separated by a colon `:`. **But note that the start index is included in the slice but the end index is *excluded* from the slice.** For example, `station_names[2:4]` would select the *third* and *fourth* elements of the list `station_names`, i.e. `station_names[2]` and `station_names[3]`, only.  



In [None]:
#see how slicing works here
print(station_names)
print(station_names[2:4])

You can create a new list using *assignement* and *slicing*...

In [None]:
#creating a new list using a _slice_
jersey_stations=station_names[1:4]
print(jersey_stations)

Here is a summary of how *slicing* can extract elements from a list `s`:

Syntax | Effect
--- | ---
`s[start:stop]`  |# items `start` through `stop-1`
`s[start:]`      |# items `start` through the rest of the list
`s[:stop]`       |# items from the beginning through `stop-1`
`s[:]`           |# a copy of the whole list
`s[-1]` |   # last item in the list
`s[-2:]` |  # last two items in the list
`s[:-2]` |  # everything except the last two items


---

>**QUESTION** What is the difference between using the slice `station_names[-2]` and `station_names[-2:]`? Use the code box below if you want need to test it.

In [None]:
#What is the difference between using the slice station_names[-2] and station_names[-2:]

---

## Tuples

A **tuple** is another data structure used by Python. It is often described as a *read-only list*. In other words, **tuples** are *imutable* which means that cannot be changed. In most other ways, tuples and lists are the same.

A **tuple** is defined using **curved* brackets, `(` and `)`:


```
date_of_birth_tuple=('April',3,2007) #this is a tuple

date_of_birth_list=['April',3,2007] #this is a list
```

Run the two bits of code below to illustrate the difference.

In [None]:
#creating date of birth using a list
date_of_birth_list=['April',3,2007]
print(date_of_birth_list)
type(date_of_birth_list)
date_of_birth_list[2]=2008  #change the third element to  2008
print(date_of_birth_list)

In [None]:
#creating date of brith using a tuple
date_of_birth_tuple=('April',3,2007)
print(date_of_birth_tuple)
type(date_of_birth_tuple)
date_of_birth_tuple[2]=2008 #change the third element to 2008
print(date_of_birth_tuple)

Note how we were able to change the information in the *list* but got a `TypeError` when we tried to change the tuple.

Tuples are useful when you have data that you know won't change (like dates of birth). But you also can't *append* or *delete* information from a tuple so they are less flexible than lists.


---

## Making different data types work together

In the previous lesson we learned how to determine a variable's *data type*  and also saw some examples of which data types are compatible with one another. We also looked at how to enable data of different types to work together.

The next section reviews working with different data types and introduces **F-string formatting**

### Reminder: Data types and their compatibility

We can explore the different types of data stored in variables using the `type()` function.
Let's use the cells below to check the data types of the variables `station_name`, `station_id`, and `station_lat`.

In [None]:
type(station_name)

In [None]:
type(station_id)

In [None]:
type(station_lat)

As expected, we see that the `station_name` is a character string, the
`station_id` is an integer, and the `station_lat` is a floating point number.

---

**Hint**: Remember, the data types are important because some are not compatible with one another.

What happens when you try to add the variables `station_name` and `station_id` in the cell below?

In [None]:
station_name + station_id

Here we get a `TypeError` because Python does not know to combine a string of characters (`station_name`) with an integer value (`station_id`).

### Converting data from one type to another

It is not the case that things like the `station_name` and `station_id` cannot be combined at all, but in order to combine a character string with a number we need to perform a *data type conversion* to make them compatible. Let's convert `station_id` to a character string using the `str()` function. We can store the converted variable as `station_id_str`.

In [None]:
station_id_str = str(station_id)

We can confirm the type has changed by checking the type of `station_id_str`, or by checking the output when you type the name of the variable into a cell and run it.

In [None]:
type(station_id_str)

In [None]:
station_id_str

Notice the number is now enclosed in quotation marks. As you can see, `str()` converts a numerical value into a character string with the same numbers as before.

---

**Note**: Similar to using `str()` to convert numbers to character strings, `int
()` can be used to convert strings or floating point numbers to integers and `float()` can be used to convert strings or integers to floating point numbers.

---


### Combining text and numbers

Although most mathematical operations operate on numerical values, a common way to combine character strings is using the addition operator `+`. Let's create a text string in the variable `station_name_and_id` that is the combination of the `station_name` and `station_id` variables. Once we define `station_name_and_id`, we can print it to the screen to see the result.

In [None]:
station_name_and_id = station_name + ": " + str(station_id)

In [None]:
print(station_name_and_id)

Note that here we are converting `station_id` to a character string using the `str()` function within the assignment to the variable `station_name_and_id`. Alternatively, we could have simply added `station_name` and `station_id_str`.

## Working with text (and numbers)

The previous example demonstrated how it is possible to combine character strings and numbers (converted to character strings) together using the `+` operator. Although this approach works, it can become quite laborous and error-prone when working with more complicated sets of textual and/or numerical components. In addition, it is sometimes desirable to format the numerical output to change the number of decimal places for floating point values, for example. Hence, next we show a few useful techniques that make manipulating strings easier and more efficient.

There are three approaches that can be used to manipulate strings in Python:

1. f-strings
2. the `.format()` method
3. using the `%` operator

The f-string approach is recommended and the most modern, introduced in Python 3.6. However, since you are likely to find examples of the older approaches we also show how they work.

### f-String formatting

Here, we show how we can combine the `station_name` text, `station_id` integer number and another floating point number `temp` using Python's f-string formatting approach. In addition, we will simultaneously round the floating point number (`temp`) to two decimal places.

In [None]:
# Temperature with many decimals
temp = 18.56789876

In [None]:
# 1. The f-string approach (recommended)
info_text = f"The temperature at {station_name} station (ID: {station_id}) is {temp:.2f} Celsius."

In [None]:
print(info_text)

So, here we have managed to combined three different data types and format the floating point value in a single line. Let's break the f-string down a bit to understand how it works.

---

**F-string formatting explained**

*Adapted from the draft text of the [Introduction to Python for Geographic Data Analysis textbook by Tenkanen et al.](https://python-gis-book.readthedocs.io/en/develop/part1/chapter-02/nb/00-python-basics.html#working-with-text-and-numbers).*

The key components here are:

- The text that you want to create and/or modify is enclosed within the quotes preceded with letter `f`.
- You can include any existing variable in the text template by placing the name of the variable inside a set of curly braces `{}`.
    - Using string formatting, it is also possible to insert numbers (such as `station_id` and `temp`) into the body of text without needing first to convert the data type to a string. This is because the f-string functionality does the data type conversion for us.
- It is possible to round numbers on the fly to a specific precision, such as two decimal points as in our example by adding format specifier (`:.2f`) after the variable that we want to format.
    - The format specifier works by first adding a colon (`:`) after the variable name
    - The decimal precision can be specified by adding a dot (`.`) followed by a number that indicates the number of decimal places (two in our case)
    - The final character `f` in the format specifier defines the type of the conversion that will be conducted
        - `f` will convert the value to decimal number
        - `e` will make the number appear in scientific notation
        - `%` will convert the value to a percentage

Of the above, the most important thing is to remember to include the `f` at the start of your f-strings :).

---

### Other approaches for string formatting (not recommended)

As mentioned above, there are other approaches that can be used to format text and combine different data types. The first one is the `.format()` method. For example:

In [None]:
# 2. .format() approach (not recommended anymore)
info_text2 = "The temperature at {0} station (ID: {1}) is {2:.2f} Celsius.".format(station_name, station_id, temp)
print(info_text2)

As you can see, here we get the same result as with f-strings using the `.format()` method, which is placed after the quotes. Placeholders are inserted inside curly braces where the numbers refer to the order of the variables listed in the `.format()` function. There are other ways to use this same approach, but the example above is typical.

The last (historical) string formatting approach is to use the `%` operator. In this approach, the placeholder `%` is added within the quotes, and the variables that are inserted into the body of text are placed inside parentheses after another `%` operator, like this:

In [None]:
# 3. %-operator approach (not recommended anymore)
info_text3 = "The temperature at %s station (ID: %s) is %.2f Celsius." % (station_name, station_id, temp)
print(info_text3)

The order of the variables within the parentheses specify which `%` placeholder will receive what information and the number of variables should be exactly the same as the number of `%` placeholders within the text template.

## Why is string formatting relevant to data science?

String formatting allows you to combine an existing string with other values to create a new string. This process is important in many programming languages, including Python, as it enables you to create dynamic strings that can change based on different inputs.

As a data scientist, you might use it for inserting a title in a graph, to show a message or an error, or to pass a statement to a function. It can also be useful for keeping track of key variables as your work progresses.


### More information about formatting text and numbers

Of course, there is much more that can be done to format and interact with character strings and numbers. For more information, please have a look at the sites linked below.

- [Common string manipulation techniques from *Introduction to Python for Geographic Data Analysis*](https://python-gis-book.readthedocs.io/en/develop/part1/chapter-02/nb/00-python-basics.html#common-string-manipulation-techniques)
- [Python documentation: PEP 498 - Literal string interpolation](https://www.python.org/dev/peps/pep-0498/)