# Introduction to Coding for AI

## 3. Data Structures and Handling Errors

### 3.1. Methods for data types

You may remember that in Notebook 1, we’ve already seen the basic data types string (`str`) for text, integer (`int`) and float (`float`) for numbers, and boolean (`bool`) for logical values. Strings are a particularly flexible data type that can be converted into other types, so the next step is to learn in more detail the **methods** that we can use with them. If you need a refresher, you can find an introduction to methods in our first notebook. 

#### Strings

Let’s retake our first data type: strings. There are many operations that you can do with them, so we’ll start by introducing some of the most common methods. First, we’ll check out methods that don’t require arguments, then those that do, and finally methods that involve lists. Let’s go!  

#### String methods that don’t require arguments 

We’ll start by looking at the string methods that need arguments, meaning no extra information is needed except the variable itself. These methods are:

- `.lower()`: Converts a string into lower case.
- `.upper()`: Converts a string into upper case.
- `.title()`: Converts the first character of each word to upper case.
- `.strip()`: Trimms white spaces at both ends of a string.

Below is an example that will show you how each of these methods work:

In [1]:
string_variable = "   Apples and oranges   "

print(f"string_variable:         {string_variable}")
print(f"string_variable.lower(): {string_variable.lower()}")
print(f"string_variable.upper(): {string_variable.upper()}")
print(f"string_variable.strip(): {string_variable.strip()}")

print(f"string_variable.strip().upper(): {string_variable.strip().upper()}")

string_variable:            Apples and oranges   
string_variable.lower():    apples and oranges   
string_variable.upper():    APPLES AND ORANGES   
string_variable.strip(): Apples and oranges
string_variable.strip().upper(): APPLES AND ORANGES


Tip: you can chain methods like this: `string_variable.strip().upper()`, just as we did for the last code line in the example above. 😉

Quite straightforward, isn’t it? Now, have a go at it yourself in the exercise below.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Similarly to the way we are testing the method `.upper()`, add a line to test the method `.title()`.
3. Run the cell to print the results.

#### String methods that require arguments

Now let’s take a look at string methods that require arguments, meaning external information is needed:

- `.count()`: Returns the number of times a specified value occurs in a string.
- `.index()`: Searches the string for a specified value and returns the index of the **first** position where it was found.
- `.replace()`: Returns a string where a specified value is replaced with a specified value.
- `.zfill()`: Fills the string with zeroes on the left to reach the specified total length. This is useful to standardize file names. For example, sometimes it helps to rename a list of files from (`image_9.png`, `image_10.png`, ..., `image_123.png`), to (`image_009.png`, `image_010.png`, ..., `image_123.png`).

Remember that Python is case-sensitive, so all methods treat differently `a` and `A`.
That’s what we’ll be checking out in the example below: 

In [2]:
characters_string = "Apples and oranges"

print(f"\n characters_string: \n {characters_string}")
print(f"\n characters_string.count('a'): \n {characters_string.count('a')}")
print(f"\n characters_string.index('a'): \n {characters_string.index('a')}")
print(f"\n characters_string.replace('a', 'A'): \n {characters_string.replace('a', 'A')}")

numbers_string = "123"

print(f"\n numbers_string: \n {numbers_string}")
print(f'\n numbers_string.zfill(5): \n {numbers_string.zfill(5)}')


 characters_string: 
 Apples and oranges

 characters_string.count('a'): 
 2

 characters_string.index('a'): 
 7

 characters_string.replace('a', 'A'): 
 Apples And orAnges

 numbers_string: 
 123

 numbers_string.zfill(5): 
 00123


#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the variable `characters_string` with a sentence that includes multiple times the name of a number. For example `five`.
3. Replace the parameter `a` with the number name that appears multiple times in your sentence.
4. Replace the parameter 'A' with the number version of the number name that you use in point 2. For example `5`.
2. Replace `numbers_string` with three letters and use  `.zfill()` to add three zeros on their left.
3. Run the cell to print the results.

#### String methods that involve *lists*

Lastly, there are a couple of methods for strings that either produce a list of strings or require a list of strings as input. These are

- `.split()`: Splits the string at the specified separator, and returns a list. The default separator is a white space.
- `.join()`: Uses the string object to join the elements of the iterable used as input.

Let's see an example of eahc method.

In [3]:
print(f'\n"A basket with mangos".split(): \n{"A basket with mangos".split(" ")}')
print(f'\n" ".join(["A", "basket", "with", "mangos"]): \n{" ".join(["A", "basket", "with", "mangos"])}')


"A basket with mangos".split(): 
['A', 'basket', 'with', 'mangos']

" ".join(["A", "basket", "with", "mangos"]): 
A basket with mangos


#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace `"A basket with mangos"` with a comma-separated list of fruits and split it on the commas instead of the spaces.
3. Join again the new list of fruits, but with underscores (`_`) between each word.
4. Run the cell to print the results.

#### Escape characters

Also you can see we have introduced a new funny-looking command: `\n`.
Commands that start with a backslash `\` are called **escape characters**, and this one, in particular, is called *new line*. In text documents, `\n` indicates the computer that it should break a text line right there, and start a new line below. Here are other common escape characters:

- `\'` and `\"`: Single and double quotes. This is helpful when you need to add quotes inside quotes. For example, to print the text *Let's write "quotes"!* you can use the following string to avoid confusing Python: `print("Let's write \"quotes\"!")`.
- `\\`: Backslash. Useful when you actually want to print a backslash.
- `\n`: New line. Creates a new line in the text.
- `\t`: Tab. Aligns text with tab space that has the same length as eight spaces. It can help to make your prints prettier when multiple lines have different lengths, like tables.

In [4]:
print("No tab.")
print("1\t Tab with 1 leading character.")
print("123\t Tab with 3 leading characters.")
print("12345\t Tab with 5 leading characters.")
print("1234567\t Tab with 7 leading characters.")
print("12345678\t Tab with 8 leading characters.")

print("\nPretty table:")
print("| 1\t| 12\t| 123\t|")
print("| 12\t| 123\t| 1234\t|")
print("| 123\t| 1234\t| 12345\t|")

No tab.
1	 Tab with 1 leading character.
123	 Tab with 3 leading characters.
12345	 Tab with 5 leading characters.
1234567	 Tab with 7 leading characters.
12345678	 Tab with 8 leading characters.

Pretty table:
| 1	| 12	| 123	|
| 12	| 123	| 1234	|
| 123	| 1234	| 12345	|


#### Built-in functions for multiple data types

#### len()

Another built-in method that you will be using often is `len()`, as frequently you'll want to know how long is a string of how many elements you have in a list.
It simply returns the length of the **iterable** object that you pass as an argument, so it works with strings, but also with lists, tuples, and other objects that you will see later on.

In [5]:
list_variable = ["Apples", "and", "oranges"]
print(f"\n Length of: \n {list_variable} \n is: {len(list_variable)}")

string_variable = "Apples and oranges"
print(f"\n Length of: \n {string_variable} \n is: {len(string_variable)}")


 Length of: 
 ['Apples', 'and', 'oranges'] 
 is: 3

 Length of: 
 Apples and oranges 
 is: 18


#### Casting

Data is not just numbers. Data can be words, dates, or even numbers that are stored as text or strings. 
Can you see the issue here? You can't make arithmetic operations with text! But no worries, with Python you can solve the problem by **casting** each value into a number. 

Let's see how in the example below:

In [6]:
integer_variable = int("-5")
float_variable = float("3.14")
boolean_variable = bool("True")

print(f"Value: {integer_variable}, Type: {type(integer_variable)}")
print(f"Value: {float_variable}, Type: {type(float_variable)}")
print(f"Value: {boolean_variable}, Type: {type(boolean_variable)}")

Value: -5, Type: <class 'int'>
Value: 3.14, Type: <class 'float'>
Value: True, Type: <class 'bool'>


Actually, you can perform two arithmetic operations with strings, but they behave differently than numbers.
You can add, or concatenate, **two strings** with a `+`, meaning that Python places one after the other,
and you can multiply **a string** and **an integer** with a `*`, meaning that `"A" * 3` returns `AAA`.

In [7]:
number_1 = "1"
number_2 = "2"

print(number_1 + number_2)

12


So, to treat our variables as numbers we have to cast them as follows:

In [8]:
number_1 = int("1")
number_2 = int("2")

print(number_1 + number_2)
print(number_1 / number_2)

number_1 = float("1")
number_2 = float("2")

print(number_1 + number_2)
print(number_1 / number_2)

3
0.5
3.0
0.5


Notice that when you perform addition, substraction or multiplicaiton with integers, the result is an integer, but when you perform division, the result is automatically converted to a float. Otherwise, `1 / 2` would return `0` instead of `0.5`, because integers only take the integer part of numbers. Convenient, isn’t it?

#### Exercise:

1. Copy the code of the cell above into the cell below.
2. Cast the result of the four arithmetic operations into integers.
3. Run the cell to print and compare the results with the cell above.

#### Numbers

Being able to easily manipulate numbers is very useful, especially with larger datasets. Python comes with a bunch of handy built-in functions for handling numbers. Let’s see them in more detail:

- `max()`: Returns the largest number in an iterator.
- `min()`: Returns the smallest number in an iterator.
- `sum()`: Sums all the numbers in an iterator.
- `abs()`: Returns the absolute value of a single number (meaning that it ignores negative signs, so `abs(-3.14)` returns `3.14`).
- `round(number, n_digits=None)`: Rounds a single number. It's worth noting that you can input two paameters to this funciton, `number` and `n_digits`. The value of `number` is rounded to the closest multiple of 10 to the power minus `n_digits`. Additionaly, if two multiples are equally close to `number`, the rounding is done toward the even choice. So, for example, as `n_digits`is `None` by default, `round(1.5)` rounds up and returns `2`, and `round(2.5)` rounds down and returns `2` as well!

Take a look at the examples.

In [9]:
data = [-10, -5.5, 0, 5.5, 10]
print(f"data = {data}\n")

print(f"max(   data    ) returns: {max(data)}")
print(f"min(   data    ) returns: {min(data)}")
print(f"sum(   data    ) returns: {sum(data)}")
print(f"abs(   data[1] ) returns: {abs(data[1])}")
print(f"round( data[1] ) returns: {round(data[1])}")

data = [-10, -5.5, 0, 5.5, 10]

max(   data    ) returns: 10
min(   data    ) returns: -10
sum(   data    ) returns: 0.0
abs(   data[1] ) returns: 5.5
round( data[1] ) returns: -6


#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the list of numbers with a list of ten negative floats.
3. In the methods `abs()` and `round()` we are passing the second element of the list `data`. Replace this input with the last element in the list `data`.
4. Run the cell to print the results.

### 3.2. Methods for data structures

The next topics that we’ll revisit in more detail are sequences and mappings; particularly lists and dictionaries.


#### Lists

We have seen that lists are sequences of strings, numbers, or a mixture of various data types.
Now you will learn how to manipulate them.
These are some of the methods that you will be using more often with lists.

#### List methods that return a value

- `.index()`: Returns the index of the **first** element with the specified value.
- `.count()`: Returns the number of elements with the specified value.

Here are a couple of examples:

In [10]:
cities_to_visit = ["Berlin", "Madrid", "Paris", "Rome", "Berlin"]
print(f"cities_to_visit = {cities_to_visit}\n")

print(f"cities_to_visit.index('Berlin') returns: {cities_to_visit.index('Berlin')}")
print(f"cities_to_visit.count('Berlin') returns: {cities_to_visit.count('Berlin')}")

cities_to_visit = ['Berlin', 'Madrid', 'Paris', 'Rome', 'Berlin']

cities_to_visit.index('Berlin') returns: 0
cities_to_visit.count('Berlin') returns: 2


#### List methods that work **in-place**
(we'll explain in a moment)

- `.sort()`: Sorts the list.
- `.reverse()`: Reverses the order of the list.
- `.insert()`: Adds an element at the specified position.
- `.remove()`: Removes an element at the specified position.
- `.extend()`: Add the elements of a list (or any iterable), to the end of the current list.
- `.append()`: Adds an element at the end of the list.

Methods that work *in-place* return `None` so, if you pass them as an argument to a `print()` funtion, you will always print `None`. If you want to visualize their results, execute them in a line and print the variable in the following line. Some examples:

In [11]:
cities_to_visit = ["Berlin", "Madrid", "Paris", "Berlin"]
print(f"cities_to_visit = {cities_to_visit}\n")

cities_to_visit.sort()
print(f"cities_to_visit.sort()              returns: {cities_to_visit}")

cities_to_visit.reverse()
print(f"cities_to_visit.reverse()           returns: {cities_to_visit}")

cities_to_visit.insert(1, 'Athene')
print(f"cities_to_visit.insert(1, 'Athene') returns: {cities_to_visit}")

cities_to_visit.remove('Berlin')
print(f"cities_to_visit.remove('Berlin')    returns: {cities_to_visit}")

cities_to_visit.extend(['Rome'])
print(f"cities_to_visit.extend(['Rome'])    returns: {cities_to_visit}")

cities_to_visit.append(['Rome'])
print(f"cities_to_visit.append(['Rome'])    returns: {cities_to_visit}")

cities_to_visit = ['Berlin', 'Madrid', 'Paris', 'Berlin']

cities_to_visit.sort()              returns: ['Berlin', 'Berlin', 'Madrid', 'Paris']
cities_to_visit.reverse()           returns: ['Paris', 'Madrid', 'Berlin', 'Berlin']
cities_to_visit.insert(1, 'Athene') returns: ['Paris', 'Athene', 'Madrid', 'Berlin', 'Berlin']
cities_to_visit.remove('Berlin')    returns: ['Paris', 'Athene', 'Madrid', 'Berlin']
cities_to_visit.extend(['Rome'])    returns: ['Paris', 'Athene', 'Madrid', 'Berlin', 'Rome']
cities_to_visit.append(['Rome'])    returns: ['Paris', 'Athene', 'Madrid', 'Berlin', 'Rome', ['Rome']]


Notice that `data.extend(['d'])` merges both lists, and `data.append(['d'])` concatenates them, so you have a list with another list inside.
If you pass a string instead of a list as the argument, then the effect is the same:
`data.extend(['d'])` and `data.append('d')` produce the same result.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the list of cities with the names of famous artists.
3. Adapt the code so that all methods run without producing errors.
4. Run the cell to print the results.

#### Indexing

Elements inside lists are ordered, so when you want to retrieve the value of an element, you use its index.
The index is an **integer** passed to the variable as an argument, like with functions, but instead of using parentheses, we use a pair of brackets `[]`.
Remember that Python has **zero-based** indexing, so `data[0]` returns the first element of data.
If you use negative integers, you start counting from the end to the beginning, so `data[-1]` returns the last element.

In [12]:
cities_to_visit = ["Madrid", "Paris", "Berlin", "Rome", "Athene"]
print(f"cities_to_visit = {cities_to_visit}\n")

print(f"cities_to_visit[0]  gives us the FIRST       element: {cities_to_visit[0]}")
print(f"cities_to_visit[1]  gives us the SECOND      element: {cities_to_visit[1]}")
print(f"cities_to_visit[-2] gives us the SECOND LAST element: {cities_to_visit[-2]}")
print(f"cities_to_visit[-1] gives us the LAST        element: {cities_to_visit[-1]}")

cities_to_visit = ['Madrid', 'Paris', 'Berlin', 'Rome', 'Athene']

cities_to_visit[0]  gives us the FIRST       element: Madrid
cities_to_visit[1]  gives us the SECOND      element: Paris
cities_to_visit[-2] gives us the SECOND LAST element: Rome
cities_to_visit[-1] gives us the LAST        element: Athene


You can also use indexing to modify the value of an element:

In [13]:
cities_to_visit = ["Madrid", "Paris", "Berlin", "Rome", "Athene"]
print(cities_to_visit)

cities_to_visit[2] = "Lisbon"
print(cities_to_visit)

['Madrid', 'Paris', 'Berlin', 'Rome', 'Athene']
['Madrid', 'Paris', 'Lisbon', 'Rome', 'Athene']


#### Slicing

With slicing, instead of indicating a single value with an index, you indicate a range of values with a `start` and an `end` **integer** separated by a **colon** (`:`), such as `data[start:end]`.
The returned elements go from `start` up to, **but not including**, `end`.
See the examples below:

In [14]:
cities_to_visit = ["Madrid", "Paris", "Berlin", "Rome", "Athene"]

print(f"cities_to_visit = {cities_to_visit}\n")
print(f"cities_to_visit[0:2]   returns: {cities_to_visit[0:2]}")
print(f"cities_to_visit[:2]    returns: {cities_to_visit[:2]}")
print(f"cities_to_visit[1:3]   returns: {cities_to_visit[1:3]}")
print(f"cities_to_visit[-3:-1] returns: {cities_to_visit[-3:-1]}")
print(f"cities_to_visit[-3:]   returns: {cities_to_visit[-3:]}")
print(f"cities_to_visit[:]     returns: {cities_to_visit[:]}")

cities_to_visit = ['Madrid', 'Paris', 'Berlin', 'Rome', 'Athene']

cities_to_visit[0:2]   returns: ['Madrid', 'Paris']
cities_to_visit[:2]    returns: ['Madrid', 'Paris']
cities_to_visit[1:3]   returns: ['Paris', 'Berlin']
cities_to_visit[-3:-1] returns: ['Berlin', 'Rome']
cities_to_visit[-3:]   returns: ['Berlin', 'Rome', 'Athene']
cities_to_visit[:]     returns: ['Madrid', 'Paris', 'Berlin', 'Rome', 'Athene']


As we mentioned at the beginning, **tuples** have basically the same functionality as lists, so *indexing* and *slicing* work the same on tuples as on lists.

As a reminder, the difference between them is that once you define a tuple you can not modify it, a characteristic called **immutability**.
It's good that you know tuples, as you will see them every now and there, but you will only be using their most basic functionality, so for now, just think of tuples as immutable lists.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the list of cities with names of famous sports people.
3. Change all of the indices to compare the outputs.
4. Run the cell to print the results.

#### Dictionaries

Now it’s time for us to discover more about dictionaries!
As we’ve seen in Notebook 1, dictionaries are a mapping data type.
These are data structures that give a name to each value, so you can indicate the value that you want with a string instead of an index.
The names are called **keys** and the values, you guessed, are called **values**.
So, you could build an English language dictionary by using the words as keys, and the definitions as the values.
For our purposes, you will only be using strings to define keys, and you can use any other object to define values.
Something very important to remember is that keys are unique, while the same value can be used for multiple keys.
For example, `"improve": "to become better"` and `"advance": "to become better"`, have the same value (`to become better`), but unique keys (`improve` and `advance`).

How do we define an instance of a dictionary to Python?
The syntax to define a dictionary is by using a pair of curly braces (`{}`), and inside we can include **key-value pairs** separating each with a comma.
Then, instead of defining the value of a key with an equal sign (`=`), we use a colon (`:`).
For example:
```
dictionary_of_words = {"improve": "to become better", "advance": "to become better"}
```

Another note on syntax: white spaces between enclosing symbols are ignored by Python.
So, as long as you are writing inside a pair of parentheses `()`, square brackets `[]`, or curly braces `{}`, you can split a line of code into multiple lines and Python will still understand your instruction.
As an example, you can also split the previous definition

```
dictionary_of_words = {"improve": "to become better", "advance": "to become better"}
```
into multiple lines to make the code easier to read:
```
dictionary_of_words = {
    "improve": "to become better",
    "advance": "to become better"
    }
```

Let's see some more examples.

In [15]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2"
}

print(dictionary_variable)
print(dictionary_variable.keys())
print(dictionary_variable.values())

{'key_1': 'value_1', 'key_2': 'value_2'}
dict_keys(['key_1', 'key_2'])
dict_values(['value_1', 'value_2'])


Notice that you can get a list of keys with the `.keys()` method, and a list of values with the `.values()` method.
To get the value of a key you use a notation similar to lists:

In [16]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2"
}

print(dictionary_variable["key_1"])

value_1


To create a new key-value pair or to update the value of a key you use the same notation:

In [17]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2"
}

# Update a value:
dictionary_variable["key_1"] = 123

# Create new key-value pair:
dictionary_variable["key_3"] = 456

print(dictionary_variable)

{'key_1': 123, 'key_2': 'value_2', 'key_3': 456}


#### Exercise:
1. Create a new dictionary with three keys, each with the name of a country.
2. Assign a number to each key to represent the population of the country.
3. In a new line, update the population of a country with a new number.
4. In a new line, add a fourth country and its corresponding population.
5. Run the cell to print the results.

- **Going further**: Google how to delete elements of Python dictionaries. In a new line, delete the country with the largest population.

Be careful to pass an existing key to the dictionary when you read it, such ass `print(dictionary_variable["key_1"])` in the example above, otherwise, you will get an error and **crash** your program.
For example, `print(dictionary_variable["key-1"])` would raise an error as `"key-1"` doesn't exist in the dictionary.
You can try it out! Uncomment the last line in the following cell and run it to see the error.
Once you read it, comment out the last line again so that we can run all cells without the notebook stopping in this line (this is only necessary to help the tutors to evaluate your notebooks faster).

In [18]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2"
}

# Uncoment, run the cell, and comment back the following line:
#print(dictionary_variable["key-1"])

### 3.3. Handling errors

This bring us to the following topic: **error handling**.
Errors can be frustrating when coding, but if you take a breath and learn to handle them, you'll see that error messages are not your enemies 😉. In fact, their purpose is to help you fix errors in your code.
Python has a tool called **try-except** that allows you to continue running your program even if something unexpected goes wrong. *Try-except* statements can have the following elements:

- `try`: Lets you test a block of code for errors.
- `except`: Lets you handle the error.
- `else`: Lets you execute code when there is no error.
- `finally`: Lets you execute code, regardless of the result of the try-except blocks.

Let's take a look at a simple case using the try-except statement in the example below.

In [19]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2",
    "key_3": "value_3",
}

try:
    print(dictionary_variable["key-1"])
except Exception as e:
    print(f"An error occurred: {e}")

An error occurred: 'key-1'


Particulary, take a look at the line `except Exception as e:`. The word `Exception` is telling Python to capture the part of the code that causes an error and store it in the variable `e`.
Before we mentioned that variables should be explicit. In this case, we use the name `e` as it is a convention among Python developers to name this *Exception* variable *e*. So, when you find it in code later on, now you know what it stands for.
Now that we've seen how it works, let's elaborate further and include additional blocks:

In [20]:
try:
    dictionary_variable = {
        "key_1": "value_1",
        "key_2": "value_2",
        "key_3": "value_3",
    }
    key_name = "key-1"
    print(dictionary_variable[key_name])
except Exception as e:
    print(f"An error occurred: {e}")
else:
    print("Code run correctly")
finally:
    print("The try-except has concluded")

An error occurred: 'key-1'
The try-except has concluded


Now you’ve seen how the try-except statement can help you handling errors in Python. Ready to give it a try yourself in the exercise below? 

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Fix the typo in the `key_name`.
3. Run the cell to see the results.

- **Going further**: Change the data type of the dictionary **keys** to a number, a tuple, and a list and see which one raises an error. Try to understand the error message and how it tries to point you to the fault in the code.

And with this, you arrived at the end of notebook 3. You’ve been learning a lot, congratulations! In the next notebook, we’ll look at how to create basic class and module structures. The juicy part is coming!