# Introduction to Coding for AI

## 1. Getting Started 

### 1.6. Run your first code

Your first task is to write `print("This is code")` in the cell below, and then click on the **Run** button with the arrow to execute the code. Make sure that the drop-down menu in the icons bar says `Code`. Then you will see that the text `This is code` will be printed below the cell. You can also execute code selecting a cell and pressing `Shift + Enter`.

In [701]:
print("This is code")

This is code


Congratulations! You have just run the first Python code in the course 🙂

Next, add a new cell below by clicking on the menu `Insert` → `Insert Cell Below`.
Alternatively, click outside the cell so that the line surrounding it goes from green to blue color, and then press `b` on the keyboard to add a new cell *below*, or `a` to add a new cell *above*.
The colors only indicate if you are currently editing a cell (green) or if you are *outside* the cell and can move between cells when using the keyboard arrows.
You can also move cells *up* and *down* using the corresponding arrows in the icons bar.

Go to the new cell you added below, click on the drop-down menu in the icons bar and select `Markdown`.
Then type `### **This** is *markdown*` and run the cell. Do you see the difference? Differently form Code cells, Markdown cells display their output on top of the markdown code instead of below it.

Now that you know how Jupyter notebooks work, you are ready to start diving into Python!

### 1.7. Python elements

To be able to efficiently use Python, there are a few elements you need to know and understand. Just like you would have to know verbs, nouns, or pronouns in English to form a sentence.
Let's start by learning that Python is an **object oriented** programming language.
This means that almost everything in Python is an object with **properties** and **methods**.
Properties are characteristics or values that belong to an object, and methods are functions that execute actions.
You can think of objects as a microwave. It has *properties* like "Heating Time" that you can set to "1 Minute" or "30 Seconds", and it also has *methods* like "Start" and "Stop". We will elaborate more on methods later on.
To start, the first elements of Python that we will show you are **variables**, **data types**, **functions**, and **modules**. As everything in Python, all of them are objects.

- **Variables** are names that indicate the places in the memory of your computer where data is stored.
They are *case-sensitive*, meaning `firstname`, `Firstname` and `firstName` are all different variables.

There are some rules and policies for naming variables. Here are some *DOs*:
- Variable names should be descriptive and meaningful. Use `last_name` instead of `ln`.
- Explicit names make code more maintainable.
- Use lower-case letters and underscores to make them more readable.

Here are some *DON'Ts*:
- Variables can't start with numbers.
- Variables can't have spaces. If you need multiple words use underscore instead: `_`
- Variables can't use some reserved words, such as `True`, `False` or `import`, because they are used by the programming language itself, and using them also as names for variables would confuse Python and make it unusable.

Below you can see the full list of reserved keywords:

In [702]:
import keyword
print(keyword.kwlist)

['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']


Otherwise, you can name your variables however you like!

- **Data types** are the values that data can have. For example, data can be numbers, text, or booleans (True or False).

In the following cell you can see an example of variables being assigned different data types:

In [703]:
numeric_variable = 123
text_variable = "456"

print("numeric_variable:", numeric_variable)
print("text_variable:", text_variable)

numeric_variable: 123
text_variable: 456


Both data types look the same when printed in the cell, so how can you differentiate them?

- **Functions** are reusable pieces of code that carry out tasks. You have seen now a couple of times the function `print()`. Notice that **functions** are words, like variables, but end with a pair of brackets `()`.
Some times they take in *arguments*, like in the case of `print()` which takes a string or a variable with the value that you want to display.

The second **built-in** function that you will see is `type()`. Built-in means that this function is part of Python itself and you can run it without having to take it from other modules.

- **Modules**, or **libraries**, are collections of code that you can reuse to create a program.
Sometimes they are readily available, like the modules that you can import from the Python **Standard library**, and sometimes you need to install external libraries to be able to import them, like the libraries you will use for writing Machine Learning code.

**Import** means that you load the code to the memory of your computer, and is executed with the command `import`.
You can import modules that include multiple functions, or import single functions directly.
Being able to import single funtions instead of entire modules helps to use less memory, which can help to avoid slowing down your computer.

In [704]:
type_of_numeric_variable = type(numeric_variable)
type_of_text_variable = type(text_variable)

print("Type of numeric_variable:", type_of_numeric_variable )
print("Type of text_variable:",type_of_text_variable )

Type of numeric_variable: <class 'int'>
Type of text_variable: <class 'str'>


### 1.8. Data types

The basic data types that we are interested in this course are:

- Text:	`str`

The data type of text is called **string**, and it is defined by using quotes or double quotes around text (like `'Text string'` or `"Text string"`). You can also use three double-quotes (`"""`) to define string blocks as seen in the example below.

- Numeric:	`int`, `float`

The data type of numbers can be **integers** (`0`, `27`, `-3`) or **floats** when numbers have a decimal point (`0.0`, `27.5`, `-3.1416`).

- Boolean:	`bool`

They are only two Boolean values: `True` and `False`. We will talk more about **booleans** a bit later as they require a more detailed explanation.

- None:	`NoneType`

Finally, we have the **None** type. You can think of None as a placeholder for missing values.

You can define most data types with a long or a short notation:

In [705]:
string_variable = "Hello World"
string_variable = str("Hello World")
string_variable = """
Hello
World
"""

integer_variable = 20
integer_variable = int(20)

float_variable = 20.5
float_variable = float(20.5)

boolean_variable = True
boolean_variable = bool(1)

none_variable = None

#### Exercise:
1. Copy the code in the cell above into the cell below.
2. Change the name of the variables.
3. Change the values of the variables, but keep the same corresponding data types.
4. Print the data type of each variable using the `type()` function.

In [706]:
string_einhorn = "Hello World"
string_einhorn = str("Hello World")
string_einhorn = """
Hello
World
"""

integer_twenty = 20
integer_twenty = int(20)

float_boat = 20.5
float_float = float(20.5)

boolean_variable = True
boolean_variable = bool(1)

none_variable = None

string_einhorn_type_abra= type(string_einhorn)
print(string_einhorn_type_abra)
print(string_einhorn)

integer_twenty_type= type(integer_twenty)
print(integer_twenty_type)

boolean_variable_type= type(boolean_variable)
print(boolean_variable_type)

none_variable_type=type(none_variable)
print(none_variable_type)



<class 'str'>

Hello
World

<class 'int'>
<class 'bool'>
<class 'NoneType'>


Additionally to the data types above, there are other types that are used to hold multiple values. Think of it like having boxes with numbers, strings or combinations of various other data types. Some examples are:

- Sequences:	`list`, `tuple`, `range`

A **list** is a sequence of objects (strings, numbers, other lists, etc.). For example `mixed_list = [1, -3.1, None, True, "cherry"]`

A **tuple** is the same as a list, but we can't modify its elements. This property is called immutability and you will learn later on when this property is useful.

A **range** is a sequence of numbers. For example `range(5)` iterates over the sequence of numbers `0, 1, 2, 3, 4`.

- Mappings:	`dict`

A **dictionary** is a collection of variables that, together with their definitions, are assigned to a single variable. Dictionaries map **keys** and **values** in the same way that English-language dictionaries map **words** to their **definitions**. For example, `{"name": "John", "age": 42}` is a dictionary with the keys `name` and `John`, and the corresponding values `John` and `42`.


- Collections:	`set`

A **set** is the collection of unique elements in a sequence. For example, the set of the following list `[1, 2, 2, 3, 3, 3]` is `{1, 2, 3}`.

These data types can also be defined in multiple ways:

In [707]:
list_variable = ["apple", "banana", "cherry"]
list_variable = list(("apple", "banana", "cherry"))

tuple_variable = ("apple", "banana", "cherry")
tuple_variable = tuple(("apple", "banana", "cherry"))

range_variable = range(5)

dictionary_variable = {"name": "John", "age": 42}
dictionary_variable = dict(name="John", age=42)

set_variable = {"apple", "banana", "cherry"}
set_variable = set(("apple", "banana", "cherry"))

#### Exercise:
1. Copy the code in the cell above into the cell below.
2. Change the name of the variables.
3. Change the values of the variables, but keep the same corresponding data types.
4. Print the data type of each variable using the `type()` function.

In [708]:
list_fruit = ["apple", "banana", "cherry"]
list_fruit = list(("apple", "banana", "cherry"))
list_fruit_type= type(list_fruit)
print(list_fruit_type)

tuple_fruit = ("apple", "banana", "cherry")
tuple_fruit = tuple(("apple", "banana", "cherry"))
tuple_fruit_type = type(tuple_fruit)
print(tuple_fruit_type)

range_row = range(5)

dictionary_person = {"name": "John", "age": 42}
dictionary_person = dict(name="John", age=42)
dictionary_person_type = type(dictionary_person)
print(dictionary_person_type)

set_fruit = {"apple", "banana", "cherry"}
set_fruit = set(("apple", "banana", "cherry"))
set_fruit_type = type(set_fruit)
print(set_fruit_type)

<class 'list'>
<class 'tuple'>
<class 'dict'>
<class 'set'>


Coming back to Booleans, the values used to indicate `True` or `False`.
Notice that they start with a capital letter and don't use quotes, so `true` and `"True"` are not Boolean values.
Booleans also have degrees of *truthiness*. What do we mean? For example, numbers `0` and `0.0` are the only numbers considered *falsy*, all other numbers are considered *truthy*. Think of it as absence vs presence, nothing vs something, `0` vs `1`, or an empty object vs a non-empty object.

Other *falsy* values are:
- Empty strings `""`
- `None`
- Empty lists: `[]`
- Empty tuples: `()`
- Empty ranges: `range(0)`
- Empty dictionaries: `{}`
- Empty sets: `set()`



In [709]:
boolean_variable = False
print(boolean_variable)

boolean_variable = bool(0)
print(boolean_variable)

print(type(boolean_variable))

False
False
<class 'bool'>


#### Exercise:
1. Create 5 variables with different falsy values in the cell below.
2. Print the value of each variable.

In [710]:
a= ""
b= 0
c= []
d= {}
e= None

print (bool(a))
print (bool(b))
print (bool(c))
print (bool(d))
print (bool(e))

a= "3"
b= -1
c= [3]
d= {"name":"Erica", "age": 30}
e= 5

print (bool(a))
print (bool(b))
print (bool(c))
print (d["name"], d["age"])
print (bool(e))





False
False
False
False
False
True
True
True
Erica 30
True


### 1.8. Operators

Now that you know data types, you may wonder what to do with them. All the manipulations that you can apply to values are done through **operators**. Some of the most common operators are:
- Arithmetic: As the name says it, they allow you to do arithmetic operations, such as `+` or `-`.
- Assignment: These operators assign values to variables, like the equal sign (`=`).
- Comparison: They compare if two values are the same, such as *equal* (`==`) or *not equal* (`!=`), and return `True` or `False` accordingly.
- Identity: They also compare two values: `is` or `is not`, but are not exactly the same as `==` or `!=`. Is good to know them, but to avoid potential errors, always use comparison operators instead.
- Logical: These operators are used to connect multiple logical evaluations. For example, if you want to evaluate if a fruit is both, round and orange, you use the operator `and`.
- Membership: Checks if a sequence of objects is present in another object. For example, `"ap" in "apple"` returns `True` as the characters `"ap"` are present in the string of characters `"apple"`.

Let's go one by one with some examples.

#### Arithmetic

- `+`: Addition (`x + y`)
- `-`: Subtraction (`x - y`)
- `*`: Multiplication (`x * y`)
- `/`: Division (`x / y`)
- `%`: Modulus, returns the remainder of a division (`x % y`), or the value on the right side of the decimal point. For example, if you divide 3 cookies between 3 children the remainder is 0 (`3 % 3`), but if you divide them between 2 children the remainder is different to zero (`3 % 2`). This operation helps, for example, when you want to know if a number is even or odd.
- `//`: Floor division, ignores the decimals (`x // y`)
- `**`: Exponentiation (`x ** y`)

Some examples are:

In [711]:
x = 5
y = 2

print("x +  y = ", x +  y)
print("x -  y = ", x -  y)
print("x *  y = ", x *  y)
print("x /  y = ", x /  y)
print("x %  y = ", x %  y)
print("x // y = ", x // y)
print("x ** y = ", x ** y)

x +  y =  7
x -  y =  3
x *  y =  10
x /  y =  2.5
x %  y =  1
x // y =  2
x ** y =  25


In [712]:
x = 4
y = 3

print("x +  y = ", x +  y)
print("x -  y = ", x -  y)
print("x *  y = ", x *  y)
print("x /  y = ", x /  y)
print("x %  y = ", x %  y)
print("x // y = ", x // y)
print("x ** y = ", x ** y)

x +  y =  7
x -  y =  1
x *  y =  12
x /  y =  1.3333333333333333
x %  y =  1
x // y =  1
x ** y =  64


#### Exercise:
1. Copy the code in the cell above into the cell below.
2. Change the values of the variables.
3. Print new results.

- **Going further**: Let’s explore the methods and try some new things out. Add a cell below and find out which operations can also be used with two strings, and which can be used with one string and one number. 

In [713]:
x = "hello"
y = "h"

print("x +  y = ", x +  y)
#print("x -  y = ", x -  y)
#print("x *  y = ", x *  y)
#print("x /  y = ", x /  y)
#print("x %  y = ", x %  y)
#print("x // y = ", x // y)
#print("x ** y = ", x ** y)

x = "hello"
y = 4

#print("x +  y = ", x +  y)
#print("x -  y = ", x -  y)
print("x *  y = ", x *  y)
#print("x /  y = ", x /  y)
#print("x %  y = ", x %  y)
#print("x // y = ", x // y)
#print("x ** y = ", x ** y)

x +  y =  helloh
x *  y =  hellohellohellohello


#### Assignment

- `=`: `x = 10`
- `+=`: `x += 5` is the same as `x = x + 5`
- `-=`: `x -= 5` is the same as `x = x - 5`
- `*=`: `x *= 5` is the same as `x = x * 5`
- `/=`: `x /= 5` is the same as `x = x / 5`

Some examples are:

In [714]:
x = 10
x += 5
print("x += 5 = ", x)

x = 10
x = x + 5
print("x = x + 5 = ", x)

x += 5 =  15
x = x + 5 =  15


#### Exercise:
1. Copy the code in the cell above into the cell below.
2. Perform the same two computations for each of the assignment operators (`+=`, `-=`, `*=`, `/=`).
3. Print new results.

In [715]:
x = 1
x += 5
print("x += 5 = ", x)

x = 2
x -= 5
print("x -= 5 = ", x)

x = 3
x *= 6
print("x *= 5 = ", x)

x = 2
x /= 5
print("x /= 5 = ", x)


x += 5 =  6
x -= 5 =  -3
x *= 5 =  18
x /= 5 =  0.4


#### Comparison

- `==`: Equal (`x == y`)
- `!=`: Not equal (`x != y`)
- `>`: Greater than (`x > y`)
- `<`: Less than (`x < y`)
- `>=`: Greater than or equal to (`x >= y`)
- `<=`: Less than or equal to (`x <= y`)

Some examples are:

In [716]:
x = 5
y = 2

print("x == y = ", x == y)
print("x != y = ", x != y)
print("x >  y = ", x >  y)
print("x <  y = ", x <  y)
print("x >= y = ", x >= y)
print("x <= y = ", x <= y)

x == y =  False
x != y =  True
x >  y =  True
x <  y =  False
x >= y =  True
x <= y =  False


#### Exercise:
1. Copy the code in the cell above into the cell below.
2. Change the values of the variables.
3. Print new results.

- **Going further**: Add a cel below and find out which operations can also be used with two strings.

In [717]:
x = 3
y = 4

print("x == y = ", x == y)
print("x != y = ", x != y)
print("x >  y = ", x >  y)
print("x <  y = ", x <  y)
print("x >= y = ", x >= y)
print("x <= y = ", x <= y)

x = "hello"
y = 4

print("x == y = ", x == y)
print("x != y = ", x != y)
#print("x >  y = ", x >  y)
#print("x <  y = ", x <  y)
#print("x >= y = ", x >= y)
#print("x <= y = ", x <= y)

x = "hello"
y = "h"

print("x == y = ", x == y)
print("x != y = ", x != y)
print("x >  y = ", x >  y)
print("x <  y = ", x <  y)
print("x >= y = ", x >= y)
print("x <= y = ", x <= y)

x == y =  False
x != y =  True
x >  y =  False
x <  y =  True
x >= y =  False
x <= y =  True
x == y =  False
x != y =  True
x == y =  False
x != y =  True
x >  y =  True
x <  y =  False
x >= y =  True
x <= y =  False


#### Identity

- `is`: Returns `True` if both variables are the same object (`x is y`)
- `is not`: Returns `True` if both variables are not the same object (`x is not y`)

Some examples are:

In [718]:
a = [1,2]
b = [1,2]
c=a
print(a is b)
print(c is a)

False
True


In [719]:
x = 5
y = 2

print("x is y = ", x is y)
print("x is not y = ", x is not y)

x is y =  False
x is not y =  True


#### Exercise:
1. Copy the code in the cell above into the cell below.
2. Change the values of the variables to
  - Task 1: `x = 5`, `y = 5` and print the results.
  - Task 2: `x = 5`, `y = 5.0` and print the results.
3. Tasks 3 and 4: For a little experiment, let's make a small change to our original code from step 1:
  - Instead of `x is y`, use `x == y`
  - Instead of `x is not y`, use `x != y`
  
Once you've done that, repeat step 2.

You will see different results when you compare side-by-side the four tasks.
The reason behind this peculiar situation is that the `==` operator compares the value of two objects, whereas the `is` operator checks whether two variables point to the same object in memory.
It is good to know identity operators exist, but they can cause code bugs if we don't use them carefully. The moral of the lesson is, as a beginner, better use *comparison* operators and avoid using *identity* operators.

In [720]:
x = 5
y = 5

print("x is y = ", x is y)
print("x is not y = ", x is not y)

x = 5
y = 5.0

print("x is y = ", x is y)
print("x is not y = ", x is not y)

x = 5
y = 2

print("x == y = ", x == y)
print("x != y = ", x != y)

x is y =  True
x is not y =  False
x is y =  False
x is not y =  True
x == y =  False
x != y =  True


#### Logical

- `and`: Returns `True` if both statements are true (`x > 5 and x < 10`)
- `or`: Returns `True` if one of the statements is true (`x < 5 or x > 10`)
- `not`: Reverse the result. Returns `False` if the result is true (`not(x < 5 or x > 10)`)

Some examples are:

In [721]:
a = 5
x = 8
b = 10

print("    x > a and x < b  = ",     x > a and x < b)
print("    x < a or  x > b  = ",     x < a or  x > b)
print("not(x < a or  x > b) = ", not(x < a or  x > b))

    x > a and x < b  =  True
    x < a or  x > b  =  False
not(x < a or  x > b) =  True


#### Exercise:
1. Copy the code in the cell above into the cell below.
2. Change the values of the variables.
3. Print new results.

In [722]:
a = 1
x = 2
b = 3

print("    x > a and x < b  = ",     x > a and x < b)
print("    x < a or  x > b  = ",     x < a or  x > b)
print("not(x < a or  x > b) = ", not(x < a or  x > b))

    x > a and x < b  =  True
    x < a or  x > b  =  False
not(x < a or  x > b) =  True


#### Membership

- `in`: Returns `True` if a sequence with the specified value is present in another object (`x in y`)
- `not in`: Returns `True` if a sequence with the specified value is not present in another object (`x not in y`)

Some examples are:

In [723]:
x = 5
y = [-10, -5, 0, 5, 10]

print(x in y)
print(x not in y)

True
False


#### Exercise:
1. Copy the code in the cell above into the cell below.
2. Change the values of the variables.
3. Print new results.

- **Going further**: Add a cell below and repeat the operations with strings.

In [724]:
x = 2
y = [-4, -2, 1, 3, 7]

print(x in y)
print(x not in y)

False
True


This concludes the first milestone! Great job reaching this point. In the following notebook, you will learn about creating your own functions and flows. 😊


# Introduction to Coding for AI

## 2. Flow and Functions

As you write pen on paper in your notebook, your words flow through consecutive lines. Similarly, in Jupyter Notebooks, the code is written in consecutive cells, and sometimes the code in one cell depends on the code executed in a cell above. That’s why you need to make sure that every time you open a notebook you run every cell, and in the same order that they appear. Once you run a cell, you can go back to it, modify the code and run it again.

This brings us to another critical aspect of Jupyter notebooks. If you use variables with the same name in different cells, the variable will always have the last value that you assign to it. For example, you may have a notebook with the code `price = 100; discount = 10` in cell 1, the code `final_price = price - discount` in cell 2, and the code `price = "expensive"` in cell 10. If you execute cell 1 and cell 2, you will get a correct result, but if you then execute cell 10 and then cell 2 again, you will get an error as you are telling Python to subtract a number from a string. What's even more dangerous, is if the code in cell 10 would have been `price = 200` instead of `price = "expensive"`. Because in this case it’s a number you wouldn’t get an error, but a wrong result, most probably without noticing!

Here are some tips to avoid these problems when you create your own notebooks:
- Run all cells from top to bottom, don’t start running cells in the middle.
- Don’t name different variables with the same name.
- Only re-run a cell when you are sure this won’t assign it a value that will cause errors -or wrong results- in cells that you run afterward.


### 2.1. Python syntax

The set of rules by which we arrange words in a sentence, whether in English or German, is called syntax. Programming languages are no different, and each has its own. Let’s check out the syntax of Python and learn the set of rules by which a Python program will be written.

#### Comments

First, let's talk about **comments**. They are text that is only meant to be read by people, and not for Python to interpret. Using comments can be very useful to clarify what you are doing. The rule is to always use the hashtag symbol (`#`) before typing. This will tell Python to ignore everything that's written in that line after the `#`.

Comments can also be used to temporarily to stop code from being executed, which comes in handy when developing software. You can place comments in many places: at the beginning of a line of code, inside a code block, or after a code instruction.

Use comments to add information that helps other people -and yourself- to understand your code, or to temporarily ignore code when you are prototyping.
Importantly, don't use comments to describe the operations that are already being shown by the code itself, but use them to describe higher-level information, such as the overall intention of an approach to solve a problem, or some precautions other people should take if they modify the code.

In [725]:
# I am a comment

##########
# ME TOO #
##########

**Key insight:** Always keep in mind that programming is an iterative activity, meaning that it is meant to be understood by others and by yourself when you haven't read your own code for a while.
Tackling a problem with code is only part of the solution; another crucial aspect is producing clean code that can be easily understood and maintained by other people.

### 2.2. Conditional statements

As we go through our day, we make all sort of decisions, such as "if the weather is nice, I'll put my clothes outside to dry, else I'll use the drying machine".
When we want to control the flow of code, we use conditional statements to tell Python to make this sort of decision-making.

#### Indentation

The next language aspect we’ll consider is **indentation**. It refers to the spaces at the beginning of some lines of code.

In other programming languages, indentation helps only for readability, but in Python indentation indicates a block of code, also called scope. Scopes are necessary when you want that a sequence of operations works with the same variables. Don’t worry, you will see some examples in a moment.

To indicate a block of code you have to add the same number of indentation spaces before each line of code, otherwise, Python will give you an error. The number of spaces is up to you, but as a convention, most people add **four spaces** behind each line of a code block. You'll see examples below.

For example, in the following code, the first `print()` is only executed if the comparison operator evaluates to `True`. Otherwise, only the second `print()` is executed.

In [726]:
###################
# Syntax Exercise #
###################

global_variable = 10

# The following is called "IF statement":
if global_variable > 5:
    print("Print this if global_variable is greater than 5")
    # <--- Notice the four spaces before the hash

print("Print this always")

Print this if global_variable is greater than 5
Print this always


Additionally to the white spaces defining the scope of variables, the code above also has a couple of new things.
The first one are the hashtag symbols `#` used to write **comments**, and the second one is the `if` statement used to **conditionally** execute code. Let's explain them in more detail.

#### Exercise:
1. Copy the code of the Syntax Exercise into the cell below.
2. Remove the existing comments, and add new comments explaining the idea behind the operation.
3. Run the cell to print the results.

In [727]:

global_variable = 6

# This is an if-condition that prints the message in case the condition is true
if global_variable > 5:
    print("Print this if global_variable is greater than 5")
    # in case it is not true it will not be printed

print("Print this always")

Print this if global_variable is greater than 5
Print this always


#### The IF statement

Conditional flow statements check for the logical evaluation of one or more comparison operations, and then decide whether to execute a code block or not.
At the end of flow statements, such as the **if** statement, you use a colon (`:`) and the following line must have spaces to indicate the code block.
So, in the example of the Syntax Exercise, the *if* statement has only a print function and a comment inside its corresponding code block.

```
global_variable = 10

if global_variable > 5:
    print("Print this if global_variable is greater than 5")
    # <--- Notice the four spaces before the hash

print("Print this always")
```

#### Exercise:
1. Copy the code of the Syntax Exercise into the cell below.
2. Change from 4 to 2 the number of spaces defining the code block inside the *if* statement.
3. Change the if statement to print "Please insert more coins to get a drink." if the `global_variable` is less than or equal to 2.5.
4. Run the cell to print the results.

- **Going further**: Let's learn what error messages look like. Remove all the spaces inside the code block and read the error message. Is the message clearly showing what is wrong in the code and how to fix it? Add the spaces back and re-run the cell to remove the error message.

In [728]:
global_variable = 10

if global_variable <= 2.5:
  print("Please insert more coins to get a drink.")
    # <--- Notice the four spaces before the hash

print("Print this always")

Print this always


#### Conditional statements and logical operators

Other than the comparison operators, such as `==`, other common elements used together with conditional flow statements are the logical operators, such as `and` and `or`. Take a look at the following example:

In [729]:
first_name = "John"
last_name = "Smith"
age = 20
grade = 8
driving_licence_approved = False

MIN_AGE = 21
MIN_GRADE = 6  # Grades go from 0 to 10

if age >= MIN_AGE:
    if grade >= MIN_GRADE:
        driving_licence_approved = True

# You can also replace the nested IF statements
# with the AND logical operator to simplify your code:
if age >= MIN_AGE and grade >= MIN_GRADE:
    driving_licence_approved = True

print(f"License approved: {driving_licence_approved}")

License approved: False


Indeed there are a few new things in this code. Let’s check them together.
Firstly, you may have noticed some variables are written in UPPER case letters. This indicates that they are **constants**, or values that almost never change.
This is only a naming convention between developers but to Python, it’s just a variable.

Next, we have *if* statements placed inside other *if* statements.
Remember? That’s when a code line has a four-space indentation compared to the previous one.
These are called **nested** *if* statements.
When you nest statements, it's critical to add four spaces for each additional *if* to define its scope.
If you write a line of code with four spaces less than the innermost *if*, this indicates to Python that this line belongs to the previous *if* statement. For example, both pieces in the following code produce the same result:

```
# Version 1
if height > 100:
    print("It's tall")
    if width > 100:
        print("It's wide")
        if depth > 100:
            print("It's deep")

# Version 2
if height > 100:
    if width > 100:
        if depth > 100:
            print("It's deep")
        print("It's wide")
    print("It's tall")
```

Lastly, we added a new way of editing the text that we want to print: `print(f"License approved: {driving_licence_approved}")`.
This is called **string formatting** and you will gradually learn more about it.
What is important for now, is that you know that you can insert the value of variables inside strings of text.
One way of doing this is by adding the letter `f` before the string quotes and then by adding a pair of *curly braces* `{}` inside the string where you want to insert the variable value.

Now let's add a few more components to our flow exercise. Once again, look out for new things:

In [730]:
#Version 1
height = 101
width = 50
if height > 100:
    print("It's tall")
    if width > 100:
        print("It's wide")
        if depth > 100:
            print("It's deep")

It's tall


In [731]:
height = 101
width = 50
#Version 2
if height > 100:
    if width > 100:
        if depth > 100:
            print ("It's deep")
    print ("It's wide")
print ("It's tall")

It's wide
It's tall


In [732]:
first_name = "John"
last_name = "Smith"
age = 20
grade = 8
driving_licence_approved = False

MIN_AGE = 21
MIN_GRADE = 6  # Grades go from 0 to 10

old_enough = age >= MIN_AGE
passed_test = grade >= MIN_GRADE

if old_enough and passed_test:
    driving_licence_approved = True
elif old_enough and not passed_test:
    print("Please try again the driving exam.")
else:  # Execute this block if none of the above blocks is executed
    print("Come back once you have the minimum age for driving.")

print(f"License approved: {driving_licence_approved}")

Come back once you have the minimum age for driving.
License approved: False


Did you spot them? Our new flow components are `elif` and `else`.
You may be thinking “Wait, what? What does elif even mean? 😳”. Let’s look at the example in more detail.

1. When the `if` statement evaluates to `True` and is executed, all the remaining `elif` and `else` elements are ignored. So, `if old_enough and passed_test`, then the licence is approved. When the code inside the `if` statement is not executed, the next flow evaluation is checked.

2. In this case, the next evaluation is the `elif`, which stands for *else if*. Same as before, when the code of an `elif` statement is executed, the remaining elements are ignored. So, `elif old_enough and not passed_test`, then we should ask John to "try again the driving exam".

3. Finally, *if none* of the `if` and `elif` statements are executed, then the `else` statement will *always* be executed. So, `else`, we should ask John to "come back once he has the minimum age for driving".

There can be only one `if` and one `else` statements connected, but you can add as many `elif` statements as you want in between.

Another change we made was to **extract** the comparison operations into variables with explicit names. More concretely, we changed this format:

`if age >= MIN_AGE and grade >= MIN_GRADE:`

to this format:

```
old_enough = age >= MIN_AGE
passed_test = grade >= MIN_GRADE

if old_enough and passed_test:
````

The second format has more lines of code, but it's more clear.
Do you see how naming your variables intuitively can make your code read almost like the English language? You are starting to like the Python syntax, aren't you? 😏

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Help John to get his driving license by updating his age.
3. Include one additional `elif` statement that prints "Please include First Name and Last Name" if either the first name `or` the last name are empty strings. Checking that all inputs to your program are correct is called **validation**.

4. Run the cell to print the results.

- **Going further**: Extract the validation of the name strings into two additional variables. The new variables should have meaningful names that facilitate reading the code in the new `elif ` statement. For example: `elif no_firstname and no_lastname:`

In [733]:
name = ""
print(len(name) == 0)

True


In [734]:
def get_length(my_input):
    result = 0

    for letter in my_input:
        result+=1

    return result

def greet_me():
    print('Hello World for Erica')


###################

apple_length = get_length('apple')
pear_length = get_length('pear')

print(apple_length)
print(pear_length)
greet_me()

5
4
Hello World for Erica


In [735]:
first_name = "John"
last_name = "Smith"
age = 21
grade = 8
driving_licence_approved = False
no_first_name = len(first_name)==0
no_last_name = len(last_name)==0

MIN_AGE = 21
MIN_GRADE = 6  # Grades go from 0 to 10

old_enough = age >= MIN_AGE
passed_test = grade >= MIN_GRADE

if old_enough and passed_test:
    driving_licence_approved = True
elif old_enough and not passed_test:
    print("Please try again the driving exam.")
elif no_first_name or no_last_name:
    print("Please include First Name and Last Name")
else:  # Execute this block if none of the above blocks is executed
    print("Come back once you have the minimum age for driving.")

print(f"License approved: {driving_licence_approved}")

License approved: True


### 2.3. Loops

Sometimes you want to repeat the same operations a number of times before other code instructions. In this case loops are fantastic.
Imagine that you have an online shopping cart with 5 items and you want to know the total price. The website could use a loop to iterate over your items and add the price of each one to get the total price.
There are two types of loops that you will be using: `while` loops and `for` loops.
`while` loops are used when you want to execute code as long as a condition is true.
`for` loops are used when you want to execute code for a specific number of times.
Let's see some examples.

#### While loops

`while` loops have a similar structure to `if` statements. They start with the keyword `while`, then evaluate a condition and if it is true execute the code block below. Otherwise, they stop looping and continue with the following instructions after their code block.

In [736]:
counter = 0

while counter != 5:
    counter += 1

print(counter)

5


In [737]:
counter = 0
while counter !=3:
    counter +=1
    print(counter)

print(counter)

1
2
3
3


Be careful not to create **infinite loops**.
Yes, you could create infinite loops that could crash your computer 😱
This would be the case if you would start the counter with the value `6` or higher, because in every loop the counter would continue growing indefinitely (`6, 7, 8, 1000...`) as all the following numbers are always different to `5`.
In this case, you could avoid this problem by changing the comparison operator from `counter != 5` to `counter < 5`, as the comparison operator will return `False` in all cases that the counter is greater than `5`.
Infinite loops can be used sometimes, but if not used carefully they can run longer than expected and crash your computer once it runs out off memory (`OOM`).
Sometimes, you will see `while` loops, so it is good that you know them, but we recommend you to avoid them and always use `for` loops.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Change the comparison operator from *not equal* to *less than*.
3. Run the cell to print the results.

In [738]:
counter = 6

while counter < 5:
    counter += 1

print(counter)

6


#### For loops

Differently to `while` loops, `for` loops only are executed a pre-defined number of times, so they are *safer*. More concretely, you can use *for* loops to process elements of **iterable** objects. These objects can be lists, numeric arrays or rows in tables. Let's start with a simple case:

In [739]:
fruits_basket = ["banana", "apple", "orange"]

for fruit in fruits_basket:
    print(fruit)

banana
apple
orange


In the code above, you **iterate** over the **elements** of a list.
In each loop, you assign the value of an element in the list to the **temporary** variable `fruit`.
Notice that the iteration keeps the order in which the elements in the list are defined.
You can see this order more explicitly with the built-in function `enumerate()`.
This function iterates over your list (or any other iterable object) and returns an index and the value of the element with that index. An example will clarify this:

In [740]:
fruits_basket = ["banana", "apple", "orange"]

for index, value in enumerate(fruits_basket):
    print(index, value)

0 banana
1 apple
2 orange


Python has **zero-based indexing**, so every time it counts or *enumerates*, it starts with `0`.
Another way of using for loops is with ranges. For example, `range(3)` will return three numbers, 0, 1 and 2.

In [741]:
for number in range(3):
    print(number)

0
1
2


Additionally, you can interrupt loops (and *if* statements) with the commands `break` and `continue`.
`break` stops the loop completely, and `continue` immediately starts the next iteration, skipping any remaining lines in the code block.
You can observe both behaviors in the following example.

In [742]:
for number in range(10):
    even_number = (number % 2) == 0
    if even_number:
        continue
    elif number > 5:
        break
    else:
        print(number)

1
3
5


Let's unpack the code above. First, `even_number` evaluates to `True` when `number` is an even number. We set the function for an even number as follows: `(number % 2) == 0`, meaning that the number divided by 2 has a reminder of 0. In this case, the parentheses have no meaning to Python as there is no name written before them, they are being used simply to make the code more explicit and easier to read by people.

**Remember:** Writing shorter code makes it more elegant, but clear communication is more important than elegance.
Remember our golden rule: always aim to write code that is easy to read by other people.

Next, the *if* statement checks whether the number is even, and skips the rest of the code block if this is true.
However, it doesn't stop the *for* loop, but only jumps to the following iteration.
On the other hand, the *for* loop stops when the `elif` evaluates to `True`, as its code block has the `break` instruction.
Finally, if none of the previous evaluations is true, the `else` code block is executed.
So, you can describe the code above in plain English as follows:
- Starting with 0 and ending with 9, print the odd integers smaller than 5 😊

#### Exercise:
1. In the cell below, write a for loop that:
  - Prints the even integers between -3 and 3.
  - Prints "Odd" instead of the odd numbers.
2. Run the cell to see the results.

In [743]:
counter = -3

while counter <3:
    if counter %2==0:
        print (counter)
    else:
        print ("Odd")
    counter +=1


Odd
-2
Odd
0
Odd
2


### 2.4. Functions

You have already used some functions, like `print()` or `range()`, but what about creating your own ones? In the following example, we have a function that can take one or two numbers in. When it takes one number, it multiplies times 2 and returns the result. When it takes two numbers, it multiplies them together and returns the result. Below the function definition, the function is called two times, with different parameters each time, and then the results are printed. Analyze the differences in the printed results before you continue.

In [744]:
def multiply(number, multiplier=2):
    result = number * multiplier
    return result, multiplier

result, multiplier = multiply(3)
print(f"Result: {result}, Multiplier: {multiplier}")

result, multiplier = multiply(3, 4)
print(f"Result: {result}, Multiplier: {multiplier}")

Result: 6, Multiplier: 2
Result: 12, Multiplier: 4


Here you have a recipe to create your own functions:

1. The keyword to indicate Python that you are creating, or **defining** a function is `def`.
2. Then you include an empty space followed by the name that you want to give to your function (follow the same rules we learned for naming variables), and then include a pair of parentheses `()`. In our example, these elements are `def multiply()`.
3. The parentheses can be empty in between, or you can have one or multiple **arguments** if the function needs external information to do its job. For example `def multiply(number)`.
4. Furthermore, you can assign **default** values to the arguments, that will be used unless you override them when you call the function. For example `def multiply(multiplier=2)`.
5. Then, like in conditionals and loops, conclude the line by adding a colon (`:`).
6. Below, also add four spaces before each line to indicate the function scope, and write down the computations that you want the function to perform. In our example, these elements are:
```
def multiply(number, multiplier=2):
        result = number * multiplier
```
7. Finally, you can return one or multiple values with the keyword `return`, but this is optional. If you want to return multiple values, write them after the keyword `return` separating each value with a comma. The function in our example `return result, multiplier`, returns a tuple with two values, `result` and `multiplier`.

As a side note, the variables that you indicate when you define a function are called **arguments** (`def multiply(number, multiplier):`), and the values that you pass when you call it are called **parameters** (`multiply(3, 4)`).
Now you can **call** your function in the same way we have called other functions like `range()`, and store its result in a variable.

#### Exercise:
1. Write a function that:
  - Takes three numbers as an argument.
  - Adds the first two numbers and assigns this value to an intermediate variable.
  - Then, multiplies the intermediate variable times the third argument, and assigns this value to a final variable.
  - Finally, it returns both, the intermediate and the final variables.
2. Call the function and assign the returned values to two variables.
3. Print the value of both variables.
3. Run the cell to see the results.

In [745]:
def abrakadabra (number_one,number_two,number_three):
    temp= number_one + number_two
    last_one= temp * number_three

    return temp, last_one


result_one,result_two = abrakadabra (1,2,3)
print(result_one)
print(result_two)




3
9


# Introduction to Coding for AI

## 3. Data Structures and Handling Errors

### 3.1. Methods for data types

You may remember that in Notebook 1, we’ve already seen the basic data types string (`str`) for text, integer (`int`) and float (`float`) for numbers, and boolean (`bool`) for logical values. Strings are a particularly flexible data type that can be converted into other types, so the next step is to learn in more detail the **methods** that we can use with them. If you need a refresher, you can find an introduction to methods in our first notebook.

#### Strings

Let’s retake our first data type: strings. There are many operations that you can do with them, so we’ll start by introducing some of the most common methods. First, we’ll check out methods that don’t require arguments, then those that do, and finally methods that involve lists. Let’s go!

#### String methods that don’t require arguments

We’ll start by looking at the string methods that need arguments, meaning no extra information is needed except the variable itself. These methods are:

- `.lower()`: Converts a string into lower case.
- `.upper()`: Converts a string into upper case.
- `.title()`: Converts the first character of each word to upper case.
- `.strip()`: Trimms white spaces at both ends of a string.

Below is an example that will show you how each of these methods work:

In [746]:
string_variable = "   Apples and oranges   "

print(f"string_variable:         {string_variable}")
print(f"string_variable.lower(): {string_variable.lower()}")
print(f"string_variable.upper(): {string_variable.upper()}")
print(f"string_variable.strip(): {string_variable.strip()}")

print(f"string_variable.strip().upper(): {string_variable.strip().upper()}")

string_variable:            Apples and oranges   
string_variable.lower():    apples and oranges   
string_variable.upper():    APPLES AND ORANGES   
string_variable.strip(): Apples and oranges
string_variable.strip().upper(): APPLES AND ORANGES


In [747]:
name = "Erica"
print('Hello ' + name + " how are u?")
print('Hello {name} how are u?')


Hello Erica how are u?
Hello {name} how are u?


Tip: you can chain methods like this: `string_variable.strip().upper()`, just as we did for the last code line in the example above. 😉

Quite straightforward, isn’t it? Now, have a go at it yourself in the exercise below.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Similarly to the way we are testing the method `.upper()`, add a line to test the method `.title()`.
3. Run the cell to print the results.

In [748]:
string_variable = "   Apples and oranges   "

print(f"string_variable:         {string_variable}")
print(f"string_variable.lower(): {string_variable.lower()}")
print(f"string_variable.upper(): {string_variable.upper()}")
print(f"string_variable.titel(): {string_variable.upper()}")
print(f"string_variable.strip(): {string_variable.strip()}")

print(f"string_variable.strip().upper(): {string_variable.strip().upper()}")

string_variable:            Apples and oranges   
string_variable.lower():    apples and oranges   
string_variable.upper():    APPLES AND ORANGES   
string_variable.titel():    APPLES AND ORANGES   
string_variable.strip(): Apples and oranges
string_variable.strip().upper(): APPLES AND ORANGES


#### String methods that require arguments

Now let’s take a look at string methods that require arguments, meaning external information is needed:

- `.count()`: Returns the number of times a specified value occurs in a string.
- `.index()`: Searches the string for a specified value and returns the index of the **first** position where it was found.
- `.replace()`: Returns a string where a specified value is replaced with a specified value.
- `.zfill()`: Fills the string with zeroes on the left to reach the specified total length. This is useful to standardize file names. For example, sometimes it helps to rename a list of files from (`image_9.png`, `image_10.png`, ..., `image_123.png`), to (`image_009.png`, `image_010.png`, ..., `image_123.png`).

Remember that Python is case-sensitive, so all methods treat differently `a` and `A`.
That’s what we’ll be checking out in the example below:

In [749]:
characters_string = "Apples and oranges"

print(f"\n characters_string: \n {characters_string}")
print(f"\n characters_string.count('a'): \n {characters_string.count('a')}")
print(f"\n characters_string.index('a'): \n {characters_string.index('a')}")
print(f"\n characters_string.replace('a', 'A'): \n {characters_string.replace('a', 'A')}")

numbers_string = "123"

print(f"\n numbers_string: \n {numbers_string}")
print(f'\n numbers_string.zfill(5): \n {numbers_string.zfill(5)}')


 characters_string: 
 Apples and oranges

 characters_string.count('a'): 
 2

 characters_string.index('a'): 
 7

 characters_string.replace('a', 'A'): 
 Apples And orAnges

 numbers_string: 
 123

 numbers_string.zfill(5): 
 00123


#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the variable `characters_string` with a sentence that includes multiple times the name of a number. For example `five`.
3. Replace the parameter `a` with the number name that appears multiple times in your sentence.
4. Replace the parameter 'A' with the number version of the number name that you use in point 2. For example `5`.
2. Replace `numbers_string` with three letters and use  `.zfill()` to add three zeros on their left.
3. Run the cell to print the results.

In [750]:
characters_string = "five"*10

print(f"\n characters_string: \n {characters_string}")
print(f"\n characters_string.count('e'): \n {characters_string.count('e')}")
print(f"\n characters_string.index('e'): \n {characters_string.index('e')}")
print(f"\n characters_string.replace('e', '5'): \n {characters_string.replace('e', '5')}")

numbers_string = "abc"

print(f"\n numbers_string: \n {numbers_string}")
print(f'\n numbers_string.zfill(6): \n {numbers_string.zfill(6)}')


 characters_string: 
 fivefivefivefivefivefivefivefivefivefive

 characters_string.count('e'): 
 10

 characters_string.index('e'): 
 3

 characters_string.replace('e', '5'): 
 fiv5fiv5fiv5fiv5fiv5fiv5fiv5fiv5fiv5fiv5

 numbers_string: 
 abc

 numbers_string.zfill(6): 
 000abc


#### String methods that involve *lists*

Lastly, there are a couple of methods for strings that either produce a list of strings or require a list of strings as input. These are

- `.split()`: Splits the string at the specified separator, and returns a list. The default separator is a white space.
- `.join()`: Uses the string object to join the elements of the iterable used as input.

Let's see an example of eahc method.

In [751]:
print(f'\n"A basket with mangos".split(): \n{"A basket with mangos".split(" ")}')
print(f'\n" ".join(["A", "basket", "with", "mangos"]): \n{" ".join(["A", "basket", "with", "mangos"])}')


"A basket with mangos".split(): 
['A', 'basket', 'with', 'mangos']

" ".join(["A", "basket", "with", "mangos"]): 
A basket with mangos


#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace `"A basket with mangos"` with a comma-separated list of fruits and split it on the commas instead of the spaces.
3. Join again the new list of fruits, but with underscores (`_`) between each word.
4. Run the cell to print the results.

In [752]:
print(f'\n"Mango,Apple,Kiwi".split(","): {"Mango,Apple,Kiwi".split(",")}')
print(f'\n" ".join(["Mango", "Apple", "Kiwi"]): {"_".join(["Mango", "Apple", "Kiwi"])}')


"Mango,Apple,Kiwi".split(","): ['Mango', 'Apple', 'Kiwi']

" ".join(["Mango", "Apple", "Kiwi"]): Mango_Apple_Kiwi


#### Escape characters

Also you can see we have introduced a new funny-looking command: `\n`.
Commands that start with a backslash `\` are called **escape characters**, and this one, in particular, is called *new line*. In text documents, `\n` indicates the computer that it should break a text line right there, and start a new line below. Here are other common escape characters:

- `\'` and `\"`: Single and double quotes. This is helpful when you need to add quotes inside quotes. For example, to print the text *Let's write "quotes"!* you can use the following string to avoid confusing Python: `print("Let's write \"quotes\"!")`.
- `\\`: Backslash. Useful when you actually want to print a backslash.
- `\n`: New line. Creates a new line in the text.
- `\t`: Tab. Aligns text with tab space that has the same length as eight spaces. It can help to make your prints prettier when multiple lines have different lengths, like tables.

In [753]:
print("No tab.")
print("1\t Tab with 1 leading character.")
print("123\t Tab with 3 leading characters.")
print("12345\t Tab with 5 leading characters.")
print("1234567\t Tab with 7 leading characters.")
print("12345678\t Tab with 8 leading characters.")

print("\nPretty table:")
print("| 1\t| 12\t| 123\t|")
print("| 12\t| 123\t| 1234\t|")
print("| 123\t| 1234\t| 12345\t|")

No tab.
1	 Tab with 1 leading character.
123	 Tab with 3 leading characters.
12345	 Tab with 5 leading characters.
1234567	 Tab with 7 leading characters.
12345678	 Tab with 8 leading characters.

Pretty table:
| 1	| 12	| 123	|
| 12	| 123	| 1234	|
| 123	| 1234	| 12345	|


#### Built-in functions for multiple data types

#### len()

Another built-in method that you will be using often is `len()`, as frequently you'll want to know how long is a string of how many elements you have in a list.
It simply returns the length of the **iterable** object that you pass as an argument, so it works with strings, but also with lists, tuples, and other objects that you will see later on.

In [754]:
list_variable = ["Apples", "and", "oranges"]
print(f"\n Length of: \n {list_variable} \n is: {len(list_variable)}")

string_variable = "Apples and oranges"
print(f"\n Length of: \n {string_variable} \n is: {len(string_variable)}")


 Length of: 
 ['Apples', 'and', 'oranges'] 
 is: 3

 Length of: 
 Apples and oranges 
 is: 18


#### Casting

Data is not just numbers. Data can be words, dates, or even numbers that are stored as text or strings.
Can you see the issue here? You can't make arithmetic operations with text! But no worries, with Python you can solve the problem by **casting** each value into a number.

Let's see how in the example below:

In [755]:
integer_variable = int("-5")
float_variable = float("3.14")
boolean_variable = bool("True")

print(f"Value: {integer_variable}, Type: {type(integer_variable)}")
print(f"Value: {float_variable}, Type: {type(float_variable)}")
print(f"Value: {boolean_variable}, Type: {type(boolean_variable)}")

Value: -5, Type: <class 'int'>
Value: 3.14, Type: <class 'float'>
Value: True, Type: <class 'bool'>


Actually, you can perform two arithmetic operations with strings, but they behave differently than numbers.
You can add, or concatenate, **two strings** with a `+`, meaning that Python places one after the other,
and you can multiply **a string** and **an integer** with a `*`, meaning that `"A" * 3` returns `AAA`.

In [756]:
number_1 = "1"
number_2 = "2"

print(number_1 + number_2)

12


So, to treat our variables as numbers we have to cast them as follows:

In [757]:
number_1 = int("1")
number_2 = int("2")

print(number_1 + number_2)
print(number_1 / number_2)

number_1 = float("1")
number_2 = float("2")

print(number_1 + number_2)
print(number_1 / number_2)

3
0.5
3.0
0.5


Notice that when you perform addition, substraction or multiplicaiton with integers, the result is an integer, but when you perform division, the result is automatically converted to a float. Otherwise, `1 / 2` would return `0` instead of `0.5`, because integers only take the integer part of numbers. Convenient, isn’t it?

#### Exercise:

1. Copy the code of the cell above into the cell below.
2. Cast the result of the four arithmetic operations into integers.
3. Run the cell to print and compare the results with the cell above.

In [758]:
number_1 = int("1")
number_2 = int("2")

result1 = number_1 + number_2
print(result1)


result2 = number_1 / number_2
result2 = int(result2)
print(result2)

number_1 = float("1")
number_2 = float("2")

result3 = number_1 + number_2
result3= int(result3)
print(result3)

result4 = number_1 / number_2
print(int(result4))

3
0
3
0


#### Numbers

Being able to easily manipulate numbers is very useful, especially with larger datasets. Python comes with a bunch of handy built-in functions for handling numbers. Let’s see them in more detail:

- `max()`: Returns the largest number in an iterator.
- `min()`: Returns the smallest number in an iterator.
- `sum()`: Sums all the numbers in an iterator.
- `abs()`: Returns the absolute value of a single number (meaning that it ignores negative signs, so `abs(-3.14)` returns `3.14`).
- `round(number, n_digits=None)`: Rounds a single number. It's worth noting that you can input two paameters to this funciton, `number` and `n_digits`. The value of `number` is rounded to the closest multiple of 10 to the power minus `n_digits`. Additionaly, if two multiples are equally close to `number`, the rounding is done toward the even choice. So, for example, as `n_digits`is `None` by default, `round(1.5)` rounds up and returns `2`, and `round(2.5)` rounds down and returns `2` as well!

Take a look at the examples.

In [759]:
data = [-10, -5.5, 0, 5.5, 10]
print(f"data = {data}\n")

print(f"max(   data    ) returns: {max(data)}")
print(f"min(   data    ) returns: {min(data)}")
print(f"sum(   data    ) returns: {sum(data)}")
print(f"abs(   data[1] ) returns: {abs(data[1])}")
print(f"round( data[1] ) returns: {round(data[1])}")

data = [-10, -5.5, 0, 5.5, 10]

max(   data    ) returns: 10
min(   data    ) returns: -10
sum(   data    ) returns: 0.0
abs(   data[1] ) returns: 5.5
round( data[1] ) returns: -6


#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the list of numbers with a list of ten negative floats.
3. In the methods `abs()` and `round()` we are passing the second element of the list `data`. Replace this input with the last element in the list `data`.
4. Run the cell to print the results.

In [760]:
data = [-12, -3, -4, -5.5,-6,-7,-8,-9,-11,-14]
print(f"data = {data}\n")

print(f"max(   data    ) returns: {max(data)}")
print(f"min(   data    ) returns: {min(data)}")
print(f"sum(   data    ) returns: {sum(data)}")
print(f"abs(   data[9] ) returns: {abs(data[9])}")
print(f"round( data[9] ) returns: {round(data[9])}")

data = [-12, -3, -4, -5.5, -6, -7, -8, -9, -11, -14]

max(   data    ) returns: -3
min(   data    ) returns: -14
sum(   data    ) returns: -79.5
abs(   data[9] ) returns: 14
round( data[9] ) returns: -14


### 3.2. Methods for data structures

The next topics that we’ll revisit in more detail are sequences and mappings; particularly lists and dictionaries.


#### Lists

We have seen that lists are sequences of strings, numbers, or a mixture of various data types.
Now you will learn how to manipulate them.
These are some of the methods that you will be using more often with lists.

#### List methods that return a value

- `.index()`: Returns the index of the **first** element with the specified value.
- `.count()`: Returns the number of elements with the specified value.

Here are a couple of examples:

In [761]:
cities_to_visit = ["Berlin", "Madrid", "Paris", "Rome", "Berlin"]
print(f"cities_to_visit = {cities_to_visit}\n")

print(f"cities_to_visit.index('Berlin') returns: {cities_to_visit.index('Berlin')}")
print(f"cities_to_visit.count('Berlin') returns: {cities_to_visit.count('Berlin')}")

cities_to_visit = ['Berlin', 'Madrid', 'Paris', 'Rome', 'Berlin']

cities_to_visit.index('Berlin') returns: 0
cities_to_visit.count('Berlin') returns: 2


#### List methods that work **in-place**
(we'll explain in a moment)

- `.sort()`: Sorts the list.
- `.reverse()`: Reverses the order of the list.
- `.insert()`: Adds an element at the specified position.
- `.remove()`: Removes an element at the specified position.
- `.extend()`: Add the elements of a list (or any iterable), to the end of the current list.
- `.append()`: Adds an element at the end of the list.

Methods that work *in-place* return `None` so, if you pass them as an argument to a `print()` funtion, you will always print `None`. If you want to visualize their results, execute them in a line and print the variable in the following line. Some examples:

In [762]:
cities_to_visit = ["Berlin", "Madrid", "Paris", "Berlin"]
print(f"cities_to_visit = {cities_to_visit}\n")

cities_to_visit.sort()
print(f"cities_to_visit.sort()              returns: {cities_to_visit}")

cities_to_visit.reverse()
print(f"cities_to_visit.reverse()           returns: {cities_to_visit}")

cities_to_visit.insert(1, 'Athene')
print(f"cities_to_visit.insert(1, 'Athene') returns: {cities_to_visit}")

cities_to_visit.remove('Berlin')
print(f"cities_to_visit.remove('Berlin')    returns: {cities_to_visit}")

cities_to_visit.extend(['Rome'])
print(f"cities_to_visit.extend(['Rome'])    returns: {cities_to_visit}")

cities_to_visit.append(['Rome'])
print(f"cities_to_visit.append(['Rome'])    returns: {cities_to_visit}")

cities_to_visit = ['Berlin', 'Madrid', 'Paris', 'Berlin']

cities_to_visit.sort()              returns: ['Berlin', 'Berlin', 'Madrid', 'Paris']
cities_to_visit.reverse()           returns: ['Paris', 'Madrid', 'Berlin', 'Berlin']
cities_to_visit.insert(1, 'Athene') returns: ['Paris', 'Athene', 'Madrid', 'Berlin', 'Berlin']
cities_to_visit.remove('Berlin')    returns: ['Paris', 'Athene', 'Madrid', 'Berlin']
cities_to_visit.extend(['Rome'])    returns: ['Paris', 'Athene', 'Madrid', 'Berlin', 'Rome']
cities_to_visit.append(['Rome'])    returns: ['Paris', 'Athene', 'Madrid', 'Berlin', 'Rome', ['Rome']]


Notice that `data.extend(['d'])` merges both lists, and `data.append(['d'])` concatenates them, so you have a list with another list inside.
If you pass a string instead of a list as the argument, then the effect is the same:
`data.extend(['d'])` and `data.append('d')` produce the same result.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the list of cities with the names of famous artists.
3. Adapt the code so that all methods run without producing errors.
4. Run the cell to print the results.

In [763]:
artists_to_listen = ["Pink", "Shakira", "Lady Gaga", "Bowie"]
print(f"artists_to_listen = {artists_to_listen}\n")

artists_to_listen.sort()
print(f"artists_to_listen.sort()              returns: {artists_to_listen}")

artists_to_listen.reverse()
print(f"artists_to_listen.reverse()           returns: {artists_to_listen}")

artists_to_listen.insert(0,'Pitbull')
print(f"artists_to_listen.insert(0, 'Pitbull') returns: {artists_to_listen}")

artists_to_listen.remove('Pink')
print(f"artists_to_listen.remove('Pink')    returns: {artists_to_listen}")

artists_to_listen.extend(['Dua Lipa'])
print(f"artists_to_listen.extend(['Dua Lipa'])    returns: {artists_to_listen}")

artists_to_listen.append(['Christina'])
print(f"artists_to_listen.append(['Christina'])    returns: {artists_to_listen}")

artists_to_listen = ['Pink', 'Shakira', 'Lady Gaga', 'Bowie']

artists_to_listen.sort()              returns: ['Bowie', 'Lady Gaga', 'Pink', 'Shakira']
artists_to_listen.reverse()           returns: ['Shakira', 'Pink', 'Lady Gaga', 'Bowie']
artists_to_listen.insert(0, 'Pitbull') returns: ['Pitbull', 'Shakira', 'Pink', 'Lady Gaga', 'Bowie']
artists_to_listen.remove('Pink')    returns: ['Pitbull', 'Shakira', 'Lady Gaga', 'Bowie']
artists_to_listen.extend(['Dua Lipa'])    returns: ['Pitbull', 'Shakira', 'Lady Gaga', 'Bowie', 'Dua Lipa']
artists_to_listen.append(['Christina'])    returns: ['Pitbull', 'Shakira', 'Lady Gaga', 'Bowie', 'Dua Lipa', ['Christina']]


#### Indexing

Elements inside lists are ordered, so when you want to retrieve the value of an element, you use its index.
The index is an **integer** passed to the variable as an argument, like with functions, but instead of using parentheses, we use a pair of brackets `[]`.
Remember that Python has **zero-based** indexing, so `data[0]` returns the first element of data.
If you use negative integers, you start counting from the end to the beginning, so `data[-1]` returns the last element.

In [764]:
cities_to_visit = ["Madrid", "Paris", "Berlin", "Rome", "Athene"]
print(f"cities_to_visit = {cities_to_visit}\n")

print(f"cities_to_visit[0]  gives us the FIRST       element: {cities_to_visit[0]}")
print(f"cities_to_visit[1]  gives us the SECOND      element: {cities_to_visit[1]}")
print(f"cities_to_visit[-2] gives us the SECOND LAST element: {cities_to_visit[-2]}")
print(f"cities_to_visit[-1] gives us the LAST        element: {cities_to_visit[-1]}")

cities_to_visit = ['Madrid', 'Paris', 'Berlin', 'Rome', 'Athene']

cities_to_visit[0]  gives us the FIRST       element: Madrid
cities_to_visit[1]  gives us the SECOND      element: Paris
cities_to_visit[-2] gives us the SECOND LAST element: Rome
cities_to_visit[-1] gives us the LAST        element: Athene


You can also use indexing to modify the value of an element:

In [765]:
cities_to_visit = ["Madrid", "Paris", "Berlin", "Rome", "Athene"]
print(cities_to_visit)

cities_to_visit[0] = "Lisbon"
print(cities_to_visit)

['Madrid', 'Paris', 'Berlin', 'Rome', 'Athene']
['Lisbon', 'Paris', 'Berlin', 'Rome', 'Athene']


#### Slicing

With slicing, instead of indicating a single value with an index, you indicate a range of values with a `start` and an `end` **integer** separated by a **colon** (`:`), such as `data[start:end]`.
The returned elements go from `start` up to, **but not including**, `end`.
See the examples below:

In [766]:
cities_to_visit = ["Madrid", "Paris", "Berlin", "Rome", "Athene"]

print(f"cities_to_visit = {cities_to_visit}\n")
print(f"cities_to_visit[0:2]   returns: {cities_to_visit[0:2]}")
print(f"cities_to_visit[:2]    returns: {cities_to_visit[:2]}")
print(f"cities_to_visit[1:3]   returns: {cities_to_visit[1:3]}")
print(f"cities_to_visit[-3:-1] returns: {cities_to_visit[-3:-1]}")
print(f"cities_to_visit[-3:]   returns: {cities_to_visit[-3:]}")
print(f"cities_to_visit[:]     returns: {cities_to_visit[:]}")

cities_to_visit = ['Madrid', 'Paris', 'Berlin', 'Rome', 'Athene']

cities_to_visit[0:2]   returns: ['Madrid', 'Paris']
cities_to_visit[:2]    returns: ['Madrid', 'Paris']
cities_to_visit[1:3]   returns: ['Paris', 'Berlin']
cities_to_visit[-3:-1] returns: ['Berlin', 'Rome']
cities_to_visit[-3:]   returns: ['Berlin', 'Rome', 'Athene']
cities_to_visit[:]     returns: ['Madrid', 'Paris', 'Berlin', 'Rome', 'Athene']


As we mentioned at the beginning, **tuples** have basically the same functionality as lists, so *indexing* and *slicing* work the same on tuples as on lists.

As a reminder, the difference between them is that once you define a tuple you can not modify it, a characteristic called **immutability**.
It's good that you know tuples, as you will see them every now and there, but you will only be using their most basic functionality, so for now, just think of tuples as immutable lists.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the list of cities with names of famous sports people.
3. Change all of the indices to compare the outputs.
4. Run the cell to print the results.

In [767]:
sports_people = ["Federer", "Williams", "Ronaldo", "Jordan", "Nadal"]

print(f"sports_people = {sports_people}\n")
print(f"sports_people[0:2]   returns: {sports_people[0:2]}")
print(f"sports_people[:2]    returns: {sports_people[:2]}")
print(f"sports_people[1:3]   returns: {sports_people[1:3]}")
print(f"sports_people[-3:-1] returns: {sports_people[-3:-1]}")
print(f"sports_people[-3:]   returns: {sports_people[-3:]}")
print(f"sports_people[:]     returns: {sports_people[:]}")

sports_people = ['Federer', 'Williams', 'Ronaldo', 'Jordan', 'Nadal']

sports_people[0:2]   returns: ['Federer', 'Williams']
sports_people[:2]    returns: ['Federer', 'Williams']
sports_people[1:3]   returns: ['Williams', 'Ronaldo']
sports_people[-3:-1] returns: ['Ronaldo', 'Jordan']
sports_people[-3:]   returns: ['Ronaldo', 'Jordan', 'Nadal']
sports_people[:]     returns: ['Federer', 'Williams', 'Ronaldo', 'Jordan', 'Nadal']


#### Dictionaries

Now it’s time for us to discover more about dictionaries!
As we’ve seen in Notebook 1, dictionaries are a mapping data type.
These are data structures that give a name to each value, so you can indicate the value that you want with a string instead of an index.
The names are called **keys** and the values, you guessed, are called **values**.
So, you could build an English language dictionary by using the words as keys, and the definitions as the values.
For our purposes, you will only be using strings to define keys, and you can use any other object to define values.
Something very important to remember is that keys are unique, while the same value can be used for multiple keys.
For example, `"improve": "to become better"` and `"advance": "to become better"`, have the same value (`to become better`), but unique keys (`improve` and `advance`).

How do we define an instance of a dictionary to Python?
The syntax to define a dictionary is by using a pair of curly braces (`{}`), and inside we can include **key-value pairs** separating each with a comma.
Then, instead of defining the value of a key with an equal sign (`=`), we use a colon (`:`).
For example:
```
dictionary_of_words = {"improve": "to become better", "advance": "to become better"}
```

Another note on syntax: white spaces between enclosing symbols are ignored by Python.
So, as long as you are writing inside a pair of parentheses `()`, square brackets `[]`, or curly braces `{}`, you can split a line of code into multiple lines and Python will still understand your instruction.
As an example, you can also split the previous definition

```
dictionary_of_words = {"improve": "to become better", "advance": "to become better"}
```
into multiple lines to make the code easier to read:
```
dictionary_of_words = {
    "improve": "to become better",
    "advance": "to become better"
    }
```

Let's see some more examples.

In [768]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2"
}

def findKeyByValue(object_to_search, value):
     for key in object_to_search.keys():
            if dictionary_variable[key] == value:
                return key


print(dictionary_variable)
print(dictionary_variable.keys())
print(dictionary_variable.values())
print(dictionary_variable["key_2"])

key_we_want = findKeyByValue(dictionary_variable, "value_2")
print(key_we_want)



{'key_1': 'value_1', 'key_2': 'value_2'}
dict_keys(['key_1', 'key_2'])
dict_values(['value_1', 'value_2'])
value_2
key_2


Notice that you can get a list of keys with the `.keys()` method, and a list of values with the `.values()` method.
To get the value of a key you use a notation similar to lists:

In [769]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2"
}

print(dictionary_variable["key_1"])

value_1


To create a new key-value pair or to update the value of a key you use the same notation:

In [770]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2"
}

# Update a value:
dictionary_variable["key_1"] = 123

# Create new key-value pair:
dictionary_variable["key_3"] = 456

print(dictionary_variable)

{'key_1': 123, 'key_2': 'value_2', 'key_3': 456}


#### Exercise:
1. Create a new dictionary with three keys, each with the name of a country.
2. Assign a number to each key to represent the population of the country.
3. In a new line, update the population of a country with a new number.
4. In a new line, add a fourth country and its corresponding population.
5. Run the cell to print the results.

- **Going further**: Google how to delete elements of Python dictionaries. In a new line, delete the country with the largest population.

In [771]:
country_dictionary = {
    "albania": "3",
    "austria" :"9",
    "Germany" : "38"
}

#Update the value:
country_dictionary["albania"] = 2.8
#Add new country:
country_dictionary["America"]=331
print (country_dictionary)

{'albania': 2.8, 'austria': '9', 'Germany': '38', 'America': 331}


Be careful to pass an existing key to the dictionary when you read it, such ass `print(dictionary_variable["key_1"])` in the example above, otherwise, you will get an error and **crash** your program.
For example, `print(dictionary_variable["key-1"])` would raise an error as `"key-1"` doesn't exist in the dictionary.
You can try it out! Uncomment the last line in the following cell and run it to see the error.
Once you read it, comment out the last line again so that we can run all cells without the notebook stopping in this line (this is only necessary to help the tutors to evaluate your notebooks faster).

In [772]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2"
}

#Uncoment, run the cell, and comment back the following line:
print(dictionary_variable["key_1"])

value_1


### 3.3. Handling errors

This bring us to the following topic: **error handling**.
Errors can be frustrating when coding, but if you take a breath and learn to handle them, you'll see that error messages are not your enemies 😉. In fact, their purpose is to help you fix errors in your code.
Python has a tool called **try-except** that allows you to continue running your program even if something unexpected goes wrong. *Try-except* statements can have the following elements:

- `try`: Lets you test a block of code for errors.
- `except`: Lets you handle the error.
- `else`: Lets you execute code when there is no error.
- `finally`: Lets you execute code, regardless of the result of the try-except blocks.

Let's take a look at a simple case using the try-except statement in the example below.

In [773]:
dictionary_variable = {
    "key_1": "value_1",
    "key_2": "value_2",
    "key_3": "value_3",
}

try:
    print(dictionary_variable["key-1"])
except Exception as e:
    print(f"An error occurred: {e}")

An error occurred: 'key-1'


Particulary, take a look at the line `except Exception as e:`. The word `Exception` is telling Python to capture the part of the code that causes an error and store it in the variable `e`.
Before we mentioned that variables should be explicit. In this case, we use the name `e` as it is a convention among Python developers to name this *Exception* variable *e*. So, when you find it in code later on, now you know what it stands for.
Now that we've seen how it works, let's elaborate further and include additional blocks:

In [774]:
try:
    dictionary_variable = {
        "key_1": "value_1",
        "key_2": "value_2",
        "key_3": "value_3",
    }
    key_name = "key-1"
    print(dictionary_variable[key_name])
except Exception as e:
    print(f"An error occurred: {e}")
else:
    print("Code run correctly")
finally:
    print("The try-except has concluded")

An error occurred: 'key-1'
The try-except has concluded


Now you’ve seen how the try-except statement can help you handling errors in Python. Ready to give it a try yourself in the exercise below?

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Fix the typo in the `key_name`.
3. Run the cell to see the results.

- **Going further**: Change the data type of the dictionary **keys** to a number, a tuple, and a list and see which one raises an error. Try to understand the error message and how it tries to point you to the fault in the code.

In [775]:
try:
    dictionary_variable = {
        "key_1": 2,
        "key_2": (3,5),
        "key_3": [1,2,3,4],
    }
    key_name = "key_1"
    print(dictionary_variable[key_name])
except Exception as e:
    print(f"An error occurred: {e}")
else:
    print("Code run correctly")
finally:
    print("The try-except has concluded")

2
Code run correctly
The try-except has concluded


And with this, you arrived at the end of notebook 3. You’ve been learning a lot, congratulations! In the next notebook, we’ll look at how to create basic class and module structures. The juicy part is coming!

# Introduction to Coding for AI

## 4. Custom classes and modules

### 4.1. Classes

Earlier we mentioned that Python is an object-oriented programming language.
Meaning that almost everything in Python is an object, with its properties and methods.
A **Class** is like a *blueprint* for creating objects of a certain kind, providing us the properties and methods.
The **properties are the variables** inside the class, and the **methods are the functions** inside the class.
The basic class structure that you will use in Machine Learning (ML) has the following components:

In [776]:
class Person():

    def __init__(self, name, age,):
        self.name = name
        self.age = age
        print("Class initialized.")

    def set_name(self, name):
        self.name = name
        self.print_new_name()

    def get_name(self):
        return self.name

    def print_new_name(self):
        print(f"New name set: {self.name}")

# Create an instance of the class Person():
person_instance = Person("John", 30)

# Print its property name:
print(person_instance.name)
print(person_instance.get_name())

# Change its property name:
person_instance.name = "Peter"
person_instance.set_name("Peter")

# Print its property name:
print(person_instance.name)

Class initialized.
John
John
New name set: Peter
Peter


In [777]:
class Car():
    def __init__(self,name):
        self.name = name

    def open (self):
        print(f"Car {self.name} is open")

car_instance = Car("bmw")

car_instance.open()

Car bmw is open


It may look a bit complicated but don’t worry, we’ll unpack it, bit by bit. Let’s start by looking at the three main components of a class code:

**class Name()**

- Similarly to functions, classes are defined with a keyword. In this case it's `class`.
- Afterwards, you write the class name preferably using *CamelCase* notation. This means that, differently to functions, you capitalize the first letter of each word and you don't use underscores to separate words. For example, `Person` and `PersonData` are correct according to [Python style guide](https://peps.python.org/pep-0008). You can find out more about popular notation styles in this [article](https://betterprogramming.pub/string-case-styles-camel-pascal-snake-and-kebab-case-981407998841).
- Finally, you add a pair of parentheses (`()`) and a colon (`:`) to end the line and begin a code block below.

**\_\_init\_\_( )** to initialize your class

- Functions inside classes are called methods. In this example, you see that a method named `__init__()` is being defined inside the class, and it stands for *initialize*.
```
    def __init__(self, name, age):
        self.name = name
        self.age = age
        print("Class initialized.")
```
- This function is executed every time an **instance** of the class is created (also known as *instantiated*) and it can take arguments if desired. Use this place to assign values to properties of the class, or to execute other operations that are necessary every time an instance is created.
- Notice that, differently from functions, you don't indicate the arguments inside the parentheses next to the class name. Instead, indicate the arguments inside the parentheses of the `__init__()`. You pass these arguments when you create an instance of the class.

**self**

If you read the code at the beginning, you’ve seen that the word `self` appears in many places. What is it? `self` is a reference to the class instances you create and is used to access properties and methods inside the class. The important things to remember are:
- When working inside of a class, always add `self`. before the name of properties. Do this when you **define properties** (like in `self.name = "name"`), and when you **call properties** (like in `return self.name`).
- Always use `self` as the *first* argument when you **define methods** inside a class (like in `def set_name(self, name)`).
- Never include `self` in the arguments when you **call methods** of a class (like in `self.print_new_name()`).
- When interacting with an **instance** of a class, never use `self` to access its properties (like in `print(person_instance.name)`) or its methods (like in `print(person_instance.get_name())`).

There is plenty of new information here. Let’s do some exercises to get the feeling of it. 🤓

#### Exercise:

In the cell below you’ll find a code to play with and modify for this exercise.

1. Add methods to set the value (also called a *setter*) and to get the value (also called a *getter*) of `age`.
2. Add a new property called `last_name` and also pass this value as an argument next time you instantiate the class.
3. After you create an instance of the class, change the value of `last_name`.
4. Print name and last_name together in one line.
5. Run the cell to see the results.

- **Going further**: Create a new method in the class called `is_the_same()`. This method should take as arguments *name*, *last_name* and *age*, and print "Same person." if the all the values are the same as the corresponding properties of the class. Otherwise, the method should print "Different person".

In [778]:
class Person():

    def __init__(self, name, age):
        self.name = name
        self.age = age
        print("Class initialized.")

    def set_name(self, name):
        self.name = name
        self.print_new_name()


    def set_age(self, age):
        self.age = age
        self.print_new_age()

    def get_age(self):
        return self.age


    def get_name(self):
        return self.name

    def print_new_name(self):
        print(f"New name set: {self.name}")

# Create an instance of the class Person():
person_instance = Person("John", 30)

# Print its property name:
print(person_instance.name)
print(person_instance.get_name())

# Change its property name:
person_instance.name = "Peter"
person_instance.set_name("Peter")

# Print its property name:
print(person_instance.name)

Class initialized.
John
John
New name set: Peter
Peter


In [779]:
class Person():

    def __init__(self, name, age):
        self.name = name
        self.age = age
        print("Class initialized.")

    def set_name(self, name):
        self.name = name
        self.print_new_name()


    def set_age(self, age):
        self.age = age

    def get_age(self):
        return self.age


    def get_name(self):
        return self.name

    def print_new_name(self):
        print(f"New name set: {self.name}")


person = Person("Ben",3)
print(person.get_name(), person.get_age())

person.set_age (20)
print(person.get_name(), person.get_age())

Class initialized.
Ben 3
Ben 20


### 4.2. Custom modules

Now it’s time for some magic. Ready? First, think of how the code below differs from the previous ones. Now, run the cell below.

In [780]:
from library import Person

# Create an instance of the class Person():
person_instance = Person("John", 30)

# Print its property name:
print(person_instance.name)
print(person_instance.get_name())

# Change its property name:
person_instance.name = "Peter"
person_instance.set_name("Peter")

# Print its property name:
print(person_instance.name)

Class initialized.
John
John
New name set: Peter
Peter


Ta daaa! You were able to create an instance of the class that you just wrote without having to add all the code again. How can this be? The reason is that in the same directory where this Jupyter Notebook is located, there is also a file called `library.py` and inside this file, there is the code of the class `Person()` you created.

The file `library.py` is a **library**, and the class `Person()` inside this file is a **module**. This explains the syntax `from library import Person`, as you are telling Python to import the module called `Person` from the library called `library`.

You can see what is inside `library.py` with the code below:

In [781]:
with open("library.py") as file:
    print(file.read())


class Person():
    
    def __init__(self, name, age):
        self.name = name
        self.age = age
        print("Class initialized.")
    
    def set_name(self, name):
        self.name = name
        self.print_new_name()
    
    def get_name(self):
        return self.name

    def print_new_name(self):
        print(f"New name set: {self.name}")

def multiply_function(number, multiplier=2):
    result = number * multiplier
    return result, multiplier

numeric_variable = 123

text_variable = "456"

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def make_heatmap(searcher, n_values, p_values, t_delta):

    results = pd.DataFrame.from_dict(searcher.cv_results_)
    
    results["params_str"] = results.params.apply(str)
    
    scores_matrix = results.sort_values("iter").pivot_table(
        index="param_n_neighbors",
        columns="param_p",
        values="mean_test_score",
        aggfunc="last",
    )

    fig, ax = plt.subplots(figsize=(5, 5))


As you can see, *libraries* can contain modules, functions, and values, and the syntax for importing them is the same. Take a look at the code below:

In [782]:
from library import multiply_function, numeric_variable, text_variable

print(multiply_function(3, 4))
print(numeric_variable)
print(text_variable)

(12, 4)
123
456


Now that you know what a class is, how to create one, and that they end up in a library, have a go at the exercise below. 🤓

#### Exercise:
1. Create a copy of `library.py` and name it `person.py`.
2. Replace the class Person() in person.py with the augmented version that you wrote in the previous exercise.
3. Go back to the previous exercise,  copy the solution code and paste it into the cell below.
4. Delete the code defining the class, but keep all the operations below it.
5. Import the module Person from person.py
3. Run the cell to see the results.

Congratulations! You have created your first software library!

In [783]:
from person import multiply_function, numeric_variable, text_variable

print(multiply_function(3, 4))
print(numeric_variable)
print(text_variable)

(12, 4)
123
456


# Introduction to Coding for AI

## 5. Data processing

So far we have been working with very small data snippets, but of course, datasets are normally stored in files or databases. In our challenge, we’ll focus on tabular data, which is a very common and flexible format that allows you to process text and numerical values. Think of the data you normally find in spreadsheets.

Let’s take a look at some of the common data formats for handling tabular data in real-world situations: TXT, CSV, and JSON.

### 5.1. Directory tree

All the data in your computer has a directory tree structure. For example, you can see the content of our challenge directory in your file explorer:

<img src="../data/content/directory_tree.png" width="90%"/>

And then think of the directory tree representing these files as the following one:

```
Introduction to Coding for AI/
│
├─ data/
│  │
│  ├─ content/
│  │  ├─ image_1.png
│  │  ├─ image_2.png
│  │  ├─ image_3.png
│  │  ├─ ...
│  │
│  ├─ datasets/
│     ├─ dataset_1.csv
│     ├─ dataset_2.json
│     ├─ dataset_3.xlxs
│     ├─ ...
│
├─ notebooks/
   ├─ notebook_1.ipynb
   ├─ notebook_2.ipynb
   ├─ notebook_3.ipynb
   ├─ ...
```

The first step to load data from your computer is to tell Python where to search for it. Here we will import a module called `glob` from the standard library also called `glob`. *glob* reads all the files inside the **folder** (also called **directory**) that you specify and returns a list with their paths.

In [797]:
from glob import glob

all_file_paths_here = glob("./*")
python_file_paths_here = glob("./*.py")
all_file_paths_above = glob("../*")

print(f"all_file_paths_here:\n{all_file_paths_here}\n")
print(f"python_file_paths_here:\n{python_file_paths_here}\n")
print(f"all_file_paths_above:\n{all_file_paths_above}\n")

all_file_paths_here:
['./lesson_2.ipynb', './person.py', './library.py', './lesson_4.ipynb', './lesson_3.ipynb', './lesson_1.ipynb', './__pycache__', './Coding_Journey.ipynb', './Introduction to Coding for AI']

python_file_paths_here:
['./person.py', './library.py']

all_file_paths_above:
['../demo', '../screenshots', '../python assignments', '../ericafirst', '../memoryMap', '../lesson3.ipynb', '../username']



The string we passed to `glob` starts with one dot and a forward slash (`./`). The dot (`.`) means **here**, the current directory where this Jupyter Notebook is located in your directory tree, and the forward slash (`/`) indicates that there is a directory there. Similarly, you use two dots (`../`) to tell Python that it should go one directory level **above**, and from there start following the rest of the path. You can repeat these two dots as many times as you want to go up in your directory tree, so `../../../` would go up three levels for example.

Next we see a star (`*`) and a star followed by the file extension of python files (`.py`). The star alone (`*`) means **everything**, so it tells `glob` to add all files and folders to the list that it will return. However, when the star is written next to other characters, its meaning changes. So, (`*.py`) indicates to `glob` to add all files ending with `.py` to the list that it will return.

To recap, to open files located in the same directory as the notebook, use one dot only, so `./file.text` searches for `file.text` in the current directory, and `./folder/file.text` searches for `file.text` in a folder called `folder` located in the same place as the jupyter notebook.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Add a command that gets the list of all jupyter notebook files (`.ipynb`) in the current directory.
3. Store this list in a variable with a suitable name and print them.
4. Run the cell to show the results.

In [799]:
from glob import glob

all_file_paths_here = glob("./*")
python_file_paths_here = glob("./*.py")
all_file_paths_above = glob("../*")
jupyter_file_paths_here = glob("./*.ipynb")

print(f"all_file_paths_here:\n{all_file_paths_here}\n")
print(f"python_file_paths_here:\n{python_file_paths_here}\n")
print(f"all_file_paths_above:\n{all_file_paths_above}\n")
print(f"jupyter_file_paths_here:\n{jupyter_file_paths_here}\n")

all_file_paths_here:
['./lesson_2.ipynb', './person.py', './library.py', './lesson_4.ipynb', './lesson_3.ipynb', './lesson_1.ipynb', './__pycache__', './Coding_Journey.ipynb', './Introduction to Coding for AI']

python_file_paths_here:
['./person.py', './library.py']

all_file_paths_above:
['../demo', '../screenshots', '../python assignments', '../ericafirst', '../memoryMap', '../lesson3.ipynb', '../username']

jupyter_file_paths_here:
['./lesson_2.ipynb', './lesson_4.ipynb', './lesson_3.ipynb', './lesson_1.ipynb', './Coding_Journey.ipynb']



### 5.2. Text Data

In the following sections, we'll see how to **read** data from files, **transform** it to be ready for analysis, and **write** it back to your hard drive. Let's begin with an example of text data.

#### 5.2.1. Read

In the file that contains the notebooks, there is also a directory with datasets. We’ll start with a dataset called `spam.txt` that contains a collection of SMS messages. This dataset is a simplified version of the original SMS Spam Collection Dataset found in [Kaggle](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset).
In the dataset, each line contains the SMS text as well as the category (class) spam or ham. In case you don't know it, ham means that the SMS is ok and not spam.
Emails already have a spam filter, but wouldn't it be great if phone companies could do the same? That's what we'll try to help them with.
Let's start by reading the data, below is the method to read text files line-by-line, check it out:

In [800]:
with open("../data/datasets/spam.txt", "r") as file_handle:
    first_line = file_handle.readline()
    print(first_line)  # String formatted by print()
    print(repr(first_line))  # Raw, printable representation of the string

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.txt'

Let’s unpack this code through the steps’ description below:

**with ... as ...**

The statement `with <object> as <handle>:` means that the code that you place after `with` has to return an object, and that you will assign this object to the variable indicated in `<handle>`. In our case, the built-in function `open()` returns an object associated with the file you are opening. When Python opens a file, it must close it once it has finished working with it, otherwise, the file could become corrupted. One way of doing this is by calling the method `<object.close()>`, but to simplify our code, when we use the `with ... :` method it creates a code block in the next line after the colon. Once your program finishes working inside this code block and goes out again, the `with` automatically closes the file for you.

**open( )**

The statement `open("<path/to/filename.extension>", "mode")` opens a file with the **mode** that you indicate. The modes that we are interested in are `r`, `w` and `a`.
  - `r`: Only reads the file. This is a safe option to avoid messing up  the data.
  - `w`: Creates a file to write. Be careful, if the file exists already it will delete its content first. Better avoid this, unless you are sure of it!
  - `a`: Opens a file to write, but instead of overwritting its content, everything you write is appended at the end of the file.

**.readline( )**

Next we see that `first_line = file_handle.readline()` reads one line from the `file_handle` and stores it in the variable `print_line`. In the example, we only execute `.readline()` one time, so we only read the first line in the file. What if you want to read multiple lines? In that case, you can execute it inside a `for` loop (we’ll see the syntax in the following example). There is also a method called `.readlines()`, the plural, that reads all the lines in the file, but it’s not advised when you work with very large files! It’s good for you to know this method exists, but to stay away from trouble, let’s only use the first method and always read one line at a time. slow and steady wins the race 😉

**repr( )**

Even if the `print()` function doesn't show it, there is a "\n" at the end of each line in texts.
In the example, we use the function `repr()` to show it.
We'll explain a bit what is `repr()` now, but the main point for you to remember, is that at the end of each line of text in a file, there is a *hidden* character `\n`.

Notice how the first print command (`print(first_line)`) formats the string in `print_line`, so instead of showing `\n` at the end of the line, it **adds** a new line. Conversely, the second print command (`print(repr(first_line))`) uses `repr()` to get the *printable representation* of the string. This means that instead of interpreting the escape characters inside strings, `print()` will show all charachters inside the string, or it *raw* content.

#### 5.2.2. Transform

Now let's start doing some processing of the text in this file. As an exercise, let's create a list with the SMS texts and a list with the corresponding *classes* (*ham* or *spam*). **Attention:** in the context of *Python programming language*, a *class* is a code structure. In the context of *Machine Learning*, a *class* is the category of something. In the latter case, the SMS text may belong to the class *ham* or to the class *spam*.

**String parsing**

The first step is to know how to process, or **parse**, each line of text. In the previous cell we saw the first line in the file, so assuming all lines have the same structure, we can create a program that:

1. Splits the SMS text from the SMS class using the string `". The class of this SMS is: "` as a separator.
2. The SMS text is ready, so now we just need to remove the `\n` trailing the class name `ham`.
3. Then store the SMS in one list and its class in another.

In [786]:
# Important:
# Declare the lists before you use them,
# otherwise you'll get an error:
sms_texts = []
sms_classes = []

with open("../data/datasets/spam.txt", "r") as file_handle:

    counter = 0
    for line in file_handle:

        sms_text, sms_class = line.split(". The class of this SMS is: ")

        sms_class = sms_class[:-1]  # Remove "\n"

        sms_texts.append(sms_text)
        sms_classes.append(sms_class)

        counter += 1

print(f"Total number of instances: {counter}")
print(f"First SMS:\n{sms_texts[0]}")
print(f"First class:\n{repr(sms_classes[0])}")

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.txt'

All the elements in the program above are already known to us. Let's go briefly over them to make sure everything is clear.

- First we initialize our data structures, which are the two lists where we'll store our clean data after processing it. We must always initialize data structures before modifying them, otherwise, Python will produce an error.
- Then we open the file and start reading its text, line by line. The variable `file_handle` is an **iterator**, so when we place it in a `for` loop it returns item after item until no more elements are left.
- Next, we split each text line with a string **pattern**. In our case, we know that the same text is always repeated between each SMS and its class: `". The class of this SMS is: "`, including a space at the end of the string.
- Afterwards, we remove the last character `\n` from the string `ham\n` and store the *cleaned* string `ham` back in the variable `sms_class`. For this we *slice* the string; remember when we saw *slicing*? We use the command `sms_class[:-1]`, which is the same as `sms_class[0:-1]`, so the start index is `0` and the end index is `-1`. In other words, we keep all the characters in the string, from index `0` and up to, *but not including*, the last index to remove the last character `\n`.
- Finally, we append the string `sms_text` in the list `sms_texts`, and the string `sms_class` in the list `sms_classes`.
- Notice that before the `for` loop, we initialize a `counter` and then increase its value by `1` after each `for` loop. In this way, we can count how many lines are in our text file.

for index, line in enumerate(file_handle):

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Replace the two lists `sms_texts` and `sms_classes` with four new lists:
  - A list for storing the text of ham messages.
  - A list for storing the text of spam messages.
  - A list for storing the class of ham messages.
  - A list for storing the class of spam messages.
3. Count how many *ham* and how many *spam* elements are in the dataset and print the answers.
4. Run the cell to show the results.

- **Going further**: Instead of having four separate lists, create a dictionary with four keys. Each key should be a string with the name of the variable, its corresponding value should be the list with data. For example, `sms_texts = []` and `sms_classes = []` would become `data_dictionary = {"sms_texts": [], "sms_classes": []}`. Therefore, `sms_texts.append(sms_text)` would become `data_dictionary["sms_texts"].append(sms_text)`.

#### 5.2.3. Write

The last step of our text data processing is to save the data.
Below you can see that we repeat the same processing we did before, and afterward we create two new files to save the SMS texts and the SMS classes separately.

Also, notice that when we write each line, we append a new line character `\n` at the very end, so that the next line we write starts in a new line below. There are two ways for adding `\n` at the end of the SMS text, by inserting into the SMS text or by appending it at the end of the SMS text.
To insert it, we can use the formatting method that we have used before: `f"{sms_text}\n"`, and to append it we can use the `+` sign as it concatenates the two strings on its left and right sides: `sms_text + "\n"`.
Both methods produce the same result, so we use the `+` method as in this case is the simplest.

In [787]:
# Preprocess data:
sms_texts = []
sms_classes = []
with open("../data/datasets/spam.txt", "r") as file_handle:
    for line in file_handle:
        sms_text, sms_class = line.split(". The class of this SMS is: ")
        sms_class = sms_class[:-1]
        sms_texts.append(sms_text)
        sms_classes.append(sms_class)

# Save data:
filename = "../data/datasets/sms_texts.txt"
data_list = sms_texts
with open(filename, "w") as file_handle:
    for item in data_list:
        line = item + "\n"
        file_handle.write(line)

# Save data:
filename = "../data/datasets/spam_classes.txt"
data_list = sms_classes
with open(filename, "w") as file_handle:
    for item in data_list:
        line = item + "\n"
        file_handle.write(line)

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.txt'

You can also see that instead of typing the file path inside the `open()` function, we define it as a variable above it. Also, instead of using a different variable name for the list inside the `for` loop, we pass it the same variable name `data_list`.
This optimization of code to make it more reusable is called code **refactoring**, and replacing values inside pieces of code with variables is called **extracting** parameters.
More concretely, we can reuse the code for writing a file if we specify the `filename` and the `data_list`.

#### Exercise:
1. Copy the code of the cell above into the cell below.
2. Below the code, create a *function* with the arguments `filename` and `data_list`.
3. Below the function definition, call the function two times. The first time, provide it with the parameters to save the SMS texts, and the second time, provide it with the parameters to save the SMS classes. For example, if you name the function save_data(), you should call it as follows: `save_data(filename, data_list)`.
4. Run the cell to execute the code.

- **Going further**: If you feel more adventurous, create a class called `DataCenter`. The class should have two methods:
  1. `preprocess_sms()` reusing the code we wrote for processing our SMS data. It should take as parameters the `filename` to open, and the `cut_pattern` used to split the *text* from the *class*. Finally, it should return two lists: `sms_texts` and `sms_classes`.
  2. `save_data()` reusing the code we wrote for saving text data.

Then create an instance of the class, call one time the `preprocess_sms()` method, and call two times the `save_data()` method to save the SMS texts and the SMS classes.

### 5.3. CSV Data

Reading and writing took lots of steps in the code we used above. Considering that all these steps are repeated often, it would be a good idea to standardize the read-and-write procedures in a single module. What about standardizing also the *cut pattern* that we use to separate the different features in our data? This is where Comma-Separated Values or **CSV** files come in handy. Furthermore, to simplify the read-and-write procedures, we'll use **Pandas**, an external library that specializes in tabular data. Normally, you have to install external libraries manually, but in our case, Anaconda comes with all the Data Science libraries we need, including Pandas.

Reading and writing took lots of steps in the code we used above. However, all these steps are repeated often, so it would be a good idea to standardize the read-and-write procedures in a reusable function. But what about the cut pattern we use to separate different features in our data? Fret not, we can standardize these too, and that's where Comma-Separated Values (CSV) files come in handy.

Yes, but isn't this going to be complicated? No, it doesn't have to be, because we'll use Pandas 🐼, an external library that specializes in tabular data. While you normally would have to install external libraries manually, Anaconda is prepared to provide you with all the Data Science libraries we need.

So, let's get into coding and import this useful Pandas library.

#### 5.3.1 Read

Let's go straight to an example.

In [788]:
import pandas as pd

data = pd.read_csv("../data/datasets/spam.csv")

print(f"Type of data: {type(data)}")
data.head()

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.csv'

Done! Great isn't it! 😊

When we write `import pandas as pd`, the `pd` is just a way to assign a shorter name to the module. This helps to make the code look cleaner and saves time when writing lots of code. For example, when calling the method to read CSV files, now we can write `pd.read_csv()` instead of the longer `pandas.read_csv()`.

Finally, when you call the method `.head()` of the object `data`, it displays the top five rows of your data with a pretty-looking table that makes it easier to inspect the data. By the way, the data type of the object returned by `pd.read_csv()` is called `DataFrame`. You can inspect it with the function `type()` as before:

#### 5.3.2 Transform

Below you can see how `spam.txt` changes from `spam.csv`, the CSV version of our dataset. We'll print two lines of the CSV file as the first line has the name of each feature (`sms` and `class`). In the second line you can see that the cut pattern `. The class of this SMS is: ` is gone. This pattern is no longer needed, as it has been replaced by other markers, such as placing the features (texts and classes) between double quotes (`" "`), and by separating each feature with a comma (hence the name CSV). This means that we no longer need to transform our data, this is already done when we use CSV conventions.

In [789]:
with open("../data/datasets/spam.txt", "r") as file_handle:
    line = repr(file_handle.readline())
    print(line, "\n")

with open("../data/datasets/spam.csv", "r") as file_handle:
    line = repr(file_handle.readline())
    print(line)
    line = repr(file_handle.readline())
    print(line)

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.txt'

#### 5.3.3 Write

Finally, we can also save our features (texts and classes) as individual files. For this, we select the respective column in our dataframe and use Pandas to save it.

#### Note:
We are importing pandas again and reading the data again, even though we already did this in the previous cell and the data is already in memory. We did the same when we defined our custom functions and we’ll keep repeating these loads and definitions. Why? Simply to make the cells in the notebook stand-alone, so you can run them independently and without having to run all the cells above first.

In [790]:
import pandas as pd

data = pd.read_csv("../data/datasets/spam.csv")

data.to_csv("../data/datasets/sms_texts.csv", columns=["sms"], index=False)
data.to_csv("../data/datasets/sms_classes.csv", columns=["class"], index=False)

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.csv'

Two lines of code, amazing. The `.to_csv()` can take additional parameters, like `columns` to indicate a list with the names of the columns that you want to save, or `index` to indicate if you want to add index numbers in your CSV file or not. How can we know all the parameters that are possible to pass to this method? The best way to understand a library is through its documentation. It may look a bit complex, but after a few minutes, you will become comfortable reading it.

For example, [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html) you can find the documentation for `.to_csv()`. At the top, you can see the definition of the method, and the default value for its arguments. Below you will find a detailed description of each parameter and at the bottom some example exercises.

Part of being a software developer is reading the libraries' documentation and searching for answers in technical forums. Ask any software developer engineer, and they will tell you they do this on a daily basis! It takes a bit of time to get used to these kinds of documents and navigate them confidently, but once you come to grips with their common structure, it becomes much easier and more enjoyable. Seriously, it's a promise 😉

So let's get some practice and do the exercise below.

#### Exercise:
1. Google "pandas read excel" and "pandas write dataframe to excel", and find the pages in the official Pandas documentation that describe these two methods.
    - **Tip:** The main page of the Pandas' official documentation is https://pandas.pydata.org/pandas-docs/stable/reference/
2. At the top of the documentation pages, you can find the **signature** of the methods. The signature shows you the parameters that you can pass to the methods. Read the arguments that each function has, and notice the default values of each.
3. At the top of the documentation pages you can find examples of how to use the methods.
3. In the cell below, load the file `"../data/datasets/spam.xlsx"` and
    - save the column `"sms"` in a file called `"../data/datasets/sms_texts.xlsx"`
    - save the column `"class"` in a file called `"../data/datasets/sms_classes.xlsx"`

### 5.4. JSON Data

Finally, we'll see another popular format used to transfer data between processes, databases, and to communicate between different programming languages: JSON (JavaScript Object Notation).
More concretely, JSON is a standard file and data-interchange format that uses human-readable text to store and transmit data objects consisting of **attribute–value pairs** (like **key-value pairs** in Python dictionaries) and **arrays** (like **lists** in Python).
The good news is that the JSON format is very similar to Python dictionaries, so you are already familiar with its structure.
When a JSON only has one item, Python converts it into a dictionary, and when it has multiple items, Python converts it into a list of dictionaries. These are examples of JSON objects:

```
single_user = {
    "name": "John",
    "last_name": "Smith",
    "age": 24,
    "hobby": {
        "reading" : true,
        "gaming" : false,
        "sport" : "football"
    },
    "children" : ["Peter", "Laura"]
}

multiple users = [
    {"name": "John", ...},
    {"name": "Jessica", ...},
    {"name": "Peter", ...}
]
```

Let's go over some subtle differences between JSON and Python:
- Wwhat in Python is a dictionary (`dict`), in JSON is called an `Object`.
- A `list` in Python is called an `Array` in JSON.
- The value `None` in Python is called `null` in JSON.
- `True` and `False` have the first letter capitalized, but in JSON they are all lowercase.

| Python |  JSON  |
|:------:|:------:|
|  dict  | Object |
|  list  |  Array |
|  True  |  true  |
|  False |  false |
|  None  |  null  |

#### 5.4.1 Read


To load JSON data into a Python dictionary (or a list when there are multiple key-value pairs), you need to import the module `json`.
You can use this module to read a file or to decode information in a text string.
To start, let's read the same dataset we used in the CSV example (the SMS ham/spam one 😉), but in JSON format.
For this, you can use the method `.load()`

In [791]:
import json

# Read a FILE:
with open("../data/datasets/spam.json", "r") as file_handle:
    data = json.load(file_handle)
    print(f"The type of data is: \n {type(data)}")
    print(f"The number of elements in the data is: \n {len(data)}")
    print(f"The first element in the data is: \n {data[0]}")

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.json'

Wow, the results for the first element in the data looks crammed and not an easy read!. But as you should know by now, in programming languages, there’s always a solution 😉. If you would like to improve the way dictionaries are printed, you can import the module `PrettyPrinter` from the library `pprint`. Once you import this module:
- Create an instance of it in the same way you created instances of classes before. For example, `pp = PrettyPrinter()` creates an instance called `pp`.
- Then, use the instance object to call the method `pprint()`. For example, you can execute `pp.pprint(dictionary)` to print more *prettily* the contents of the dictionary.

Take a look at the following example:

In [792]:
import json

from pprint import PrettyPrinter
pp = PrettyPrinter()

# Read a FILE:
with open("../data/datasets/spam.json", "r") as file_handle:
    data = json.load(file_handle)
    print("The first element in the data is:")
    pp.pprint(data[0])

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.json'

Much better, isn’t it? You can now instantly see the `class` in the first line, and separately the `sms` text on the second.

#### 5.4.2 Transform

Great, the data is in a dictionary and you can read it, so why don’t we look at how to process it. For example, let’s add a feature called `binary_class` with the value `0` when the text is `spam`, and with the value `1` when the text is `ham`. For this, we can iterate over our list of dictionaries with a *for* loop, or use pandas.

Frequently, data has missing values, so it could be the case that some texts have no `class` value.
It is important to catch these cases and decide what to do with them, otherwise, your program could crash when trying to add a number with a `None` for example.
There are multiple alternatives to solve this issue. You could remove the rows that have any missing value (although you may end up with no data!), or you could replace them with educated guesses, for example with the average value in the column, or the average between the two adjacent rows. This process of filling in missing values is called **imputation**.
Observe in both of the methods below the alternative ways to verify that your data is complete.

In [793]:
# Iterating over a list of dictionaries with a for loop

import json

from pprint import PrettyPrinter
pp = PrettyPrinter()


# LOAD the data:
with open("../data/datasets/spam.json", "r") as file_handle:
    data = json.load(file_handle)

# INSPECT the data:
print(f'The type of "data" is:\n  {type(data)}')
print(f'The type of each item in "data" is:\n  {type(data[0])}')
print("\nThe first element in the ORIGINAL data is:")
pp.pprint(data[0])

# TRANSFORM the data:
for index, item in enumerate(data):
    if item["class"] == "spam":
        data[index]["binary_class"] = 0
    elif item["class"] == "ham":
        data[index]["binary_class"] = 1
    else:
        # Warn if there are missing values:
        print(f'The class must be "spam" or "ham". The class of item {index} is: {item["class"]}')

# VERIFY the data:
print("\nThe first element in the TRANSFORMED data is:")
pp.pprint(data[0])

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.json'

When we create our new `binary_class` feature, we don't want to modify the temporary variable returned in the `for` loop, but we want to modify the original data object.
To do this, we use the function `enumerate()` that we introduced in notebook '2. Flow and Functions*.
This function helps us because it returns the `index` and the `item` of the list in each iteration, so we evaluate our logical conditions on the `item`, and then we modify the corresponding item in `data` by pointing to it with its `index`.
Notice that you can use consecutive pairs of brackets `[]` to go deeper in the data structure. For example:
- `data` returns a list.
- `data[index]` returns a dictionary.
- `data[index]["binary_class"]` returns a number.

Now let's see how to perform the same procedure in a simplified way by using Pandas.

In [794]:
# Applying a function to a Pandas data frame

import pandas as pd


# LOAD the data:
data = pd.read_json("../data/datasets/spam.json")

# INSPECT the data:
print(data.head(), "\n")

# TRANSFORM the data:
data["binary_class"] = data.apply(
    lambda row: 0 if row["class"] == "spam" else 1,
    axis=1
)

# VERIFY the data:
print(data.head(), "\n")

# Warn if there are missing values:
nan_values = data["binary_class"].isnull().sum()
print(f"Number of missing values: {nan_values}")

FileNotFoundError: File ../data/datasets/spam.json does not exist

Let’s unpack what we’ve done. Here we use the method `.read_json()` to load the JSON data directly into a Pandas data frame named `data`.
Afterward, in `# TRANSFORM the data` we create a new column with the same notation used to create a new *key-value* pair in a dictionary: `dictionary["new_key"] = new_value` or `pandas_dataframe["new_column"] = new_series_of_values`.

To create the new series of values that will be assigned to our new column `binary_class`, we use the method `.apply()` on `data`.
This method applies a **function** iteratively over the **columns** or the **rows** of a Pandas data frame.

In the first parameter, we pass the function to `.apply()` with a syntax called **lambda function**.
Lambda functions are the same as regular functions, but have a shortened syntax that makes our code shorter and easier to read.
The syntax of *lambda* functions is: `lambda input: computation`. Notice that you start with the keyword `lambda` to define the function, and the result you get for each row is the value returned by the `computation`.
In our case, the computation is `0 if row["class"] == "spam" else 1`, so our lambda function returns `0` when the row `class` is `spam`, and returns `1` when the row `class` is `ham`.

In the second parameter passed to `.apply()`, we indicate that we want to iterate over rows.
`.apply()` iterates over columns when its parameter `axis=0`, and iterates over rows when its parameter `axis=1`.
As we want to check the value of `class` in each row, we set `axis=1`.

And there you go, you now know how to process data with JSON, and scan any dataset for missing values. 😉

#### 5.4.3 Write

Finally, we are ready with our new data, the next step is to save it back as a JSON. These are the methods for storing dictionaries and Pandas data frames:

In [795]:
# Save a dictionary as a JSON

import json


with open("../data/datasets/spam.json", "r") as file_handle:
    data = json.load(file_handle)

for index, item in enumerate(data):
    if item["class"] == "spam":
        data[index]["binary_class"] = 0
    elif item["class"] == "ham":

        data[index]["binary_class"] = 1
    else:
        pass

# SAVE the data:
with open("../data/datasets/spam_dictionary.json", "w") as file_handle:
    json.dump(data, file_handle, indent=2)

FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/spam.json'

To have more clarity in this example code about saving data, we skip the step of checking for missing values, but you should always perform it Flow control elements (for, if, else, etc.) can’t be left empty, they always must have an instruction. Therefore, we write `pass` to tell Python that we don’t want to do anything and simply continue to the next step in the code.

In [796]:
# Save a Pandas data frame as a JSON

import pandas as pd


data = pd.read_json("../data/datasets/spam.json")

data["binary_class"] = data.apply(
    lambda row: 0 if row["class"] == "spam" else 1,
    axis=1
)

# SAVE the data:
data.to_json("../data/datasets/spam_dataframe.json", indent=2, orient="records")

FileNotFoundError: File ../data/datasets/spam.json does not exist

The last line shows the method to save a Pandas data frame as a JSON, and it has two additional parameters.
`indent=2` adds two indentation spaces to make the JSON file easier to read, and `orient="records"` tells Pandas to store it with the common pattern for data records.

JSON is a universal translator between programming languages and you'll encounter it often when receiving and sending data. Now you are ready to start communicating with databases around the world! 😎

#### Exercise:
1. Open the JSON file in a simple text editor. Don't use Excel, but instead use the default text editors **textEdit** in Mac, or **Notepad** in Windows.
2. Open the [official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html) for `.to_json()` and find the alternative values that you can pass to the parameter `orient`.
3. Copy the code of the cell above into the cell below.
4. Save the data with all the different `orient` values (one line of code per value) and compare the output format in your text editor.
4. Run the cell to execute the code.