# Introduction to Python

## Python and R

Source: 2024 StackOverflow developer survey

### How popular are Python and R?

<img height="600px" src="https://drive.usercontent.google.com/download?id=1Nheb2DJzQU9fAFi8aNCAE4sVRTd3AKUH" />

### Which one is "better" according to developers?

<img height="200px" src="https://drive.usercontent.google.com/download?id=1YmfBS2AMN-hndfTjHXlK-3s0MGKbfsx-" /> <img height="200px" src="https://drive.usercontent.google.com/download?id=13bs8OlOze-6tdtonzGBCS3KWepo7fass" />

### Which one is associated with higher salaries?

<img height="400px" src="https://drive.usercontent.google.com/download?id=1kdU-HQ9KLaTLwcCZljx4nc9nSJIFnFvJ" />

# Dealing with numbers

## Using Python as a calculator

All the common mathematical operations are available.
Remarks:

* Exponentiation is performed with the `**` operator. I.e., $a^b$ is `a ** b`. Contrast this with many other programming languages which use the `^` operator.
* The exponent must not be an integer. For example, raising to the power of `0.5` gives you the square root.

In [None]:
3 + 2

5

In [None]:
3 * 4

12

In [None]:
3 / 2

1.5

In [None]:
3 ** 2

9

In [None]:
3 ** 0.5

1.7320508075688772

You will rarely use numbers (or other types of data) directly.
Rather, you will likely store them in variables and perform operations using these variables.

Remark: you can use underscores as thousands separators to improve readability when dealing with large number literals.

In [None]:
revenue = 100_000
costs = 40_000
profit = revenue - costs

Remark: the notebook will print the result of the last expression in a cell.
Writing the name of a variable by itself is a valid expression.
Therefore, it will be evaluated and its result (the value held by the variable) will be printed.

In [None]:
profit

60000

In [None]:
tax_rate = 0.15
net_profit = profit * (1 - tax_rate)

In [None]:
net_profit

51000.0

## Integer vs. floating-point numbers

In Python there are two main number types.
The first is integer numbers (`int`), and the second is floating-point numbers (`float`).

Variables `revenue` and `costs` are integer because we defined them as such.
Variable `profit` is also integer because it is the difference of two integer variables.
Variable `tax_rate` is a float by definition.
Finally, variable `net_profit` is a float (although it contains a whole number) because it is the product of an integer and a floating-point number.

In [None]:
type(profit)

int

In [None]:
type(net_profit)

float

If you want `net_profit` as an integer number, you have to explicitly convert it using `int()`.

In [None]:
net_profit, int(net_profit)

(51000.0, 51000)

Remark: `int()` always rounds down!

In [None]:
int(1.1), int(1.9), int(1.5)

(1, 1, 1)

# Dealing with text

## Strings

There are three main way of inputting strings in Python: single-quoted strings, double-quoted strings, and f-strings.

As the names suggest, single- and double-quoted strings differ by being delimited by single quotes (`'`) and double quotes (`"`).

A single-quoted string can contain double quotes inside it, but eventual single quotes must be escaped with a backslash.
Similarly, a double-quoted string can contain single quotes inside it, but eventual double quotes must be escaped with a backslash.

For example, `'Is this "real" food?!'` is a valid single-quoted string; `"Is this 'real' food?!"` is a valid double-quoted string; `"Is this \"real\" food?!"` is also a valid double-quoted string.

Finally, f-strings are so called because they are preceded by a literal `f` before the opening (single or double) quote.
They are very useful for interpolating variable values within strings.
The variables (or, better said, the expressions) are enclosed within curly braces.
The expressions are evaluated and the result is converted to a string and inserted in the f-string.

In [None]:
name = "Jean"
surname = 'Valjean'

print(f"Hello, {name} {surname}.")

Hello, Jean Valjean.


In [None]:
print(f"Hello, {name.upper()} {surname.upper()}.")

Hello, JEAN VALJEAN.


## Formatting numbers within f-strings

You can interpolate numbers within f-strings.

In [None]:
print(f"The net profit is: {net_profit}")

The net profit is: 51000.0


Furthermore, you can specify [various format specifiers](https://docs.python.org/3/library/string.html#formatspec) to improve how these numbers are shown to the user.

In [None]:
# Use zero decimal digits for floating point numbers
print(f"The net profit is: {net_profit:.0f}")

The net profit is: 51000


In [None]:
# Additionally, use the thousands separator
print(f"The net profit is: {net_profit:,.0f}")

The net profit is: 51,000


In [None]:
print(f"The tax rate is: {tax_rate}")

The tax rate is: 0.15


In [None]:
# Format a decimal as a percentage
print(f"The tax rate is: {tax_rate:%}")

The tax rate is: 15.000000%


In [None]:
# Format as a percentage with zero decimal digits
print(f"The tax rate is: {tax_rate:.0%}")

The tax rate is: 15%


# Logical expressions and operators

A logical expression is a statement that can be true or false. Logical operators (such as "and" and "or" in plain English) combine logical expressions into new ones.

In [None]:
net_profit

51000.0

Remark: the most common numerical comparison operators are `>` (strictly larger), `>=` (greater or equal), `<` (strictly smaller), `<=` (less or equal), `==` (equal), `!=` (unequal).

In [None]:
# Logical expression: is net_profit strigtly larger than 10,000?
net_profit > 10_000

True

In [None]:
net_profit <= 12_000

False

In [None]:
net_profit == 51_000

True

Python allows chaining comparison operators.

In [None]:
10_000 <= net_profit <= 40_000

False

Remark: the most common logical operators are `and`, `or`, and `not`.

In [None]:
net_profit > 40_000 and tax_rate < 0.25

True

In [None]:
net_profit <= 10_000 or tax_rate >= 0.3

False

# The first data structure: lists

Lists are sequences of elements accessed by index.
The indices start from 0 (indicating the first element of the list), and proceed sequentially with 1 (the second element), 2 (the third element), etc.

In [None]:
surnames = ['Valjean', 'Javert', 'Myriel']
ages = [40, 38, 62]

In [None]:
surnames

['Valjean', 'Javert', 'Myriel']

In [None]:
ages

[40, 38, 62]

You can add new elements at the end of a list with the `append` method.

In [None]:
surnames.append('Pontmercy')

In [None]:
surnames

['Valjean', 'Javert', 'Myriel', 'Pontmercy']

In [None]:
ages.append(26)

In [None]:
ages

[40, 38, 62, 26]

The elements of a list must not necessarily be all of the same type. Python allows mixed lists.

In [None]:
mixed_list = ['Barcelona', 24, True, 5.6]

In [None]:
mixed_list

['Barcelona', 24, True, 5.6]

To check whether a list contains an element you can use the `in` operator.

In [None]:
'Valjean' in surnames

True

In [None]:
'Hugo' in surnames

False

A common operation on numeric lists is computing their sum. This can be achieved with the `sum()` function.

In [None]:
sum(ages)

166

## List comprehension: filtering

List comprehension is a way to transform a given list into a new list by applying some transformation.
The first transformation we see is *filtering*, i.e., keeping the elements that satisfy a condition and discarding the others.

Remark: you can check if a string contains a character (or a substring) using the `in` operator, e.g., `'a' in 'abc'` will evaluate to `True`, and `'a' in 'def'` will evaluate to `False`.

In [None]:
[surname for surname in surnames if 'J' in surname]

['Javert']

In [None]:
[surname for surname in surnames if 'j' in surname]

['Valjean']

Exercise: filter surnames that contain either *j* or *J*.

In [None]:
solution_1 = [surname for surname in surnames if 'j' in surname or 'J' in surname]
solution_1

['Valjean', 'Javert']

In [None]:
solution_2 = [surname for surname in surnames if 'j' in surname.lower()]
solution_2

['Valjean', 'Javert']

In [None]:
# Bonus. There really are MANY way of accomplishing the same thing in Python.
list(filter(lambda s: 'j' in s.lower(), surnames))
# Etc. etc.

['Valjean', 'Javert']

## List comprehension: transforming each element

The second task often accomplished via list comprehension is transforming, i.e., applying a transformation to each element of an existing list and obtaining a new list with the transformed elements.

In [None]:
# Square each element of the ages list
[age ** 2 for age in ages]

[1600, 1444, 3844, 676]

In [None]:
# Put each surname in lower case
[surname.lower() for surname in surnames]

['valjean', 'javert', 'myriel', 'pontmercy']

## List comprehension: filtering and transforming

We can also combine filtering and transforming in one single list comprehension that does both.

In [None]:
[age ** 2 for age in ages if age < 50]

[1600, 1444, 676]

## Iterating over a list: a first introduction to for loops

The general form of a `for` loop is:

```
for <name> in <iterable>:
    expressions using <name>
```

We use a `for` loop to go through each element of a list and then do something with it.
In a sense, a `for` loop is similar to list comprehension. However, unlike using list comprehension, the "result" of a for loop applied to a list is not necessarily another list.
In the following example, we use a `for` loop to simply print the elements of a list in a fancy way.
We can loop on other things that are not list; that's why in the above pseudo-code, I used the more general name "iterable".
For the moment, we only see how to loop over a list.

Note that we need a way to refer to each element of the list when we want to perform some operation on it.
Therefore, we define a name (`<name>` in the pseudocode) to refer to a generic list element.

Remark: Python uses indentation to specify where the "main body" of the `for` loop (i.e., the part that actually does something with the list elements) starts and ends.

In [None]:
for surname in surnames:
    print(f"Hello, Mr. {surname}!")

Hello, Mr. Valjean!
Hello, Mr. Javert!
Hello, Mr. Myriel!
Hello, Mr. Pontmercy!


## Enumerating a list

If, in a `for` loop, we want to access both the elements of the list and their indices, we can use the `enumerate()` function.

In [None]:
for idx, surname in enumerate(surnames):
  print(f"The surname at index {idx} is {surname}.")

The surname at index 0 is Valjean.
The surname at index 1 is Javert.
The surname at index 2 is Myriel.
The surname at index 3 is Pontmercy.


## Zipping lists

Zipping two (or more!) lists refers to simultaneously accessing the elements of the lists that have the same index.
For example, given the `surnames` and `ages` list, we might imagin that there is a correspondence between them.
In other words, the age at index `0` in the `ages` list refers to the person whose surname is at index `0` in the `surnames` list, etc.

In [None]:
for surname, age in zip(surnames, ages):
    print(f"Mr. {surname} is {age} years old.")

Mr. Valjean is 40 years old.
Mr. Javert is 38 years old.
Mr. Myriel is 62 years old.
Mr. Pontmercy is 26 years old.


## Iterating over a sorted list

We can obtain a sorted view of a list with the `sorted()` function.
This function does not change the list in-place, it just returns a way to access the list in sorted order.

In [None]:
for age in sorted(ages):
    print(age, end=' ')

26 38 40 62 

By default, `sorted()` sorts in ascending order.
If we want to sort in descending order we have two options.

First, we can combine the `sorted()` function with the `reversed()` function.
By reversing the ascending order we obtain the descending one.

Second, we can apss the named parameter `reverse=True` to the `sorting()` function.

In [None]:
for age in reversed(sorted(ages)):
    print(age, end=' ')

62 40 38 26 

In [None]:
for age in sorted(ages, reverse=True):
    print(age, end=' ')

62 40 38 26 

Exercise: iterate over the zipped `surnames`-`ages` lists, but sorted by age.

In [None]:
for age, surname in sorted(zip(ages, surnames)):
    print(f"Mr. {surname} is {age} years old.")

Mr. Pontmercy is 26 years old.
Mr. Javert is 38 years old.
Mr. Valjean is 40 years old.
Mr. Myriel is 62 years old.


# A new data structure: a dictionary

We can see dictionaries as generalisation of lists in which the indices associated with the values do not necessarily form an increasing sequence of integers starting from zero.
Rather, in a dictionary, values can be associated with many other types of indices, including strings.
The indices of a dictionary are called keys.
Therefore, in a dictionary, each key has a corresponding associated value.

In [None]:
character_age = {
    'Valjean': 40,
    'Javert': 38,
    'Myriel': 62
}

In [None]:
character_age

{'Valjean': 40, 'Javert': 38, 'Myriel': 62}

We access dictionary values using the same square-bracket syntax that we use to access list values.

In [None]:
character_age['Valjean']

40

To add a new key-value entry to a dictionary, we can simply assign using the `=` operator.
If the key we use is not already in the dictionary, we will create a new entry.
Remark, however, that if the key already is in the dictionary, then we will update the existing value: there cannot be duplicate keys.

To summarise:
* Lists allow duplicate values. E.g., `[1, 0, 1, 1, 2, 4]` is a valid list.
* Dictionaries also allow duplicate values. E.g., `{0: 2, 1: 2, 4: 3, 5: 2}` is a dictionary in which the value `2` appears three times.
* However, dictionaries do not allow duplicate keys. `{0: 2, 1: 2, 4: 3, 0: 80}` is not a semantically valid dictionary because the key `0` appears twice. (If you write this code, Python will not raise an error, but it will give you a dictionary in which the key `0` is associated with the value that appears the last; in our case, `80`.)

In [None]:
character_age['Pontmercy'] = 26

In [None]:
character_age

{'Valjean': 40, 'Javert': 38, 'Myriel': 62, 'Pontmercy': 26}

## Looping through a dictionary

Using a `for` loop similar to the one we used for lists will only loop through the dictionary's keys!

In [None]:
for surname in character_age:
    print(surname)

Valjean
Javert
Myriel
Pontmercy


Most of the time we want both keys and values.
We could use the keys to access the values, but this is more verbose and not Pythonic.

In [None]:
# Not great:

for surname in character_age:
    print(f"Mr. {surname} is {character_age[surname]} years old.")

Mr. Valjean is 40 years old.
Mr. Javert is 38 years old.
Mr. Myriel is 62 years old.
Mr. Pontmercy is 26 years old.


Instead, it is recommended to use the `.items()` function of the dictionary, which returns, at each iteration, both key and value.

Remark that, in this case, we need two `<name>`s in our `for` loop: one to hold the key and one to hold the value.

In [None]:
# Better:
for surname, age in character_age.items():
    print(f"Mr. {surname} is {age} years old.")

Mr. Valjean is 40 years old.
Mr. Javert is 38 years old.
Mr. Myriel is 62 years old.
Mr. Pontmercy is 26 years old.


## Dictionary comprehension

You can use dictionary comprehension similarly as you did with list comprehension.

In [None]:
# Filter by value
{surname: age for surname, age in character_age.items() if age > 30}

{'Valjean': 40, 'Javert': 38, 'Myriel': 62}

In [None]:
# Filter by key
{surname: age for surname, age in character_age.items() if 'j' in surname.lower()}

{'Valjean': 40, 'Javert': 38}

In [None]:
# Transform the values
{surname: age * 2 for surname, age in character_age.items()}

{'Valjean': 80, 'Javert': 76, 'Myriel': 124}

In [None]:
# Transform both keys and values
{surname.upper(): age / 2 for surname, age in character_age.items()}

{'VALJEAN': 20.0, 'JAVERT': 19.0, 'MYRIEL': 31.0}

# Functions

A function is a block of named code that (optinally) takes some inputs, performs operations, and (optionally) returns some outputs.

In Python, you define a function as follows:

In [None]:
def after_taxes(profit, tax_rate):
    return profit * (1 - tax_rate)

And you use it like this:

In [None]:
after_taxes(100_000, 0.125)

87500.0

However, these are not the best ways to define and use a function.

I recommend that you:
* Use type annotations to give a hint of the types you expect as parameters.
* Use type annotations to give a hint of the type that your function will return.
* Use default parameter values if this is appropriate in your case.
* Call functions using named parameter to avoid any confusion on which parameter does what.

In [None]:
def after_taxes(profit: int, tax_rate: float = 0.15) -> float:
    return profit * (1 - tax_rate)

In [None]:
after_taxes(profit=100_000, tax_rate=0.125)

87500.0

In [None]:
after_taxes(profit=25_000)

21250.0

The above code gives you a good way of defining a function and using it.
Remark, however, that this function is a bit strange for its real-life use case.
If your company has lost money (negative profit) you should not get a "reverse" taxation applied to it.

In [None]:
after_taxes(profit=-20_000)

-17000.0

The tax rate should only be applied to positive profits; otherwise, you pay zero taxes on losses.
We can easily change our function to handle the negative profit case using an `if` statement.

In [None]:
def after_taxes(profit: int, tax_rate: float = 0.15) -> float:
    if profit > 0:
        return profit * (1 - tax_rate)
    return float(profit)

In [None]:
after_taxes(profit=-20_000)

-20000.0

There are still weird things that our function can do.
For example, a distracted user might pass a *negative* tax rate.
We can warn the user of invalid input values by *raising an exception* in case we encounter such invalid values.

In [None]:
# Oooooops... your are giving yourself free money
after_taxes(profit=10_000, tax_rate=-0.5)

# Let's fix this...

15000.0

In [None]:
def after_taxes(profit: int, tax_rate: float = 0.15) -> float:
    if tax_rate < 0 or tax_rate > 1:
        raise ValueError(f"The tax rate must be between 0 and 1. Value {tax_rate} invalid.")

    if profit > 0:
        return profit * (1 - tax_rate)
    return float(profit)

In [None]:
after_taxes(profit=10_000, tax_rate=-0.5)

ValueError: The tax rate must be between 0 and 1. Value -0.5 invalid.