# Python pills (for Data Analysis in Subnuclear Physics)

---
### Contacts
 * Lucio Anderlini: Lucio.Anderlini@fi.infn.it
 * Piergiulio Lenzi: Piergiulio.Lenzi@fi.infn.it
---

Python is a general-purpose programming language designed to be easily readable and quick to write. These features, along with several technological innovations compared to compiled languages, have made it one of the gold standards for data science and data analysis.

In this notebook, we will discuss the basic syntax of Python and introduce some fundamental concepts of Object-Oriented Programming (OOP), a programming paradigm widely adopted by the libraries we will explore in future lectures.

## Introduction

As with any respectable programming course, let's start with the classic Hello World.

In [None]:
print ("Hello world")

Hello world


The instruction `print(<string>)` is used to display the value of the `string` in the output cell. In fact, `print` itself is a program that is executed when followed by parentheses; it is called a *function*.

The instruction that invokes the execution of a function (for example, `print('hello')`) is called a **function call**.

The string can be a **variable**: it does not need to be an explicitly literal string typed between the parentheses. The variables a function can accept as *inputs* are called *arguments*. The *print* function can accept multiple arguments.

For example,


In [None]:
## Define a string variable and print it
a_string_variable = "Hello world"
print (a_string_variable)

## Define an integer variable and print it
an_integer_variable = 123
print (an_integer_variable)

## Define a floating point variable and print it
a_floating_point_variable = 123.
print (a_floating_point_variable)

## Print all the variables defined above in a single print statement
print (a_string_variable, an_integer_variable, a_floating_point_variable)

Hello world
123
123.0
Hello world 123 123.0


# Conditions: the `if` statement

As in the vast majority of programming languages, the logical flow of the program can be branched based on the values taken by variables.

Branching is defined through the **`if`** statement using the following syntax:

```python
if <condition 1>:
  # statements executed when condition 1 is satisfied
elif <condition 2>:
  # statements executed when condition 2 is satisfied
elif <condition 3>:
  # statements executed when:
  #  - condition 1 is not satisfied
  #  - condition 2 is not satisfied
  #  - condition 3 is satisfied [...]
else:
  # statements executed when none of the above conditions are satisfied
```

For example,

In [None]:
a = 12345

if a == 123:
  print ("a is 123!")
elif (a == 123) or (a == 1234):
  print ("a is 1234")
else:
  print ("a is neither 123 nor 1234")

a is neither 123 nor 1234


In Python, indentation (*i.e.*, the number of white spaces between the left margin and the first character of a line of code) is used to delimit code blocks.

Lines with the same indentation level belong to the same code block and are subject to the same conditions.

For example,


In [None]:
a = 'hello'
if a == 'goodmorning':
  print ("Line 1 to print in case of goodmorning")
  print ("Line 2 to print in case of goodmorning")
else:
  print ("Line 1 to print in case it is not a goodmorning")
print ("Line to print in everycase")

Line 1 to print in case it is not a goodmorning
Line to print in everycase


The structure in code blocks enables writing rather complex structures with nested conditions.

For example,


In [None]:
symbol = 'mu'
charge = '-'

if symbol == 'e' or symbol == 'mu' or symbol == 'tau' or symbol == 'nu' or symbol == 'nu_bar':
  if charge == '0':
    if symbol == 'nu':
      print ("matter")
    else:
      print ('antimatter')
  elif charge == '+' and symbol != 'nu' and symbol != 'nu_bar':
    print ("antimatter")
  elif charge == '-' and symbol != 'nu' and symbol != 'nu_bar':
    print ("matter")
  else:
    print ("Unexpected lepton")
else:
  print ("Not a lepton")


matter


## Ternary operator
In Python, it is possible to define a variable based on a condition with a single-line statement combining the condition and the two values the variable will take when the condition is met or not.
Since this operator takes three *inputs*, it is called a Ternary operator and is available in several languages (but in Python, the order of the three inputs is rather unique).

The syntax is as follows:

```python
<value if condition is satisfied> if <condition> else <value if condition is not satisfied>
```


It's more difficult to explain it than to see it in action!


In [None]:
charge = '+'
print (charge, "positive" if charge == "+" else "negative")
charge = '-'
print (charge, "positive" if charge == "+" else "negative")

+ positive
- negative


# Boolean variables

The conditions that define the behavior of conditional statements can define a variable, called a **boolean**. A boolean variable can take the value `True` or `False` depending on whether the condition is satisfied or not.

For example,


In [None]:
## Boolean variable initialized to True
successful_condition = True

## Variabile initialized to False
failed_condition = False

## A boolean variable initialized with the result of a condition
successful_condition = (1 > 0)

## Using a boolean variable in an if statement
if successful_condition:
  print ("Successful")
else:
  print ("Failed")

Successful


## Logical Operators

As we briefly mentioned earlier, different boolean variables and conditions can be combined using logical operators:
 * and
 * or
 * not

For example,


In [None]:
print ("False:", True and False )
print ("True: ", True or False)
print ("False:", (not True) or False )
print ("False:", (1 > 0) and (1 < 0))

False: False
True:  True
False: False
False: False


Logical operators are always applied to the "boolean" version of variables. For example, the integer variable `1` is translated to `True`, while `0` is translated to `False`. Similarly, an empty string is translated to `False`, whereas any string with at least one character is translated to `True`.

We can use the syntax
```python
bool(<variable>)
```
to have Python print the translation of a variable into a logical value.

For example,


In [None]:
## Logic value of strings
print ("False: ", bool(""))
print ("True:  ", bool(" "))

## Logic value of integers and non-boolean variables in the if statement
a = 1
b = 0
print ("Succesful" if a else "Failed")
print ("Succesful" if b else "Failed")

False:  False
True:   True
Succesful
Failed


# Loops
Sometimes it is useful to perform operations iteratively multiple times.
The logical flow of a program thus presents a **loop**.


### The **`while`** Loop
The simplest loop is the **while** loop, which allows a portion of code to be executed as long as (*while*, indeed) a condition is valid.

For example,


In [None]:
## Initialize the month to the first (1) month of the year
month = 0
season = "unknown"

## Check the season of each month and stop at the first month labeled as "summer"
while season != "summer":
  ## increment month
  month += 1

  ## Compute the season
  if month <= 2 or month >= 12:
    season = "winter"
  elif month >= 3 and month <= 5:
    season = "spring"
  elif month >= 6 and month <= 8:
    season = "summer"
  elif month >= 9 and month <= 11:
    season = "fall"

print ("The first month of summer is", month)

The first month of summer is 6


Alternatively, we can break the loop with the `break` statement.


In [None]:
month = 1
season = "unknown"

while month <= 12:
  if month >= 6 and month <= 8:
    break
  month += 1

print ("The first month of summer is", month)

The first month of summer is 6


### The **`for`** Loop

Another loop, not based on a condition, but that executes code on a sequence of elements, is the **`for`** loop.

For example:


In [None]:
for particle in "e", "mu", "tau":
  for charge in "+", "-":
    print ("Particle", particle, charge, "is a lepton")
for particle in "nu", "nu_bar":
  print ("Particle", particle, "is a lepton")
for particle in "udsctb":
  print ("Particle", particle, "is a quark")

Particle e + is a lepton
Particle e - is a lepton
Particle mu + is a lepton
Particle mu - is a lepton
Particle tau + is a lepton
Particle tau - is a lepton
Particle nu is a lepton
Particle nu_bar is a lepton
Particle u is a quark
Particle d is a quark
Particle s is a quark
Particle c is a quark
Particle t is a quark
Particle b is a quark


The **`continue`** statement can be used to skip an iteration of a loop (whether a `for` loop or a `while` loop) without breaking the loop.

For example, if we want to exclude positive muons from the previous loop, we can write:


In [None]:
for particle in "e", "mu", "tau":
  for charge in "+", "-":
    if particle == "mu" and charge == "+":
      continue

    print ("Particle", particle, charge, "is a lepton (but certainly not a mu+)")

Particle e + is a lepton (but certainly not a mu+)
Particle e - is a lepton (but certainly not a mu+)
Particle mu - is a lepton (but certainly not a mu+)
Particle tau + is a lepton (but certainly not a mu+)
Particle tau - is a lepton (but certainly not a mu+)


### Range and other extensions of the `for` loop
The most common extension of the `for` loop is the use of the `range` keyword, which allows loops over continuous sequences of integers.
Using the `range` statement, we can define sequences from a `start` number to a `stop` number (exclusive) in steps of `step` with the following syntax:

```python
for <variable> in range (<stop>):
  ...

for <variable> in range (<start>, <stop>):
  ...

for <variable> in range (<start>, <stop>, <step>):
  ...
```

Where `start` is not defined, it is assumed to be zero, while `step` is assumed to be one.

Let's see three examples of this syntax:


In [None]:
# from 0 to 5 (excluded) with step 1
for i in range(5):
  print (i)

0
1
2
3
4


In [None]:
# from 2 to 5 (excluded) with step 1
for i in range(3,5):
  print (i)

3
4


In [None]:
# from 0 to 10 (excluded), only even number
for i in range(0, 10, 2):
  print (i)

0
2
4
6
8


#### Zip
The `for` loop can be extended to iterate simultaneously over multiple iterables using the `zip` statement.

For example,


In [None]:
even = 0, 2, 4, 6, 8
odd = 1, 3, 5, 7, 9

for a, b in zip (even, odd):
  print ("even:", a, "; odd:", b)

even: 0 ; odd: 1
even: 2 ; odd: 3
even: 4 ; odd: 5
even: 6 ; odd: 7
even: 8 ; odd: 9


#### Enumerate

Un'altra estensione utile è basata su `enumerate` che permette di aggiungere un indice all'iterazione.
Ad esempio:

In [None]:
for iChar, char in enumerate("hello world"):
  print ("Char number", iChar, "has value:", char )

Char number 0 has value: h
Char number 1 has value: e
Char number 2 has value: l
Char number 3 has value: l
Char number 4 has value: o
Char number 5 has value:  
Char number 6 has value: w
Char number 7 has value: o
Char number 8 has value: r
Char number 9 has value: l
Char number 10 has value: d


#### Sorted

Another interesting trick is the use of the `sorted` function, which sorts the elements we iterate over in ascending order:


In [None]:
values = 4, 3, 6, 2
print ("Unsorted:")
for a in values:
  print (a)

print ("---")

print ("Sorted:")
for a in sorted(values):
  print (a)

Unsorted:
4
3
6
2
---
Sorted:
2
3
4
6


# Functions
Functions allow you to define sequences of operations that represent a logical unit of the program, making them a fundamental building block in **sequential** programming.

We have already seen an example of a function (`print`) with its call (`print(<args>)`). In Python, the definition of new functions is extremely concise and follows this syntax:
```python
def funzione ( [ arg1[, arg2] ]):
    # your statements

    return <something>
```

The code that creates a function is called the **definition** of the function and represents the other side of the coin of the **call** (which we have already seen above).

For example, let's create a function that adds one to the argument and returns the incremented value as the **return** value.


In [None]:
## Function definition
def add_one (some_value):
  return some_value + 1

## Function call
incremented_value = add_one ( 42 )
print ("Incremented value:", incremented_value)

Incremented value: 43


Function arguments can be associated with a default value: that is, the value the argument takes if one is not defined at the time of the call.

Arguments with default values (called **keyword arguments** or **`kwargs`**) must always be placed after arguments without default values (also known as **positional arguments**).

For example:


In [None]:
def a_function(its_first_argument, its_second_argument, a_keyword_argument='hello', another_k='world'):
  print ("Argument 1:", its_first_argument)
  print ("Argument 2:", its_second_argument)
  print ("Keyword Arg (kwarg) 1:", a_keyword_argument)
  print ("-----------------------------")


### Rely on defaults
a_function("arg 1", "arg 2")

### Passing arguments, positionally
a_function("arg 1", "arg 2", "arg 3")

### Passing arguments with keys
a_function(
    its_first_argument = "arg 1",
    its_second_argument = "arg 2",
    a_keyword_argument = 'arg 3',
    another_k = "arg 4"
)

## Passing arguments mixing positional and keyword arguments (in that order!
a_function("arg 1", "arg 2", another_k = "arg 4")


Argument 1: arg 1
Argument 2: arg 2
Keyword Arg (kwarg) 1: hello
-----------------------------
Argument 1: arg 1
Argument 2: arg 2
Keyword Arg (kwarg) 1: arg 3
-----------------------------
Argument 1: arg 1
Argument 2: arg 2
Keyword Arg (kwarg) 1: arg 3
-----------------------------
Argument 1: arg 1
Argument 2: arg 2
Keyword Arg (kwarg) 1: hello
-----------------------------


## *Packing* and *unpacking* arguments
Sometimes it is useful to treat a sequence of arguments as a single variable, for example, to handle sequences of an arbitrary number of parameters.

In these cases, the concept of *packing* (and *unpacking*) arguments can be useful. Packing and unpacking are defined with the `*args` operator for positional arguments, while they are defined with `**kwargs` for keyword arguments.

Let's set aside keyword arguments for now and focus on positional arguments. For example, consider a function that sums all the arguments and returns the result,


In [None]:
## Definition of a function with unpacking
def compute_sum (*packed_arguments):
  result = 0.
  ## Here we loop over the packed arguments
  for arg in packed_arguments:
    result += arg

  return result

## Plain call of the example function
print ("Plain sum:", compute_sum(1,2,3,4))

## Same call with packed arguments
my_args = 1,2,3,4
print ("Packed-argument sum:", compute_sum(*my_args))

Plain sum: 10.0
Packed-argument sum: 10.0


The functioning of packing and unpacking for keyword arguments is quite similar. Let's see an example.

First, let's define two functions, `f_packed` and `f_unpacked`. The first one simply prints the arguments `arg1` and `arg2`, while the second one prints all the keyword arguments

In [None]:
def f_unpacked (arg1, arg2):
  print ("from f_unpacked --- arg1", arg1)
  print ("from f_unpacked --- arg2", arg2)

def f_packed (**args):
  print ("--- from f_packed ---", args)
  f_unpacked (**args)

Conversely, the `f_packed` function takes any list of keyword arguments as an argument. To pass them to the `f_unpacked` function, the arguments need to be "unpacked".


In [None]:
f_packed (arg1 = "first_packed_argument", arg2 = "second_packed_argument")


--- from f_packed --- {'arg1': 'first_packed_argument', 'arg2': 'second_packed_argument'}
from f_unpacked --- arg1 first_packed_argument
from f_unpacked --- arg2 second_packed_argument


## Functions that return multiple values
With a mechanism similar to *packing* and *unpacking*, it is possible to define functions that return more than one value.

For example, let's see how we could write a function that returns the minimum and maximum values among all the arguments passed to it.


In [None]:
def min_and_max (*values):
  m = values[0] # Initialize to the first value
  M = values[0]

  # Loops over all the positional arguments
  for value in values:
    if value < m:   # for each value, if it is found smaller than the minimum,
      m = value     #   then the minimum is updated
    if value > M:   # if the value is larger than the maximum,
      M = value     #   then it updates the value

  return m, M       # Finally, it returns minimum and maximum

The return value of a function can therefore be "unpacked" immediately in the function call statement,


In [None]:
m, M = min_and_max (3, 2, 6, 2, 5)
print ("Min:", m)
print ("Max:", M)

Min: 2
Max: 6


*or* later, by temporarily saving it in a single variable:


In [None]:
mM = min_and_max (3, 2, 6)
print ("Min and Max", mM)
m, M = mM
print ("Min:", m)
print ("Max:", M)

Min and Max (2, 6)
Min: 2
Max: 6


In this case, 1, 2, and 3 to the right of the equal sign are packed and assigned to the tuple `one`, `two`, `three` after being unpacked.


## Functions of functions and decorators
Unlike many other programming languages, Python allows functions to be treated as variables. This has two very important implications:
 1. Functions can be passed as arguments to other functions;
 2. Functions can be defined within other functions.


#### Functions of functions

Let's start with the first point and see a simple example that shows how to pass a function as an argument.
Let's revisit the example of summing arguments, but let's try to generalize it to any operation that is done iteratively with a scheme like this:

```
Iteration    Operation

 1.          result = initial_value
 2.          result = f(result, x_1)
 3.          result = f(result, x_2)
 ...
 N.          result = f(result, x_N)
```


We will need to pass as input to this function:
 * the `initial_value` to start the computation
 * the function `f`
 * the sequence of values `x_1` ... `x_N` on which the function `f` will operate

Having understood what we want to achieve, we are ready to write the code:


In [None]:
def associatively_apply (initial_value, operation, *values):
  result = initial_value
  for value in values:
    result = operation (result, value)

  return result

The function thus defined iteratively applies a function called `operation` to the input values. To use this function, we must first define our `operation`.
As we mentioned earlier, `operation` is a generic function that takes two arguments as input and returns the result of some operation on those two input values.

For example:


In [None]:
## Sum operation
def op_sum (x1, x2):
  return x1 + x2

## Product operation
def op_prod (x1, x2):
  return x1 * x2

Now we have all the elements we need to test our associative reduction function.


In [None]:
print ("Sum of 2, 3, 4:", associatively_apply(0, op_sum, 2, 3, 4))
print ("Product of 2, 3, 4:", associatively_apply(1, op_prod, 2, 3, 4))

Sum of 2, 3, 4: 9
Product of 2, 3, 4: 24


This way of writing functions is extremely powerful, but it exposes the risk of making the code quickly unreadable.

Among the precautions that can be taken to mitigate the problem is the suggestion to never mix *packed* and *unpacked* positional arguments in the same function. In this case, we could have written,


In [None]:
def readable_associative_apply (*values, initial_value=0, operation=op_sum):
  result = initial_value
  for value in values:
    result = operation (result, value)

  return result

In this way, if we do not explicitly declare the operation, our function assumes that we want to perform a sum starting from zero,


In [None]:
print ("Sum:", readable_associative_apply(2,3,4))

Sum: 9


but we can still modify the default values of the operation and the initial value to apply other operations.


In [None]:
print ("Product:", readable_associative_apply(2,3,4, initial_value=1, operation=op_prod))

Product: 24


### Functions that return functions

It can be useful to define functions within other functions to define their behavior without having to pass arguments at each subsequent call.

For example, revisiting the example discussed above of `operation` functions, let's build a function that returns a certain operation depending on the input passed as an argument:


In [None]:
def op_factory (op_name):
  if op_name == 'sum':
    def op (x1, x2):
      return x1 + x2
  elif op_name == 'prod':
    def op (x1, x2):
      return x1 * x2
  else:
    def op (x1, x2):
      return 0.

  ## We return here the function
  return op

At the time of the call, we must remember that the returned value is a function, and therefore it must be called to obtain the result.


In [None]:
## Calls the op_factory function
op = op_factory ("sum")

## Calls the operation
result = op(1, 2)

## Print the result
print ("1 + 2 =", result)

1 + 2 = 3


A function B defined within a function A can automatically access (read) the variables defined in A, including the arguments passed at the time of the call.

For example, we can build a function that adds a constant to its single argument:


In [None]:
## Define the factory function
def make_adder (addendum):
  ## Define an inner function (accessing the arguments of the factory)
  def ret (x):
    return x + addendum
  ## Return the function
  return ret

## Calls the factory function to define an "adder"
add3 = make_adder (3)

## Calls the defined function
print ("3 + 4 =", add3(4))

3 + 4 = 7


### Decorators

At this point, we have all the elements to build functions that accept other functions as input, modify them, and return them.

For example, we can define a function that swaps the order of the arguments of one of the operators we discussed earlier.
Of course, this function won't be particularly useful for a sum, but it could be for a subtraction or division, or for any operation where the commutative property does not hold.

For example,


In [None]:
## (Re-)define the op_sum function
def op_diff (x1, x2):
  return x1 - x2

## Then we make the factory function
def swap_args (operator):
  def new_operator (x1, x2):
    return operator (x2, x1)

  return new_operator

## Then we call the factory function
swapped_diff = swap_args (op_diff)

## And then we call both the op_diff...
print ("4 - 3 =", op_diff(4, 3))
## ... and its version with swapped args
print ("3 - 4 =", swapped_diff(4, 3))

4 - 3 = 1
3 - 4 = -1


A function that takes a function as an argument and returns a modified version of that function is called a **decorator**.
*Decorators* can be used to add (or modify) the characteristics of a function at the time of its definition.
Again, it may not seem very useful for a simple operation like swapping arguments, but in general, it can be a very powerful tool.

For example, let's see how to apply the decorator that swaps arguments at the time of defining a function.


In [None]:
## Define the difference operator, swapping the arguments at the moment of the definition
@swap_args
def swapped_diff (x1, x2):
  return x1-x2

## Now, when calling swappwd_diff, the arguments are automatically swapped:
print ("4 - 3 =", swapped_diff(3, 4))

4 - 3 = 1


In Python, there are many libraries that also include decorators that allow you to extend the functionality of functions in an extremely concise way.

We will see some examples of decorators in the definition of classes, a little further down.


## Lambda functions
In the case of operators, we have seen examples of very short functions that have the sole purpose of transforming inputs into an output that is returned via the `return` statement.

These short functions can be defined in an even more concise way in Python, using `lambda` functions:


In [None]:
op_sum = lambda x1, x2: x1 + x2

print ("1 + 2 =", op_sum (1, 2))


1 + 2 = 3


Lambda functions are particularly useful when used in combination with the *factory* functions we saw earlier.

Let's revisit the example of associative reduction:

In [None]:
## Definition using a lambda function to define a default value
def associatively_reduce (*values, operator=lambda x1, x2: x1+x2, initial_value=0):
  ret = initial_value
  for value in values:
    ret = operator (ret, value)
  return ret

## Call of the function with its default values
print ("1 + 2 + 3 + 4 =", associatively_reduce(1,2,3,4))

1 + 2 + 3 + 4 = 10


We can use a lambda function to define the operator *inline*, that is, within a single line of code.

For example:


In [None]:
print ("2 * 5 =", associatively_reduce(2, 5, operator=lambda a,b: a*b, initial_value=1))

2 * 5 = 10


## Documentation with **docstring**
Since Python does not define the data type of each variable, it is crucial to document functions, explaining what the input arguments represent and what the function returns.

The documentation of functions can be (and it is good practice to do so) entrusted to `docstrings`, which are text strings placed immediately after the function definition with the following syntax


In [None]:
def some_function (arg1, arg2, arg3):
  "Some function is a function that returns the sum of its there arguments"
  return arg1 + arg2 + arg3

The documentation of a function can then be retrieved and printed using the `help` command.

In [None]:
help(some_function)

Help on function some_function in module __main__:

some_function(arg1, arg2, arg3)
    Some function is a function that returns the sum of its there arguments



In this way, the documentation remains attached to the function even when using library functions, which are therefore not directly displayed in the notebook or Python script.


# Classes
Just as functions allow for the representation of elementary units in sequential programming, classes represent the fundamental unit of object-oriented programming (*Object Oriented Programming, OOP*).


Object-oriented programming is a programming paradigm that involves the existence of abstract objects which include data and the functions to access that data.

These "*objects*" allow for the definition of logical units of a program that can be distributed and reused in other programs much more effectively than what can be done with functions.

Each object is characterized by a **definition** and an **instance**.
The *definition* of a class indicates which variables each instance should contain and the functions that should access those variables to modify the data "contained" in each instance.

It is important to note that a single definition (class) can correspond to multiple instances. Each instance will have the same interface, but the data contained can be different.

Let's see an example:


In [None]:
## Definition of the class (with no data)
class Object:
  pass

## Two instances: same class, different data
instance1 = Object()
instance2 = Object()

## Constructor and Destructor

When a class is instantiated, a function called the *constructor* is executed with the task of allocating the necessary memory for the data included in each instance.
In Python, the constructor is defined by the function `__init__`.
Similarly, there is a function that is called when each instance is removed from memory to cleanly remove the data from memory. In Python, the destructor function is called `__del__` and is used very, very rarely, because Python keeps track of the used memory and automatically cleans up unnecessary memory with a mechanism called *garbage collection* (which we will discuss later).

Let's see an example of a class with a constructor and destructor.


In [None]:
class Object:
  def __init__ (self):
    print ("Class Object was instanciated")

  def __del__ (self):
    print ("Removed Object from memory")

We observe that both the constructor (`__init__`) and the destructor (`__del__`) are defined with an argument called `self`.

The first argument of a class function is always a *reference* to the class instance.
This will become clearer later. The word `self` is not a Python keyword reserved for this variable, so it could be called something else. However, the use of `self` has become such a widespread convention that it is now considered a rule of the language.

Normally, the `__init__` function is used to initialize the variables contained in an object.

To access the variables of an object, the `.` (dot) operator is used,

```
<instance>.<variable>
```

This syntax is also valid within the class functions, where it becomes
```
self.<variable>
```

Let's see an example:


In [None]:
## Define a class named Object
class Object:
  ## Define the constructor
  def __init__ (self):
    ## initializes and defines the variable `some_variable`
    self.some_variable = 5

## Create an instance of Object, named obj5
obj5 = Object()

## Create an other instance, named obj4
obj4 = Object()

## Modify the value of "some_variable" in obj4
obj4.some_variable = 4

## the variable `some_variable` of the instance obj5 has
## not been modified and it is still set to its default value
print (obj5.some_variable, "= 5")

## the variable `some_variable` of the instance obj4 has been
## set to 4
print (obj4.some_variable, "= 4")

5 = 5
4 = 4


We can define additional arguments for the constructor, besides the reference to the instance (`self`).
The exact same rules apply as for other functions.
We can define positional and keyword arguments, and we can use packing and unpacking operators. Just like with "normal" functions.


In [None]:
class Object:
  def __init__ (self, variable):
    self.variable = variable

obj1 = Object(1)          # define `variable` as a positional argument
obj2 = Object(variable=2) # define `variable` as a keyed argument

print (obj1.variable, "= 1")
print (obj2.variable, "= 2")

1 = 1
2 = 2


## *Garbage collection*

Before concluding the introduction to constructors and destructors, let's see the destructor in action. Even though the destructor is rarely defined in Python, adding a `print` statement to the destructor can be useful to understand when Python removes variables (in this case, instances) from memory.

Consider the following example:

In [None]:
# Define a class with both constructor and destructor
class Object:
  def __init__ (self):
    print ("init")

  def __del__ (self):
    print ("del")

# Then we define a function that creates that object
def func_create ():
  obj = Object()

# And another function that creates and *returns* the object
def func_return ():
  obj = Object()
  return obj

Now, if we call the function `func_create`, the object is created and immediately destroyed because it is not accessible outside the function, so Python removes it from memory. And when it is removed, the destructor is called.


In [None]:
func_create()

init
del


The block (or blocks) of code in which a variable is visible is called the ***scope***. When exiting a *scope*, Python checks which variables are still accessible, and if it finds variables that no longer have "references" and are therefore no longer usable, it eliminates them.

This mechanism is called *garbage collection*.

Let's see what happens if, instead of just creating the object, we create it and return it.


In [None]:
a = func_return()

init


In this case, the destructor is not called because the object is still accessible through its "reference" `a`.

However, we can remove the "reference" `a`, foregoing access to the instance of Object created in `func_return` later in the program:


In [None]:
del a

del


Rimossa la referenza `a`, non c'è più nessuna referenza all'istanza di `Object` che abbiamo creato poco sopra, e infatti Python invoca il distruttore per ripulire la memoria da un oggetto non più necessario.

Il meccanismo del *garbage collection* è piuttosto robusto, e normalmente si possono scrivere programmi complicati a piacere ignorando del tutto come Python gestisce la memoria.
Tuttavia, laddove si utilizzino librerie con implementazioni in `C++` che gestiscono la memoria senza *garbage collection*, la gestione della memoria va considerata con attenzione perché l'accesso a determinate risorse può avvenire tramite puntatori di cui Python non ha contezza. E una volta che Python non ha più referenze verso un oggetto, lo rimuove dalla memoria, anche se la libreria `C++` lo sta ancora utilizzando.
In fisica subnucleare, questo problema si incontra spesso, perché le librerie ROOT sviluppate dal CERN per l'analisi dei dati dei grandi esperimenti di LHC sono scritte in C++ e sono state solo recentemente interfacciate a Python.



## Methods (or *member functions*)

The functions `__init__` and `__del__` have special names that indicate to Python that they should be called automatically in certain circumstances. With the same syntax, functions can be defined as members of the class with arbitrary names. Clearly, these functions will not be called automatically by Python, but only upon explicit call

In [None]:
## Define the class
class MinMaxer:
  ## Define its constructor, with packed arguments
  def __init__ (self, *args):
    ## Store the packed arguments in a member variable (where ctor is the short for constructor)
    self.ctor_args = args

  ## Define a method to access the minimum
  def get_min (self):
    min = self.ctor_args[0]
    for arg in self.ctor_args:
      if arg < min:
        min = arg

    return min

  ## Define a method to access the maximum
  def get_max (self):
    max = self.ctor_args[0]
    for arg in self.ctor_args:
      if arg > max:
        max = arg

    return max


After that, we instantiate the class into a couple of variables and call the methods `get_min` and `get_max`.


In [None]:
mM16 = MinMaxer (1, 2, 3, 4, 5, 6)
mM79 = MinMaxer (9, 9, 8, 9, 7, 8)

print ("1 =", mM16.get_min())
print ("6 =", mM16.get_max())
print ("7 =", mM79.get_min())
print ("9 =", mM79.get_max())

1 = 1
6 = 6
7 = 7
9 = 9


This is the first example of a class with all the elements that made object-oriented programming a revolution in the world of computer science. Let's focus a bit more on these elements.

1. `MinMaxer` defines a constructor, where resources are acquired. In this very simple case, the resources are represented by a list of integers passed as "packed" arguments. The data is stored within the class in a member variable.
2. `MinMaxer` defines an interface that allows access to the data

## Operators
Besides `__init__` and `__del__`, there are many other keywords in Python that allow defining methods that are called automatically in certain circumstances. These methods are typically referred to as **operators** because the "circumstances" that determine their call are often indicated by symbols that appear around the instance, which in mathematics represent operators.

Easier done than said, so let's see an example.
We define a "vector" class that takes three coordinates as an argument (limiting ourselves to a three-dimensional space). For example, we want to be able to define our vectors with the syntax

```python
vecX = Vector3d(1, 0, 0)
vecY = Vector3d(0, 1, 0)
vecZ = Vector3d(0, 0, 1)
```

At this point, we want to implement the dot product between these vectors, so that
```python
vecX * vecX
```
returns one, while
```python
vecX * vecY
```
returns zero.

So,



In [None]:
class Vector3d:
  def __init__ (self, x, y, z):
    self.x = x
    self.y = y
    self.z = z

  ## Defines the operator multiplication.
  def __mul__ (self, other):
    return self.x * other.x + self.y * other.y + self.z * other.z

vecX = Vector3d(1, 0, 0)
vecY = Vector3d(0, 1, 0)

print (vecX * vecX, "= 1")
print (vecX * vecY, "= 0")


1 = 1
0 = 0


The `*` operator placed between `vecX` and `vecY` causes the `__mul__` function to be called, even without an explicit call.

The `__mul__` function takes two arguments representing the references to the two factors involved in the multiplication. Note that the first of the two is always the class itself, while the second can be an arbitrary object, such as a real number.
That is, we expect that
```python
vecX * 3
```
would return a vector (3, 0, 0), but in this simple example an error would be generated because we have implemented the `__mul__` function with the assumption that the second factor is a vector.

Let's see an example of `Vector3d` where, instead of defining the `vector*vector` product, we define the `vector*real` product.


In [None]:
class Vector3d:
  def __init__ (self, x, y, z):
    self.x, self.y, self.z = x, y, z

  def __mul__ (self, other):
    "Multiplication with a scalar"
    return Vector3d (self.x * other, self.y * other, self.z * other)

vecX = Vector3d (1, 0, 0)
vec3X = vecX * 3
print ("3 0 0 =", vec3X.x, vec3X.y, vec3X.z)

3 0 0 = 3 0 0


Unfortunately, Python cannot know if our multiplication enjoys the commutative property or not. So, if instead of writing `vecX * 3`, we wrote `3 * vecX`, we would generate an error.
In fact, the multiplication `other * self` does not invoke the same function `__mul__` invoked by `self * other`!

 * `self * other` -> calls `__mul__`
 * `other * self` -> calls `__rmul__`

A table that reports the names of functions called depending on the syntactic context in which the reference to an instance is found can be consulted [here](https://docs.python.org/3/library/operator.html#mapping-operators-to-functions).


## Object-Oriented Programming (Introduction)

Structuring code into objects, instead of as a sequence of instructions, offers great advantages in code reuse. Indeed, a sufficiently generic object can be reused as-is in many different contexts.
Consider a class that defines a vector or a matrix, with an interface that allows calculations

The reuse of code has led to the proliferation of libraries that address more or less specific problems, presenting an interface to the "client" application that can adapt to different contexts without needing to understand the logic used to solve various problems.

This approach to programming is called "object-oriented" and has historically represented a true revolution in the way software is conceived and distributed.
Along with the concept of an object, techniques for interacting with these objects have also been developed to address specific issues that may arise in the distribution of object-based libraries.

Among these, we will briefly discuss three important concepts and see how they have been adopted in Python. In particular, we will see:
 * the concept of *encapsulation*, which involves exposing an interface that is as generic and independent of the implementation as possible. Encapsulating the *implementation* means allowing the freedom to modify it later without having to change the interface that the object exposes to client applications. *Encapsulation* is thus a fundamental technique for ensuring *maintainability* of the code;
 * the concept of *inheritance* of a class, which involves extending the behavior of a class by adding new functionalities (or in some measure modifying existing ones). The inheritance technique takes the concept of code reuse to the extreme because it allows the construction of "inheritance trees" where all functionalities shared among even very different objects are grouped in classes progressively closer to the root, and are differentiated by functionalities described closer to the leaves.
 * the concept of *polymorphism* which allows using an object by considering only parts of its interface, even mixing different objects that still present compatible interfaces.

These concepts are not unique to Python and, in fact, were developed in compiled languages (such as C++) where they are used much more widely than in Python.


### Inheritance

Let's start by discussing the concept of inheritance with an example.

We define a class `Base` that, when instantiated, stores an integer, which we will call `datum`, and returns it through the `get_value` function. We also add a function that simply prints the original data, which we will call `dump`.


In [None]:
class Base:
  def __init__ (self, value):
    self.datum = value

  def get_value (self):
    return self.datum

  def dump (self):
    print (self.datum)

From this class, we derive two subclasses that, instead of returning the data initialized by the constructor, return it multiplied and divided by a constant, respectively, which is also defined in the constructor.

Here is the first class (which multiplies):


In [None]:
class Multiplier (Base):  # The parentheses indicate that the Multiplier class
                          # inherit from "Base" object.
  def __init__ (self, value, multiplier):
    Base.__init__ (self, value)  # Fall back on the original multiplier
    self.multiplier = multiplier

  def get_value (self):   # Override of the function `get_value`
    return self.datum * self.multiplier

Let's go through it line by line.
```
class Multiplier (Base):
```
First, we define a new class called `Multiplier` that **inherits** from the `Base` class.
The inheritance relationship means that every instance of `Multiplier` is also an instance of `Base`. Removing functionalities from a `Base` class, while possible, often results in code full of bugs and very difficult to read and maintain. The rule is that as we inherit from a class, we add functionalities so that every instance of a derived class *is also* an instance of the base class.

```
def __init__ (self, value, multiplier):
```
Here we define the constructor. We take as input a value that initializes the `Base` class, plus the multiplier.

```
  Base.__init__ (self, value)
```
With this instruction, we are "reusing" the constructor of the `Base` class, explicitly indicating it.
Note that `Base`

In [None]:
class Divider (Base):
  def __init__ (self, value, divider):
    Base.__init__ (self, value)
    self.divider = divider

  def get_value (self):
    return self.datum / self.divider

Let's proceed to instantiate all three of them:

In [None]:
base = Base(3)
mult = Multiplier(4, 2)
divi = Divider(6, 2)

## As a Base instance, base will return the argument as it is
print ("3 =", base.get_value())
## As a Multiplier instance, mult will return the product of the two arguments
print ("8 =", mult.get_value())
## As a Divider instance, divi will return the division of the first and second argument
print ("3 =", divi.get_value())


3 = 3
8 = 8
3 = 3.0


Since we did not override the `dump` function,
we can call the original version defined in `Base`, which ignores multiplication and division and simply prints the value.


In [None]:
base.dump() ## -> 3
mult.dump() ## -> 4
divi.dump() ## -> 6

3
4
6


### Polymorphism and *Duck Typing*
Continuing with this example, let's try to show a case of polymorphism.
We can, for example, loop through different objects, trusting that they all expose at least a part of the interface (the one we use) in common.
In this case, we will use the `get_value` function. They are different objects, they have different implementations, but the `<instance>.get_value()` interface is common to all, so the loop works:


In [None]:
for instance in base, mult, divi:
  print (instance.get_value())
print ('---')
for instance in base, mult, divi:
  instance.dump()

3
8
3.0
---
3
4
6


In Python, as in most object-oriented languages, this behavior can be achieved with inheritance, as shown above, so that a common `Base` ensures a common interface.

In Python, however, since there is no compilation, the same effect can be achieved simply by defining the same interface in objects that share nothing.

This technique of achieving polymorphism without inheritance is called ***duck typing***.

For example,


In [None]:
class Three:
  def get(self):
    return 3

class Four:
  def get(self):
    return 4


i3, i4 = Three(), Four()
for instance in i3, i4:
  print (instance.get())

3
4


### Encapsulation and Properties

As previously discussed, encapsulation is a technique that allows modifying the implementation while maintaining the same interface.

Let's try to understand why it is useful and important with an example *not to follow*.

We define a new class MinMaxer that directly calculates the minimum and maximum value in the constructor and saves them in two variables `min` and `max` which are exposed to client applications to obtain, precisely, the minimum and maximum value among the arguments.


In [None]:
class MinMaxer:
  def __init__ (self, *values):
    self.min = values[0]
    self.max = values[0]
    for value in values:
      if value > self.max:
        self.max = value
      if value < self.min:
        self.min = value


Let's imagine a very complex application that uses our MinMaxer.


In [None]:
class VeryComplicatedClientApplication:
  def at_some_point (self, *args):
    maxer = MinMaxer(*args)
    print ("Max is", maxer.max)

app = VeryComplicatedClientApplication()
app.at_some_point(2, 3, 4, 5, 6)

Max is 6


At this point, we decide that we want to add the ability to define two thresholds to `MinMaxer`, so that the maximum and minimum values can never be above or below these thresholds.

With the class defined as it is, we would be forced to recalculate the minimum and maximum values every time the thresholds are updated, which can be very inconvenient in terms of computational cost.

Let's look at this alternative example:


In [None]:
class MinMaxer:
  def __init__ (self, *values, th_min=-10000, th_max=10000):
    self._min = values[0]
    self._max = values[0]
    self.th_min = th_min
    self.th_max = th_max
    for value in values:
      if value > self._max:
        self._max = value
      if value < self._min:
        self._min = value

  def get_min(self):
    if self._min > self.th_min:
      return self._min
    else:
      return self.th_min

  def get_max(self):
    if self._max < self.th_max:
      return self._max
    else:
      return self.th_max


By exposing functions as interfaces instead of member variables, we retain the ability to modify them, for example, to add new functionalities to the class.

This principle is at the core of the concept of *encapsulation*.

We note that the convention in Python is that variables not intended to be accessed from outside the class (*private*) are indicated by prefixing the variable name with an underscore (`_`).
So, looking at the defined class above, we quickly understand that:
 * `_min` and `_max` should not be read (or written) directly but are internal variables of the class.
 * `th_min` and `th_max` are public variables that we can read and modify (public variables).

Python allows implementing *encapsulation* for variables as well, through the concept of properties.

This way, we can have the advantages of *encapsulation* discussed above while maintaining the same exact interface that we have used *so extensively* in our `VeryComplicatedClientApplication`.


In [None]:
## MinMaxer
class MinMaxer:
  def __init__ (self, *values, th_min=-10000, th_max=10000):
    self._min = values[0]
    self._max = values[0]
    self.th_min = th_min
    self.th_max = th_max
    for value in values:
      if value > self._max:
        self._max = value
      if value < self._min:
        self._min = value

  @property
  def min(self):
    if self._min > self.th_min:
      return self._min
    else:
      return self.th_min

  @property
  def max(self):
    if self._max < self.th_max:
      return self._max
    else:
      return self.th_max


## Client Application
class VeryComplicatedClientApplication:
  def at_some_point (self, *args):
    maxer = MinMaxer(*args)
    print ("Max is", maxer.max)

app = VeryComplicatedClientApplication()
app.at_some_point(2, 3, 4, 5, 6)

Max is 6


By exposing functions as interfaces instead of member variables, we retain the ability to modify them, for example, to add new functionalities to the class.

This principle is at the core of the concept of *encapsulation*.

We note that the convention in Python is that variables not intended to be accessed from outside the class (*private*) are indicated by prefixing the variable name with an underscore (`_`).
So, looking at the defined class above, we quickly understand that:
 * `_min` and `_max` should not be read (or written) directly but are internal variables of the class.
 * `th_min` and `th_max` are public variables that we can read and modify (public variables).

Python allows implementing *encapsulation* for variables as well, through the concept of properties.

This way, we can have the advantages of *encapsulation* discussed above while maintaining the same exact interface that we have used *so extensively* in our `VeryComplicatedClientApplication`.


In [None]:
## MinMaxer
class MinMaxer:
  def __init__ (self, *values, th_min=-10000, th_max=10000):
    self._min = values[0]
    self._max = values[0]
    self.th_min = th_min
    self.th_max = th_max
    for value in values:
      if value > self._max:
        self._max = value
      if value < self._min:
        self._min = value

  @property
  def min(self):
    if self._min > self.th_min:
      return self._min
    else:
      return self.th_min

  @property
  def max(self):
    if self._max < self.th_max:
      return self._max
    else:
      return self.th_max


## Client Application
class VeryComplicatedClientApplication:
  def at_some_point (self, *args):
    maxer = MinMaxer(*args)
    print ("Max is", maxer.max)

app = VeryComplicatedClientApplication()
app.at_some_point(2, 3, 4, 5, 6)

Max is 6


Properties also allow making some variables read-only. In fact,
```
minmaxer.min = 3
```
would generate an error because we have not defined a function to set a variable.

As a concluding example on encapsulation with properties, let's see an example of a class that defines a circle and allows defining the radius and area, and calculating one given the other.


In [None]:
class Circle:
  def __init__ (self, radius):
    self.radius = radius

  @property
  def radius (self):
    return self._radius

  @radius.setter
  def radius (self, value):
    self._radius = value if value > 0 else 0

  @property
  def area (self):
    return 3.141 * self.radius * self.radius

  @area.setter
  def area (self, new_area):
    new_area = new_area if new_area > 0 else 0
    self._radius = (new_area/3.141)**0.5

print ("--- Circle with radius: 3 ---")
c = Circle (3)
print ("R:", c.radius)
print ("A:", c.area)
print ("--- Circle with radius: 1 ---")
c.radius = 1
print ("R:", c.radius)
print ("A:", c.area)
print ("--- Circle with area: 3 ---")
c.area = 3
print ("R:", c.radius)
print ("A:", c.area)

--- Circle with radius: 3 ---
R: 3
A: 28.269
--- Circle with radius: 1 ---
R: 1
A: 3.141
--- Circle with area: 3 ---
R: 0.9772972104898937
A: 3.0


## Static Methods and Class Methods

Besides properties, there are two other *special methods* that allow the use of methods in a non-standard way:
 * *static methods*, which do not access the instance of the class and can be called even without instantiating the class.
 * *class methods*, which access the class but not the instance

In [None]:
class MyClass:
  def __init__ (self, name):
    self.name = name

  def standard_method (self):
    print ("instance: ", self.name)

  @classmethod
  def class_method (cls):
    print ("inherited?", cls != MyClass)

  @staticmethod
  def static_method ():
    print ("No self, no class reference")

class DerivedClass (MyClass):
  pass

## Let's intanciate the two classes
instance = MyClass("an instance")
another_instance = DerivedClass("another_instance")

## All functions can be called from instances
instance.standard_method()
instance.class_method()
instance.static_method()

## The class method access to the class of the instance, so for example
## it can be used to check if the instance is of a certain class
another_instance.class_method()

## Static and class methods can be called without an instance, from the class
MyClass.class_method()
DerivedClass.class_method()

MyClass.static_method()

instance:  an instance
inherited? False
No self, no class reference
inherited? True
inherited? False
inherited? True
No self, no class reference


# Gli oggetti *built-in* in Python

Dopo aver discusso come si implementano gli ogget in Python, vediamo alcuni esempi di oggetti in Python. Questi oggetti sono parte integrante del linguaggio e disponibili senza caricare librerie esterne.

Qui ci limiteremo ad un breve riassunto utile al corso.
La documentazione completa è disponibile qui: https://docs.python.org/3.8/library/stdtypes.html#text-sequence-type-str

## Mutable and Immutable Data Types
Data types in Python are classified into two categories: **mutable** and **immutable**.

Immutable data types are:
 * Integers (`int`)
 * Floating-point numbers (`float`)
 * Boolean variables (`bool`)
 * Text strings (`str`)
 * Tuple sequences (`tuple`)

While mutable data types are:
 * Lists (`list`)
 * Dictionaries or hash tables (`dict`)
 * Sets of unique or non-repeated variables (`set`)
 * Instances of classes (see above)

Before considering each of these types in detail, let's focus on the difference between mutable and immutable data types.

Let's create a function `add` that takes two input arguments and adds the second argument to the first one, without returning anything.


In [None]:
def add (arg1, arg2):
  arg1 += arg2

When calling this function with immutable data, such as integers or strings, the variables passed as arguments are not modified.


In [None]:
# Example with integers
a, b = 1, 2
add (a, b)
print (a)

# Example with strings
a = "hello"
b = "world"
add (a, b)
print (a)

1
hello


With mutable data types, however, the variables used
as arguments are modified. Even outside the function.

Let's see an example with lists.


In [None]:
a = list(['hello'])
b = list(['world'])
add (a, b)          ## Notice: the + operator means "concatenate" for two lists
print (a)

['hello', 'world']


In this second case, the variable `a` gets modified!

**Why is the behavior different?**
The reason is that immutable data types are considered small objects that can be copied every time they are modified. For example, when we write
```
a = 42
a += 1
```
the variable `a` is allocated and assigned the value 42, then a second variable is allocated and assigned the value 43, and finally the new variable allocated containing 43 is "linked" to the symbol `a`. If there are no more active references to the original variable (42), it is removed from memory according to the *garbage collection* mechanism discussed earlier.
In the case discussed above, a reference to the original variable still exists after it has been incremented (outside the `add` function), so it is not deallocated, nor is it modified.

In Python, the `id` instruction allows you to know the identifier of a memory area where a variable is stored. Using it, we can verify that the variable `a` is stored in two different memory cells before and after its increment:


In [None]:
a = 42
print (id(a))
a += 1
print (id(a))

10752168
10752200


These differences between **mutable** and **immutable** variables should be taken into consideration especially when working with functions and classes, because passing a mutable variable as an argument implicitly authorizes that class or function to modify that object, and this can have unexpected implications.

The difference is so important that Python has defined two data types that are practically identical except for the fact that one is mutable and the other is not: *lists* (mutable) and *tuples* (immutable).


## Strings
Text strings, or simply strings, represent fragments of text. Working with strings is fundamental for interacting with human beings.


In Python, strings can be defined indifferently
 * `with 'single quotes'`, or
 * `with "double quotes"`

When defined this way, the text strings must be closed on the same line in which they are opened. That is,
```
"this string is valid"
"this string
 is not valid"
```

**You can** define multi-line strings with triple quotes, for example:
```
"""This is
a valid string"""

''' and so is
this one'''

""" and this one, even though it is on a single line"""
```

Being able to define strings with an arbitrary choice of the four opening and closing symbols allows you to use the other quote character within the string. For example, we can write a string like:
```
'In English, the word "Monday" translates to "Monday".'
```
while
```
"In English "Tuesday" translates to "Tuesday""
```
would result in an error because the string would be closed by the double quote between "In English" and "Tuesday".

By using the four symbols, you can create strings that push the limits, like:
```
"""Writing in Python: '''The translation of "the word 'Wednesday' translates to 'Wednesday'" is "the word 'Wednesday' translates to 'Wednesday'."""
```
Although not particularly suitable for interaction with humans, this nesting of strings is very useful when generating snippets of Python code, for example, to interact with databases.

Here are some examples of assigning strings to a variable:


In [None]:
a_string = "This is a string"
another_string = 'This is a string'
a_third_string = '''This is a different string'''
a_fourth_string = 'This is a string and its type is "str"'

### Characters and Substrings (*Indexing* and *Slicing*)
As in other programming languages, square brackets can be used to access portions of a string, for example:


In [None]:
a = "s1234"
print ("s =", a[0])
print ("4 =", a[4])

s = s
4 = 4


However, since strings are immutable, you cannot replace a character in a string, and the code
\```
a[3] = 4
\```
would cause an error (`'str' does not support item assignment`).


Python also supports the use of negative indices, which are interpreted as *starting from the last character*. For example, `a[-1]` is the last character of the string `a`, `a[-2]` is the second to last, `a[-3]` is the third to last, and so on.

In [None]:
a = "hello world"
print (a[-1], a[-2], a[-3])

d l r


Square brackets can also be used to identify portions of a string through an operation called ***slicing***, to distinguish them from single character selection operations typically referred to as ***indexing***.
To select the substring that starts with the `start` character and ends with the `stop` character (excluding it) in steps of `step`, use the expression
`string[from:to:step]` in which, if not explicitly indicated, `from` is initialized to zero, `to` to -1, and `step` to 1.

Let's see some examples:

In [None]:
a = "Large Hadron Collider"
print (a[:5])
print (a[6:-9])
print (a[-8:])

Large
Hadron
Collider


### String Comparison

In Python, you can use the operators `==`, `>` and `<` to compare strings. The meaning of the `==` operator is probably obvious, while the *greater than* and *less than* operators refer to the "ASCII alphabetical order". As in the traditional alphabetical order, you start from the first letter and sort so that words that begin with letters that appear earlier in the alphabet come first in the sorted list. However, unlike the standard alphabetical order, in Python, uppercase letters always come before lowercase letters.

**To learn more, see:** [ASCII Character Order](http://support.ecisolutions.com/doc-ddms/help/reportsmenu/ascii_sort_order_chart.htm)


In [None]:
print ("best" > "art")  ## "b" appears after in the alphabet, so it is "larger" than "a"
print ("Best" < "art")  ## but "B", being an upper case letter appears before any lower case letter, including "a"
print ("Best" > "Art")  ## Standard alphabetical order is restored when both are uppercase
print ("Best" > "Beast") ## If the first letter is equal, ordering continues with the second one and so on

print ("Best" == "Best") ## Similarly, comparison with '==' is case sensitive
print ("Best" != "best")

True
True
True
True
True
True


Thanks to the definition of the operators `<` and `>`, we can use the `sorted` keyword that we have already seen as an extension of the `for` loop to sort alphabetically (ASCII).

For example,


In [None]:
authors = "Duck, Donald", "Mouse, Mickey", "Goofy"

for author in sorted(authors):
  print (author)

Duck, Donald
Goofy
Mouse, Mickey


#### Uppercase and Lowercase

The `upper` and `lower` functions allow converting a string to its uppercase and lowercase counterpart, respectively.


In [None]:
print ("Hello world!".lower())
print ("Hello world!".upper())

hello world!
HELLO WORLD!


In [None]:
words = ["art", "beauty", "Charm"]

print ("Ordering with ASCII alphabet:")
print (", ".join (sorted(words)))
print ()
print ("Ordering with upper ASCII alphabet:")
print (", ".join (sorted(words, key=str.upper)))

Ordering with ASCII alphabet:
Charm, art, beauty

Ordering with upper ASCII alphabet:
art, beauty, Charm


### Concatenare e suddividere stringhe
Le stringhe in Python possono essere concatenate utilizzando gli operatori `+` e `+=`. Ad esempio possiamo scrivere,


In [None]:
name = "Mickey"
surname = "Mouse"
full_name = name + " " + surname
print ("Full Name:", full_name)

Full Name: Mickey Mouse


### Concatenating and Splitting Strings
Strings in Python can be concatenated using the operators `+` and `+=`. For example, we can write,


In [None]:
name = "Mickey"
surname = "Mouse"
full_name = name + " " + surname
print ("Full Name:", full_name)

Full Name: Mickey Mouse


or

In [None]:
full_name = name
full_name += " "
full_name += surname
print ("Full Name:", full_name)

Full Name: Mickey Mouse


However, for very long strings, or an arbitrary number of strings, it may not be very convenient.
We can use the `join` method of the `str` class to concatenate a sequence of strings, separated by a delimiter.
For example,


In [None]:
name_and_surname = 'Donald', 'Duck'
print (' '.join(name_and_surname))  ## Here, the separator is a "white space"

surname_and_name = 'Duck', 'Donald'
print (", ".join(surname_and_name)) ## Here the separator is ", "

Donald Duck
Duck, Donald


The `join` method can be used, for example, to concatenate a directory to a filename:
```
separator = "\" if windows else "/"
path_and_file = path_to_dir, filename
path = separator.join(path_and_file)
```

Or in combination with the newline character indicated as `\n`:
```
lines = ["first line", "second line", "third line"]
multiline_string = "\n".join(lines)
```


In [None]:
crucial_apps = "chrome", "firefox", "vim", "openssh", "python3.8+"
print ("\n".join(crucial_apps))

chrome
firefox
vim
openssh
python3.8+


The reverse operation of `join`, which concatenates strings, is `split`, which allows splitting a string into multiple substrings based on a delimiter.
For example,


In [None]:
surname, name = "Mouse, Mickey".split(', ')
print (name[0] + ". " + surname)

M. Mouse


#### String Formatting
String formatting allows inserting numerical variables into a string according to various conversion rules (e.g., base, or number of significant digits).

Starting from Python 3.6, strings can be formatted with the following syntax:
```
f"<string> {<variable>:<format>} <string>"
```
where `f` before the double quotes indicates that in this string, curly braces denote the presence of variables to be formatted in the string, `<string>` indicates generic text strings that compose the string,
`<variable>` is the variable we want to appear in the string with the format `<format>`. Once again, easier said than done.

The format can be omitted, making our `f`*-string* particularly readable.
Let's see an example,
```
name = "Alice"
age = 30
greeting = f"Hello, {name}! You are {age} years old."
print(greeting)
```


In [None]:
v = 16
## Base
print (f"{v}") ## plain, decimal formatting
print (f"{v:d}") ## explicited decimal base
print (f"{v:x}") ## hex formatting
print (f"{v:o}") ## octal formatting
print (f"{v:b}") ## binary formatting

16
16
10
20
10000


In [None]:
## As a floating point (see later for more details)
print (f"{v:f}")

16.000000


In [None]:
## padding
print (f"{v:03d}") ## at least three digits, padding the rest with zeros
print (f"{v:05d}") ## at least five digits
print (f"{v:5d}")  ## five chars, padding with white spaces at left
print (f"{v:<5d}.") ## five chars, padding with spaces at right
print (f"{v:^5d}") ## five chars, trying to center the number in the field

016
00016
   16
16   .
 16  


For real numbers, the formatting rules are the same, but we can also define how many decimal places should appear and whether we want to use scientific notation (useful for large numbers).


In [None]:
 # a decent approximation for pi

pi = 3.141592653589793238462643383279

print (f"""Formatting pi:
  - bare format:              {pi}
  - explicit floating point:  {pi:f}
  - with fixed precision:     {pi:.2f}
  - with scientific notation: {pi:.2e}
  - with scientific notation: {pi:.2E}
  - general format:           {pi:.3g}  {pi*1e5:.3g} (note that here .3 indicates the number of significant, rather than decimal, digits)
  """)

Formatting pi:
 - bare format:              3.141592653589793
 - explicit floating point:  3.141593
 - with fixed precision:     3.14
 - with scientific notation: 3.14e+00
 - with scientific notation: 3.14E+00
 - general format:           3.14  3.14e+05 (note that here .3 indicates the number of significant, rather than decimal, digits)
 


#### Solved Exercise
Given the list of characters:
```
Mickey Mouse, Donald Duck, Minnie Mouse
```
write a program that constructs a string
```
"""
1) M. Mouse
2) D. Duck
3) M. Mouse
"""
```
and then prints it on the screen.

**Solution**


In [None]:
characters = "Mickey Mouse, Donal Duck, Minnie Mouse"
characters = characters.split (", ")

string = ""

for iChar, char in enumerate(characters):
  name, surname = char.split(" ")
  string += f"{iChar+1}) {name[0]}. {surname}\n"

print (string)

1) M. Mouse
2) D. Duck
3) M. Mouse



## Tuple
Tuples represent non-modifiable (immutable) sequences of variables.

The *Packing* and *Unpacking* of arguments, discussed in the context of functions, works by constructing and reading `tuples` composed of those arguments.

In fact,


In [None]:
def a_function (*args):
  print (type(args))

a_function("Hello", "World")

<class 'tuple'>


or simply

In [None]:
a_sequence = "Hello", "World"
print(type(a_sequence))

<class 'tuple'>


Tuples can also be defined within the same line using parentheses, for example, in the call to a function with a single argument we can write:


In [None]:
def a_function (single_argument):
  print (type(single_argument))

a_function ( ("Hello", "World") )

<class 'tuple'>


The keyword `tuple` can be used to convert other *iterables* into `tuples`. For example, we have seen that `range` can be used in combination with the `for` loop to iterate over ordered sequences of integers. We can convert those sequences into tuples by writing:


In [None]:
tuple(range(10))

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

similarly, a string can be converted into a tuple


In [None]:
tuple ("Hello world")

('H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd')

Of course, a tuple can also be converted into another tuple, and although it is unnecessary, it is often done to improve code readability. For example,


In [None]:
a = tuple((1,2,3,4))

first create a tuple `(1,2,3,4)` and then explicitly convert it into a `tuple`.
This allows us to immediately see that `a` is a tuple (and therefore immutable).


#### Simple operations with `tuples`
We can count the number of elements that make up a `tuple` with the `len` function


In [None]:
len((1,"Mickey Mouse", 2.3))

3

We can sum the values it contains using `sum`


In [None]:
one_to_four = (1,2,3,4)
print (type(one_to_four), ",", sum(one_to_four))

<class 'tuple'> , 10


And of course, we can define a loop over the elements of a tuple


In [None]:
one_to_four = tuple((1,2,3,4))
for a in one_to_four:
  print (a)

1
2
3
4


#### The `in` operator
The `in` operator can be used to check whether an element is present in a tuple or not.
For example:


In [None]:
deny_list = ("Mickey Mouse", "Donald Duck")
allow_list = ("Minnie Mouse")

for char in "Mickey Mouse", "Minnie Mouse", "Donald Duck", "Goofy":
  if char in deny_list:
    print(f"Access explicitly denied to {char}")
  elif char in allow_list:
    print(f"Access granted to {char}")
  else:
    print(f"Unknown user {char}. Please proceed to registration to use our service.")


Access explicitly denied to Mickey Mouse
Access granted to Minnie Mouse
Access explicitly denied to Donald Duck
Unknown user Goofy. Please proceed to registration to use our service.


#### Nested Tuples
Since objects in Python can be treated regardless of their type, we can build tuples of complex objects as desired, including other tuples.
Building tuples of tuples is known as **nesting** (*nested tuples*).

In the example below, we build a nested tuple and then loop through the outermost tuple, unpacking the elements of each of the inner tuples in the same (`for`) statement:


In [None]:
characters = (
    ("Mickey", "Mouse"),
    ("Donald", "Duck"),
    ("Minnie", "Mouse")
)

for name, surname in characters:
  print (name, surname)

Mickey Mouse
Donald Duck
Minnie Mouse


#### *Indexing* and *Slicing*
Regarding *indexing* and *slicing*, the same rules as for strings apply.

Example of *indexing*:


In [None]:
doubles = (0, 2, 4, 6, 8, 10, 12)
for i in 3, 5:
  print (f"Doubling {i}, I got {doubles[i]}")

Doubling 3, I got 6
Doubling 5, I got 10


Similarly, slicing works just like it does for strings and allows selecting elements from a tuple by writing


In [None]:
one_to_ten = tuple(range(1,11))
print (f"The original tuple: {one_to_ten}\n")

print (f"First: {one_to_ten[0]}")
print (f"Second and third: {one_to_ten[1:3]}")
print (f"All but last one: {one_to_ten[:-1]}")
print (f"The last two: {one_to_ten[-2:]}")
print (f"Odd only: {one_to_ten[::2]}")
print (f"Even only: {one_to_ten[1::2]}")

The original tuple: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

First: 1
Second and third: (2, 3)
All but last one: (1, 2, 3, 4, 5, 6, 7, 8, 9)
The last two: (9, 10)
Odd only: (1, 3, 5, 7, 9)
Even only: (2, 4, 6, 8, 10)


#### *Tuple comprehension*
Some complex operations can be written directly in the tuple definition using an operation called *tuple comprehension*.


In [None]:
original = 1,2,3,4,5,6,7,8
doubles = tuple(x * 2 for x in original)
print ("Doubles", doubles)

Doubles (2, 4, 6, 8, 10, 12, 14, 16)


Here is a second example where we use *list* comprehension to define a nested list from two *simple* lists


In [None]:
l1 = 1,2,3,4
l2 = 2,4,6,8
l12 = tuple((l1[i], l2[i]) for i in range(len(l1)))
for a, b in l12:
  print (a,b )

1 2
2 4
3 6
4 8


Actually, the *pythonic* way to do this operation is by using `zip` as we saw in the case of for loops:


In [None]:
for a, b in tuple (zip(l1, l2)):
  print (a,b)

1 2
2 4
3 6
4 8


The syntax of *tuple comprehension* also includes the use of conditions, in the form
```
tuple ( <operation> for <variable[s]> in <iterator> if <condition> )
```

so for example, we can write


In [None]:
input_tuple = 1,5,2,6,2,5,7,8,3,4,7,8,9,3
print ("Elements less than 5", tuple (x for x in input_tuple if x < 5))
print ("and their double", tuple (2*x for x in input_tuple if x < 5))

Elements less than 5 (1, 2, 2, 3, 4, 3)
and their double (2, 4, 4, 6, 8, 6)


#### An important application of tuple comprehension (the histogram)
An application that we will often use, in one form or another, is the one that allows us to loop over numerical ranges given a list of thresholds.

For example, we may be interested in classifying the numerical values in a certain tuple according to the interval in which they fall.

Let's imagine we want to count how many values fall between 0 and 2, how many between 2 and 5, and how many between 5 and 10 in the tuple
(1, 1, 1, 4, 3, 3, 1, 7, 8)

We can set up the problem like this:


In [None]:
input_tuple = (1, 1, 1, 4, 3, 3, 1, 7, 8)
boundaries = (0, 2, 5, 10)
bins = tuple((min, max) for min, max in zip(boundaries[:-1], boundaries[1:]))
for min, max in bins:
  print (f"Entries in interval {min}--{max}: {len(tuple(x for x in input_tuple if x >= min and x < max))}")

Entries in interval 0--2: 4
Entries in interval 2--5: 3
Entries in interval 5--10: 2


which can be further condensed into


In [None]:
input_tuple = (1, 1, 1, 4, 3, 3, 1, 7, 8)
boundaries = (0, 2, 5, 10)

for min, max in zip (boundaries[0:-1], boundaries[1:]):
  print (f"Entries in interval {min}-{max}: {len(tuple(x for x in input_tuple if x >= min and x < max))}")

Entries in interval 0-2: 4
Entries in interval 2-5: 3
Entries in interval 5-10: 2


#### Worked Exercise: the Sieve of Eratosthenes

The Sieve of Eratosthenes is an algorithm introduced around 2300 years ago for calculating prime numbers, [Wikipedia](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes).

Its implementation in Python can be reduced to a single line


In [None]:
print (tuple(x for x in range(2,1001) if x not in tuple(x*y for x in range(2,1001) for y in range(2,100))))

(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997)


***Deep Dive*** *(a bit challenging)*: If you are interested in exploring further, try to understand, by searching the Internet, why removing the word "tuple" makes the algorithm extremely more efficient.

*Hint*: look up what a generator is


In [None]:
print (tuple(x for x in range(2,1001) if x not in (x*y for x in range(2,1001) for y in range(2,100))))

(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997)


## Lists
Lists are data types very similar to tuples, but they are modifiable. As we have seen, this exposes them to the risk of being unexpectedly modified by some functions, but it also makes them incredibly more versatile tools.


Just like with `tuples`, let's start with the definition. Lists are defined by square brackets, for example:


In [None]:
my_list = [1,2,3,4]
print (type(my_list))

<class 'list'>


The keyword to convert other sequences into lists, for example tuples, is `list`


In [None]:
t = (1,2,3)
l = list(t)
print (type(t), "->", type(l))
print (list("hello"))

<class 'tuple'> -> <class 'list'>
['h', 'e', 'l', 'l', 'o']


Everything discussed for tuples also applies to lists.
For example, we can sum a list generated with *list comprehension* as follows


In [None]:
sum([2**n for n in range(0,3)])   ### Note: in Python the operator ** means "power"

7

A small difference is that *list comprehension* does not require the keyword `list` because the square brackets make it unambiguous that we are defining a list. The same syntax seen for `tuples` is still valid:


In [None]:
my_list = list(x for x in range(10) if x > 2)
print (my_list)

[3, 4, 5, 6, 7, 8, 9]


As with tuples, lists can also be unpacked.


In [None]:
name, surname = ["Mickey", "Mouse"]
print (surname)

Mouse


#### Modifying a List
Unlike tuples, lists can be modified. For example, we can concatenate two lists using the `+` and `+=` operators.


In [None]:
l1 = list(2**n for n in range(3))
print (l1)
l1 += list(2**n for n in range(3,10))
print (l1)
print (l1 + l1)

[1, 2, 4]
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512]


Similarly, an element can be added to the end of the list using `append` in a frequently used construct.


In [None]:
def a_very_complex_calculation (input):
  return (0.1 + input) ** -1.2 - input*3 + 2

results = []
for i in range(5):
  res = a_very_complex_calculation(i)
  results.append (res)

print (results)

[17.848931924611133, -0.10807409479430508, -3.589477612191418, -6.742743730617972, -9.816067119617623]


#### Removing Elements from a List
To remove elements from a list, use the syntax:
```
del <list>[<index>]
```
For example,


In [None]:
l = list (i * 2 for i in range(10))
print (l)
del l[2]  ## drops the second element of the list
print (l)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
[0, 2, 6, 8, 10, 12, 14, 16, 18]


To remove a value from a list, you can obtain its position using the `index` method and then immediately remove the value at that position.

Of course, since `index` returns only the first position where a certain value appears, this method removes only the first occurrence of that value:


In [None]:
mylist = list(range(5)) + list(range(5))
print ("Full list:", mylist)
del mylist [ mylist.index (3) ]
print ("Drop the first occurency of '3':", mylist)

while 4 in mylist:
  del mylist [mylist.index(4)]
print ("Drop all occurencies of '4':", mylist)

Full list: [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Drop the first occurency of '3': [0, 1, 2, 4, 0, 1, 2, 3, 4]
Drop all occurencies of '4': [0, 1, 2, 0, 1, 2, 3]


Almost always, *list comprehension* is preferred in these cases (and also has the advantage of working with tuples as well).


In [None]:
mylist = list(range(5))*2   ## Note the usage of the multiplication operator. When applied to a list and an integer n, it repeats the list content n times.
print ("Full list", mylist)
print ("Drop all occurency of 4:", [value for value in mylist if value != 4])
print ("Drop the first occurency of 3:", [value for index, value in enumerate(mylist) if index != mylist.index(3)])

Full list [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Drop all occurency of 4: [0, 1, 2, 3, 0, 1, 2, 3]
Drop the first occurency of 3: [0, 1, 2, 4, 0, 1, 2, 3, 4]


## Dictionaries
*Dictionaries* are Python's implementation of **hash tables**, which are data types consisting of an unordered sequence of key-value pairs.


To define a dictionary, use the syntax:
```
d = {<key1>: <value1>, <key2>: <value2>, ... }
```
or, in the case of string keys
```
d = dict(key1 = value1, key2 = value2)
```

For example,


In [None]:
d = {"Mickey": "mouse", "Minnie": "mouse", "Goofy": "dog", "Donald": 'duck' }

print (d)

{'Mickey': 'mouse', 'Minnie': 'mouse', 'Goofy': 'dog', 'Donald': 'duck'}


You can access a key-value pair by key using the syntax:
\```
<dictionary>[<key>]
\```
for example


In [None]:
print (f"Mickey is a {d['Mickey']}")

Mickey is a mouse


As with lists and tuples, you can use the `in` keyword to check if a certain key (or even a certain value, by specifying it) is present in the dictionary.


In [None]:
print ("Is Mickey present in d?", 'Mickey' in d)
print ("Is Clarabelle present in d?", 'Clarabelle' in d)
print ("Is there any mouse in d?", 'mouse' in d.values())
print ("Is there any cat in d?", 'cat' in d.values())

Is Mickey present in d? True
Is Clarabelle present in d? False
Is there any mouse in d? True
Is there any cat in d? False


#### Iterating over Dictionaries
Once defined, we can iterate over the elements of a dictionary, remembering that the order is not guaranteed!

We can iterate over the keys:


In [None]:
for character in d.keys():
  print (character)

Mickey
Minnie
Goofy
Donald


or over the values


In [None]:
for specie in d.values():
  print (specie)

mouse
mouse
dog
duck


or over the key-value pairs


In [None]:
for char, specie in d.items():
  print (f'Character "{char}" is a {specie}')

Character "Mickey" is a mouse
Character "Minnie" is a mouse
Character "Goofy" is a dog
Character "Donald" is a duck


Similarly, dictionaries can appear in the comprehensions discussed above:


In [None]:
print (", ".join(char for char in d.keys()))

Mickey, Minnie, Goofy, Donald


#### Modifying Dictionaries
The most common way to modify an existing dictionary is by adding (or overwriting) a key-value pair. For example,


In [None]:
d['Clarabelle'] = 'cow'
print (d)
d['Clarabelle'] = 'bull'
print (d)

{'Mickey': 'mouse', 'Minnie': 'mouse', 'Goofy': 'dog', 'Donald': 'duck', 'Clarabelle': 'cow'}
{'Mickey': 'mouse', 'Minnie': 'mouse', 'Goofy': 'dog', 'Donald': 'duck', 'Clarabelle': 'bull'}


Two dictionaries `d1` and `d2` can be combined, giving priority to `d1` if a key appears in both dictionaries.


In [None]:
french_french = {1: 'un', 2: 'deux', 3: 'trois', 80: 'quatrevingt', 90: 'quatrevingt-dix'}
swiss_french  = {1: 'un', 4: 'quatre', 80: 'huitante', 90: 'nonante'}

print ("Use a French, French word if that is not defined in Swiss French dict")
french_french.update(swiss_french)

print (french_french)


Use a French, French word if that is not defined in Swiss French dict
{1: 'un', 2: 'deux', 3: 'trois', 80: 'huitante', 90: 'nonante', 4: 'quatre'}


As an important note, `update` modifies the original dictionary! To avoid this, you can define a new dictionary, for example, using dictionary comprehension (discussed below).


#### Removing Key-Value Pairs

To remove a key (and its value) from a dictionary, use the keyword `del`. For example,
```
del <dictionary>[<key>]
```


In [None]:
mydict = {"Mickey": "mouse", "Minnie": "mouse", "Goofy": "dog", "Donald": 'duck', "Clarabelle": 'cow' }
print (mydict)
del mydict['Clarabelle']
print (mydict)

{'Mickey': 'mouse', 'Minnie': 'mouse', 'Goofy': 'dog', 'Donald': 'duck', 'Clarabelle': 'cow'}
{'Mickey': 'mouse', 'Minnie': 'mouse', 'Goofy': 'dog', 'Donald': 'duck'}


Multiple elements can also be removed iteratively from a dictionary, but not while iterating over it. For example, you can save them in a list and remove them later:


In [None]:
mydict = {"Mickey": "mouse", "Minnie": "mouse", "Goofy": "dog", "Donald": 'duck', "Clarabelle": 'cow' }
to_be_dropped = list()

for key, value in mydict.items():
  if value == 'mouse':
    to_be_dropped.append (key)

for key in to_be_dropped:
  del mydict[key]

print (mydict)

{'Goofy': 'dog', 'Donald': 'duck', 'Clarabelle': 'cow'}


#### Dictionary Comprehension
Just like tuples and lists, dictionaries can also be defined using *comprehension*.

For example, we can repeat the example of removing characters of type "mouse" from the dictionary by defining a new dictionary in which those characters do not appear:


In [None]:
mydict = {"Mickey": "mouse", "Minnie": "mouse", "Goofy": "dog", "Donald": 'duck', "Clarabelle": 'cow' }
mouseless = {key: value for key, value in mydict.items() if value != 'mouse'}
print (mouseless)

{'Goofy': 'dog', 'Donald': 'duck', 'Clarabelle': 'cow'}


This mechanism is usually preferred over the explicit removal of elements from the dictionary (and from a list, equivalently).
Thanks to Python's *garbage collection*, you just need to call the new dictionary with the same name as the original, and Python will automatically remove the latter from memory, along with all the keys and values that no longer appear in any references.


In [None]:
mydict = {"Mickey": "mouse", "Minnie": "mouse", "Goofy": "dog", "Donald": 'duck', "Clarabelle": 'cow' }
mydict = {k:v for k, v in mydict.items() if v != 'mouse'}  ## Drops from memory Mickey and Minnie, exactly as obtained with del. But it's much cleaner!

## Sets
Sets are unordered sequences of unique values. They offer an interface with various set operations (such as unions, intersections, and differences) that are implemented more efficiently than for lists and similar data types.


Sets are identified with curly braces, without the colons to separate key and value as in the case of *dictionaries*.
For historical reasons, `{}` is interpreted as an empty `dict`, not as a `set`.

To define an empty set, use `set()`.

Let's see some examples:


In [None]:
baryons = {'neutron', 'proton'}
mesons = {'pion', 'kaon'}
charged_leptons = set(('electron', 'muon', 'tauon'))
neutrinos = {'neutrino'}

#### Logical Operations on Sets
The following operations on sets are defined:
 * union (`union`)
 * intersection (`intersection`)
 * difference (`difference`)


In [None]:
## Union
hadrons = baryons.union(mesons)
print (hadrons)

## Intersection
charged = charged_leptons.union({'proton', 'pion', 'kaon'})
charged_hadrons = hadrons.intersection (charged)
print (charged_hadrons)

## Difference
neutral_hadrons = hadrons.difference(charged_hadrons)
print (neutral_hadrons)

{'proton', 'kaon', 'pion', 'neutron'}
{'proton', 'kaon', 'pion'}
{'neutron'}


#### Sets for Identifying Unique Elements
Sets are often used because of their characteristic of containing unique elements, to count how many different elements appear in a sequence.
For example, if we wonder how many different letters are needed to write "Hello, world!" we can write:


In [None]:
string = "Hello, world!"
letters = set("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")

print (f"Processing string: '{string}'")
print (f'Number of uniques letters: ', len(set(string).intersection(letters)))
print (f'Total number of letters: {len(tuple(char for char in string if char in letters))}')
print (f'Total length of the string: {len(string)}')

Processing string: 'Hello, world!'
Number of uniques letters:  7
Total number of letters: 10
Total length of the string: 13


# *Python Standard Libraries* and the *Python Package Index*

Starting from Python's basic types, and extending the language with programs written in other languages, over the years, an ecosystem of Python packages has formed that codifies solutions to a wide range of problems.


The primary way to access these packages is the `import` statement, which can be used with the following syntax:
```
import <package_name> [as <local_name>]
from <package_name> import <module_name> [as <local_name>]
```

For example, the package [`itertools`](https://docs.python.org/3/library/itertools.html) contains functions useful for constructing loops in a more intelligent way.

To use one of the functions defined in this package, we first need to import the package.


In [None]:
import itertools

Therefore, we can access the functions defined in `itertools` using the same syntax used to access the functions of a class (*methods*).

For example:


In [None]:
itertools.product

itertools.product

These functions can be used within the program as if they were defined in the program itself.

For example, using the `product` function, we can suggest to Walt Disney's heirs some characters they may not have thought of yet.


In [None]:
species = 'mouse', 'duck', 'dog'
genders = 'male', 'female'
ages = 'child', 'young', 'adult', 'aged'

print ("Ideas for a new Disney Characters")
for specie, gender, age in itertools.product(species, genders, ages):
  article = 'An' if age[0].lower() in 'aeiou' else 'A'
  print (f" - {article} {age} {gender} {specie}")

Ideas for a new Disney Characters
 - A child male mouse
 - A young male mouse
 - An adult male mouse
 - An aged male mouse
 - A child female mouse
 - A young female mouse
 - An adult female mouse
 - An aged female mouse
 - A child male duck
 - A young male duck
 - An adult male duck
 - An aged male duck
 - A child female duck
 - A young female duck
 - An adult female duck
 - An aged female duck
 - A child male dog
 - A young male dog
 - An adult male dog
 - An aged male dog
 - A child female dog
 - A young female dog
 - An adult female dog
 - An aged female dog


## *Python Standard Libraries*
The Python standard libraries are automatically available in (almost) every Python installation and are maintained by the Python development team.

https://docs.python.org/3/library/index.html

Some particularly important libraries that any Python developer should know about are:
 * `os` — Interface to the operating system
 * `os.path` — Interface to the file system
 * `sys` — System-specific parameters and functions
 * `glob` — File system exploration with wildcards
 * `itertools` — Tools for constructing loops and iterators
 * `time` — Time-related functions
 * `datetime` — Time-related data types (e.g., dates)
 * `pickle` — Functions for saving Python objects to disk (serialization)

At a much higher level of complexity, but which I cannot refrain from mentioning in a data analysis course:
 * `sqlite3` for SQLite Database management, allowing the use of SQL to describe data access
 * `re` for regular expressions

These last two libraries introduce two more languages into Python for specific applications, significantly extending the scope of this language.


## Python Package Index

In addition to the packages offered by the standard libraries, thousands of developers have shared their software packages with the Python developer community.

The *repository* where these packages are uploaded by developers is called the *Python Package Index* (PyPI) and it is a fundamental resource for Python programmers.

Don't think that the packages released on PyPI are developed solely by enthusiasts and *nerds* looking for visibility. Companies like Google, Facebook, and other tech giants contribute to very important packages released through PyPI.

To install packages from PyPI in Jupyter, you can write
```
!pip install <package_name>
```


In [None]:
!pip install particle



and then you can import the package exactly as if it were a standard Python package.

For example, we can use the package [`scikit-hep/particle`](https://github.com/scikit-hep/particle) to investigate the properties of the charged pion.


In [None]:
from particle import Particle
pion = Particle.from_name("pi+")
print (f"Mass of the charged pion: {pion.mass} MeV/c2")

Mass of the charged pion: 139.57039 MeV/c2


# Exercises
The exercises proposed below can all be solved without using external libraries or modules. That is, without ever needing to write `import`.


#### Some Statistics on Word Length
Consider the following text (taken from Wikipedia):
```
Particle physics (also known as high energy physics) is a branch of physics
that studies the nature of the particles that constitute matter and radiation.
Although the word particle can refer to various types of very small objects
(such as protons, gas particles, or even household dust), particle physics
usually investigates the irreducibly smallest detectable particles and the
fundamental interactions necessary to explain their behaviour. In current
understanding, these elementary particles are excitations of the quantum fields
that also govern their interactions.
```
and determine the longest word.

Additionally, count how many words appear based on the number of letters.
If a word appears more than once in the text, it should be counted only once.


In [None]:
input_string = """
Particle physics (also known as high energy physics) is a branch of physics
that studies the nature of the particles that constitute matter and radiation.
Although the word particle can refer to various types of very small objects
(such as protons, gas particles, or even household dust), particle physics
usually investigates the irreducibly smallest detectable particles and the
fundamental interactions necessary to explain their behaviour. In current
understanding, these elementary particles are excitations of the quantum fields
that also govern their interactions.
"""

## 1. Clean the non-letter characters
upper_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
lower_letters = "abcdefghijklmnopqrstuvwxyz"
admitted_chars = lower_letters + upper_letters + " "
clean_string = ''
for char in input_string:
  if char in admitted_chars:
    clean_string += char

## 2. Transform all upper case letters into lower case
lower_string = ''
for char in clean_string:
  if char in upper_letters:
    ## get the index of the letter
    id = upper_letters.index(char)
    lower_string += lower_letters[id]
  else:
    lower_string += char

## 3. Split the string and remove double countings
words = set(lower_string.split(' '))

## 4. Print the longest word
longest_word = ''
for word in words:
  if len (word) > len(longest_word):
    longest_word = word

print (f"The longest word is '{longest_word}'.")

## 5. Creates a histogram for the length of these words
boundaries = 0, 1, 3, 5, 6, 10, 1000
counts = [
          len([word for word in words if len(word) > min and len(word) <= max])
              for min, max in zip(boundaries[:-1], boundaries[1:])
]

for min, max, count in zip (boundaries[:-1], boundaries[1:], counts):
  print (f"Number of words with more than {min:2d} charaters, but no more than {max:4d}: {count:3d}")


The longest word is 'currentunderstanding'.
Number of words with more than  0 charaters, but no more than    1:   1
Number of words with more than  1 charaters, but no more than    3:  11
Number of words with more than  3 charaters, but no more than    5:  13
Number of words with more than  5 charaters, but no more than    6:   5
Number of words with more than  6 charaters, but no more than   10:  16
Number of words with more than 10 charaters, but no more than 1000:  10


### The Fibonacci Series
Write a function that determines if the input is part of the Fibonacci series.


In [None]:
def is_in_fibonacci (input):
  "Return True if input is in the Fibonacci sequence"
  last_two = [1, 1]
  while input >= last_two[-1]:
    if input in last_two:
      return True
    else:
      last_two.append(sum (last_two))
      del last_two[0]
  return False

print (is_in_fibonacci(610))
print (is_in_fibonacci(32))

True
False


### Mean, Median, and Standard Deviation
Given a sequence of real numbers, calculate the mean, median, and standard deviation (using only built-in types, i.e., without ever writing `import`).


In [None]:
input_seq = 3.4, 5.2, 3.3, 76.2, 3.1, 2.2, 3.3, 6.0

mu = sum(input_seq)/len(input_seq)
sigma = (sum (tuple((x-mu)**2 for x in input_seq))/(len(input_seq) - 1))**0.5
median = list(sorted(input_seq))[int(len(input_seq)/2)-1]

print (f"Average:  {mu:.1f}")
print (f"Std. Dev: {sigma:.1f}")
print (f"Median:   {median:.1f}")

Average:  12.8
Std. Dev: 25.6
Median:   3.3


### Removing Duplicates from a List

We've seen that `set` is a data type that represents sequences of unique, unordered elements, while lists are ordered sequences of potentially repeated elements.

Write a function that removes duplicates from a list while preserving the order. In the case of duplicates, the only occurrence to remain should be the first one.

### Counting Vowels in a Text
Given a string, count the number of vowels, without distinguishing between uppercase and lowercase.

### Read-Only Dictionary
Write a class that emulates the behavior of a dictionary but prevents overwriting a key-value pair where the key is already defined.

### At Least One Match
Write a function that returns `True` if there is at least one element in common between two input sequences.

### Odd Indices
Write a function that removes all characters from a string that have an odd index.

### A Poor Man's Encryption
Write a program that encrypts and decrypts a text string with an integer key between 1 and 100.
The key will be used to shift the letters in the alphabet by a number of letters equal to the value of the key.

For example, the string `Ciao` with key `1` should become `Djbp`.
