# Basics I - Syntax and types

**Author**: 
- [Gabriele Pompa](https://www.linkedin.com/in/gabrielepompa/): gabriele.pompa@unisi.com
- [Giuseppe Trapani](https://www.linkedin.com/in/giuseppe-trapani-73889471/): giuseppe.trapani@unisi.com - trpgsp@gmail.com

# Table of contents

[Executive Summary](#summary)
1. [Names and Values](#names_and_values)
2. [Expressions and Operators](#expressions_and_operators)
3. [Built-in Data Types and Operators](#data_types_and_operators)\
    3.1 [Booleans and flow control](#bool)\
    A. [`if` statement](#if)\
    B. [`while` loop](#while)
    

    3.2 [Integers](#int)\
    3.3 [Floats](#float)\
    3.4 [Strings](#str)


### **Resources**: 

- [_Python for Finance (2nd ed.)_](http://shop.oreilly.com/product/0636920117728.do): Sec. 3.Basic Data Types (Section 3.Excursus: Regular Expression is optional)
- [_The Python Tutorial_](https://docs.python.org/3.7/tutorial/): Sec. [3.1.1](https://docs.python.org/3.7/tutorial/introduction.html#numbers) (Numbers), [3.1.2](https://docs.python.org/3.7/tutorial/introduction.html#strings) (Strings), [3.2](https://docs.python.org/3.7/tutorial/introduction.html#first-steps-towards-programming) (First Steps Toward Programming), [4.1](https://docs.python.org/3.7/tutorial/controlflow.html#if-statements) (if Statements)

# Executive Summary <a name="summary"></a>

In this lecture we will informally present the main actors of Python, that is 

* statements
* expressions 
* values
* names
* operators

We will see in practice the act of binding _values_ to _names_ otherwise known as "variables assignment" and then move to describe the basic _types_. Informally, the _type_ of a variable is related to the amount of bits that are reserved in memory to store it but way more importantly, the type of a variable constrains the operations you can perform on its values what said operations return.

# 1. Names and Values <a name="names_and_values"></a>

All data represented in digital form is stored in computer's [Random Access Memory](https://en.wikipedia.org/wiki/Random-access_memory), the RAM. In order for your applications to retrieve it, RAM locations are assigned an address. 

High level languages like Python allow you to assign a _name_ to such addresses: this way you can immediately understand what you are storing.

**Statements** are instructions that Python interpreter can execute.\
**Expressions** are combinations of values, operators and (why not?) other expressions that can be _evaluated_. Evaluating an expression returns a _value_.

Assigning a _value_ to a _name_ is a statement and it can be done using the ```=``` operator.

In [4]:
one_hundred = 100
two_hundred = 100
skill = 1337

In [108]:
name_of_my_variable = 1000

In [110]:
name_for_example = 12341

Main message here : the _name_ as a simple address to a storage. The _value_ is what you place into the storage.

In [357]:
a, b = 2, 3

# 2. Expressions and Operators <a name="expressions_and_operators"></a>

As said, expressions are combinations of values and **operators**.\
Operators combine expressions once they are evaluated, but what they actually do depends on their specific **implementation**, which in turn depends on the actual **type** of value on which they are applied.

Let's see some abstract examples:

- Adding two ```real``` numbers returns another ```real``` number
- Adding two ```characters``` returns a ```word```
- Dividing ```integer``` numbers returns a ```rational``` number 
- Dividing ```integer``` numbers returns **two** ```integer``` numbers (and they are...)
- Comparing two ```integer``` numbers with the ```<``` operator returns 1 if the first is less than the second and 0 otherwise
- Comparing two ```words``` with the ```>``` returns 1 if the first is longer (i.e. is made of more characters) than the second and 0 otherwise

In [16]:
100 + 100

200

In [17]:
'a' + 'b'

'ab'

In [18]:
name1 = 10
name2 = 100
name1 + name2

110

The strongest power granted by languages like Python is that of **implementing your own types and what happens when operators are applied to them or between other types and your types**. This is called **object-oriented programming** and we will see the main benefits in the last part of the course.

# 3. Built-in Data Types and Operators <a name="data_types"></a>

From now on let's use the word _variable_ when referring to _names_.

The Python interpreter infers at run-time the type of a variable: **you don't need to declare it in advance**.

We say that Python is a _dynamically typed_ language. This is in contrast with other - compiled - languages, like C or C++, where the type of a variable has to be declared along with the variable identifier (that is, its _name_ ). These languages are _statically typed_.

What's the point of "knowing" the type of the variable for the interpreter? Recall that the type of the variable 
- (IMPORTANT) constrains what the operators will actually do to the value
- (IMPORTANT BUT WE DON'T CARE) might require more or less memory to store that value

Every programming language comes with a set of built-in types and operators. Here you will find the [official reference](https://docs.python.org/3/library/stdtypes.html) you can always revert back when in doubt and [another one](https://www.w3schools.com/python/python_operators.asp) that is shorter and straight to the point.

Let's start by reviewing some of the built-in scalar types.

## 3.1 Booleans and flow control <a name="bool"></a>

Logical states like `True` and `False` are represented in Python as `bool` data type.

The output of a _comparison_ operator

- `<` (smaller than), 
- `>` (greater than), 
- `<=` (smaller or equal than), 
- `>=` (greater or equal than), 
- `==` (equal to), 
- `!=` (not equal to)

is a boolean value. We will review them later once we get some interesting types to compare!

In [114]:
a = True
b = False

_Logical operators_ 
- `and` (logic and), 
- `or` (logic or), 
- `not` (logic not)

are used to combine boolean values and return boolean values as output, you can find more informations [here](https://en.wikipedia.org/wiki/Boolean_algebra#Basic_operations).

In [247]:
a = False
b = True
c = True
d = False 
e = True

In [248]:
a and b

False

The one below is called "Truth table" and gives you the result of evaluating the expression 
```python
a and b
```
depending on the input values of `a` and `b`

| a | b | a and b
| -------- | --------|  ----
| False | False | False
| False | True | False
| True | False | False
| True | True | True

If you think about it, it's like "plotting" the function `y = and(a, b)` on a graph =).

Try to reproduce the same outputs for the other two connectors.

In [249]:
a or b

True

In [250]:
not a

True

In [251]:
not b

False

### Operator precedence

Also hear this short story on **operator precedence**: it's well known from mathematics that when you perform subsequent operations the result may vary depending on the order.

Python expressions are evaluated from left to right and operators are bound to some precedence rules. For example:

```python
a = 10
b = 5

a + b * a
60
```
you will get 60 because the product operator `*` has precedence over the addition `+` operator: that is python will perform `b * a` first obtaining 50 and then it will sum `a + 50` getting you 60.


Now you can study operator precedence [here](https://www.mathcs.emory.edu/~valerie/courses/fall10/155/resources/op_precedence.html) _or_ you can use parenthesis just like in math: **statements inside parenthesis are grouped together and are evaluated before reducing the expression or concatenating the following operations**.

```python
a = 10
b = 5

(a + b) * a
150
```
now you will get 150 because the expression inside the parenthesis is evaluated before going on with the rest of the expression!

So the message is: **use parenthesis when you need longer expressions, the code is clearer and you avoid surprises**.

Now that you know about precedence, try to reproduce the Truth table for the following expressions:

```python
a and (b or c) and (e or b) and b

(a and b) or (c and e) or (b and b)
```

### Flow control

A couple of words on **flow control**: the flow of a computer program is the "direction" of the execution of the statements. In Python the flow runs from top to bottom, one statement at a time (well, one day we'll talk about [concurrency](https://en.wikipedia.org/wiki/Concurrency_(computer_science))).

Some constructs of the language allow you to **skip** or **repeat** a group of statements. The most important thing is that you can skip or repeat this blocks of code based on the truth value of some boolean variables, also called **conditions**.

### A.```if``` statement <a name="if"></a> - to skip blocks

An [`if` statement](https://docs.python.org/3.7/tutorial/controlflow.html#if-statements) in Python is declared as follows:
```python
if condition:
    statement_1
    statement_2
    statement_3
elif alternative_condition_1:
    statement_4
    statement_5
elif alternative_condition_2:
    statement_6
    statement_7
else:
    statement_else
    
statement_after_the_if
```
Here is how the flow goes, from top to bottom:

The Python interpreter evaluates the logical `condition`, that can be any complex combination of booleans (as we saw before) that returns either `True` or `False`.

If `condition` is `True`, then `statement_1`, `statement_2`, `statement_3` are executed in that order; all the following conditions will **never** be evaluated and the statements in their blocks will be ignored (that is `statement_4`, `statement_5`, `statement_6`, `statement_7`, `statement_else`) and the program goes directly to `statement_after_the_if`.

If `condition` is `False` then its block is ignored (that is `statement_1`, `statement_2`, `statement_3` will never be executed) and the program can go on to the next condition: the Python interpreter evaluates the logical `alternative_condition_1`. Now it behaves exactly as before: if `alternative_condition_1` is `True` then `statement_4` and `statement_5` are executed in that order and the following blocks ignored (that is `statement_6`, `statement_7`, `statement_else` will never be executed) and the program goes directly to `statement_after_the_if`.

If also `alternative_condition_1` is `False` then the program goes on to `alternative_condition_2` and again it behaves as before.

Finally if `alternative_condition_2` is `False` the guard `else` is triggered and only `statement_else` will be executed before going to `statement_after_the_if`.

Summing up, these are the rules:

- The if-elif-else structure starts with `if` followed by an expression that returns a boolean.
- After the first `if` you can add as many `elif` as you want, each one guarding an expression that returns a boolean.
- If one of the conditions guarded by `if` or `elif` evaluates to `True`, the set of statetements indented after the guard is executed **and all the other ignored**
- If none of the conditions guarded by `if` or `elif` evaluates to `True`, the statements indented after `else` are executed.

Notice that:
- the guards ends with a `:`, after which you have to go to the next line.
- blocks belonging to one of the guards are **indented** (that is spaced from the previous line of code) by 4 spaces.
- `elif` statement is optional and there can be more than one.
- `else` statement is optional.

In [220]:
a = True
b = False
c = 0

In [221]:
if (a and b):   ## evaluates (a and b) -> FALSE
    c = 100      # no execution
    name = 1823
    name1 = 7777
elif b:         ## evaluates b -> FALSE
    c = 45       # no execution
    name2 = 1123
elif a:         ## evaluates a -> TRUE
    c = 1337     # yayyyy execution
    name3 = 8888
else:            # NOPE
    c = 99

### B. ```while``` loop <a name="while"></a> - to repeat blocks 

A [`while` loop](https://docs.python.org/3.7/tutorial/introduction.html#first-steps-towards-programming) in Python is declared as follows:
```python
while condition:
    statement_1
    statement_2
    statement_3
    
statement_after_while
```
The Python interpreter evaluates the logical `condition` that again can be any combination of boolean expressions as long as it can be evaluated to either `True` or `False`.

If it is `True` then `statement_1`, `statement_2`, `statement_3` are executed in that order. Then `condition` **is evaluated again** and, if `True`, then `statement_1`, `statement_2`, `statement_3` are **executed in that order again**. 

The loop keeps going like this and ends, finally evaluating `statement_after_while`, when (and if) `condition` becomes `False`.

**Warning**: if `condition` never becomes `False`, you end up with an infinite loop, which will execute forever. Here is the **golden rule**: inside the loop there must be some statement that affects `condition`. 

In [227]:
a = True                
number = 10

In [229]:
while a:            # here a is evaluated -> TRUE, the loop can begin
    number = 1      # statements are executed
    a = False       # golden rule: something must affect the loop condition

In [232]:
a = True
b = True
c = True

number_1 = 10
number_2 = 1

In [234]:
while (a and b) or (b and c):         # condition is evaluated -> TRUE, the loop can begin
    number_2 = number_1 + number_2    # statements are executed
    b = False                         # golden rule: something must affect the loop condition

## 3.2 Integers <a name="int"></a>

Integers are whole numbers and are represented in Python as `int` data type.

| Operator | On integers
| -------- | --------|
| `+` | Sum 
| `-` | Subtraction 
| `*` | Product
| `**` | Exponentiation
| `/` | Floating division
| `//`| Integer division
| `%` | Remainder of the integer division

In [296]:
x = 19
y = 8

In [297]:
x // y

2

In [298]:
x - y

11

In [299]:
2 ** 2

4

In [90]:
2.5 // 2.1

1.0

In [302]:
4 % 3

1

Let's introduce our first function that we are going to use on numerical types: 

```python
abs(x)
```

The function returns the absolute value of its argument, that is, the value always with a positive sign.

In [304]:
negative_one = -1
abs(negative_one)

1

### Extra trivia on integers

The amount of bits (i.e. computer memory) needed to store to an `int` depends on its value. For `n=10`, 4 bits are reserved. We can check this using the `.bit_length()` method of `int` variables.

In [82]:
n = 10

In [83]:
n.bit_length()

4

Indeed, it's simple to see that (check this [decimal-to-binary converter](https://www.rapidtables.com/convert/number/decimal-to-binary.html))

$$
10 = (1 \times 2^3) + (0 \times 2^2) + (1 \times 2^1) + (0 \times 2^0) = 8 + 0 + 2 + 0
$$

therefore the 4 binary numbers (i.e. 0/1 bits) `1010` are sufficient to represent the integer number 10.

Python is very efficient in its internal representation of integer numbers as it can represent pretty big integers, like $10^{100}$ (named [Googol](https://en.wikipedia.org/wiki/Googol))

In [3]:
googol = 10**100
print(googol)

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000


In [4]:
googol.bit_length()

333

## 3.3 Floats <a name="float"></a>

Non-integers numbers, those with a fractional part, are represented in Python as `float` data type.

Now, the short story is that `float` values are "representations" of rational numbers and real numbers, not always exact. This is due to how the computer internally represents such numbers. Such differences leads to some common errors that can be easily avoided and we will see how.

If you want the long story, not really necessary but really interesting, you can go straight to the next paragraph.

| Operator | On floats
| -------- | --------|
| `+` | Sum 
| `-` | Subtraction 
| `*` | Product
| `**` | Exponentiation
| `/` | Floating division
| `//`| Integer division
| `%` | Remainder of the division

In [91]:
x = 10.0
y = 2.0
q = 1/4

In [92]:
x // y

5.0

In [93]:
2.3 % 2.1

0.19999999999999973

### Floating points arithmetic

Computers cannot represent **exactly** some rational and real numbers. Those interested should read this short article directly from python's [official page](https://docs.python.org/3/tutorial/floatingpoint.html). To summarize at the extreme, it's a problem of space: some rational numbers have an infinite number of digits after the decimal point, for exampe $1/3 = 0.33333333333...$. Some others cannot even be represented as fractions, for example $\pi, e, \sqrt{2}$. This means you could end up needing an infinite amount of memory to store them correctly and this is clearly not possible.

We are interested in the consequences of this, namely the infamous [round-off error](https://en.wikipedia.org/wiki/Round-off_error). 

In [259]:
q = 0.25 + 0.1
q

0.35

In [269]:
q = 0.35 + 0.1  #should be 0.45
q

0.44999999999999996

As you can see there is a tiny error there due to the approximation of the floating point numbers. Let's see common situations and their solutions

#### Comparisons

Say you placed a condition for an `if` statement that returns `True` when two `floats` are identical

In [310]:
target_value = 0.45
result = 0.35 + 0.1

income = 0

if result == target_value:
    income = 1000
else:
    income = 0

income

0

This is bad practice: due to rounding errors accumulated during computations, you can never really be sure that a `float` is equal to some number.

What you can do is to check if their absolute difference is smaller than a **predetermined tolerance**. To compute the absolute value of a number remember the function 
```python
abs(x)
```

In [312]:
tolerance = 0.0000001   # predetermined tolerance

target_value = 0.45
result = 0.35 + 0.1
income = 0

difference = result - target_value
absolute_difference = abs(difference)

if absolute_difference <= tolerance:
    income = 1000
else:
    income = 0
    
income

1000

The same can be said when checking if a number is exactly equal to zero: the accumulation of errors may return a number that is very small but not zero. 

In [315]:
computation = 0.1 + 0.1 + 0.1 - 0.3
income = 0

if computation == 0:
    income = 1000
else:
    income = 0

income

0

See again how `computation` is a very small number but not actually zero therefore your comparison fails. Let's set up again a tolerance and perform a safe comparison:

In [317]:
tolerance = 0.0000001   # predetermined tolerance

computation = 0.1 + 0.1 + 0.1 - 0.3
income = 0

if abs(computation) <= tolerance:
    income = 1000
else:
    income = 0
income

1000

### Extra trivia on floating point numbers

See the following example: $\frac{1}{4}$ is represented _exactly_ as the `float` 0.25. This is because 0.25 has an exact (and obvious) binary representation in terms of fractions (negative powers of $2$):

$$
\frac{1}{4} = (0 \times 2^{0}) + (0 \times 2^{-1}) + (1 \times 2^{-2}) = \left(0 \times 1 \right) + \left(0 \times \frac{1}{2} \right) + \left(1 \times \frac{1}{4} \right) = 0.25
$$

where the 0/1 bits associated to smaller powers of 2: $ 2^{-3}, 2^{-4}, ...$ are all zero.

Therefore, in a [_fixed-point_](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) binary representation (that is a binary representation using a fixed number of bits after the decimal point '.', as the one above), the decimal number 0.25 can be represented as the binary number `0.01` (check this [decimal-to-binary converter](https://www.rapidtables.com/convert/number/decimal-to-binary.html)), that is using only the first two left-most bits after the '.' (which are the most significant).

Binary representation of `float` numbers is not always _perfect_. That is, it's not alway true that a decimal number $0 < q < 1$ can be represented exactly as the series

$$
q = \sum_{i=1}^{k} b_i \times 2^{-i}
$$

where $b_i = 0/1$ is the $i$-th bit. In particular it can be that:

- the series is infinite ($k = \infty$);

- the series requires more bits than those at disposal. That is, given a finite number of bits at disposal - say $k_{MAX}$ - it can be that $k > k_{MAX}$.

In this last case, the best we can do is a _truncation_ of the series. That is, $q$ can will be approximately represented as 

$$
q \approx \sum_{i=1}^{k_{MAX}} b_i \times 2^{-i}
$$


In real life things are more complicated. In particular, The IEEE 754 [double-precision](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) standard - currently adopted by modern 64-bits machines - reserves 64 bits to represent a decimal number, but bits are not simply associated to negative and decreasing powers of the base 2: $2^{-1}, 2^{-2}, ...$ as in the [fixed-point](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) binary representation that we considered before. The IEEE 754 standard prescribes a [_floating-point_](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) format, where the meaning and role of the bits in the binary representation changes depending on their position. In particular, for your knowledge (more informations in [Wikipedia](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)):
- 1 bit (the $1$st one) represents the _sign;_ 
- 11 bits (from the $2$nd to the $12$th) represent an _exponent;_ 
- 52 bits (from the $13$th to the last one) represet the _fractional part_.

This representation allows to represent a greater range of decimal numbers, given the same amount of bits at disposal (64). This increase in the range of number representable comes at the cost of precision. In the IEEE 754 double-precision standards, the relative accuracy is of 15-digits.

## 3.4 Strings <a name="str"></a>

| Operator | On string
| -------- | --------|
| `+` | Concatenation
| `-` | NOT IMPLEMENTED
| `*` | Repetition (only between string and integers)
| `**` | NOT IMPLEMENTED
| `/` | NOT IMPLEMENTED
| `//`| NOT IMPLEMENTED
| `%` | NOT IMPLEMENTED

One or more text characters are represented in Python as [`str` data type](https://docs.python.org/3.7/tutorial/introduction.html#strings). Use the quotes `'this is a string'` or double-quotes `"this is another string"` to define a string. There is no preferred way, choose the one that fits for you and be consistent with it.

In [107]:
a_string = 'peppi is teaching IT for business and finance' 

In [210]:
my_name = 'Giuseppe'
my_surname = 'Trapani'

In [213]:
teacher_name = my_name + ' ' + my_surname

In [215]:
some_letter = 'c'
some_integer = 10

In [216]:
some_letter * some_integer

'cccccccccc'

The type `str` is a bit more interesting than the previous ones because it implements some **methods**. Methods are simply functions (we will dive a bit later into functions) that can use the internal representation of the type we are considering.

Everything "coming out" of the internal representation of a type is accessed by the ```.``` operator.

See for example the method ```split()```:

In [320]:
a_string.split()

['peppi', 'is', 'teaching', 'IT', 'for', 'business', 'and', 'finance']

Throught the course we will see that methods are everywhere. Let's come up with some documentation using the function
```python
help()
```

In [328]:
help(a_string.split)

Help on built-in function split:

split(sep=None, maxsplit=-1) method of builtins.str instance
    Return a list of the words in the string, using sep as the delimiter string.
    
    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.



The documentation describes briefly the output of a method as well as the extra parameters (called _arguments_) that can be passed to it in order to modify the results.

We see from the previous description that method `split()` of a variable of type `str` returns a list of the words in the string and can accept an extra argument called `sep` that is the character on which you want to perform the split. If you call the method without specifying `sep`, the default character is the whitespace.

In [329]:
date_string = '2020-10-02'

In [331]:
date_string.split()  # call the method without specifying sep; it will separate the string into substrings at each whitespace.

['2020-10-02']

In [330]:
date_string.split(sep='-')  # now we specify sep as '-'

['2020', '10', '02']

In [336]:
date_string.split(sep='10')  # notice how the sep argument can be more than one character long

['2020-', '-02']

## Functions

We already saw methods. One step forward we have functions. Functions are simply blocks of code that gets executed by the interpreter at the need (that is when it's "called"). Notice that this is a quite general definition and **it has pretty much nothing to do with mathematical functions**. Let's see the pieces:

```python
def function_name(argument_1, argument_2):
    statement_1
    statement_2
    # maybe do something with the arguments and assign them to a variable called output, for example summing them
    output = argument_1 + argument_2
    return output
```
The main rules are the following:

- The definition begins with the statement `def` (stands for _definition_)
- You place arguments **internal names** in the (compulsory) parentheses the separated by commas
- `:` and a-capo
- The block of code is indented by 4 spaces

Notice the following:

- `function_name` must be a valid name as we saw for variables (no spaces, no initial digits)
- if you already have a variable with the same name as `function_name`, it will be overwritten
- if you declare another variable with the same name as `function_name`, the function will be overwritten
- arguments are optional but the parentheses are
- every variable declared **inside** the function will not be accessible from outside
- every variable declared **before** the function will be accessible inside the function but doing it is very bad practice
- the final `return` statement is optional and is used to "pull" variables outside of the function

Let's see some example:

In [343]:
def my_addition(addend_1, addend_2):
    return addend_1 + addend_2

In [344]:
my_addition(10, 20)

30

In [345]:
def my_unsafe_division(numerator, denominator):
    return numerator / denominator

In [346]:
my_unsafe_division(10, 0)

ZeroDivisionError: division by zero

In [347]:
def my_safe_division(numerator, denominator):
    if denominator == 0:
        return 0
    else:
        return numerator / denominator

In [352]:
my_safe_division(10.0, 0.0)

0

In [353]:
my_safe_division(10, 0.1+0.1+0.1-0.3)

1.8014398509481984e+17

In [355]:
tolerance = 0.0000000001
def my_even_safer_division(numerator, denominator):
    if abs(denominator) < tolerance:
        return 0
    else:
        return numerator / denominator

In [356]:
my_even_safer_division(10, 0.1 + 0.1 + 0.1 - 0.3)

0

The take-home message here is pretty straight: functions are used to encapsulate parts of code that you often end up repeating along the flow of your application. 