# Python fundamentals

### The Zen of Python

The 20 principles that influences the design of Python. 

Written in the 1999, they have been included as the 20th entry of the Python Enhancement Proposals (a.k.a. [PEP 20](https://www.python.org/dev/peps/pep-0020/)).

In [1]:
# eastern egg
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


*In December 1989, Van Rossum had been looking for a "'hobby' programming project that would keep [him] occupied during the week around Christmas" as his office was closed when he decided to write an interpreter for a "new scripting language [he] had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers". He attributes choosing the name "Python" to "being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus)".* https://en.wikipedia.org/wiki/Guido_van_Rossum

## Getting Started

If you have installed Python on your own machine, the simplest way is to type commands directly in the **Interactive** IPython shell:

In [2]:
print("Are we having fun yet?")

Are we having fun yet?


If you are using **Google Colab**, you can bring up an interactive shell by using !bash and then asking it to run Python 3 (type python3 into the input field once bash loads, and then execute commands):

In [3]:
!bash

[?2004h[?2004ll@BIBI-PAVILION: ~/AUC_TM_2025/notebooks[01;32mmfmcl@BIBI-PAVILION[00m:[01;34m~/AUC_TM_2025/notebooks[00m$ ^C[?2004l
[?2004h]0;mfmcl@BIBI-PAVILION: ~/AUC_TM_2025/notebooks[01;32mmfmcl@BIBI-PAVILION[00m:[01;34m~/AUC_TM_2025/notebooks[00m$ ^C[?2004l
[?2004l
[?2004h]0;mfmcl@BIBI-PAVILION: ~/AUC_TM_2025/notebooks[01;32mmfmcl@BIBI-PAVILION[00m:[01;34m~/AUC_TM_2025/notebooks[00m$ 

This is great for testing purposes, or for exploratory analysis, but it is very inefficient if you want to reuse the same program multiple times. An alternative is to write your instructions in a Jupyter Notebook, in which you can also explain your code to the reader using text blocks, and which can be loaded by Google Colab or Jupyter Notebook in Anaconda.

A more traditional alternative when you are working on your own machine is to write your instructions in a `\*.py` **text file** and to ask the Python interpreter to execute this file by listing its name in a python command.

1. In you current working directory (`%pwd`), or somewhere else (e.g., `/stuff`), create a text file called `fun.py` 

2. Open this file with a text editor

3. Copy the following code, save and close the file 

```python 
print("Are we having fun yet? (file version)")
```

Launch the Python interpreter by dialing the following command in the shell:

```bash 
python fun.py
```

In [4]:
# shell commands can be launched from IPythoon as well
!python stuff/fun.py

Are we having fun yet? (file version)


In [5]:
# the same results can be obtained using the %run magic, with a slightly different syntax
%run stuff/fun.py

Are we having fun yet? (file version)


More magic? Use the `quickref` command to list them all:

In [6]:
# IPython - Quick Reference Card
%quickref


IPython -- An enhanced Interactive Python - Quick Reference Card

obj?, obj??      : Get help, or more help for object (also works as
                   ?obj, ??obj).
?foo.*abc*       : List names in 'foo' containing 'abc' in them.
%magic           : Information about IPython's 'magic' % functions.

Magic functions are prefixed by % or %%, and typically take their arguments
without parentheses, quotes or even commas for convenience.  Line magics take a
single % and cell magics are prefixed with two %%.

Example magic function calls:

%alias d ls -F   : 'd' is now an alias for 'ls -F'
alias d ls -F    : Works if 'alias' not a python name
alist = %alias   : Get list of aliases to 'alist'
cd /usr/share    : Obvious. cd -<tab> to choose from visited dirs.
%cd??            : See help AND source for magic %cd
%timeit x=10     : time the 'x=10' statement with high precision.
%%timeit x=2**100
x**100           : time 'x**100' with a setup of 'x=2**100'; setup code is not
                   co

Even if writing programs with simple text editors is perfectly possible, life is much easier when you used an IDE (Integrated Development Environment) or a text editor on steroids. Desirable features of these environments include syntax highlighting, inspection capabilities, code profiling features, autocomplete, support for interactive shells, package managers and so forth.

If you are not participating to the everlasting [editor war](https://en.wikipedia.org/wiki/Editor_war), or if you don't have experience with any IDE, you could consider using [Spyder](https://pythonhosted.org/spyder/), the IDE that comes with Anaconda [[Tutorial](http://www.southampton.ac.uk/~fangohr/blog/spyder-the-python-ide.html)].

Some customizable text editors you might also like are [Sublime Text](https://www.sublimetext.com/) or [Atom](https://atom.io/) (Linux, macOS, Windows).

## Variables

Variables are reserved **memory locations** used by a computer program.

Each variable is associated with an **identifier** (i.e. a name). Python restricts the naming possibilities of a variable:

- identifier characters may be letters, digits or underscores...

- ... but the first character cannot be a number

- Python keywords cannot be used as identifiers

In [7]:
# list of Python keywords
import keyword
print(keyword.kwlist)

['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']


In [8]:
# Nope.
None = 10

SyntaxError: cannot assign to None (2683696864.py, line 2)

In [9]:
# Works, but please don't.
none = 10

In [10]:
# What's wrong with you?!
_None = 10

##### Python does not require variables to be declared explicitly

Variables are created when you first assign a value to them.

The symbol `=` is used to assign values to variables.

In [11]:
# let's associate the value 55 to the variable named "a_number"
a_number = 55

In [12]:
# the type of value of the variable can be inspected with the function type()
type(a_number)

int

In [13]:
# the function id() returns the identity of the object's memory address
id(a_number)

140013034046336

##### Variable types can change during the execution of a command

In [14]:
# Let's convert our numerical variable into a digit (i.e. into text)
a_number = str(a_number)
a_number
type(a_number)

str

## Built-in data types

Python natively supports the following basic types:

- **Boolean**: *bool*

- **None**: *NoneType*

- **Numerical**: *int*, *float*, *long*, *complex*

- **Sequence**: *string*, *list*, *tuple*

- **Set**: *set*, *frozenset*

- **Dictionaries**


#### Mutable vs. Immutable objects

Python data types can be organized by distinguishing those types whose objects can change after their creation (**Mutable**) and those that do not admit such possibility (**Immutable**)

| Immutables|   Mutables|
|:---------:|:---------:|
|  Numerical|          -|
|     String|          -|
|      Tuple|       List|
|  Frozenset|        Set|
|          -| Dictionary|

### NoneType

It has one sole value: `None`. It is used to represent the absence of a value.


### bool

Boolean logical values, can be `True` or `False`.

To be used to represent the truth or falsity of some condition.

#### Boolean Operators

- **and**: conjunction
- **or**: inclusive disjunction
- **not**: negation

In [15]:
True and False or True

True

### Numerical Types

- **int**: integers, e.g. `92`
- **long**: long integers of non-limited length, e.g. `15L`
- **float**: floating-point numbers, e.g. `8.75638`
- **complex**: complex numbers, e.g. `1.23+4.56j`




#### Using Python as a Calculator

The simplest way to perform calculations with Python is by using the interactive shell as a fancy calculator. 

Some of the operations supported by all the numeric types are:

In [16]:
# addition
3 + 5

8

In [17]:
# difference
9 - 5

4

In [18]:
# product
9 * 50

450

In [19]:
# quotient
9 / 2

4.5

> Note: In Python 2.x (but not in Python 3.x) integer division truncates the remainder and returns an integer (i.e. it calculates the so-called "floored" quotient).

In [20]:
# Floor division gives us the default behavior of division in Python 2
9 // 2

4

In [21]:
# the remainder of the floored quotient
9 % 2

1

In [22]:
# x to the power of y
3 ** 2

9

In [23]:
# Round a number to a given precision in decimal digits (default 0 digits)
round(1.765432, 3)

1.765

---

### Quiz

Calculate the number of seconds we're going to spend in this classroom together.

In [25]:
# your code here
60 * 105 * 2 * 16

201600

---

While the previous operators produce new variables, the following perform the operation **in-place**. 

That is, the variable itself is changed in the result of the process.

In [26]:
# our variable
a_number = 8

In [27]:
# IN PLACE addition
a_number += 3
a_number

11

In [28]:
# IN PLACE subtraction
a_number -= 3
a_number

8

In [29]:
# IN PLACE multiplication
a_number *= 3
a_number

24

In [30]:
# IN PLACE division
a_number /= 3
a_number

8.0

In [31]:
# IN PLACE modulus
a_number %= 3
a_number

2.0

### Relational Operators

Comparisons are supported by all objects. The main comparison operators are:

|   Operator|                Semantics|
|:---------:|:-----------------------:|
|         ==|                    equal|
|         !=|                not equal|
|          <|                less-than|
|         <=|    less-than or equal to|
|          >|             greater-then|
|         >=| greater-then or equal to|
|         is| object identity|
|         is not| negated object identity|

Relational operators are used to **test conditions**, and the output is a boolean value

In [32]:
# is 7 bigger than 9?
7 > 9

False

NOTE: `is` checks if two things **are the same object**, not just if they are equal

In [33]:
# the value 0 (of any numerical type) is considered to be False, but 0 is not False
print(False == 0)
print(False is 0)
# PS if you get a SyntaxWarning, this was introduced in 3.8 to help avoid mistaking == for is 

True
False


  print(False is 0)


---

### Quiz

use the function `id()` to explain the behavior of `is`

In [34]:
# your code here

---

## Sequences

We will focus on three types of sequences: **strings**, **lists** and **tuples** (see the [documentation](https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange) for the full list). 

Most sequence types support the following operations (where `s` and `t` are sequences, `n`, `i` and `j` integers):

|   Operation|                Result|
|:----------:|:-----------------------:|
|      x in s|  True if an item of s is equal to x|
|  x not in s|  False if an item of s is equal to x|
|       s + t| Concatenation of s and t|
|       s * n|   add s to itself n times (negative n are treated as 0)|
|        s[i]|	ith item of s, origin 0|
|      s[i:j]|   slice of s from i to j|
|    s[i:j:k]| slice of s from i to j with step k|
|      len(s)| length of s|	 
|      min(s)| smallest item of s|
|      max(s)| largest item of s|
|  s.index(x)| index of the first occurrence of x in s |
|  s.count(x)| total number of occurrences of x in s|

### Lists


Lists are **ordered** **mutable** sequences of **heterogeneous** elements.

In the Python language, lists are defined by square brackets `[]` and their elements are separated by commas.

In [65]:
# lists can contain different types of objects (even another list like [1,2,3])
demo_list = ["text", 23, 92, "another_text", [1,2,3]]
print(demo_list)

['text', 23, 92, 'another_text', [1, 2, 3]]


In [66]:
# len() returns the length of the list
len(demo_list)

5

In [67]:
# membership verification (return a boolean value)
23 in demo_list

True

In [68]:
# concatenation
new_demo_list = demo_list + demo_list
new_demo_list

['text',
 23,
 92,
 'another_text',
 [1, 2, 3],
 'text',
 23,
 92,
 'another_text',
 [1, 2, 3]]

In [69]:
# repetition
new_demo_list = demo_list * 3
new_demo_list

['text',
 23,
 92,
 'another_text',
 [1, 2, 3],
 'text',
 23,
 92,
 'another_text',
 [1, 2, 3],
 'text',
 23,
 92,
 'another_text',
 [1, 2, 3]]

#### Lists are ordered

In [70]:
# lists are ordered, elements can be recalled by using their index (remember that the index of the first element is 0)
demo_list[0]

'text'

In [71]:
# the index "-1" is associated with the last element
demo_list[-1]

[1, 2, 3]

In [72]:
# check the position of the element "23"
demo_list.index(23)

1

#### Slicing


Slicing is a **computationally fast** way to extract a portion of a sequence in order to create a new sequence. 

Slicing Notation works in the following way:


```python
sequence[start:stop:step]
```

##### Slicing rules

- The slice of the sequence `s` from `start` to `stop` is defined as the sequence of items with index `k` such that `start` <= `k` < `stop`. 


- If `start` or `stop` is greater than len(`s`), use len(`s`). 


- If `start` is omitted or `None`, use 0. 


- If `stop` is omitted or `None`, use len(`s`). 


- If `start` or `stop` is negative, the index is relative to the end of sequence.

![alt text](https://github.com/bloemj/AUC_TMCI_2022/blob/main/notebooks/images/list-slicing.png?raw=1)

In [73]:
print(demo_list)

['text', 23, 92, 'another_text', [1, 2, 3]]


In [74]:
# slicing with positive indices 
demo_list[1:3]

[23, 92]

In [75]:
# slicing works with negative indices as well 
demo_list[-3:-1]

[92, 'another_text']

In [76]:
# slicing works with a mixture of positive and negative indices
demo_list[1:-2]

[23, 92]

In [77]:
# if an index is omitted, Python reaches the first or the last element of the list
demo_list[2:]

[92, 'another_text', [1, 2, 3]]

In [78]:
# the same as above, with the first index omitted
demo_list[:3]

['text', 23, 92]

In [79]:
# slicing with steps
demo_list[1::2]

[23, 'another_text']

---

### Quiz

The following code creates a list whose length is a-priori unknown. Extract:

- the last element of the list by using a positive index


- the first element of the list by using a negative index

In [80]:
import random  # built-in package

random_number_list = []
for i in range(15):
    random_number = random.randint(1, 99)
    if random_number > 32:
      random_number_list.append(random_number)

random_number_list

[61, 42, 69, 94, 67, 65, 36, 70, 87, 98, 45, 76]

In [81]:
# your code here
print(random_number_list[len(random_number_list) - 1])
print(random_number_list[-len(random_number_list)])

76
61


---

#### Lists are mutable

Lists methods allows you to manipulate the elements stored in the list quickly and effectively.

In [82]:
# the method .append() allows you to add an element at the end of the list
# Equivalent to `demo_list[len(demo_list):] = ["appended"]`
demo_list.append("appended")
print(demo_list)

['text', 23, 92, 'another_text', [1, 2, 3], 'appended']


In [83]:
# the method .extend() allows you to add all the elements of a second list
demo_list.extend(["elements","of","another","list"])
print(demo_list)

['text', 23, 92, 'another_text', [1, 2, 3], 'appended', 'elements', 'of', 'another', 'list']


In [84]:
# the method .remove() allows you to remove a given elements from a list
demo_list.remove("appended")
print(demo_list)

['text', 23, 92, 'another_text', [1, 2, 3], 'elements', 'of', 'another', 'list']


**Bonus question: What happens if the list contained duplicates of the removed element?**

Answer: Only the first occurence is removed.

In [90]:
# the method .reverse() reverses a list IN PLACE (i.e. the original list is modified)
demo_list.reverse()
print(demo_list)

['list', 'new_text', 7, 38, 'elements', 38, 7, 'new_text', 'text']


In [91]:
# slicing can be used also to replace elements in a list
demo_list[1:3] = ["new_text", 7, 38]
print(demo_list)

['list', 'new_text', 7, 38, 38, 'elements', 38, 7, 'new_text', 'text']


In [92]:
# slicing can be used to delete elements
demo_list[4:6] = []
print(demo_list)

['list', 'new_text', 7, 38, 38, 7, 'new_text', 'text']


In [93]:
# the method .sort() order a list (NOTE: the list elements must be of the same type)
homogeneous_list = [1,56,33,8,220,9]
homogeneous_list.sort()
homogeneous_list

[1, 8, 9, 33, 56, 220]

In [94]:
# lists of strings can be sorted as well
homogeneous_list = ["canary", "hippo", "kangaroo", "narwhal", "elephant", "raccoon", "yak", "ant"]
homogeneous_list.sort()
homogeneous_list

['ant', 'canary', 'elephant', 'hippo', 'kangaroo', 'narwhal', 'raccoon', 'yak']

---

### Quiz

Explain the sorting of the following list:

In [95]:
homogeneous_list =["1", "56", "33", "8", "220", "9"]
homogeneous_list.sort()
homogeneous_list

['1', '220', '33', '56', '8', '9']

The list is sorted by the value fo the first character first, then the second, then the third...

---

#### Lists of numbers

Lists of numbers can be manipulated by additional functions, among which **max()**, **min()** and **sum**

In [96]:
some_numbers = [4, 8, 2, 6, 2, 9]
print(max(some_numbers))
print(min(some_numbers))
print(sum(some_numbers))

9
2
31


---

### Quiz

Compute the average of `some_numbers`:

In [97]:
# your code here
print(sum(some_numbers) / len(some_numbers))

5.166666666666667


---

### Tuples

Tuples are the **immutable** counterpart of the lists. 

They are defined by round brackets `()` and they mainly differ from the list in that they do not accept those methods that tries to manipulate its elements.


In [98]:
# tuples can contain different types of objects (even another list like [1,2,3])
demo_tuple = ("text", 23, 92, "another_text", [1,2,3])
print(demo_tuple)

('text', 23, 92, 'another_text', [1, 2, 3])


In [99]:
# elements can be accessed on the basis of their index...
demo_tuple[1:3]

(23, 92)

In [100]:
# ...but they cannot be replaces
try:
    demo_tuple[1] = 2
except TypeError as e:
    print(e)

'tuple' object does not support item assignment


### Strings

Strings are **immutable**, sequences of characters (so, they are **homogeneus**)

In the Python language, strings are defined by single `'` or double quotes `"`,  and their elements are continguous.

In [101]:
demo_string = "Does Cersei have any friends?"
print(demo_string)

Does Cersei have any friends?


In [102]:
# using single or double quotes is indifferent
demo_string_single = 'Does Cersei have any friends?'
demo_string == demo_string_single

True

In [103]:
# Remember: Digits =/= numbers! (frequent source of debugging frustration)
print (type("1979") == type(1979))
print ('"1979" type is: ' + str(type("1979")))
print ('1979 type is:' + str(type(1979)))

False
"1979" type is: <class 'str'>
1979 type is:<class 'int'>


By being immutable sequences, strings accept all the sequences methods, but they not support item assignment:

In [104]:
# you can check the length (in charachters, whitespaces count!!!) of a string
len(demo_string)

29

In [105]:
# strings can be looked for in other strings
"Cersei" in demo_string

True

In [106]:
# how many 'a's do we have in our string?
demo_string.count("a")

2

In [107]:
# concatenation is possible
demo_string += "\nNoway!"
demo_string

'Does Cersei have any friends?\nNoway!'

The escape sequence `\n` indicates the end of the line. 

Python escape sequences are introduced by the escape character `\`, whose goal is to signal the interpreter that the following character has an "unusual" interpretation. Here is a partial list:

| Escape Sequence|         Meaning|
|:--------------:|:---------------:|
|              \\\\|  backslash|
|              \\'|  single quote|
|              \\b|  backspace|
|              \\n|  new line|
|              \\t|  horizontal tab|


In [108]:
# nicely print our string!
print(demo_string)

Does Cersei have any friends?
Noway!


In [109]:
# what does a backspace do?
print(demo_string  +"\b")

Does Cersei have any friends?
Noway


In [110]:
# the escape character may be useful when you need single quotes inside a single quote-marked string...
'can any of you pronounce \'s-Hertogenbosch?'

"can any of you pronounce 's-Hertogenbosch?"

In [111]:
# but I prefer this solution (when possible)
"can any of you pronounce 's-Hertogenbosch?"

"can any of you pronounce 's-Hertogenbosch?"

Strings that span multiple lines can be written in a readable form by using the sequence `"""` as a delimiter

In [114]:
print("""Unsealed, on a porch a letter sat
Then you said, "I wanna leave it again"
Once I saw her on a beach of weathered sand
And on the sand I wanna leave her again""")

Unsealed, on a porch a letter sat
Then you said, "I wanna leave it again"
Once I saw her on a beach of weathered sand
And on the sand I wanna leave her again


**Question:**

In [115]:
# where do all those '\n's come from???
""""On a weekend I wanna wish it all away, yeah
And they called and I said that I'll go
And I said that I'll call out again
And the reason I ought ta leave her calm, I know
I said, "I don't know whether I'm the boxer or the bag"""""

'"On a weekend I wanna wish it all away, yeah\nAnd they called and I said that I\'ll go\nAnd I said that I\'ll call out again\nAnd the reason I ought ta leave her calm, I know\nI said, "I don\'t know whether I\'m the boxer or the bag'

#### String slicing is a thing:

![alt text](https://github.com/bloemj/AUC_TMCI_2022/blob/main/notebooks/images/string-slicing.png?raw=1)


Source: [Bird et al. (2009)](http://www.nltk.org/book/ch03.html)

In [116]:
# let's try it
"Monty Python"[-12:-7]

'Monty'

In [117]:
# ...but single characters cannot be replaced
try:
    demo_string[5:11] = "Melisandre"
except TypeError as e:
    print(e)

'str' object does not support item assignment


#### String Methods

Strings have a buch of dedicated methods (see the [documentation](https://docs.python.org/2/library/stdtypes.html#string-methods) for a complete list), that allows them to be both inspected or manipulated (they are not modified, rather a **new object** is returned). The following are those I use the most:

In [118]:
# is the string composed SOLELY of 1. digits 2. alphabetic characters 3. both?
print('100'.isdigit())
print('cat'.isalpha())
print('my cat is 100'.isalnum())

True
True
False


In [119]:
print(demo_string)

Does Cersei have any friends?
Noway!


In [120]:
# does the string starts or ends with a given sequence of characters?
print(demo_string.startswith("d"))  # it is case sensitive !!!
print(demo_string.endswith("Noway!"))

False
True


In [121]:
# change case to all the characters of a string
print(demo_string_single.upper())
print(demo_string_single.lower())

DOES CERSEI HAVE ANY FRIENDS?
does cersei have any friends?


In [122]:
# remove a given character (default is any whitespace) from the beginning and the end of a string
print ("Twice minus: - before and after -".strip("-"))
print ("  \t  Too much space?".strip())

Twice minus: - before and after 
Too much space?


In [123]:
# replace a given sequence of characters with another 
print(demo_string.replace("Cersei", "Melisandre"))

Does Melisandre have any friends?
Noway!


A string can be transformed into a list of string by splitting it on a given character

In [124]:
# a whitespace
demo_string_single.split(" ")

['Does', 'Cersei', 'have', 'any', 'friends?']

In [125]:
# the default character is any white line (that's convenient)
demo_string.split()

['Does', 'Cersei', 'have', 'any', 'friends?', 'Noway!']

In [126]:
# the maximun number of slits can be specified
demo_string_single.split(" ", 2)

['Does', 'Cersei', 'have any friends?']

The inverse operation is possible, a list of strings can be joined by a single character

In [127]:
# a whitespace
" ".join(["One","Two","Three"])

'One Two Three'

In [128]:
# an hyphen
"-".join(["One","Two","Three"])

'One-Two-Three'

In [129]:
# no characters at all
"".join(["Super", "cali", "fragilistic", "expiali", "docious"] )

'Supercalifragilisticexpialidocious'

In [130]:
# BEWARE: non string elements in the original list are not admitted
try:
    " ".join(["Today", "I", "am", 45])
except TypeError as e:
    print(e)

sequence item 3: expected str instance, int found


---

### Quiz

Italian nobles tends to have an awful lot of names. 

For instance, "Vittorio Emanuele di Savoia" has the 12 names listed in `full_name`. 

Can you find a pythonic way to eliminate the less used names from this string?

In [131]:
full_name = "Vittorio Emanuele Alberto Carlo Teodoro Umberto Bonifacio Amedeo Damiano Bernardino Gennaro Maria di Savoia"

In [137]:
# your code here
full_name_list = full_name.split(" ")
short_name = " ".join(full_name_list[:2] + full_name_list[-2:])
print(short_name)

Vittorio Emanuele di Savoia


---

#### Unicode

##### Unicode in brief

Characters, the smallest textual units, are abstractions that the computer represent as code points. Code points are integer values, usually denoted in base 16. 

*The Unicode Standard describes how code points map to characters and vice versa.* (see the [Unicode character table](https://unicode-table.com/en)).


##### Encodings

Sequences of code points are represented in the computer memory as a set of bytes. 

*The rules for translating a Unicode string into a sequence of bytes are called an Encoding.*

The principal character encoding are `ASCII`, `Latin-1` (`ISO-8859-1`, `iso88591` …), and `UTF-8` (`utf8`, `UTF_8` …). They are partly incompatible, with the following exception: a document stored in `ASCII` can be read using `Latin-1` or `UTF-8`, because `Latin-1` and `UTF-8` are subsets of `ASCII` (i.e. their 0-128 code-points are identical).

As a general rule, if you don't know which encoding to choose for your text, you should use `UTF-8`, mainly because it can handle any Unicode point and because it is more compact than other comparable encoding like `UTF-16` and `UTF-32`.

#### Unicode in Python 3.x

Strings are unicode by default in Python 3.

In [138]:
# a byte string is visualised according to your default locale
string_1 = "Does Cersei have any friends?"
type(string_1)

str

In [139]:
# a unicode string
string_2 = u"Does Cersei™ have any friends?"
type(string_2)

str

##### From bytes to Unicode (and back)

`encode()` allows you to convert unicode strings to byte strings.

In [140]:
# default Py3 unicode string
print(type(string_2))

# unicode string to byte string
string_2 = string_2.encode('utf8')
print(type(string_2))

string_2 = string_2.decode("utf-8")
print(type(string_2))

<class 'str'>
<class 'bytes'>
<class 'str'>


#### Unicode Decoding and Encoding

![alt text](https://github.com/bloemj/AUC_TMCI_2022/blob/main/notebooks/images/unicode.png?raw=1)


Source: [Bird et al. (2009)](http://www.nltk.org/book/ch03.html#what-is-unicode)

---

### Sets

Sets are collections of **unordered** of **distinct** objects.

They are commonly used to test membership, to remove duplicates or to compute mathematical operations such as intersection, union, difference, and symmetric difference. Being unordered collections, they do not support indexing, slicing and any other sequence-like behavior.

In Python, sets can be create beither by using the syntax  `set([])` or by using curly braces `{}`.

In [141]:
# when you creates a set, repetitions are removed
demo_set = set([1,2,1,2,1,2,3,5,6,4,2,3,5,2])
another_demo_set = {1,6,6,3,9}
print(demo_set)
print(another_demo_set)

{1, 2, 3, 4, 5, 6}
{1, 3, 6, 9}


In [142]:
# elements may we added with the function add()
demo_set.add(7)
demo_set

{1, 2, 3, 4, 5, 6, 7}

In [143]:
# elements may we removed with the function remove()
demo_set.remove(2)
demo_set

{1, 3, 4, 5, 6, 7}

In [144]:
# length of a set
len(demo_set)

6

In [145]:
# membership test
5 in demo_set

True

In [146]:
# the union of the two sets
demo_set.union(another_demo_set)

{1, 3, 4, 5, 6, 7, 9}

In [147]:
# intersection of the two sets
demo_set.intersection(another_demo_set)

{1, 3, 6}

In [148]:
# elements that are in the demo_set but not in another_demo_set
demo_set.difference(another_demo_set)

{4, 5, 7}

In [149]:
# every elements of another_demo_set are in demo_set?
print(demo_set.issuperset(another_demo_set))
print(another_demo_set.issubset(demo_set))

False
False


---

### Quiz

The following list contains 100 random extractions (with replacement) of numbers between 1 and 15. 

Find the number that has never been extracted

In [150]:
random_numbers = [1, 2, 1, 1, 9, 13, 15, 5, 9, 8, 12, 14, 3, 2, 8, 10, 3, 12, 15, 13, 5, 3, 7, 5, 2, 13, 12, 8, 10, 5, 15, 8, 2, 8, 5, 12, 9, 2, 3, 5, 1, 4, 5, 9, 13, 2, 12, 5, 10, 8, 1, 15, 15, 6, 12, 3, 1, 3, 7, 14, 15, 10, 15, 7, 10, 12, 1, 2, 13, 7, 9, 6, 6, 7, 4, 12, 10, 8, 8, 3, 8, 4, 6, 14, 10, 5, 2, 3, 15, 4, 9, 3, 7, 7, 2, 4, 4, 1, 7, 15]

In [156]:
# your code here
full_set = set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
extracted_set = set(random_numbers)
full_set.difference(extracted_set)

{11}

---

### Dictionaries

Dictionaries are **associative arrays** mapping **immutable** types (string, numbers, tuples...) to arbitrary objects of any kind (variables, functions, modules...). Intuitively, they can be thought as collections of objects that we can recall by means of a unique key. 

To visualize a Python dictionary you can think of a telephone book, in which people names are the unique keys that you use to retrieve difference kinds of information (phone numbers, street address, mail address...). The same telephone number, street address or other information can be present in the entries of more people, but a label cannot be associated with more than one entry. 

In Python, dictionaries are defined by curly brackets `{}`, in which key-value pairs are separated by commas and joint by colons.

In [157]:
# a English-Dutch dictionary of colors
demo_dictionary = {"black" : "zwart",
                  "white" : "wit",
                  "red" : "rood",
                  "yellow" : "geel"}
print(demo_dictionary)

{'black': 'zwart', 'white': 'wit', 'red': 'rood', 'yellow': 'geel'}


In [158]:
# values can be recalled by their keys
demo_dictionary["white"]

'wit'

In [159]:
# we can change a value assocaited with a key
demo_dictionary["white"] = "wit"
print(demo_dictionary)

{'black': 'zwart', 'white': 'wit', 'red': 'rood', 'yellow': 'geel'}


In [160]:
# if the key is missing a new key : value pair is added
demo_dictionary["blue"] = "blauw"
print(demo_dictionary)

{'black': 'zwart', 'white': 'wit', 'red': 'rood', 'yellow': 'geel', 'blue': 'blauw'}


In [161]:
# key:values can be deleted with the command "del()"
del(demo_dictionary["blue"])
print(demo_dictionary)

{'black': 'zwart', 'white': 'wit', 'red': 'rood', 'yellow': 'geel'}


In [162]:
# check if a dictionary has a given key
"blue" in demo_dictionary

False

In [163]:
# count the number of entries in a dictionary
len(demo_dictionary)

4

#### Iterating over a Dictionary

In [164]:
# iterate over dictionary keys:
print(list(demo_dictionary))
print(list(demo_dictionary.keys()))

['black', 'white', 'red', 'yellow']
['black', 'white', 'red', 'yellow']


In [165]:
# iterate over dictionary values:
print(list(demo_dictionary.values()))

['zwart', 'wit', 'rood', 'geel']


In [166]:
# iterate over dictionary key-value pairs:
print(list(demo_dictionary.items()))

[('black', 'zwart'), ('white', 'wit'), ('red', 'rood'), ('yellow', 'geel')]


---

### Quiz

How many characters does the longest Dutch colour name in our dictionary have?

In [170]:
# your code here
max_len = 0
for v in demo_dictionary.values():
    if len(v) > max_len:
        max_len = len(v)
print(max_len)

5


If you solved this using what we have learned so far, without flow control, it was probably a bit tricky or you had to hard-code some indices. In the next part of this notebook, we will learn a more efficient way of doing this. After that, you can try to solve this quiz in a more efficient way.

---

###  Type casting

Sometimes, we may need to change the type of a variable. 

For instance, we may want to change a list into a set in order to delete all its repeated elements. A quick way to do so is to transform the list in a set.

Other example may involve the `join()` operator and the `print()` functions, that do not accept numbers. Conversion functions can be used to quickly switch numbers to strings.

Note that not all types of variables can be switched to other types. In what follows we report some common conversion.

In [171]:
# from number to string
str(3.123424235454)

'3.123424235454'

In [172]:
# sometimes you want to round the number
str(round(3.123424235454, 2))

'3.12'

In [173]:
# from string to integer
int("3")

3

In [174]:
# from string to floating numer
float("3")

3.0

In [175]:
# from list to tuple (list cannot be used as dictionary keys, tuples can)
tuple([demo_list])

(['list', 'new_text', 7, 38, 38, 7, 'new_text', 'text'],)

In [176]:
# from list to set
set([1,2,3,1,2,3,3])

{1, 2, 3}

In [177]:
# from string to list
print(list("this is a sentence"))

['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 'e', 'n', 't', 'e', 'n', 'c', 'e']


---

### Quiz

How many different characters did Jonathan Coe use in the titles of his books?

In [178]:
coe_bibliography = ["The Accidental Woman", "A Touch of Love", "The Dwarves of Death", "What a Carve Up! or The Winshaw Legacy Viking", "The House of Sleep", "The Rotters' Club", "The Closed Circle", "The Rain Before It Falls", "The Terrible Privacy of Maxwell Sim", "Expo 58", "Number 11"]

In [182]:
# your code here
chars = []
for title in coe_bibliography:
    for char in title:
        if char not in chars:
            chars.append(char)
print(len(chars))

47


---

## Python Syntax

### The Significant Whitespace

Most program languages use characters (e.g. `{...}`) or keyworks (e.g. `begin ... end`) to delimitate blocks of codes. 

#### When writing Python code, you rely on INDENTATION to structure your programs. 

All programming languages allow you to indent (and you should!), but in Python you **have to.**

Otherwise, you'll receive and IndentationError and your code won't work

#### How Indentation Works

- All statements with the same distance from the left border belong of the same block of code. 


- Sub-blocks are more indented, while the block ends at the line less indented.


- You should use **4 spaces** per indentation level.


- When a statement is too long it's good practice to avoid lines of code longer than 80 characters), it can be split with `"\"`


- **Never mix** spaces and tabs in a single source file


> #### Recommended Reading:
>
> Recommended Reading: [PEP 8 - Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)

##### The code is way more readable:


```python
# raw_input() reads from standard input (e.g. keyboard)
n_string = raw_input('enter a number, please')

if not n_string.isdigit():
    print("this isn't a number...")
else:
    n = int(n_string)
    if n == 0:
        print("zero? why zero?")
    elif n % 2 == 0:
        print("even")
    else:
        print("odd")
```

##### The structure is transparent:

<div style="float:left;margin:-12px 0 0 0" markdown="1">
    <img src="https://github.com/bloemj/AUC_TMCI_2022/blob/main/notebooks/images/Indent.png?raw=1" width="505">
</div>

### Conditional Statements

A lot of programming has to do with executing a block of code only if a certain condition is verified. 

In Python, the `if-then-else` construct has the form:

```python
if condition1:
    statements
elif condition2:
    statements
elif condition3:
    statements
else:
    statements
```

Note that the `elif` and `else` clauses are optional. A conditional statement can contain a single `if` block, and nothing else.

In [183]:
import random
n = random.randrange(0,99)  # random number between 0 and 99
if n == 0:
    print("zero? why zero?")
elif n % 2 == 0:
    print("even")
else:
    print("odd")

odd


## Flow control

### For Loops

Programming is of little use if we cannot repeat an instruction for an intended number of times. 

The `for` statement allows us to define **iterations** (i.e.taking items from an iterable) by following this template:

```python
for variable in sequence:
	statement
else:
	statement
```

The code in the optional `else` clause is executed if and only if the loop terminates successfully (i.e. without a **`break`**)

In [184]:
# let's iterate over our demo_list
for el in demo_list:
    print(el)

list
new_text
7
38
38
7
new_text
text


#### The enumerate function

The `enumerate()` is problably the most used among the functions that supports the iteration of an iterable. This funtion return the current item plus **its index** in the iteration process.

In [185]:
# use enumerate in the iteration over our demo_list
for i, el in enumerate(demo_list):
    print (i, "-->", el)

0 --> list
1 --> new_text
2 --> 7
3 --> 38
4 --> 38
5 --> 7
6 --> new_text
7 --> text


#### The range construct

The  `range()` construct can be used to control the iteration. It generate lists of numbers on the basis of the following three arguments:

- `start` : the first integer of the list
(default is 0)
- `stop` : one larger the last integer of the list (list stop at n - 1)
- `step`: the increment of the list (default is 1)

In [186]:
# let's play with range
print(range(0,10))
print(range(10))
print(range(1,10,2))

range(0, 10)
range(0, 10)
range(1, 10, 2)


In [187]:
# let's use range in a for loop
for el in range(1, 10, 2):
    print(el)

1
3
5
7
9


### While Loops

The `while` statement allows us to control a loop on the basis of a condition. 

A `while` loop runs as long as a condition is verified. 

It has the following general form:

```python
while condition:
	statement
else:
	statement
```

the code in the optional `else` clause is execute if and only if the loops terminates successfully (i.e., without a **`break`**)

In [188]:
n = 1
while n % 2 != 0:
    n = random.randrange(0,99)
    print(n)

42


### Break and Continue

`break` and `continue` are two statements that allow for a more flexible control of a loop. Intuitively:

- `continue` is used to pass to the next iteration of the loop
- `break` is used to interrupt the loop abruptly

In [189]:
# when we encounter 7 we skip to the next step
for el in range(1, 10, 2):
    if el == 7:
        continue
    print(el)

1
3
5
9


In [190]:
# when we encounter 7 we stop our loop 
for el in range(1, 10, 2):
    if el == 7:
        break
    print(el)

1
3
5


`break` influence the execution of the loop in yet another way: when a loop terminates due to a `break` statement, the code embedded in the option `else` clause is skipped.

In [191]:
# the continue statement does not influence the execution of the else block
for el in range(1, 10, 2):
    if el == 7:
        print ("(let's ignore the " + str(el) + ")")
        continue
    print(el)
else:
    print (">>> the iteration ended with the number " + str(el))

1
3
5
(let's ignore the 7)
9
>>> the iteration ended with the number 9


In [192]:
# what if we replace continue with break
for el in range(1, 10, 2):
    if el == 7:
        print ("(we encountered the number " + str(el) + ", let's break the loop)")
        break
    print(el)
else:
    print (">>> the iteration ended with the number " + str(el))

1
3
5
(we encountered the number 7, let's break the loop)


### The Pass Statement

Given the importance of indentation for Python, sometimes we may need a placeholder that allows us to write down a condition for an `if-then-else` construct or for a `while` loop without writing any statement (maybe just a comment). This is the case in which the `pass` statement comes in handy. 

In what follows, **nothing happens**:

```python
if condition1:
    pass
else:
    pass
```


### List Comprehensions

A list comprehension is a syntactic construct that allows us to create lists by applying a function on another list, in just **one line** of code. 

Even if the reverse isn't always true, list comprehensions can always be (inefficiently) expressed as loops. We will exploit this family resemblance for introducing this construct.

In what follows, we start with a list of numbers and we want to square all of its elements and save our final values in a new list.

In [193]:
# our source list
source_list = [1,2,3,4,5,6,7,8,9]

In [194]:
# we can solve this problem with a for loop...
final_list = []
for el in source_list:
    final_list.append(el ** 2)
print(final_list)

[1, 4, 9, 16, 25, 36, 49, 64, 81]


In [195]:
# ... or by using list comprehension
final_list = [el ** 2 for el in source_list]
print(final_list)

[1, 4, 9, 16, 25, 36, 49, 64, 81]


**Conditional statements may be implemented**

In what follows we want to ignore all the odd numbers

In [196]:
# we can solve this problem with a for loop...
final_list = []
for el in source_list:
    if el % 2 == 0:
        final_list.append(el ** 2)
print(final_list)

[4, 16, 36, 64]


In [197]:
# ... or by using list comprehension
final_list = [el ** 2 for el in source_list if el % 2 == 0]
print(final_list)

[4, 16, 36, 64]


**If you want to implement an else clause the syntax changes slightly**

In what follows we want to leave the odd numbers unchanged

In [198]:
# we can solve this problem with a for loop...
final_list = []
for el in source_list:
    if el % 2 == 0:
        final_list.append(el ** 2)
    else:
        final_list.append(el)
print(final_list)

[1, 4, 3, 16, 5, 36, 7, 64, 9]


In [199]:
# ... or by using list comprehension
final_list = [el ** 2 if el % 2 == 0 else el for el in source_list]
print(final_list)

[1, 4, 3, 16, 5, 36, 7, 64, 9]


---

### Quiz

The list `random_numbers` contains 20 randomly generated integers. 

Create a new list containing only the positive numbers from the list

In [200]:
random_numbers = [81, -36, 15, -96, 14, 51, 70, 40, -15, -64, -25, 82, -88, 7, -13, -30, 12, 32, 96, -55]

In [201]:
# your code here
postive_numbers = []
for n in random_numbers:
	if n > 0:
		postive_numbers.append(n)


---

**Here's something you might find useful:**

[Python Cheat Sheet](https://www.cheatography.com/davechild/cheat-sheets/python/)

Collects much of the syntax we used today.

---

### Exercise 1.

The code in the next cell creates a variable called `zen_text`, and assigns it a nicely formatted version of the textual elements of the Zen of Python.

**Count the number of non-empty lines** of this manifesto.

In [202]:
import this
zen_text = ''.join(this.d.get(el, el) for el in this.s)

In [207]:
# your code here
non_empty_lines = len([l for l in zen_text.split('\n') if l.strip()])
print(non_empty_lines)

20


### Exercise 2.

The dictionary in the following cell reports, for each Scrubs character, a nested dictionary containing the name of the actor, the age of the character and its credentials.

Write code to answer the following questions:

- what are the **names of the actors** of the cast?


- what is the **average age** of the characters?


- how **many M.D.s** are there in the main cast?

In [None]:
scrubs_main = {
    "Bob Kelso": {"actor": "Ken Jenkins", "age": 70, "credentials": "M.D."},
    "Carla Espinosa-Turk": {"actor": "Judy Reyes", "age": 36, "credentials": "RN"},
    "Christopher Turk": {"actor": "Donald Faison", "age": 31, "credentials": "M.D."},
    "Elliot Reid": {"actor": "Sarah Chalke", "age": 29, "credentials": "M.D."},
    "J.D.": {"actor": "Zach Braff", "age": 31, "credentials": "M.D."},
    "Janitor": {"actor": "Neil Flynn", "age": 40, "credentials": None},
    "Perry Cox": {"actor": "John C. McGinley", "age": 45, "credentials": "M.D."},
}

In [None]:
# your code here
actors = []
ages = []
md_count = 0

for char, info in scrubs_main.items():
    actors.append(info["actor"])
    ages.append(info["age"])
    if info["credentials"] == "M.D.":
        md_count += 1

print("Actors: " + str(actors))
print("Average age: " + str(sum(ages) / len(ages)))
print("Number of M.D.s: " + str(md_count))


Actors: ['Ken Jenkins', 'Judy Reyes', 'Donald Faison', 'Sarah Chalke', 'Zach Braff', 'Neil Flynn', 'John C. McGinley']
Average age: 40.285714285714285
Number of M.D.s: 5


---