# Introduction to Python and NumPy
## Part One - Python
#### By Jonathan L. Moran (jo6155mo-s@student.lu.se)
From the EDAN95 - Applied Machine Learning course given at Lunds Tekniska Högskola (LTH) | Ht2 2019.

## Objectives
In this set of notebooks you will
*  Get a hands-on introduction to Python
*  Refresh your knowledge on linear algebra
*  Know the main functions of numpy

## Implementation task

_1. Run all the code in the chapter: A Tour of Python (P. Nugues)._

#### 1.2 The Read, Evaluate and Print Loop

In [1]:
### From P. Nugues' Ch. 1 - A Tour of Python (Language Processing with Python, 2019)

In [2]:
a = 1            # We create variable a and assign it with 1
b = 2            # We create b and assign it with 2
b + 1            # We add 1 to b
# And Python returns the result...

3

In [3]:
c = a / (b + 1)  # We carry out a computation and assign it to c
c                # We print c

0.3333333333333333

In [4]:
text = 'Result:' # We create text and assign it with a string
print(text, c)   # And we print both text and c

Result: 0.3333333333333333


#### 1.3 Introdutory Programs

_The program below uses a loop to print the numbers of a list. The loop starts with the `for` and `in` statements ended with a colon. After this statement, we add an indentation of four spaces to define the body of the loop: The statements executed by this loop. We remove the indentation when the block has ended:_

In [5]:
for i in [1, 2, 3, 4, 5, 6]:
    print(i)
print('Done')

1
2
3
4
5
6
Done


_The next program introduces a condition with the `if` and `else` statements, also ended with a colon, and the modulo operator, `%`, to print the odd and even numbers:_

In [6]:
for i in [1, 2, 3, 4, 5, 6]:
    if i % 2 == 0:
        print('Even:', i)
    else:
        print('Odd:', i)
print('Done')

Odd: 1
Even: 2
Odd: 3
Even: 4
Odd: 5
Even: 6
Done


#### 1.4 Strings

In [7]:
iliad = """Sing, O goddess, the anger of Achilles son of
Peleus, that brought countless ills upon the Achaeans."""
iliad

'Sing, O goddess, the anger of Achilles son of\nPeleus, that brought countless ills upon the Achaeans.'

_In the example above, the string includes a new line delimiter, `\n`, between of
and Peleus to break the line._

_If, instead, we want to keep the white spaces and just wrap the line so that it fits our text editor, we will use the backslash continuation character, `\`, as in:_

In [8]:
iliad2 = 'Sing, O goddess, the anger of Achilles son of \
Peleus, that brought countless ills upon the Achaeans.'
iliad2

'Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans.'

##### 1.4.1 String Index

_We access the characters in a string using their index enclosed in square brackets, starting at 0:_

In [9]:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
alphabet[0]     # ’a’
alphabet[1]     # ’b’
alphabet[25]    # ’z’

'z'

_We can use negative indices, that start from the end of the string:_

In [10]:
alphabet[-1]    # the last character of a string: ’z’
alphabet[-2]    # the second last: ’y’
alphabet[-26]   # ’a’

'a'

_The length of a string is given by the `len()` function_

In [11]:
len(alphabet)   # 26

26

##### 1.4.2 String Operations and Functions

Strings come with a set of built-in operators and functions. We concatenate and repeat strings using `+` and `*` as in:

In [12]:
'abc' + 'def'   # ’abcdef’
'abc' * 3       # ’abcabcabc’

'abcabcabc'

_The `join()` function is an alternative to `+`. It is called by a string with a list as argument: `str.join(list)`. It concatenates the elements of the list with the calling string, possibly empty, placed in-between:_

In [13]:
''.join(['abc', 'def', 'ghi'])    # equivalent to a +:
                                  # ’abcdefghi’
' '.join(['abc', 'def', 'ghi'])   # places a space between the
                                  # elements: ’abc def ghi’
', '.join(['abc', 'def', 'ghi'])  # ’abc, def, ghi’

'abc, def, ghi'

_We set a string in uppercase letters with `str.upper()` and in lowercase with `str.lower()`:_

In [14]:
accented_e = 'eéèêë'
accented_e.upper()                # ’EÉÈÊË’

'EÉÈÊË'

In [15]:
accented_E = 'EÉÈÊË'
accented_E.lower()                # ’eéèêë’

'eéèêë'

_We search and replace substrings in strings using `str.find()` and `str.replace()`. `str.find()` returns the index of the first occurrence of the substring or `-1`, if not found, while `replace()` replaces all the occurrences of the substring and returns a new string:_

In [16]:
alphabet.find('def')              # 3

3

In [17]:
alphabet.find('é')                # -1

-1

In [18]:
alphabet.replace('abc', 'αβγ')    # ’αβγdefghijklmnopqrstuvwxyz’

'αβγdefghijklmnopqrstuvwxyz'

_We can iterate over the characters of a string using a `for` `in` loop, and for instance extract all its vowels as in:_

In [19]:
text_vowels = ''
for i in iliad:
    if i in 'aeiou':
        text_vowels = text_vowels + i
print(text_vowels)                # ’ioeeaeoieooeeuaououeiuoeaea’

ioeeaeoieooeeuaououeiuoeaea


_We can abridge the statement:_

In [20]:
text_vowels = text_vowels + i

_into_

In [21]:
text_vowels += i

_as well as for all the arithmetic operators: `-=`, `*=`, `/=`, `**=`, and `%=`._

##### 1.4.3 Slices

_We can extract substrings of a string using **slices**: A range defined by a start and an end index, `[start:end]`, where the slice will include all the characters from index start up to index `end - 1`:_

In [22]:
alphabet[0:3]     # the three first letters of alphabet: ’abc’
alphabet[:3]      # equivalent to alphabet[0:3]
alphabet[3:6]     # substring from index 3 to index 5: ’def’
alphabet[-3:]     # the three last letters of alphabet: ’xyz’
alphabet[10:-10]  # ’klmnop’
alphabet[:]       # all the letters: ’a...z’

'abcdefghijklmnopqrstuvwxyz'

_As the end index is excluded from the slice,_
```
alphabet[:i] + alphabet[i:]
```
_is always equal to the original string, whatever the value of `i`._

In addition to the start and the end, we can add a step using the syntax `[start:end:step]`.
With a step of 2, we extract every second letter:

In [23]:
alphabet[0::2]        # acegikmoqzuwy

'acegikmoqsuwy'

##### 1.4.4 Special Characters

_The characters in the strings are interpreted literally by Python, except the quotes and backslashes. To create strings containing these two characters, Python defines two escape sequences: `\’` to represent a quote and `\\` to represent a backslash as in:_

In [24]:
'Python\'s strings'    # "Python’s strings"

"Python's strings"

_The `\N{name}` name and `\uxxxx` and `\Uxxxxxxxx` sequences enable us to designate any character, like Ö and Œ, by its Unicode name, respectively, `\N{LATIN CAPITAL LETTER O WITH DIAERESIS}` and `\N{LATIN CAPITAL LIGATURE OE}`, or its code point, `\u00D6` and `\u0152`._

In [25]:
'\N{COMMERCIAL AT}'    # ’@’
'\x40'                 # ’@’
'\100'                 # ’@’
'\u0152'               # ’Œ’

'Œ'

_If we want to treat backslashes as normal characters, we add the `r` prefix (raw) to the string as in:_

In [26]:
r'\N{COMMERCIAL AT}'   # ’\\N{COMMERCIAL AT}’
r'\x40'                # ’\\x40’
r'\100'                # ’\\100’
r'\u0152'              # ’\\u0152’

'\\u0152'

##### 1.4.5 Formatting Strings

_Python can interpolate variables inside strings. This process is called formatting and uses the `str.format()` function. The positions of the variables in the string are given by curly braces: `{}` that will be replaced by the arguments in `format()` in the same order as in:_

In [27]:
begin = 'my'
'{} string {}'.format(begin, 'is empty')
# ’my string is empty’

'my string is empty'

In [28]:
begin = 'my'
'{0} string {1}'.format(begin, 'is empty')
# ’my string is empty’

'my string is empty'

_If the input string contains braces, we escape them by doubling them: `{{` for a literal { and `}}` for }._

#### 1.5 Data Types

_We return the type of a value with the `type()` function:_

In [29]:
type(alphabet)     # <class ’str’>
type(12)           # <class ’int’>
type('12')         # <class ’str’>
type(12.0)         # <class ’float’>
type(True)         # <class ’bool’>
type(1 < 2)        # <class ’bool’>
type(None)         # <class ’NoneType’>

NoneType

_Python supports the conversion of types using a function with the type name as `int()` or `str()`. When the conversion is not possible, Python throws an error:_

In [30]:
int('12')         # 12
str(12)           # ’12’
int('12.0')       # ValueError
int(alphabet)     # ValueError
int(True)         # 1
int(False)        # 0
bool(7)           # True
bool(0)           # False
bool(None)        # False

ValueError: invalid literal for int() with base 10: '12.0'

#### 1.6 Data Structures

##### 1.6.1 Lists
_Lists in Python are data structures that can hold any number of elements of any type. Like in strings, each element has a position, where we can read data using the position index. We can also write data to a specific index and a list grows or shrinks automatically when elements are appended, inserted, or deleted. Python manages the memory without any intervention from the programmer._

In [31]:
list1 = []       # An empty list
list1 = list()   # Another way to create an empty list
list2 = [1,2,3]  # List containing 1, 2, and 3
list2[1]         # 2
list2[1] = 8
list2            # [1, 8, 3]
list2[4]         # Index error

IndexError: list index out of range

_Lists can contain elements of different types:_

In [32]:
var1 = 3.14
var2 = 'my string'
list3 = [1, var1, 'Prolog', var2]
list3            # [1, 3.14, ’Prolog’, ’my string’]

[1, 3.14, 'Prolog', 'my string']

In [33]:
list3[1:3]       # [3.14, ’Prolog’]
list3[1:3] = [2.72, 'Perl', 'Python']
list3            # [1, 2.72, ’Perl’, ’Python’, ’my string’]

[1, 2.72, 'Perl', 'Python', 'my string']

_We can create lists of lists:_

In [34]:
list4 = [list2, list3]
list4
# [[1, 8, 3], [1, 2.72, ’Perl’, ’Python’, ’my string’]]

[[1, 8, 3], [1, 2.72, 'Perl', 'Python', 'my string']]

_where we access the elements of the inner lists with a sequence of indices between square brackets:_

In [35]:
list4[0][1]       # 8
list4[1][3]       # ’Python’

'Python'

_We can also assign complete list to a variable and a list to a list of variables as in:_

In [36]:
list5 = list2
[v1,v2,v3] = list5

_where `list5` contains a copy of `list2`, and `v`, `v2`, `v3` contain, respectively, 1, 8, and 3._

##### 1.6.2 Built-in List Operations and Functions
_Lists have built-in operators and functions. Like for strings, we can use the `+` and `*` operators to concatenate and repeat lists:_

In [37]:
list2                         # [1, 8, 3]
list3[:-1]                    # [1, 2.72, ’Perl’, ’Python’]
[1, 2, 3] + ['a', 'b']        # [1, 2, 3, ’a’, ’b’]
list2[:2] + list3[2:-1]       # [1, 8, ’Perl’, ’Python’]
list2 * 2                     # [1, 8, 3, 1, 8, 3]
[0.0] * 4                     # Initializes a list of four 0.0s
# [0.0, 0.0, 0.0, 0.0]

[0.0, 0.0, 0.0, 0.0]

_In addition to operators, lists have functions that include:_
* `list.extend(elements)` that extends the list with the elements of elements passed as argument;
* `list.append(element)` that appends element to the end of the list;
* `list.insert(idx, element)` that inserts element at index idx;
* `list.remove(value)` that removes the first occurrence of value;
* `list.pop(i)`, that removes the element at index i and returns its value; If there is no index, list.pop() takes the last element in the list;
* `del list[i]`, a statement that also removes the element at index i. In addition, `del` can remove slices, clear the whole list, or delete the list variable;
* `len()`, a function that returns the length of list;
* `list.sort()` that sorts the list;
* `sorted()` a function that returns a sorted list.

In [38]:
list2                          # [1, 8, 3]
list2[1] = 2                   # [1, 2, 3]
len(list2)                     # 3
list2.extend([4, 5])           # [1, 2, 3, 4, 5]
list2.append(6)                # [1, 2, 3, 4, 5, 6]
list2.append([7, 8])           # [1, 2, 3, 4, 5, 6, [7, 8]]
list2.pop(-1)                  # [1, 2, 3, 4, 5, 6]
list2.remove(1)                # [2, 3, 4, 5, 6]
list2.insert(0, 'a')           # [’a’,2,3,4,5,6]

_To know all the functions associated with a type, we can use `dir()`, as in:_

In [39]:
dir(list)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

_or_

In [40]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


_To have help on a specific type or function, we can use help as in:_

In [41]:
help(list)

Help on class list in module builtins:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self))

_and_

In [42]:
help(list.append)

Help on method_descriptor:

append(self, object, /)
    Append object to the end of the list.



##### 1.6.3 Tuples
_Tuples are sequences enclosed in parentheses. They are very similar to lists, except that they are immutable. Once created, we access the elements of a tuple, including slices, using the same notation as with the lists._

In [43]:
tuple1 = ()               # An empty tuple
tuple1 = tuple()          # Another way to create an empty tuple
tuple2 = (1, 2, 3, 4)
tuple2[3]                 # 4
tuple2[1:4]               # (2, 3, 4)

(2, 3, 4)

_Parentheses enclosing one item could be ambiguous as `(1)`, for example, as it already denotes an arithmetic expression. That is why tuples of one item require a trailing comma:_

In [44]:
type((1))       # <class ’int’>
                # Arithmetic expression corresponding to integer 1

int

In [45]:
type((1,))      # <class ’tuple’>
                # A tuple consisting of one item: integer 1

tuple

_We can convert lists to tuples and tuples to lists:_

In [46]:
list6 = ['a', 'b', 'c']
tuple3 = tuple(list6)       # conversion to a tuple: (’a’, ’b’, ’c’)
type(tuple3)                # <class ’tuple’>
list7 = list(tuple2)        # [1, 2, 3, 4]
tuple([1])                  # (1,)
                            # conversion to a tuple of one item
list((1,))                  # [1]
                            # conversion to a list of one item

[1]

_Tuple can include elements of different types. If an inner element is mutable, we can change its value as in:_

In [47]:
tuple4 = (tuple2, list6)    # ((1, 2, 3, 4), [’a’, ’b’, ’c’])
tuple4[0]                   # (1, 2, 3, 4),
tuple4[1]                   # [’a’, ’b’, ’c’]
tuple4[0][2]                # 3
tuple4[1][1]                # ’b’
tuple4[1][1] = 'β'          # ((1, 2, 3, 4), [’a’, ’β’, ’c’])

##### 1.6.4 Sets
_Sets are collections that have no duplicates. We create a set with a sequence enclosed in curly braces or an empty set with the `set()` function. We can then add and remove elements with the `add()` and `remove()` functions:_

In [48]:
set1 = set()                        # An empty set
set2 = {'a', 'b', 'c', 'c', 'b'}    # {’a’, ’b’, ’c’}
set2.add('d')                       # {’a’, ’b’, ’c’, ’d’}
set2.remove('a')                    # {’b’, ’c’, ’d’}

_Sets are useful to extract the unique elements of lists or strings as in:_

In [49]:
list8 = ['a', 'b', 'c', 'c', 'b']
set3 = set(list8)                   # {’a’, ’b’, ’c’}
iliad_chars = set(iliad.lower())
# The set of unique characters of the iliad string

_Sets are unordered. We can create a sorted list of them using `sorted()` as in:_

In [50]:
sorted(iliad_chars)

['\n',
 ' ',
 ',',
 '.',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'l',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u']

##### 1.6.5 Built-in Set Functions
_The `set` library includes the classical set operations:_
* `set1.intersection(set2, ...)`
* `set1.union(set2, ...)`
* `set1.difference(set2, ...)`
* `set1.symmetric_difference(set2)`
* `set1.issuperset(set2)`
* `set1.issubset(set2)`

_A few examples:_

In [51]:
set2.intersection(set3)                  # {’c’, ’b’}
set2.union(set3)
set2.symmetric_difference(set3)          # {’d’, ’b’, ’a’, ’c’}
set2.issubset(set3)                      # {’a’, ’d’}
iliad_chars.intersection(set(alphabet))
# characters of the iliad string that are letters:
# {’a’, ’s’, ’g’, ’p’, ’u’, ’h’, ’c’, ’l’, ’i’,
#  ’d’, ’o’, ’e’, ’b’, ’t’, ’f’, ’r’, ’n’}

{'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'l',
 'n',
 'o',
 'p',
 'r',
 's',
 't',
 'u'}

#### 1.6 Dictionaries
_Dictionaries are collections, where the values are indexed by keys instead of ordered positions, like in lists or tuples. Counting the words of a text is a very frequent operation in natural language processing, as we will see in the rest of this book. Dictionaries are the most appropriate data structures to carry this out, where we use the keys to store the words and the values to store the counts._

_We create a dictionary by assigning it a set of initial key-value pairs, possibly empty, where keys and values are separated by a colon, and then adding keys and values using the same syntax as with the lists. The statements:_

In [52]:
wordcount = {}                # We create an empty dictionary
wordcount = dict()            # Another way to create a dictionary
wordcount['a'] = 21           # The key ’a’ has value 21
wordcount['And'] = 10         # ’And’ has value 10
wordcount['the'] = 18
wordcount

{'a': 21, 'And': 10, 'the': 18}

_create the dictionary `wordcount` and add three keys: `a`, `And`, `the`, whose values are 21, 10, and 18._

_A dictionary entry is created when a value is assigned to it. Its existence can be tested using the `in` Boolean function:_

In [53]:
'And' in wordcount            # True

True

In [54]:
'is' in wordcount             # False

False

_To access a key in a dictionary without risking an error, we can use the `get()` function that has a default value if the key is undefined:_
* `get(’And’)` returns the value of the key or `None` if undefined;
* `get(’is’, val)` returns the value of the key or val if undefined.

In [55]:
wordcount.get('And')          # 10
wordcount.get('is', 0)        # 0
wordcount.get('is')           # None

##### 1.6.7 Built-in Dictionary Functions
_Dictionaries have a set of built-in functions. The most useful ones are:_
* `keys()` returns the keys of a dictionary;
* `values()` returns the values of a dictionary;
* `items()` returns the key-value pairs of a dictionary.

_A few examples:_

In [56]:
wordcount.keys()               # dict_keys([’the’, ’a’, ’And’])
wordcount.values()             # dict_values([18, 21, 10])
wordcount.items()              # dict_items([(’the’, 18), (’a’, 21),
                               # (’And’, 10)])

dict_items([('a', 21), ('And', 10), ('the', 18)])

##### 1.6.8 Counting the Letters of a Text

_We use the `for` `in` statement to scan the `iliad` text `set` in lowercase letters; we increment the frequency of the current letter if it is in the dictionary or we set it to 1, if we have not seen it before. The complete program is:_

In [57]:
letter_count = {}
for letter in iliad.lower():
    if letter in alphabet:
        if letter in letter_count:
            letter_count[letter] += 1
        else:
            letter_count[letter] = 1
letter_count

{'s': 10,
 'i': 3,
 'n': 6,
 'g': 4,
 'o': 8,
 'd': 2,
 'e': 9,
 't': 6,
 'h': 6,
 'a': 6,
 'r': 2,
 'f': 2,
 'c': 3,
 'l': 6,
 'p': 2,
 'u': 4,
 'b': 1}

_To print the result in alphabetical order, we extract the keys; we sort them; and we print the key-value pairs. We do all this with this loop:_

In [58]:
for letter in sorted(letter_count.keys()):
    print(letter, letter_count[letter])

a 6
b 1
c 3
d 2
e 9
f 2
g 4
h 6
i 3
l 6
n 6
o 8
p 2
r 2
s 10
t 6
u 4


_By default, `sorted()` sorts the elements alphabetically. If we want to sort the letters by frequency, we can use the key argument of `sorted()`. key specifies a function whose result is used to compare the elements. In our case, we want to compare the frequencies, that is the values of the dictionary. We saw that we extract these values with the `get` method, here `letter_count.get`, and we hence assign it to key._

_Using `get`, the letters will be sorted from the least frequent to the most frequent. If we want to reverse this order, we use the third argument, `reverse`, a Boolean value, that we set to `True`._

In [59]:
for letter in sorted(letter_count.keys(), 
                     key=letter_count.get, reverse=True):
    print(letter, letter_count[letter])

s 10
e 9
o 8
n 6
t 6
h 6
a 6
l 6
g 4
u 4
i 3
c 3
d 2
r 2
f 2
p 2
b 1


#### 1.7 Control Structures
_In Python, the control flow statements include conditionals, loops, exceptions, and functions. These statements consist of two parts, the header and the suite. The header starts with a keyword like `if`, `for`, or `while` and ends with a colon. The suite consists of the statement sequence controlled by the header; we have seen that the statement in the suite must be indented with four characters._

##### 1.7.1 Conditionals
_Python expresses conditions with the `if`, `elif`, and `else` statements as in:_

In [60]:
digits = '0123456789'
punctuation = '.,;:?!'
char = '.'

if char in alphabet:
    print('Letter')
elif char in digits:
    print('Number')
elif char in punctuation:
    print('Punctuation')
else:
    print('Other')

Punctuation


##### 1.7.2 The `for` Loop
_A `for` in loop in Python iterates over the elements of a sequence such as a string or a list. This differs from languages like Perl, C or Java, where the typical `for` iteration is over numbers. If we need to create such loops, Python has the `range(start, stop, step)` function that returns a sequence of numbers. Only one argument is required: `stop`. The variables `start` and `step` will default to 0 and 1._

_The next program generates the integers from 0 to 99 and computes their sum:_

In [61]:
sum = 0
for i in range(100):
    sum += i
print(sum)
# Sum of integers from 0 to 99: 4950
# Using the built-in sum() function,
# sum(range(100)) would produce the same result.

4950


_We have seen how to iterate over a list and over indices using `range()`. Should we want to iterate over both, we can use the `enumerate()` function. `enumerate()` takes a sequence as argument and returns a sequence of `(index, element)` pairs, where element is an element of the sequence and index, its index.
We can use `enumerate()` to get the letters of the alphabet and their index with the program:_

In [62]:
for idx, letter in enumerate(alphabet):
    print(idx, letter)

0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j
10 k
11 l
12 m
13 n
14 o
15 p
16 q
17 r
18 s
19 t
20 u
21 v
22 w
23 x
24 y
25 z


##### 1.7.3 The `while` Loop
_The `while` loop is an alternative to `for`, although less frequent in Python programs. This loop executes a block of statements as long as a condition is true. We can reformulate the counting `for` loop in Sect 1.7.2 using `while`:_

In [63]:
sum, i = 0, 0
while i < 100:
    sum += i
    i += 1
print(sum)

4950


_Another possible structure is to use an infinite loop and a `break` statement to exit the loop:_

In [64]:
while True:
    sum += i
    i += 1
    if i >= 100:
        break
print(sum)

5050


_Note that it is not possible to assign a variable in the condition of a `while` statement._

##### 1.7.4 Exceptions
_Python has a mechanism to handle errors so that they do not stop a program. It uses the `try` and `except` keywords. We saw in Sect. 1.5 that the conversion of the alphabet and `12.0` strings into integers prints an error and exits the program. We can handle it safely with the `try/except` construct:_

In [65]:
try:
    int(alphabet)
    int('12.0')
except:
    pass
print('Cleared the exception')

Cleared the exception


_where `pass` is an empty statement serving as a placeholder for the `except` block._

_It is also possible, and better, to tell `except` to catch specific exceptions as in:_

In [66]:
try:
    int(alphabet)
    int('12.0')
except ValueError:
    print('Caught a value error')
except TypeError:
    print('Caught a type error!')
# Caught a value error!

Caught a value error


#### 1.8 Functions
_We define a function in Python with the `def` keyword and we use return to return the results. In Sect. 1.6.6, we wrote a small program to count the letters of a text. Let us create a function from it that accepts any text instead of `iliad`. We also add a Boolean, `lc`, to set the text in lowercase:_

In [67]:
def count_letters(text, lc=True):
    letter_count = {}
    if lc:
        text = text.lower()
    for letter in text:
        if letter.lower() in alphabet:
            if letter in letter_count:
                letter_count[letter] += 1
            else:
                letter_count[letter] = 1
    return letter_count

_We call the function with the two parameters:_

In [68]:
count_letters(iliad, True)

{'s': 10,
 'i': 3,
 'n': 6,
 'g': 4,
 'o': 8,
 'd': 2,
 'e': 9,
 't': 6,
 'h': 6,
 'a': 6,
 'r': 2,
 'f': 2,
 'c': 3,
 'l': 6,
 'p': 2,
 'u': 4,
 'b': 1}

#### 1.9 Comprehensions and Generators

##### 1.9.1 Comprehensions
_Instead of loops, the comprehensions are an alternative, concise syntactic notation to to create lists, sets, or dictionaries._

_Given an input word, we can generate all the one-character deletions in two steps: First, we split the word into two parts; then we delete the first letter of the second part. We can write this operation in two comprehensions, whose syntax is close to the set comprehension in set theory. First, we generate the splits where we iterate over the sequence of character indices and we create pairs consisting of a prefix and a rest._

_If the input word is acress, the resulting list in splits is:_

In [69]:
word = 'acress'
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
splits

[('', 'acress'),
 ('a', 'cress'),
 ('ac', 'ress'),
 ('acr', 'ess'),
 ('acre', 'ss'),
 ('acres', 's'),
 ('acress', '')]

_which is equivalent to_

In [70]:
splits = []
for i in range(len(word) + 1):
    splits.append((word[:i], word[i:]))
splits

[('', 'acress'),
 ('a', 'cress'),
 ('ac', 'ress'),
 ('acr', 'ess'),
 ('acre', 'ss'),
 ('acres', 's'),
 ('acress', '')]

_Then, we apply the deletions, where we concatenate the prefix and the rest deprived from its first character. We check that the rest is not an empty list:_

In [71]:
deletes = [a + b[1:] for a, b in splits if b]
deletes

['cress', 'aress', 'acess', 'acrss', 'acres', 'acres']

_which is equivalent to_

In [72]:
deletes = []
for a, b in splits:
    if b:
        deletes.append(a + b[1:])
deletes

['cress', 'aress', 'acess', 'acrss', 'acres', 'acres']

##### 1.9.2 Generators
_List comprehensions are stored in memory. If the list is large, it can exceed the computer capacity. Generators generate the elements on demand instead and can handle much longer sequences._

_Generators have a syntax that is identical to the list comprehensions except that we replace the square brackets with parentheses:_

In [73]:
splits_generator = ((word[:i], word[i:]) for i in range(len(word) + 1))
for i in splits_generator: print(i)

('', 'acress')
('a', 'cress')
('ac', 'ress')
('acr', 'ess')
('acre', 'ss')
('acres', 's')
('acress', '')


_We can iterate over this generator exactly as with a list. The statement:_

In [74]:
for i in splits_generator: print(i)

_However, this iteration can only be done once. We need to create the generator again to retraverse the sequence._

_Finally, we can also use functions to create generators. We replace the `return` keyword with `yield` to do this, as in the function:_

In [75]:
def splits_generator_function():
    for i in range(len(word) + 1):
        yield (word[:i], word[i:])

_that returns a generator identical to the previous one:_

In [76]:
splits_generator = splits_generator_function()

##### 1.9.3 Iterators
_We just saw that we can iterate only once over a generator. Objects with this property in Python are called iterators. Iterators are very efficient devices and, at the same time, probably less intuitive than lists for beginners._

_Let us give some examples with a useful iterator: `zip()`. Let us first create three strings with the Latin, Greek, and Russian Cyrillic alphabets:_

In [77]:
latin_alphabet = 'abcdefghijklmnopqrstuvwxyz'
len(latin_alphabet)                                           # 26
greek_alphabet = 'αβγδεζηθικλμνξοπρστυφχψω'
len(greek_alphabet)                                           # 24
cyrillic_alphabet = 'абвгдеёжзийклмнопрстуфхцчшщъыьэюя'
len(cyrillic_alphabet)                                        # 33

33

_`zip()` weaves strings, lists, or tuples and creates an iterator of tuples, where each tuple contains the elements with the same index: `latin_alphabet[0]` and `greek_alphabet[0]`, `latin_alphabet[1]` and `greek_alphabet[1]`, and so on. If the strings are of different sizes, `zip()` will stop at the shortest._

_The following code applies `zip()` to the three first letters of our alphabets:_

In [78]:
la_gr = zip(latin_alphabet[:3], greek_alphabet[:3])

In [79]:
la_gr_cy = zip(latin_alphabet[:3], greek_alphabet[:3],
                  cyrillic_alphabet[0:3])

_and creates two iterators with the tuples:_

In [80]:
la_gr          # (’a’, ’α’), (’b’, ’β’), (’c’, ’γ’)

<zip at 0x7fd63239fec0>

In [81]:
la_gr_cy       # (’a’, ’α’, ’а’), (’b’, ’β’, ’б’), (’c’, ’γ’, ’в’)

<zip at 0x7fd6323a21c0>

_Once created, we access the elements of an iterator with `__next()__` as in:_

In [82]:
la_gr.__next__()   # (’a’, ’α’)
la_gr.__next__()   # (’b’, ’β’)
la_gr.__next__()   # (’c’, ’γ’)

('c', 'γ')

_When we reach the end and there are no more elements, Python raises an exception_

In [83]:
la_gr.__next__()

StopIteration: 

_If we want to use this iterator again, we have to recreate it._

_Another way to traverse this sequence multiple times is to convert the iterator to a list as in:_

In [84]:
la_gr_cy_list = list(la_gr_cy)
la_gr_cy_list

[('a', 'α', 'а'), ('b', 'β', 'б'), ('c', 'γ', 'в')]

In [85]:
la_gr_cy_list = list(la_gr_cy)
la_gr_cy_list

[]

_To restore the original lists of alphabet, we can use the `zip(*)` inverse function:_

In [86]:
zip(*la_gr_cy_list)
# (’a’, ’b’, ’c’), (’α’, ’β’, ’γ’), (’а’, ’б’, ’в’)

<zip at 0x7fd630767980>

_Finally, we can convert lists to iterators using `iter()`._

#### 1.10 Modules
_Python comes with a very large set of libraries called modules like, for example, the `math` module that contains a set of mathematical functions. We load a module with the `import` keyword and we use its functions with the module name as a prefix followed by a dot:_

In [87]:
import math
math.sqrt(2)                  # 1.4142135623730951
math.sin(math.pi/2)           # 1.0
math.log(8, 2)

3.0

_We can create an alias name to the modules with the as keyword:_

In [88]:
import statistics as stats
stats.mean([1, 2, 3, 4, 5])   # 3.0
stats.stdev([1, 2, 3, 4, 5])  # 1.5811388300841898

1.5811388300841898

_Modules are just files, whose names are the module names with the `.py` suffix. To import a file, Python searches first the standard library, the files in the current folder, and then the files in `PYTHONPATH`._

_When Python imports a module, it executes its statements just as when we run:_

```
python module.py
```

_If we want to have a different execution when we run the program from the command line and when we import it, we need to include this condition:_
```
if __name__ == '__main__':
       print("Running the program")
       # Other statements
   else:
       print("Importing the program")
       # Other statements
```

##### 1.11 Installing Modules
_Python comes with a standard library of modules like `math`. Although comprehensive, we will use external libraries in the next chapters that are not part of the standard release, e.g. `regex`._

_We can use `pip`, the Python package manager to install the modules we need. `pip` will retrieve them from the Python package index (PyPI) and fetch them for us._


_To install `regex`, we just run the command:_
```
pip install regex
```
_or_
```
python -m pip install regex
```

_and if we want to upgrade an already installed module, we run:_
```
python -m pip install --upgrade regex
```

_Another option is to use a Python distribution with pre-installed packages like Anaconda (https://www.continuum.io/downloads). Nonetheless, even if Anaconda has many packages, it does not include `regex` and we will have to install it._

##### 1.12 Basic File Input/Output
_Python has a set of built-in input/output functions to read and write files: `open()`, `read()`, `write()`, and `close()`._

_The next lines open and read the `iliad.txt` file, count the characters, and write the results in the `iliad_stats.txt` file:_
```
f_iliad = open('iliad.txt', 'r')         # open a file
iliad_txt = f_iliad.read()               # read all the file
f_iliad.close()                          # close the file
iliad_stats = count_letters(iliad_txt)   # count the letters
with open('iliad_stats.txt', 'w') as f:
    f.write(str(iliad_stats))
    # we automatically close the file
```

_where `open()` opens a file in the read-only mode, `r`, and returns a file object; `read()` reads the entire content of the file and returns a string; `close()` closes the file object; `count_letter()` counts the letters; and finally the `with` statement is a shorthand to handle exceptions and close the file automatically after the block: `open()` creates a new file using the write mode, `w`, and `write()` writes the results as a string._

_In addition to these base functions, Python has modules to read and write a large variety of file formats._

#### 1.13 Memo Functions and Decorators

##### 1.13.1 Memo Functions
_Memo functions are functions that remember a result instead of computing it. This process is also called `memoization`. The Fibonacci series is a case, where memo functions provide a dramatic execution speed up._

_The Fibonacci sequence is defined by the relation:_

$$ F(n) = F(n-1) + F(n-2) $$

with $ F(1) = F(2) = 1 $.

_A naïve implementation in Python is straightforward:_

In [89]:
def fibonacci(n):
    if n == 1: return 1
    elif n == 2: return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)

_however, this function has an expensive double recursion that we can drastically improve by storing the results in a dictionary. This store, `f_numbers`, will save an exponential number of recalculations:_

In [90]:
f_numbers = {}

def fibonacci2(n):
    if n == 1: return 1
    elif n == 2: return 1
    elif n in f_numbers:
        return f_numbers[n]
    else:
        f_numbers[n] = fibonacci2(n - 1) + fibonacci2(n - 2)
        return f_numbers[n]

##### 1.13.2 Decorators
_Python decorators are syntactic notations to simplify the writing of memo functions (they can be used for other purposes too)._

_Decorators need a generic memo function to cache the results already computed._

_Let us define it:_

In [91]:
def memo_function(f):
    cache = {}    
    def memo(x):
        if x in cache:
            return cache[x]
        else:
            cache[x] = f(x)
            return cache[x]
    return memo

_Using this memo function, we can redefine `fibonacci()` with the statement:_

In [92]:
fibonacci = memo_function(fibonacci)

_that results in `memo()` being assigned to the `fibonacci()` function. When we call `fibonacci()`, we in fact call `memo()` that will either lookup the cache or call the original `fibonacci()` function._


_One detail may be puzzling: How does the new function know of the `cache()` variable and its initialization as well as the value of the `f` argument, the original `fibonacci()` function? This is because Python implements a closure mechanism that gives the inner functions access to the local variables of their enclosing function._

_Now the decorators: Python provides a short notation for memo functions; instead of writing:_
```
fibonacci = memo_function(fibonacci)
```
_we just decorate `fibonacci()` with the `@memo_function` line before it:_
```
@memo_function
def fibonacci(n):
...
```

#### 1.14 Object-Oriented Programming
_Although not obvious at first sight, Python is an object-oriented language, where all the language entities are objects inheriting from a class: The `str` class for the strings, for instance. Each class has a set of methods that we call with the `object.method()` notation._

##### 1.14.1 Classes and Objects
_We encapsulate a function by inserting it as a block inside the class. Among the methods, one of them, the constructor, is called at the creation of an object. It has the `__init()__` name. This notation in Python is, unfortunately, not as intuitive as the rest of the language, and we need to add a `self` extra parameter to the methods as well as to the instance variables. This `self` keyword denotes the object itself. We use `__init()__` to assign an initial value to the content, length, and letter_count variables._

_Finally, we have the class:_

In [93]:
class Text:
    """Text class to hold and process text"""
    
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    
    def __init__(self, text=None):
        """The constructor called when an object is created"""
        self.content = text
        self.length = len(text)
        self.letter_counts = {}
    
    def count_letters(self, lc=True):
        """Function to count the letters of a text"""
        letter_counts = {}
        if lc:
            text = self.content.lower()
        else:
            text = self.content
        for letter in text:
            if letter.lower() in self.alphabet:
                if letter in letter_counts:
                    letter_counts[letter] += 1
                else:
                    letter_counts[letter] = 1
        self.letter_counts = letter_counts
        return letter_counts

_We create new objects using the `Text(init_value)` syntax:_

In [94]:
txt = Text("""Tell me, O Muse, of that many-sided hero who
traveled far and wide after he had sacked the famous town
of Troy.""")

_A class has its own type:_

In [95]:
type(txt)          # <class ’__main__.Text’>

__main__.Text

_We access the instance variables using this notation:_

In [96]:
txt.length        # 111

111

_We create and assign new instance variables the same way:_
```
txt.my_var = 'a'    # a new instance variable with value ’a’
txt.content = open('iliad.txt', 'r').read()
                    # txt.content is now the content of the file
```

_and we call methods with the same notation:_

In [97]:
txt.count_letters() # return the letter counts of txt.text

{'t': 8,
 'e': 12,
 'l': 3,
 'm': 4,
 'o': 8,
 'u': 2,
 's': 4,
 'f': 5,
 'h': 6,
 'a': 9,
 'n': 3,
 'y': 2,
 'i': 2,
 'd': 7,
 'r': 5,
 'w': 3,
 'v': 1,
 'c': 1,
 'k': 1}

_Finally, we added short descriptions of the class and its methods in the form of docstrings: Strings being the first statement of the class, method, or function. Docstrings are very useful to document a program. We access them using the `.__doc__` variable as in:_

In [98]:
Text.__doc__    # ’Text class to hold and process text’
Text.count_letters.__doc__
                # ’Function to count the letters of a text’

'Function to count the letters of a text'

_or with the `help()` function._

##### 1.14.2 Subclassing
_Using classes, we can build a hierarchy, where the subclasses will inherit methods from their superclass parents._

_Let us create a Word class that we define as a subclass of Text._

In [99]:
class Word(Text):
    def __init__(self, word=None):
        super().__init__(word)
        self.part_of_speech = None
    def annotate(self, part_of_speech):
        self.part_of_speech = part_of_speech

_where the `super().__init__(word)` function will call the constructor of `Text`._ 

_We can then create a new word:_

In [100]:
word = Word('Muse')

_that inherits the Text instance variables:_

In [101]:
word.length     # 4

4

_and methods_

In [102]:
word.count_letters(lc=False)
# {’M’: 1, ’u’: 1, ’s’: 1, ’e’: 1}

{'M': 1, 'u': 1, 's': 1, 'e': 1}

_We can also call the Word specific method as_

In [103]:
word.annotate('Noun')

_and have:_

In [104]:
word.part_of_speech # Noun

'Noun'

#### 1.15 Functional Programming
_Python provides some functional programming mechanisms with map and reduce functions._

##### 1.15.1 `map()`
_`map()` enables us to apply a function to all the elements of an iterable, a list for instance. The first argument of `map()` is the function to apply and the second one, the iterable. `map()` returns an iterator._

_Let us use `map()` to compute the length of a sequence of texts, in our case, the first sentences of the Iliad and the Odyssey. We apply `len()` to the list of strings and we convert the resulting iterator to a list to print it._

In [105]:
odyssey = """Tell me, O Muse, of that many-sided hero who
   traveled far and wide after he had sacked the famous town
   of Troy."""
text_lengths = map(len, [iliad, odyssey])
list(text_lengths)     # [100, 111]

[100, 117]

##### 1.15.2 Lambda Expressions
_Let us now suppose that we have a list of files instead of strings, here `iliad.txt` and `odyssey.txt`. To deal with this list, we can replace `len()` in `map()` with a function that reads a file and computes its length:_

In [106]:
def file_length(file):
    return len(open(file).read())

_For such a short function, a lambda expression can do the job more compactly._ 

_A lambda is an anonymous function, denoted with the `lambda` keyword, followed by the function parameters, a colon, and the returned expression. To compute the length of a file, we write the lambda:_
```
lambda file: len(open(file).read())
```
_and we apply it to our list of files:_
```
files = ['iliad.txt', 'odyssey.txt']
text_lengths = map(lambda x: len(open(x).read()), files)
list(text_lengths)                  # [809768, 611742]
```
_We can return multiple values using tuples. If we want to both keep the text and its length in the form of a pair: `(text,length)`, we just write:_
```
text_lengths = (
map(lambda x: (open(x).read(), len(open(x).read())),
    files))
text_lengths = list(text_lengths)
[text_lengths[0][1], text_lengths[1][1]]  # [809768, 611742]
```

_In the previous piece of code, we had to read the text twice: In the first element of the pair and in the second one. We can use two `map()` calls instead: One to read the files and a second to compute the lengths. This results in:_
```
text_lengths = (
    map(lambda x: (x, len(x)),
        map(lambda x: open(x).read(), files)))
    text_lengths = list(text_lengths)
    [text_lengths[0][1], text_lengths[1][1]]  # [809768, 611742]
```

##### 1.15.3 `reduce()`
_`reduce()` is a complement to `map()` that applies an operation to pairs of elements of a sequence. We can use `reduce()` and the addition to compute the total number of characters of our set of files. We formulate it as a lambda expression:_
```
lambda x, y: x[1] + y[1]
```
_to sum the consecutive elements, where the length of each file is the second element in the pair; the first one being the text._

_`reduce()` is part of the functools module and we have to import it. The resulting code is:_
```
import functools
char_count = functools.reduce(
lambda x, y: x[1] + y[1],
map(lambda x: (x, len(x)),
    map(lambda x: open(x).read(), files)))
char_count     # 1421510
```

##### 1.15.4 `filter()`
_`filter()` is a third function that we can use to keep the elements of an iterable that satisfy a condition. `filter()` has two arguments: A function, possibly a lambda, and an iterable. It returns the elements of the iterable for which the function is true._

_As an example of the `filter()` function, let us write a piece of code to extract and count the lowercase vowels of a text._

_We need first a lambda that returns `true` if a character `x` is a vowel:_
```
lambda x : x in 'aeiou'
```
_that we apply to the iliad string to obtain all its vowels:_
```
''.join(filter(lambda x : x in 'aeiou', iliad))
    # ioeeaeoieooeeuaououeiuoeaea
```
_We can apply the same code to a whole file:_
```
''.join(filter(lambda x: x in 'aeiou',
              open('iliad.txt').read()))
```
_and easily extend the extraction to a list of files using `map()`:_
```
map(lambda y:
    ''.join(filter(lambda x: x in 'aeiou',
                    open(y).read())),
   files)
```

_We finally count the vowels in the two files using `len()` that we apply with a second
`map()`:_
```
list(map(len,
         map(lambda y:
             ''.join(filter(lambda x: x in 'aeiou',
                            open(y).read())),
             files)))  
# [231874, 176190]
```

## Credits
This assignment was prepared by P. Nugues et al., HT2019 (link [here](https://web.archive.org/web/20200111075034/http://cs.lth.se/edan95/lab-programming-assignments/lab-session-1/)).

Exercises and code are from Ch 1. A Tour of Python in _Language Processing with Python_ (P. Nugues, 2019).