# Lecture 2

## A few more words on lists (and strings)

We have briefly encountered lists in the previous lecture. These are roughly sequences of data. Lists are defined as sequences of elements separated by commas and enclosed by square brackets. Each of the element can be accessed by calling it's index value.

Lists are declared by just equating a variable to `[]` or using `list()`.

In [17]:
a = []

In [18]:
a == list()

True

One can directly populate a list by writing down its elements

In [20]:
fruits = ["apples", "oranges", "bananas", "kiwis", "pineapples"]

Differently from matlab, in python lists are indexed starting from $0$. Thus the list `fruits`, has "apples" at index 0, "oranges" at index 1 and bananas at index 2.

To direcly access the elements in a list you just append `[index]` to the list name.

In [21]:
fruits[0]

'apples'

In [22]:
fruits[3]

'kiwis'

We can even read the list starting from the bottom

In [23]:
fruits[-1]

'pineapples'

In [24]:
fruits[-2] + " " + fruits[2]

'kiwis bananas'

Of course lists can be nested

In [27]:
l1 = [1,2,3]
l2 = [7,8,9]

l3 = [l1, l2]

print("l1 =", l1, "\nl2 =", l2, "\nl3 =", l3)

l1 = [1, 2, 3] 
l2 = [7, 8, 9] 
l3 = [[1, 2, 3], [7, 8, 9]]


In [28]:
l3[0]

[1, 2, 3]

In [29]:
l1[2]

3

In [30]:
l3[0][2] # = l1[2] = 3

3

You can use the indexing to modify specific elements

In [64]:
l2[0] = 15 

print(l2)
print(l3) # note that this changed l3 as well... be careful with lists

[15, 8, 9]
[[1, 2, 3], [15, 8, 9]]


### Slices


Indexing allows you to access a single element, Slicing brings this to the next level allowing you to access a sequence of data inside the list. In other words "slicing" the list.

Slicing is done by defining the index values of the first and the last element that you need from the parent list. It is written as `parentlist[ a : b ]` where `a, b` are the extremal index values. If `a` or `b` is not specified then the index value is considered to be the first value for `a` if `a` is not defined and the last value for `b` when `b` is not defined.

In [69]:
num = [0,1,2,3,4,5,6,7,8,9]

In [33]:
print(num[0:4])
print(num[4:])

[0, 1, 2, 3]
[4, 5, 6, 7, 8, 9]


You can also choose a step in the slicing process

In [36]:
print(num[5:8:2])
print(num[:9:3])

[5, 7]
[0, 3, 6]


To find the length of the list or the number of elements in a list, use the function `len()`. The functions `min()` and `max()` will return respectively the minimum and the maximum value in the list. The sum of two lists is a concatenated list.

In [37]:
l1 + l2

[1, 2, 3, 7, 8, 9]

In [38]:
len(num)

10

In [39]:
len(l3)

2

In [40]:
min(l1+l2)

1

In [41]:
max(num)

9

To check if a value is present in a list you can use `in` as follows

In [42]:
print(4 in num)
print(4 in l1+l2)
print("oranges" in fruits)
print("pears" in fruits)

True
False
True
False


You can generate lists of integer fastly using the function `range(start:stop:step)`

In [65]:
print("range(6): ", list(range(6)))
print("range(3, 9): ", list(range(3,9)))
print("range(2, 9, 5): ", list(range(2,9,5)))

range(6):  [0, 1, 2, 3, 4, 5]
range(3, 9):  [3, 4, 5, 6, 7, 8]
range(2, 9, 5):  [2, 7]


You can modify slices as for indexing and use [:] to copy a list

In [70]:
oldnum = num[:]
print(num[3:5])
num[3:5] = [11,12]
print(num)
print(oldnum)

[3, 4]
[0, 1, 2, 11, 12, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


**Exercise 1**: programmatically obtain a list of even numbers from 2 to 78

You can add elements to a list using `append()`. But there are many available functions.

In [71]:
num.append(45)
print(num)

[0, 1, 2, 11, 12, 5, 6, 7, 8, 9, 45]


**Exercise 2**: use the `help` function to find out what functions are available for lists

In [47]:
help(list)

Help on class list in module builtins:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __l

In [73]:
hello = "Hello"
list(hello)

['H', 'e', 'l', 'l', 'o']

You can use slicing, indexing, len, in and other list features on strings as well!

In [74]:
hello[2]

'l'

In [59]:
hello[3:6]

'lo'

But not all

In [76]:
hello[2] = 'r'

TypeError: 'str' object does not support item assignment

In [77]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

## Control flow

### `if-elif-else`

The `if` statement allows you to modify the flow of your program depending on some conditions.
The simplest form is

    if some_condition:
        do something
        do something
        ...
        do something

Note that the indentation here is **very** important.

In [6]:
x = 3

if x > 0:
    print("x is positive")

x is positive


In [8]:
if x <= 0:
    print("x is non positive")

It is possible to merge those two mutually-exclusive `if`s by using the `if-else` construct.

    if some_condition:
        do something
        ...
        do something
    else:
        do something else
        ...
        do something else


In [11]:
x = -7
y = 0

if x > 0:
    y = x * 3
    print("x is positive")
else:
    y = -x - 1
    print("x is not positive")
    
print("y is", y)

x is not positive
y is 6


If you have more branching tests, you can use the `if-elif-else` construct.


    if some_condition:
        do something
    elif some_other_condition:
        do something
    else:
        do something

In [12]:
x = 10
y = 12

if x > y:
    print("x > y")
elif x < y:
    print("x < y")
else:
    print("x = y")

x < y


Of course these statements can be nested

In [14]:
if x > y:
    print("x > y")
elif x < y:
    if x == 10:
        print("y > 10")
    elif y != 2:
        print("x < y and y is not 2")
    else:
        print("x < y")
else:
    print("x = y")

y > 10


## Loops

### for loops

One of the most useful things you can do with lists is to iterate through them, i.e. to go through each element one at a time. To do this in Python, we use the `for` statement

    for item in some_list:
        do something with item
        ...
        

In [80]:
days_of_the_week = ["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]

for day in days_of_the_week:
    print(day, end=", ")

Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, 

In [81]:
for i in range(20):
    print("The square of", i, "is", i*i)

The square of 0 is 0
The square of 1 is 1
The square of 2 is 4
The square of 3 is 9
The square of 4 is 16
The square of 5 is 25
The square of 6 is 36
The square of 7 is 49
The square of 8 is 64
The square of 9 is 81
The square of 10 is 100
The square of 11 is 121
The square of 12 is 144
The square of 13 is 169
The square of 14 is 196
The square of 15 is 225
The square of 16 is 256
The square of 17 is 289
The square of 18 is 324
The square of 19 is 361


In [91]:
for letter in "How are you?":
    if letter.islower():
        print(letter.upper(), end="")
    elif not letter.isalpha():
        print("*", end="")
    else:
        print(letter.lower(), end="")

hOW*ARE*YOU*

In [95]:
even_squared = []
for i in range(10):
    square = i*i
    
    if i%2 == 0:
        even_squared.append(square)

print(even_squared)

[0, 4, 16, 36, 64]


### List Comprehensions

Interestingly, you can mix loops and lists to create lists in a faster way that resembles the mathematical way of defining it:

In [96]:
[i*i for i in range(10) if i%2 == 0]

[0, 4, 16, 36, 64]

### The fibonacci sequence

The Fibonacci sequence is defined as follows
$$
\begin{cases}
F_0 = F_1 = 1 \\
F_n = F_{n-1} + F_{n-2}
\end{cases}
$$

Thus, the sequence goes like 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

A very common exercise in programming books is to compute the Fibonacci sequence up to some number n.

In [108]:
## First we set the value for n - the length of the sequence
n = 10

## The first two values of the sequence are predefined,
## so we are going to include them immediately. Here we define the
## fibonacci sequence as a list called fibonacci
fibonacci = [1,1]

## n is a length, must be bigger than 0. We deal with it immediately and let the
## user know
if n < 0:
    print("n must be a non-negative integer")
    
## if n < 2 we already know the answer and we print it immediately
elif n < 2:
    print(fibonacci[:n])

## if none of the above, we are going to compute the n-th fibonacci number
else:
    ## we compute all of them until we arrive to n
    for i in range(2,n): 
        ## at each step we compute a new value and we append it to the 
        ## sequence
        ith_fibonacci = fibonacci[i-1] + fibonacci[i-2]
        fibonacci.append(ith_fibonacci)
        
    ## finally we print the result
    print(fibonacci)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]


## Functions

Instead of going back to the cell and modify `n` or copying and pasting the content on a different cell, it would be nice to have a way to reuse the code.

We do this with the `def` statement in Python:

In [1]:
def add(x, y):
    print("x is {} and y is {}".format(x, y))
    return x + y  # Return values with a return statement

print(add(5, 6))

# Another way to call functions is with keyword arguments
print(add(y=6, x=5))  # Keyword arguments can arrive in any order.

x is 5 and y is 6
11
x is 5 and y is 6
11


In [112]:
def fibonacci(sequence_length):
    "Return the Fibonacci sequence of length `sequence_length`"
    fibonacci = [0,1]
    if sequence_length < 0:
        print("Fibonacci sequence only defined for non-negative length")
        return
    if 0 <= sequence_length < 3:
        return fibonacci[:sequence_length]
    for i in range(2,sequence_length): 
        fibonacci.append(fibonacci[i-1]+fibonacci[i-2])
    return fibonacci

The key `return` ends whatever is going on and returns the the value typed next to it. 

Note that the first line of the function is a single string. This is called a `docstring`, and is a special kind of comment that appears when calling `help`:

In [113]:
help(fibonacci)

Help on function fibonacci in module __main__:

fibonacci(sequence_length)
    Return the Fibonacci sequence of length *sequence_length*



If you define a docstring for all of your functions, it makes it easier for other people to use them, since they can get help on the arguments and return values of the function.

In [116]:
fibonacci(-3)

Fibonacci sequence only defined for non-negative length


In [121]:
fibonacci(18)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597]

In [118]:
fibonacci(0)

[]

**Exercise 3**: write a function `fact` that computes the factorial of a number $n$: 
$n! = n\cdot(n-1)\cdot(n-2)\cdots2\cdot1$.

Note that you will not need to write many fuctions yourselves. Python standard library is huge and it is possible to `import` in your code the needed modules and use the function contained there as follows:

In [2]:
import math
print("13! =", math.factorial(13))
print("sqrt(4) is", math.sqrt(4))

13! = 6227020800
sqrt(4) is 2.0


In [138]:
import random as rnd
print("This is a sequence of uniformly distributed random numbers:", 
      [rnd.random() for _ in range(5)])

This is a sequence of uniformly distributed random numbers: [0.5059046337280433, 0.9204464493990102, 0.4820081694856416, 0.8865577876262553, 0.7914589572807124]


In [137]:
from math import pi, sin

print("Sin of", pi/2, "is", sin(pi/2))

Sin of 1.5707963267948966 is 1.0


In [3]:
dir(math)

['__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'pi',
 'pow',
 'radians',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'trunc']

## A more complicated example

Let's see what a small script that opens a text, generates a markov chain uot of its content and uses it to generate "random" sentences can look like.

We are going to use the most naive way of doing it, so that the code is simple.

In [12]:
from random import choice
from collections import defaultdict

EOS = ['.', '?', '!']

def build_dict(words):
    """
    Build a dictionary from the words list.
    It associates to an ordered tuple of words, 
    all the words that could possibly follow.
    
    # key: tuple; value: list
    (word1, word2) => [w1, w2, ...]
    """
    d = defaultdict(list)
    for i, word in enumerate(words):
        
        # this is how you deal with possible errors in python
        try:
            first, second, third = words[i], words[i+1], words[i+2]
        except IndexError:
            # break means 'break out from this loop'
            break
        
        key = (first, second)
        
        # this is done by defaultdict behind the scene
        # if key not in d:
        #     d[key] = []
        
        d[key].append(third)
 
    return d
 

def generate_sentence(d, eos=EOS):
    """Generate a random sentence from a dictionary `d` of the form
    
    (word1, word2) => [w1, w2, ...]
    
    For every two words, their successor is picked up at random
    from the associated list.
    
    `eos` is an optional list of symbols that identify the end of a sentence.
    It defaults to ['.', '?', '!'].
    """
    
    # we take a list of all the tuples among the dictionary keys
    # whose first element starts with a capital letter 
    # and then choose one element at random
    starters = [key for key in d.keys() if key[0][0].isupper()]
    key = choice(starters)
     
    # let's start building the sentence
    # an equivalent way is: sentence = list(key)
    first, second = key
    sentence = [first, second]
    
    # This is a while loop, it keeps looping until you break out or
    # the condition becomes false.
    while True:
        try:
            # we pick up a random word from the possible successors
            # if any.
            third = choice(d[key])
        except KeyError:
            # we break the loop in case of errors
            break
        
        # add the word to the sentence
        sentence.append(third)
        
        # if we find a stop symbol we break the loop
        if third[-1] in eos:
            break
            
        # else we generate a new key using the new word
        # and refill the variables. This is actually equivalent
        # to first, second = key = (second, third)
        key = (second, third)
        first, second = key
 
    # We return a string obtained by joining all the elements in `sentence`
    # separated by a space
    return ' '.join(sentence)
 
def generate_dictionary_from_file(filename, and_text=False):
    """Return the word dictionray generated with the content 
    of `filename`. If the optional parameter `and_text` is True
    it returns also the fulltext in a tuple of the form 
    (dictionary, fulltext)"""
    
    # this is an interesting python construct called context,
    # **very roughly** it is like
    #
    # f = open(filename, "rt")
    # text = f.read()
    # f.close()
    #
    # where the f.close() is executed even if some errors has happened
    
    with open(filename, "rt") as f:
        # text now contains the whole content of the file
        text = f.read()
    
    # we get an array after splitting on spaces, this is very not optimal,
    # and assumes that we did clean the file and removed newlines before using it
    words = text.split()
    words_dict = build_dict(words)
    
    if text:
        return (words_dict, text)
    else:
        return words_dict


In [18]:
from pprint import pprint

sentence = "This planet has - or rather had - a problem, \
which was this: most of the people living on it were unhappy \
for pretty much all of the time. Many solutions were suggested \
for this problem, but most of these were largely concerned with \
the movement of small green pieces of paper, which was odd because \
on the whole it wasn't the small green pieces of paper that were unhappy."

splitted_sentence = sentence.split()
pprint(splitted_sentence[0:7])

d = build_dict(splitted_sentence)
pprint(d)

['This', 'planet', 'has', '-', 'or', 'rather', 'had']
defaultdict(<class 'list'>,
            {('-', 'a'): ['problem,'],
             ('-', 'or'): ['rather'],
             ('Many', 'solutions'): ['were'],
             ('This', 'planet'): ['has'],
             ('a', 'problem,'): ['which'],
             ('all', 'of'): ['the'],
             ('because', 'on'): ['the'],
             ('but', 'most'): ['of'],
             ('concerned', 'with'): ['the'],
             ('for', 'pretty'): ['much'],
             ('for', 'this'): ['problem,'],
             ('green', 'pieces'): ['of', 'of'],
             ('had', '-'): ['a'],
             ('has', '-'): ['or'],
             ('it', "wasn't"): ['the'],
             ('it', 'were'): ['unhappy'],
             ('largely', 'concerned'): ['with'],
             ('living', 'on'): ['it'],
             ('most', 'of'): ['the', 'these'],
             ('movement', 'of'): ['small'],
             ('much', 'all'): ['of'],
             ('odd', 'because'): ['on'],
      

In [31]:
print(generate_sentence(d))

This planet has - or rather had - a problem, which was odd because on the whole it wasn't the small green pieces of paper that were unhappy.


In [33]:
print(generate_sentence(d))

This planet has - or rather had - a problem, which was this: most of these were largely concerned with the movement of small green pieces of paper that were unhappy.


In [28]:
print(generate_sentence(d))

Many solutions were suggested for this problem, but most of the time.


In [11]:
# alltext.txt contains the project gnutemberg versions of 
# The Enchiridion - Epictetus
# The Brothers Grimm Fairy Tales
# Huckleberry Finn
# Adventures of Sherlock Holmes

words_dict, text = generate_dictionary_from_file("alltext.txt", and_text=True)

for _ in range(5):
    sent = generate_sentence(words_dict)
    print(sent, end="\n\n")
    if sent in text:
        print('# existing sentence :(')

Holder, that you have no business to talk us over." "Well, this is too late--forever too late!

This one kept pointing the pistol clinked upon the wall, and says: "About what, Sid?" "Why, about the case, and there, having charming manners, he was a big room in his favour from the extraordinary story of the little man said, 'Let him come in.' Soon afterwards there was a picture of a country walk on Thursday and Friday evening, which is opposite, and drove to the forest.

PINK There was a little of my own little office, and no decent body would have thought was in the dead man says is absolutely unique, and its relation to crime.

Bow Street cells, does he?" He slipped an emerald snake ring from his bed to lie down in the daylight, for he said he done it; jis' stood dah, kiner smilin' up.

They cussed Jim considerble, though, and Jim used to say that.



**Exercise 4** (advanced): modify the code above to generate a dictionary using n-grams instead of 3 words, i.e. associating (n-1) characters to the possible nth character that follows them.

### Conclusion

There is much more to cover but as a very fast introduction, you have all the tools that are needed to start writing your first programs and make something useful out of them. Remember to read carefully the error messages, document your code and use the help, either inline or from [docs.python.org](https://docs.python.org/3.4/).

Finally, a small easter egg: Tim Peters, one of the earliest and most prolific Python contributors, wrote the "Zen of Python", which can be accessed via the `import this` command

In [4]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
