# Exercises for the weekend - solutions

**Exercise 1**: Write a function, named `sum_all`, that takes in input a sequence `a` (a list or a tuple) and outputs the sum of all numbers in `a` and in its subsequences. *Example*: with input `a = ( 4, 'python', [2, (1, 'str')], 9 )`, the function must return `16`.

Like for the deep-cloning problem, the function must recursively explore all subsequences. For this reason it may be worth starting from the solution of this problem and then adapting it.


```python
def deep_clone( x ):
    if type(x) == list:
        b = []
        for e in x:
            b.append( deep_clone( e ) )
        return b
    else:
        # x is a scalar, tuple, string
        return x
```

In this case, `b` will be a number, so the `append` will becomes a sum.

In [2]:
def sum_all( a ):
    if type(a) in (list, tuple):
        b = 0
        for x in a:
            b += sum_all( x )
        return b
    elif type(a) in (int, float):
        return a
    else:
        return 0
    
a = ( 4, 'python', [2, (1, 'str')], 9 )
print(sum_all(a))

16


**Exercise 2**: Modify the `all_strings_of_size` function so that it accepts as input, in addition to `n`, a list `s` of symbols and prints all the strings of length `n` that can be obtained with symbols in `s`.

Thiw is the original function

```python
def all_strings_of_size(n):
    def f(a, i):
        '''
        a[0],..., a[i-1] are already defined
        fills a[i], a[i+1],... in all possible ways
        '''
        if i == len(a):
            b = ''
            for x in a:
                b += x
            print(b)
        else: # i < len(a)
            a[i] = '0'
            # fills all position from i+1
            f(a, i+1)                
            a[i] = '1'
            # fills all position from i+1
            f(a, i+1)
            
    a = [None]*n
    f(a, 0)
```

We have to modify the else branch: in this version the symbols are two (`0` and `1`) and hardcoded in the funcion; in this new version these will be `s[0]`, `s[1]`...

In [4]:
def all_strings_of_size(n, s):
    def f(a, i):
        '''
        a[0],..., a[i-1] are already defined
        fills a[i], a[i+1],... in all possible ways
        '''
        if i == len(a):
            b = ''
            for x in a:
                b += x
            print(b)
        else: # i < len(a)
            for c in s:
                a[i] = c
                # fills all position from i+1
                f(a, i+1)                
            
    a = [None]*n
    f(a, 0)


all_strings_of_size(3, '+^')


+++
++^
+^+
+^^
^++
^+^
^^+
^^^


In [5]:
def all_strings_of_size(n, s):
    def f(a, i):
        if i == len(a):
            b = ''
            for x in a:
                b += x
            print(b)
        else: # i < len(a)
            for x in s:
                a[i] = x
                f(a, i+1)
            
    a = [None]*n
    f(a, 0)
    
all_strings_of_size(3, ['a', 'b', 'c'])

aaa
aab
aac
aba
abb
abc
aca
acb
acc
baa
bab
bac
bba
bbb
bbc
bca
bcb
bcc
caa
cab
cac
cba
cbb
cbc
cca
ccb
ccc


**Exercise 3**: Write a function, named `power_set`, that takes in input a list `s` that contains a set of characters `I` and prints all possible subsets of `I`.

We know how to list all the binary sequences of a given size (see the `all_strings_of_size` function). There is a one-to-one relation between all the binary sequences of size `a` and all the subsets of a set `s` of size `n`. Let `a` be one of this binary sequence, it describe a subset in `s` in the following way: `s[j]` if and only if `a[j] == 1`.

This solution modifies the function `all_strings_of_size` in the basic case of the recursive function `f`: here the list `b` is 'translated' in a subset of `s` following the above considerations.

In [6]:
def power_set( s ):
    '''
    Parameters
    ----------
    s: a list that defines a set

    Prints all the subset of s
    '''
    def f(a, i):
        if i == len(a):
            # translate a in a subset of s
            b = []
            for j in range(len(s)):
                if a[j] == 1:
                    b.append(s[j])
            print(b)
        else:
            a[i] = 0
            f(a, i+1)
            a[i] = 1
            f(a, i+1)
        
    a = [None]*len(s)
    f(a, 0)
    
power_set([0,1,2])

[]
[2]
[1]
[1, 2]
[0]
[0, 2]
[0, 1]
[0, 1, 2]


### Non-numeric sorting 

In [16]:
b = ['zero', 'one', 'two', 'three', 'four', 'five']
bubble_sort(b)
print(b)

['five', 'four', 'one', 'three', 'two', 'zero']


The stringhs are sorted according to the lexiocographic order, that is according to the semantics of the `>` operator.

Now consider a list of tuples

In [17]:
c = [ (1, 2), (0,1), (0, 2, 3) ]
bubble_sort(c)
print(c)

[(0, 1), (0, 2, 3), (1, 2)]


If `t0` and `t1` are two tuples, the value of `t0 > t1` is the value of `t0[i] > t1[i]` where `i` is the first position in which `t0[i] != t1[i]`. Similarly, the value of `t0 < t1` is the value of `t0[i] < t1[i]` where `i` is the first position in which `t0[i] != t1[i]`. If `t0` is prefix of `t1`, `t0 < t1`. Finally, `t0 == t1` if they are identical.

For example

In [1]:
print(  (1, 0) < (0, 10, 0)   )

print(   (0, 0) < (0, 10, 0)    )

print(   (0, 10) > (0, 10, 0)    )

print(   (0, 10) == (0, 10)    )

print(   (0, 'one') < (0, 'zero')    )

#print(   (0, 10) < (0, 'zero')    )   ERROR: requires comparing an int with a str

print(   (1, 10) < (0, 'zero')    )

a = [0, (0, 1)]
b = [0, (0, 2)]

print(a < b)

False
True
False
True
True
False
True


If we want to sort the strings from shortest to longest, in the `if` we have to compare their lengths

In [20]:
def bubble_sort( a ):
    n = len(a)
    c = 0
    
    is_sorted = False
    while not is_sorted:
        is_sorted = True
        for i in range(n-1-c):
            if len(a[i]) > len(a[i+1]):
                is_sorted = False
                a[i], a[i+1] = a[i+1], a[i]
        c += 1


b = ['zero', 'one', 'two', 'three', 'four', 'five']
bubble_sort(b)
print(b)

['one', 'two', 'zero', 'four', 'five', 'three']


If we want to sort numbers ignoring their sign...

In [21]:
def bubble_sort( a ):
    n = len(a)
    c = 0
    
    is_sorted = False
    while not is_sorted:
        is_sorted = True
        for i in range(n-1-c):
            if abs(a[i]) > abs(a[i+1]):
                is_sorted = False
                a[i], a[i+1] = a[i+1], a[i]
        c += 1

a = [0 , 9, -2, 8, 4, -5, 6, -7]
bubble_sort(a)
print(a)

[0, -2, 4, -5, 6, -7, 8, 9]


In general if we want to sort the list according to another criterion we have use a different function in the `if` condition. This function can be part of the input.

In [23]:
def bubble_sort( a, key ):
    n = len(a)
    c = 0
    
    is_sorted = False
    while not is_sorted:
        is_sorted = True
        for i in range(n-1-c):
            if key(a[i]) > key(a[i+1]):
                is_sorted = False
                a[i], a[i+1] = a[i+1], a[i]
        c += 1

a = [0 , 9, -2, 8, 4, -5, 6, -7]
b = ['zero', 'one', 'two', 'three', 'four', 'five']

bubble_sort(a, abs)
print(a)

bubble_sort(b, len)
print(b)

bubble_sort(a, float)
print(a)

bubble_sort(b, str) 
print(b)

[0, -2, 4, -5, 6, -7, 8, 9]
['one', 'two', 'zero', 'four', 'five', 'three']
[-7, -5, -2, 0, 4, 6, 8, 9]
['five', 'four', 'one', 'three', 'two', 'zero']


It is also important to guarantee the 'natural' ordering and that this is the default one

In [24]:
def identity(x):
    return x

def bubble_sort( a, key=identity ):
    n = len(a)
    c = 0
    
    is_sorted = False
    while not is_sorted:
        is_sorted = True
        for i in range(n-1-c):
            if key(a[i]) > key(a[i+1]):
                is_sorted = False
                a[i], a[i+1] = a[i+1], a[i]
        c += 1
        
        
a = [0 , 9, -2, 8, 4, -5, 6, -7]

bubble_sort(a)
print(a)

[-7, -5, -2, 0, 4, 6, 8, 9]


In this way, by default, the sort function uses the original input values.

The next version avoids having two separate functions, here function `identity` is local to `bubble_sort`.

In [4]:
def bubble_sort( a, key=None ):
    def identity(x):
        return x
    n = len(a)
    c = 0
    
    if key == None:
        key = identity
        
    is_sorted = False
    while not is_sorted:
        is_sorted = True
        for i in range(n-1-c):
            if key(a[i]) > key(a[i+1]):
                is_sorted = False
                a[i], a[i+1] = a[i+1], a[i]
        c += 1

a = [0 , 9, -2, 8, 4, -5, 6, -7]

bubble_sort(a)
print(a)

[-7, -5, -2, 0, 4, 6, 8, 9]


**Problem**: Given a list of numbers (`int` or `float`), write a program that sorts the numbers according to the number of zeros in them.

In [4]:
a = [10, 121, 0.0001, 100]

# sort a by the number of zeros (from smaller)

def f(x):
    '''
    Parameter: x an int of float
    Rerturns the number of zeros in x 
    '''
    c = 0
    for x in str(x):
        if x == '0':
            c += 1
    return c

bubble_sort(a, key=f)

print(a)


[121, 10, 100, 0.0001]


**Problem**: The list `a` contains numbers and strings. By using the `bubble_sort` function, write a program the move the number on the left of `a` and the strings on the right.

In [5]:
a = [ 4, 'zero', 'ten', 0, -1.7, 'five' ]

'''
output
a = [ 4, 0, -1.7, 'zero', 'ten', 'five' ]
'''

def f(x):
    if type(x) in (int, float):
        return 0
    else:
        return 1
    
bubble_sort(a, key=f)
print(a)

[4, 0, -1.7, 'zero', 'ten', 'five']


**Problem**: Like the previous one but the number must be sosrted from the smaller to the bigger and the strings in lexigographic order.

*Solution*. The `key` function returns a tuple: its first element is the same of the previous version; the second is used in the case of comparing numbers with numbers or strings with strings. This second item is the item itself.

In [6]:
a = [ 4, 'zero', 'ten', 0, -1.7, 'five' ]

'''
output
a = [ -1.7, 0, 4, 'five', 'ten', 'zero' ]
'''

def f(x):
    if type(x) in (int, float):
        return 0, x
    else:
        return 1, x
    
bubble_sort(a, key=f)
print(a)

[-1.7, 0, 4, 'five', 'ten', 'zero']


### `lambda` functions

They are inlne anonymous function that can be used on the fly.

In [3]:
f = lambda x: x+1

print(f(2))

print( (lambda x: x)(10)    )

3
10


The general syntax is

```python
lambda parameters : expression_of_parameters  
```

The identity function can be easily described by a `lambda` function

In [5]:
a = [ (3, 1), (1, 3), (0, -9), (10, 0) ]

# sorts according the second item of the tuple

bubble_sort(a, key=lambda t: t[1])
print(a)

[(0, -9), (10, 0), (3, 1), (1, 3)]


# Computational cost

Algorithms or programs must be correct and _efficient_, they have to optimize the used resources

* time
* memory
* network traffic
* ...

We will consider _time_ and _memory_.

We start with time. How long it takes to run the next function?

In [29]:
def sum_all( a ):
    c = 0           # 1
    for x in a:     # if n = len(a), repeated n times
        c += x     # 2
    return c       # 1

print( sum_all( [1,2,3] ) )

6


One can run the program on some input and take the time but the result depends on:

* the speed of the computer
* the efficiency of the Python interpreter
* other programs running in the computer that slow down our program
* the input

We introduce an abstraction that remove all the hardware and (external) software dependency. Instead of using nano-seconds we count the number of _elementary operation_ or **steps**. We assume that one step takes a unit of time. We consider elementary operations that involves *scalar types* (`int`, `float`, `bool`, `None`)

* assignment
* arithmetic, logic, relational operations that involve _scalars_
* indexing
* ...

If `n` is the size of the list `a`, `sum_list(a)` takes

    1 + 2n + 1 = 2n + 2
    
time. 1 for the line 5, `2n` is the cost of the `for` loop and the last 1 is the cost of the `return`.

Additive constats can be removed because when `n` grows they are irrelevant. So, the function `sum_list` takes `2n` time.

Multiplicative constants are important. If we have a solution `S1` whose cost is `5n` time and a solution `S2` whose cost is `2n` time, the first one takes 2.5 times of the time of the second one for complete. The second one is 2.5 times faster than the first one!

We say that `S2` is a *fine* optimizzattion of `S1`. We are saving a constant factor of time with respect the first solution. But in the case `S1` takes `3n**2` time (means $3n^2$), `S2` is *asymptotically* faster that `S1`, in this case we can ignore also the multiplicative constants. The optimizzations of this second type are more important than the fine optimizations, for this reason from the computational cost are also removed the multiplicative constants. If we do so, in the time cost only remains the *order of magnitude* of the number of steps. From an expression like this

    3n**2 + 5n + 9
    
since we ignore additive and multiplicative constants, it remains

    n**2 + n
    
this last expression is between `n**2` and `2n**2`. Since we are ignoring multiplicative constats `n**2+n` can be considered equal to `n**2`. That is, from the original expression, it remains only the term of higher dimension. We conclude that the time cost of `S1` is **order** of `n**2`. In symbols `O(n**2)` (Big-O notation).

Often we refer to the time cost as **time complexity** or, although more generic,  **computationa complexity**.

Consider the example

In [29]:
def find_item(a, k):
    '''
    Parameters
    ----------
    a : a list
    k : an item
    Returns
    -------
    i : int the position of the first k in a, None if
        k is not in a
    '''
    for i, x in enumerate(a):
        if x == k:
            return i
    return None
    
a = [3, 1, 3, 5, 3,  6, 9, 5]
print(find_item(a, 5))

3


In this case the number of steps depends on the value of `i` returned by the function. There are two extreme cases:

* **Best case**. `pos` is zero, in this case the number of steps is constant (the test in line 9 and the return). In the big-O notation the constant is indicated with `O(1)`
* **Worst case**. `pos` is -1, in this case the test in line 9 is repeated `len(a)` times so the number of steps is `O(n)`.

There is also the **average case** that occurs when `pos` is more or less `n/2`. Even in this case the number of steps is `O(n)`.

The most common measure uses the more pessimistic worst case analysis because provide an upped bound of the cost that covers all the cases.

## The complexity of the bubble-sort algorithm

Let us consider this version of the algorithm where the elements of the list are scalars. This implies that the comparison operation is elementary.

```python
def bubble_sort( a ):
    n = len(a)
    c = 0    
    
    is_sorted = False
    while not is_sorted:
        is_sorted = True
        for i in range(n-1-c):
            if a[i] > a[i+1]:
                is_sorted = False
                a[i], a[i+1] = a[i+1], a[i]
        c += 1
```

The `for` loop is repeated at most `n` times, the cost of each operation is constat, so the cost of the `for` loop is `O(n)`. The `while` loop can be repeated at most (in the worst case), `n-1` times. This implies that the total cost, in the worst case is `O(n**2)`. There is also a best case, the one in which the `while` loop is executed only once. In this case the cost of the function is  `O(n)`.

## The computational cost of operators between non-scalar types

Let `a` be a list of `n` lists each of sizes `m`. What is the time complexity of the following operation?

```python
bubble_sort(a)
```

In the worst case the time complexity of

```python
a[i] > a[i+1]
```

is `O(m)` because it can require to compare all the items in `a[i]` with the items in the same positions of `a[i+1]`.

Then, in this case, the time complexity of the bubble sort algorithm is `O(mn**2)`.

------

Let `a` be a list of `n` numbers, we are interested in the time complexity of the aliasing an cloning.

```python
b = a
```

It just introduces a new name for `a`, so its time complecity is `O(1)`.

```python
c = a[:]
```

This operation creates a copy of `a` and requires `n` copy operations of scalars (one for each item in `a`), so its complexity is `O(n)`.

# Binary search algorithm

Consider the problem of searching a key value `k` in a list `a` of sorted items. In case `k` is in `a` we need its position, otherwise `None`.

*Observation*. In case `k != a[i]` for some position `i` we know that or `k` is not in the list, or `k` is on the left of `i`, or `k` is on the right of `i`. The next function consider a seaching interval described by two indices `lx <= rx`. For each iteration, if `k` is not in the middle of the interval, the search continues in one of the two halfs of the interval. The process ends when `k` is found or when the interval is empty.

In [10]:
a = [0, 0, 3, 3, 6, 8, 8, 10, 12, 16, 18, 18, 20]
k = 8

lx, rx = 0, len(a)

# the current subsequence is a[lx:rx]

while lx < rx:
    cx = (lx+rx)//2
    if k == a[cx]:
        # return cx
        print('a[',cx,'] = ', k)
        break
    elif k < a[cx]:
        rx = cx
    else: # k > a[cx]
        lx = cx+1

if lx >= rx:
    print(k, 'not in ', a)

a[ 6 ] =  8


## Exercises for the weekend


1. Consider a list `a` containing strings and numeric tuples like

    ```
    [ 'zero', (3, 9.5, 11), 'one', 'three', (5, 1, 3.14, 8), 'six' ]
    ```

    Solve the problem of sorting the list according to the following criteria:

    - strings must precede tuples
    - the strings must be sorted from longest to shortest 
    - the tuples must be sorted by increasing values of the sums of the numerical values they contain

    for the example above the output should be

    ```
    [ 'three', 'zero', 'one', 'six', (5, 1, 3.14, 8), (3, 9.5, 11) ]
    ```

    or

    ```
    [ 'three', 'zero', 'six', 'one', (5, 1, 3.14, 8), (3, 9.5, 11) ]
    ```
    