# Lab 4: Practice Python Fundamentals
___

## Part I: Simple functions

### $\S$ Exercise 1

As we know, the addition of two list in Python will concatenate the two lists. Write a function `elementwise_add` to add two list, element by element. You need to consider
* If the two lists are of the same size.
* If the two lists are of different sizes.
* If the elements are strings.

In [174]:
l1 = [1,2,3]
l2 = [2,4,5]
l1 + l2

[1, 2, 3, 2, 4, 5]

In [151]:
def elementwise_add(l1, l2):
    pass

In [152]:
a = [1,2,4]
b = [2,5,8]
elementwise_add(a, b)

[3, 7, 12]

In [153]:
elementwise_add(["red", "blue", "orange"], ["yellow", "pink", "magenta"])

['redyellow', 'bluepink', 'orangemagenta']

### $\S$ Exercise 2

The module `random` contains multiple functions to generate random variables, which are very useful in data simulations.

Write a function to generate $n$ random variables which follows Beta distribution, and compute the average and variance of the sample.

In [155]:
import random
def rbeta(n, a, b):
    pass

In [156]:
x = rbeta(100, 2, 5)
sum(x)/len(x)

0.27052909104526957

In [158]:
2.0/(2+5)

0.2857142857142857

### $\S$ Exercise 3

Entropy (熵) and information gain (信息增益) are two essential metrics in information theory. Entropy is used to measure the purity (impurity) of a group:
$$
\textrm{Entropy} = \sum_{i \in c} -p_i \log_2 p_i
$$
while information gain is used to measure the purity gains after splitting by a continuous or discrete feature. This is often used in __decision tree (决策树)__ or __random forests (随机森林)__.
$$
\mbox{information gain} = \mbox{Entropy} - \mbox{weighted sum Entropy}
$$

Here is an example. Before splitting by gender, the class probabilities for a sample (320) are:
$$
N(0) = 153, N(1) = 167
$$
For male (160), 
$$
N(0) = 139, N(1) = 21
$$
while for female (160),
$$
N(0) = 14, N(1) = 146
$$

Therefore, the entropy before splitting is:
$$
-\frac{153}{320} \times \log_2 \frac{153}{320} - \frac{167}{320} \times \log_2 \frac{167}{320} = 0.999
$$
After the splitting:
$$
\frac{160}{320} \times (-\frac{139}{160} \times \log_2 \frac{139}{160} - \frac{21}{160} \times \log_2 \frac{21}{160}) + \frac{160}{320} \times (-\frac{14}{160} \times \log_2 \frac{14}{160} - \frac{146}{160} \times \log_2 \frac{146}{160}) = 0.494
$$
Thus, the information gain is $0.999 - 0.494 = 0.505$.

In python, the labels for a group can be stored as a list:

In [169]:
male = [1,1,1,0]; female = [0, 0, 0]

Write a function `info_gain` to compute the information gain, using a list as input, each element of which is also a list containing the labels for each individual in the corresponding group.

Note that we need to set $0 \times \log_2 0 = 0$.

In [177]:
def info_gain(labels):
    """Compute the information gain for splitting by a features. 
    
    >>> male = [1,1,1,0]; female = [0,0,0]
    >>> info_gain([male, female])
    0.807
    """
    pass

## Part II: Recursion

### $\S$ Exercise 1

Write a function `count_ways` to count the ways to sum any given total amount over a give list of numbers. Available numbers are passed as argument to the function.

#### <font color="red">Hints</font>

For each element in the list, the solution might have two situations:
   > (1) Element is included in the summation, once or more
   
   > (2) Element is not included in the summation


In [16]:
def count_ways(total, numbers):
    pass

In [17]:
count_ways(6, [1, 5])

2

In [18]:
count_ways(10, [1,5])

3

In [19]:
count_ways(100, [1, 5, 10, 25, 50])

292

#### <font color="red">Questions</font>
* <font color="red">Does the order of the argument `numbers` affect the efficiency of the above function? If yes, what kind of order will have the highest efficiency? Use `%timeit` to check it, as shown following.</font>

In [20]:
%timeit count_ways(100, [1, 5, 10, 25, 50])

100 loops, best of 3: 9.48 ms per loop


In [21]:
%timeit count_ways(100, [50, 25, 10, 5, 1])

100 loops, best of 3: 3.62 ms per loop


### $\S$ Exercise 2

Here is a function `flatten_dict` to flatten a nested dictionary by joining the keys with . character.

In [13]:
def flatten_dict(nested, flatten={}):
    """ Flattens a nested dict
    
        >>> flatten_dict({'a': 1, 'b': {'x': 2, 'y': 3}, 'c': 4})
        {'a': 1, 'b.x': 2, 'b.y': 3, 'c': 4}
    """
    pass

In [14]:
flatten_dict({'a': 1, 'b':2, 'c': {'d':3, 'e': {'f': 4, 'g': 5}}, 'h': 6})

{'a': 1, 'b': 2, 'c.d': 3, 'c.e.f': 4, 'c.e.g': 5, 'h': 6}

Could you please write a function `flatten_list` to flatten a nested list?

In [None]:
def flatten_list(nested, flatten=[]):
    """ Flatten a nested list.
    
        >>> flatten_list([1, 2, 3, [4, 5]])
        [1, 2, 3, 4, 5]    
    """
    pass

### $\S$ Exercise 3

Implement a function `dirtree` that takes a directory as an argument and prints all the files in that directory recursively as a tree, as shown following.
```python
>>> dirtree("foo/")
```
```
foo/
|-- a.txt
|-- b.txt
|-- bar/
|   |-- .ipynb_checkpoints/
|   |-- bar2/
|   |   |-- .ipynb_checkpoints/
|   |   `-- p.txt
|   |-- p.txt
|   `-- q.txt
`-- c.txt
```

In [172]:
import os
def dirtree(path, indent=0):
    pass

In [173]:
dirtree("foo/")

foo/
|-- a.txt
|-- b.txt
|-- bar/
|   |-- .ipynb_checkpoints/
|   |-- bar2/
|   |   |-- .ipynb_checkpoints/
|   |   `-- p.txt
|   |-- p.txt
|   `-- q.txt
`-- c.txt


### $\S$ Exercise 4

Write a function `permute` to permute a list, and return all the possible permutations of a given list.

In [92]:
def permute(li):
    pass

In [93]:
permute([1,2,3,4])

[[1, 2, 3, 4],
 [1, 2, 4, 3],
 [1, 3, 2, 4],
 [1, 3, 4, 2],
 [1, 4, 2, 3],
 [1, 4, 3, 2],
 [2, 1, 3, 4],
 [2, 1, 4, 3],
 [2, 3, 1, 4],
 [2, 3, 4, 1],
 [2, 4, 1, 3],
 [2, 4, 3, 1],
 [3, 1, 2, 4],
 [3, 1, 4, 2],
 [3, 2, 1, 4],
 [3, 2, 4, 1],
 [3, 4, 1, 2],
 [3, 4, 2, 1],
 [4, 1, 2, 3],
 [4, 1, 3, 2],
 [4, 2, 1, 3],
 [4, 2, 3, 1],
 [4, 3, 1, 2],
 [4, 3, 2, 1]]

## Part III: Decorators

Decorator is a kind of high-order functions which take the modified function as the argument and also return another function which processes the arguments for the modified function.

### Closure: global and local variables

In [141]:
a = 10
def x():
    #global a
    #print "Global variables: ", globals()
    a = 100
    print "Local variables (x): ", locals()
    def y():
        print "In y(), a = ", a
        print "Local variables (y): ", locals()
    y()
    print "In x(), a = ", a
x()

Local variables (x):  {'a': 100}
In y(), a =  100
Local variables (y):  {'a': 100}
In x(), a =  100


#### How does a function find a variable?
* For function `x`, if variable `a` is defined in its domain, use the local variable `a`; otherwise use the global variable `a`. 
* For function `y`, if variable `a` is defined in its domain, use the local variable `a`; otherwise use the local variable `a` defined in the outer funtion `x`; or otherwise use the global variable `a`.

Here is an example of decorator to trace the whole recursion process for computing Fibonacci series:

In [123]:
def trace(func):
    func.indent = 0
    def g(x):
        print '|  ' * func.indent + '|--', func.__name__, x
        func.indent += 1
        value = func(x)
        print '|  ' * func.indent + '|--', 'return', repr(value)
        func.indent -= 1
        return value
    return g

def fib(n):
    if n is 0 or n is 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

In [124]:
fib = trace(fib)
fib(3)

|-- fib 3
|  |-- fib 2
|  |  |-- fib 1
|  |  |  |-- return 1
|  |  |-- fib 0
|  |  |  |-- return 1
|  |  |-- return 2
|  |-- fib 1
|  |  |-- return 1
|  |-- return 3


3

We can also mark a decorator function like this:

In [125]:
@trace
def fib(n):
    if n is 0 or n is 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

In [127]:
fib(5)

|-- fib 5
|  |-- fib 4
|  |  |-- fib 3
|  |  |  |-- fib 2
|  |  |  |  |-- fib 1
|  |  |  |  |  |-- return 1
|  |  |  |  |-- fib 0
|  |  |  |  |  |-- return 1
|  |  |  |  |-- return 2
|  |  |  |-- fib 1
|  |  |  |  |-- return 1
|  |  |  |-- return 3
|  |  |-- fib 2
|  |  |  |-- fib 1
|  |  |  |  |-- return 1
|  |  |  |-- fib 0
|  |  |  |  |-- return 1
|  |  |  |-- return 2
|  |  |-- return 5
|  |-- fib 3
|  |  |-- fib 2
|  |  |  |-- fib 1
|  |  |  |  |-- return 1
|  |  |  |-- fib 0
|  |  |  |  |-- return 1
|  |  |  |-- return 2
|  |  |-- fib 1
|  |  |  |-- return 1
|  |  |-- return 3
|  |-- return 8


8

### $\S$ Exercise 1

Write a decorator function for the function `count_ways` in section I, so that it will print out the whole process of the counting.

In [103]:
def trace(func):
    pass

@trace
def count_ways(total, numbers):
    pass

In [109]:
count_ways(3, numbers=[1, 2])

|-- count_ways (3,) {'numbers': [1, 2]}
|  |-- count_ways (3, [2]) {}
|  |  |-- count_ways (3, []) {}
|  |  |  |-- return 0
|  |  |-- count_ways (1, [2]) {}
|  |  |  |-- count_ways (1, []) {}
|  |  |  |  |-- return 0
|  |  |  |-- count_ways (-1, [2]) {}
|  |  |  |  |-- return 0
|  |  |  |-- return 0
|  |  |-- return 0
|  |-- count_ways (2, [1, 2]) {}
|  |  |-- count_ways (2, [2]) {}
|  |  |  |-- count_ways (2, []) {}
|  |  |  |  |-- return 0
|  |  |  |-- count_ways (0, [2]) {}
|  |  |  |  |-- return 1
|  |  |  |-- return 1
|  |  |-- count_ways (1, [1, 2]) {}
|  |  |  |-- count_ways (1, [2]) {}
|  |  |  |  |-- count_ways (1, []) {}
|  |  |  |  |  |-- return 0
|  |  |  |  |-- count_ways (-1, [2]) {}
|  |  |  |  |  |-- return 0
|  |  |  |  |-- return 0
|  |  |  |-- count_ways (0, [1, 2]) {}
|  |  |  |  |-- return 1
|  |  |  |-- return 1
|  |  |-- return 2
|  |-- return 2


2

### $\S$ Exercise 2

Write a decorator function for printing out the running time for a given function.

In [128]:
import time
def profile(fun):
    pass

@profile
def fib(n):
    if n==0 or n==1:
        return 1
    else:
        return fib(n-2) + fib(n-1)     

In [146]:
fib(30)

'11.9591488838 seconds elapsed.'