# Fundamentals of Computer Science 30398
## Lecture 5 - Python specific features

### Homework exercises:

### Exercise 1. Binary search

Write a function `binary_search`, that takes as an input two arguments: a list `arr` of elements of type `int`, which can be assumed to be in the increasing order, and `target` of type int.

The function should return the position of `target` in the list `arr` (for simplicity you can assume that `target` is indeed present in the list `arr`), in time $O(\log n)$ using the binary search algorithm. You can find short visualization how the algorithm works in programming, lecture 4 slides.

Hint: Create two variables, `left` and `right` denoting positions of the left, and the right endpoint of the current range where the algorithm knows the element `target` needs to be. Update those positions accordingly in the `while` loop.

Bonus points: if `target` is not element of the list `arr`, your function should return `None`.

Example:
```python
def binary_search(arr, target):
	...
	
binary_search([1,5,7,12, 33], 5)
Out: 1
binary_search([1,5,7,12,33], 33)
Out: 4
binary_search([1,5,7,12,33], 4)
Out: None


In [280]:
def binary_search(arr, target):
    left = 0
    right = len(arr) - 1
    while left < right:
        mid = (left + right) // 2
        if arr[mid] ==  target:
            return mid
        if arr[mid] > target:
            right = mid - 1
        else:
            left = mid + 1
    if arr[left] == target:
        return left
    return None

In [281]:
binary_search([1,4,5,7,12,22], 23)

## Python indexing and slicing

In [282]:
a = [1,2,3,5,7]

In [283]:
a[3] 

5

We can use negative index in to get $k$-th position in the list counting from the last one. This:

In [284]:
a[-2]

5

Is just a shorter way of writing:

In [285]:
a[len(a) - 2]

5

**Example** We can write a short function calculating first $n$ Fibonacci numbers for $n \geq 2$.

In [286]:
def fibonacci(n):
    fib = [1,1]
    for i in range(n-2):
        fib.append(fib[-1] + fib[-2])
    return fib

In [287]:
fibonacci(10)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

### Slicing

In [288]:
fib = fibonacci(20)

In [289]:
fib

[1,
 1,
 2,
 3,
 5,
 8,
 13,
 21,
 34,
 55,
 89,
 144,
 233,
 377,
 610,
 987,
 1597,
 2584,
 4181,
 6765]

If `l` is a list, and `a, b` are integers, we can use syntax `l[a:b]` to get a new list containing all elements from `l` in the range from `a` (inclusive) to `b` (not inclusive). That is, the following code:

In [290]:
fib[2:5]

[2, 3, 5]

Has the same functionality as:

In [291]:
tmp = []
for i in range(2, 5):
    tmp.append(fib[i])
tmp

[2, 3, 5]

but is significantly shorter.

If we omit right bound of the range, we get all the elements from the left bound to end of the list (i.e. by default the right bound is just `len(l)`. For example:

In [292]:
fib[10:]

[89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

Similarly, if we omit the left bound, we get all the elemnts from the beginning of the list to the right bound (i.e. by default, the left bound is zero). For example:

In [293]:
fib[:10]

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

Note that this operation creates copy of the list -- in particular it internally has to loop over the entire list. If list was very long, this can be rather costly.

### Recursive binary search

**WARNING** This following code extremely slow, and even though it returns the right result, it completely misses the idea of a binary search!

We can try to use what we just learned together with recursion to write a (potentially) more natural implementation of binary search (assuming that `target` is in the list `lst`)

In [294]:
def binary_search_rec(lst, target):
    if len(lst) == 1:
        return 0
        
    mid = len(lst) // 2
    if lst[mid] == target:
        return mid
    if lst[mid] > target:
        return binary_search_rec(lst[:mid], target)
    else:
        return mid + 1 + binary_search_rec(lst[mid+1:], target)

We can check that it returns mostly correct answers

In [295]:
binary_search_rec([1,2,6,7, 22, 45], 22)

4

In [296]:
binary_search_rec([1,2,6,7, 22, 45], 6)

2

But this implementation completely misses the point of binary search! Note that in the code we are using list slicing. That is, we have operations `lst[:mid]` and `lst[mid + 1:]`. Those operations create copy of the entire left/right half of the list, by a long loop.

The entire point of the binary search was that we can find a position of the element in a sorted list in time $O(\log n)$, but the implementation above runs in time $O(n)$, due to the operation `lst[:mid]`. To get the complexity $O(n)$ we didn't have to bother with binary search: we could have just scanned the entire list with a simple loop!

Let's see oin practice that it is in fact much slower than the actual binary search:

In [297]:
lst = []
for i in range(50_000_000):
    lst.append(i*i)

In [298]:
binary_search_rec(lst, 123_456*123_456)

123456

In [299]:
binary_search(lst, 123_456*123_456)

123456

### Back to homework exercises
### Exercise 2: listing all subsets

Write a function `all_subsets` that takes a number $n$, and returns a list of length $2^n$ - elements of this list should be themselves lists corresponding to all subsets of the set $\{0, 1, 2, \ldots n-1\}$.

Example:

```python
def all_subsets(n):
	...

all_subsets(2)
Out: [[], [1], [0], [0, 1]]

all_subsets(3)
Out: [[], [2], [1], [1, 2], [0], [0, 2], [0, 1], [0, 1, 2]]
```
The subsets can appear in any order in the output list. The elements in each subset should be given in the increasing order.

In [300]:
def all_subsets(n):
    if n== 0:
        return [[]]
        
    previous = all_subsets(n-1)
    result = []
    for x in previous:
        result.append(x)
        y = x.copy()
        y.append(n-1)
        result.append(y)
    return result

In [301]:
all_subsets(2)

[[], [1], [0], [0, 1]]

In [302]:
all_subsets(3)

[[], [2], [1], [1, 2], [0], [0, 2], [0, 1], [0, 1, 2]]

### List concatenation

We can use operator $+$ on two lists, to create a new list containing first the content of the left list, and then the content of second. For example:

In [303]:
[1,2,3] + [7,8,9]

[1, 2, 3, 7, 8, 9]

This is often very convenient, but keep in mind that this is potentially a slow operation, as it creates a copy of the entire list (so it has to internally loop over the list). Consider difference between:

In [304]:
a = [1,2,3,4,5]
a.append(2)

and

In [305]:
a = a + [2]

The first code just appends $2$ at the end of existing list $a$ (modifying it). This is very fast, and takes time $O(1)$. The second code creates a new list, copies the entire content of the list $a$ there, appends element $2$, and changes the reference `a` to point to this new list. If the list `a` is long, and we do this repeatedly, this can be very slow.

**Example**
Let's say that we try to create a list of all integers from $0$ to $n-1$. This would be a terrible way to do it:

In [306]:
def all_integers_up_to(n):
    result = []
    for i in range(n):
        result = result + [i]
    return result

Indeed, if we try to do it with even just moderately large $n$, this takes a long time:

In [307]:
res = all_integers_up_to(100_000)

In contrast, the usual way of doing this keeps appending the number to the same list:

In [308]:
def correct_all_integers_up_to(n):
    result = []
    for i in range(n):
        result.append(i)
    return result

In [309]:
res = correct_all_integers_up_to(100_000)

Going back to our `all_subsets` function, this is exactly what we wanted though: copy the list `x` and add an element at the end of it. We can use this operation to simplify our solution a bit, and make it cleaner:

In [310]:
def all_subsets(n):
    if n== 0:
        return [[]]
        
    previous = all_subsets(n-1)
    result = []
    for x in previous:
        result.append(x)
        result.append(x + [n-1])
    return result

In [311]:
all_subsets(3)

[[], [2], [1], [1, 2], [0], [0, 2], [0, 1], [0, 1, 2]]

## New exercises

### Exercise 3: square root

Write a function `sqrt(x, precision)` that takes a positive argument `x` of type `float`, and `precision` of type `float`. It should return $\sqrt{x}$ up to error `precision`. That is, it should return value $y$, s.t. $|y - \sqrt{x}| < \mathrm{precision}$. Use binary search to solve this problem.

In [312]:
def sqrt(x, precision):
    left = 0
    right = x+1
    while right - left > precision:
        mid = (right + left) / 2
        if mid*mid < x:
            left = mid
        else:
            right = mid
    return (right + left)/2

In [313]:
sqrt(3.0, 0.00000001)

1.732050810009241

A much slower, linear-search attempt at computing square root is a bad idea:

In [314]:
def sqrt_slow(x, precision):
    left = 0
    while left * left < x:
        left = left + precision
    return left

In [315]:
sqrt_slow(3.0, 0.00000001)

1.7320508078408496

## Python features: keyword arguments, default values of arguments

In [316]:
sqrt(3.0, 0.0001)

1.732025146484375

When calling a function in python we can explicitly write which arguments have which values with the following syntax

In [317]:
sqrt(3.0, precision=0.001)

1.73193359375

In the case of named arguments, we do not need to pass them in the same order as in function definition.

In [318]:
sqrt(precision=0.001, x = 3.0)

1.73193359375

This is quite useful, depending on the context. For example, in the square root function, when reading a code after a while it my be unclear to you why it should take two parameters. Encountering call `sqrt(3.0)` somewhere in the code, it is easy to figure out that this computes square root of $3$. When you see just `sqrt(3.0, 0.0001)`, the meaning of the second parameter might be less obvious, and you will have to check documentation. In this case, it is better to explicitly name the second parameter while calling the function:

In [319]:
sqrt(3.0, precision=0.0001)

1.732025146484375

is unambiguous.

### Function parameters can have default value

In [320]:
def sqrt(x, precision = 0.0000001):
    left = 0
    right = x+1
    while right - left > precision:
        mid = (right + left) / 2
        if mid*mid < x:
            left = mid
        else:
            right = mid
    return (right + left)/2

In this case the parameter precision is optional. If we do not specify it when calling `sqrt`, its value will be set to `0.0000001`, but we can change it if we so wish to:

In [321]:
sqrt(3.0)

1.7320508062839508

In [322]:
sqrt(3.0, 0.001)

1.73193359375

In [323]:
sqrt(3.0, precision=0.1)

1.71875

### Warning: default values should always be immutable!

**Example:**
If we create a function with a default value that is a list, this list will only be created once, at the declaration of this function, and all call will refer to the same list. This is almost never the intended behavior.

In [324]:
def func(arg = []):
    arg.append(3)
    return arg

In [325]:
func([1,5,6,7])

[1, 5, 6, 7, 3]

In [326]:
func()

[3]

In [327]:
func()

[3, 3]

In [328]:
func()

[3, 3, 3]

The simple way to avoid problems like that is to never have any mutable object as a default value for an argument. If we what to have an optional argument that is a list, and the list is a new empty list when unspecified explicitly, we should just set the default value of the argument to `None`, and create the list in the code of the function.

In [329]:
def func_good(arg = None):
    if arg == None:
        arg = []
    arg.append(3)
    return arg

In [330]:
func([1,5,6,7])

[1, 5, 6, 7, 3]

In [331]:
func_good()

[3]

In [332]:
func_good()

[3]

In [333]:
func_good()

[3]

## Libraries

Another reason for existance of functions: probably someone already wrote a function you need. The functions people wrote that you can use in your code are always bunched together in **libraries** - library is just a collection of functions that you might want to use for a specific purpose. In your code (conventionally at the beginning of the file) you can import all the libraries you want, and then call functions from those libraries:

In [334]:
import math

In [335]:
math.sqrt(6)

2.449489742783178

To read how to use a specific function that you haven't used before (or one that you have used, but don't remember exactly), you can just look up its documentation using `help` function:

In [336]:
help(math.sqrt)

Help on built-in function sqrt in module math:

sqrt(x, /)
    Return the square root of x.



For example, you might invoke `help` on a function `print` that we've used for a couple of lectures now, to learn that it takes two optional arguments, `sep` and `end`. This might be useful

In [337]:
help(print)

Help on built-in function print in module builtins:

print(*args, sep=' ', end='\n', file=None, flush=False)
    Prints the values to a stream, or to sys.stdout by default.

    sep
      string inserted between values, default a space.
    end
      string appended after the last value, default a newline.
    file
      a file-like object (stream); defaults to the current sys.stdout.
    flush
      whether to forcibly flush the stream.



In [338]:
print("abc", "cde", 12)
print("Another Print")

abc cde 12
Another Print


In [339]:
print("abc", "cde", 12, sep="|")

abc|cde|12


In [340]:
print("abc", "cde", 12, end="")
print("Another Print!")

abc cde 12Another Print!


To specify the help string for our own function, we use triple double-quotation mark in the line following the `def` keyword.

In [341]:
def sqrt(x, precision = 0.0000001):
    """ This function returns square root of x, up to precision specified by argument
    precision.
    """
    left = 0
    right = x+1
    while right - left > precision:
        mid = (right + left) / 2
        if mid*mid < x:
            left = mid
        else:
            right = mid
    return (right + left)/2

In [342]:
help(sqrt)

Help on function sqrt in module __main__:

sqrt(x, precision=1e-07)
    This function returns square root of x, up to precision specified by argument
    precision.



### Speeding up our code to find all primes in a given range:

Our original code for listing all primes in range was this

In [343]:
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

In [344]:
def find_all_primes(n):
    result = []
    for i in range(n):
        if is_prime(i):
            result.append(i)
    return result

This code, as it turns out, runs in time $O(n^2 / \log n)$ --- if we try to count number of primes in some moderately large range, this becomes already slow:

In [345]:
len(find_all_primes(50_000))

5133

Here is a simple idea to speed this up. Note that any number $n$ which is not prime, has a divisor greater than $1$ but at most $\sqrt{n}$.

Indeed, if $a$ divides $n$, then also $n/a$ divides $n$. But at least one of those two numbers is at most $\sqrt{n}$. Let's try to use this idea to improve the running time of `is_prime` function. `int(math.sqrt(n))` will give us the largest integer smaller than $\sqrt{n}$. 

The first try to improve the `is_prime` function may look like this:

**WARNING:** This is wrong

In [346]:
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(math.sqrt(n))):
        if n % i == 0:
            return False
    return True

In [347]:
is_prime(5)

True

In [348]:
is_prime(6)

True

In [349]:
is_prime(9)

True

**Exercise** Can you spot the mistake?

The issue is that `for i in range(2, b):` iterates over `i` from `2` up to `b-1`, not to `b` -- the range does not include right endpoint. When `n` is a square of a prime number, like 9, we have

In [350]:
int(math.sqrt(9))

3

so the loop `for i in range(2,3):` will try only `i = 2`. This is not a divisor of $9$, so it will skip over to `return True`. Those kind of off-by-one errors are very common, and with some practice you will be able to spot them faster and faster (and potentially make them less and less frequently). Let's see the corrected version:

In [351]:
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(math.sqrt(n))+1):
        if n % i == 0:
            return False
    return True

In [352]:
is_prime(9)

False

Now we can find all primes up to $n$ in time roughly $O(n^{3/2} / \log n)$ --- much faster than previously. Let's see how fast this works in practice:

In [353]:
len(find_all_primes(50_000))

5133

### List comprehension: filter,  map, map+filter

Let's write a function that outputs a list containing all odd numbers from the input lst.

In [354]:
def all_odds(lst):
    result = []
    for x in lst:
        if x % 2 == 1:
            result.append(x)
    return result

In [355]:
all_odds([2,3,4,6,9,11,22])

[3, 9, 11]

In [356]:
lst = [2,3,4,6,9,11,22]

This type of pattern is something we have already done a couple of times (creating a list of all elements from the input list that satisfy a specific condition), and Python have specific syntax that makes writing it much less verbose. Instead of the function above we can write:

In [357]:
[ x for x in lst if x % 2 == 1]

[3, 9, 11]

The function `find_all_primes` now can be rewritten with this new trick to a simple one-liner:

In [358]:
all_primes_in_range = [ x for x in range(2, 50_000) if is_prime(x)]

In [359]:
len(all_primes_in_range)

5133

Another common pattern that we have seen a couple of times before, is trying to write a function that produces a new list, obtained by applying the same operation to all elements of some previous list. For example, let's try to write a function that squares all elements in the input list:

In [360]:
def square_all_elements(lst):
    result = []
    for x in lst:
        result.append(x*x)
    return result

In [361]:
square_all_elements([1,2,3,10, 23])

[1, 4, 9, 100, 529]

Shorthand for this in python:

In [362]:
[ x*x for x in lst ]

[4, 9, 16, 36, 81, 121, 484]

We can combine those two constructions. Here is a code that produces squares of all prime numbers in the range up to $999$.

In [363]:
squares_of_primes = [ x * x for x in range(1000) if is_prime(x) ]

In [364]:
squares_of_primes

[4,
 9,
 25,
 49,
 121,
 169,
 289,
 361,
 529,
 841,
 961,
 1369,
 1681,
 1849,
 2209,
 2809,
 3481,
 3721,
 4489,
 5041,
 5329,
 6241,
 6889,
 7921,
 9409,
 10201,
 10609,
 11449,
 11881,
 12769,
 16129,
 17161,
 18769,
 19321,
 22201,
 22801,
 24649,
 26569,
 27889,
 29929,
 32041,
 32761,
 36481,
 37249,
 38809,
 39601,
 44521,
 49729,
 51529,
 52441,
 54289,
 57121,
 58081,
 63001,
 66049,
 69169,
 72361,
 73441,
 76729,
 78961,
 80089,
 85849,
 94249,
 96721,
 97969,
 100489,
 109561,
 113569,
 120409,
 121801,
 124609,
 128881,
 134689,
 139129,
 143641,
 146689,
 151321,
 157609,
 160801,
 167281,
 175561,
 177241,
 185761,
 187489,
 192721,
 196249,
 201601,
 208849,
 212521,
 214369,
 218089,
 229441,
 237169,
 241081,
 249001,
 253009,
 259081,
 271441,
 273529,
 292681,
 299209,
 310249,
 316969,
 323761,
 326041,
 332929,
 344569,
 351649,
 358801,
 361201,
 368449,
 375769,
 380689,
 383161,
 398161,
 410881,
 413449,
 418609,
 426409,
 434281,
 436921,
 452929,
 458329,


### Tuples

Tuples are a new type in Python, that behaves similar to lists in many aspects. We create a tuple using round bracekts, instead of square brackets

In [365]:
a = (12, 34,56,"String", 2.3)

In [366]:
a

(12, 34, 56, 'String', 2.3)

As in the list, we can iterate over elements of the tuple

In [367]:
for x in a:
    print(x)

12
34
56
String
2.3


Or access the $k$-th element of the tuple.

In [368]:
a[2]

56

The main difference is that tuples are immutable, in contrast to the lists. We cannot change what the $k$-th element of the tuple is rerefencing

In [369]:
a[2] = 4

TypeError: 'tuple' object does not support item assignment

Tuples are useful for example when we want to return more than a single value from a function: we will still return a single value, but this value, will be in fact of type `tuple`, that contains two other values. For example:

### Exercise 4

Write a function `mean_and_std_dev(x)` which takes a list `x` of floats as input, and outputs a pair $(\mu, \sigma)$, where
$$
\mu = \frac{1}{n} \sum_{1\leq i\leq n} x_i,
$$
and
$$
\sigma = \sqrt{\frac{1}{n} \sum_{1 \leq i \leq n} (x_i - \mu)^2}.
$$

In [None]:
def mean(lst):
    result = 0
    for x in lst:
        result = result + x
    return result/len(lst)

In [None]:
def mean_and_std_dev(lst):
    avg = mean(lst)
    sq_diff = [ (x - avg)*(x - avg) for x in lst ]
    std_dev = math.sqrt(mean(sq_diff))
    return (avg, std_dev)

Now, we can assign the first value of returned tuple to one variable, and the second to other

In [None]:
x, y = mean_and_std_dev([1,2,3,4,5])

In [None]:
x

In [None]:
y