<div class="frontmatter text-center">
<h1>Introduction to Data Science and Programming</h1>
<h2>Exercise 9: Python Crash Course - Binary search and conda</h2>
<h3>IT University of Copenhagen, Fall 2023</h3>
</div>

* Task 1: `conda`
* Task 2: Implementing a binary search algorithm
* Task 3: Improving the efficiency of a recursive Fibonacci function with "caching"

# Task 1: Creating the `websoup` environment

In this task, we will: 
* learn how to use conda to manage environments; 
* create an environment `websoup`; 
* and install the package `beautifulsoup4` (which we need for lecture & exercise 10).

Detailed instructions are [here](https://github.com/anastassiavybornova/pythoncrashcourse/blob/main/exercise09_conda.md). 

> **Troubleshooting**: If you get lost within the detailed instructions, have no more time left, and really just want to install the `beautifulsoup4` package, then head [here](https://github.com/anastassiavybornova/pythoncrashcourse/blob/main/exercise09_justsoup.md) for instructions. (This is the minimum needed for you to be able to code along during lecture & exercise 10).

# Task 2: `binary_search`

Write a function `binary_search` that:
* takes two input arguments: an **already sorted** list of numbers; and a **number**
* returns `True` or `False`, depending on whether the number is on the list or not
* implements a **binary search algorithm** ("cutting the search space in half at each step") to find out whether the number is on the list

If you need some inspiration on how to approach this task, check out **Chapter 8  - Friday: Writing a Binary Search** from the book [**Python Projects for Beginners**](https://learnit.itu.dk/pluginfile.php/356837/mod_page/content/8/PythonProjectsForBeginners.pdf) (the PDF is available for download on the [Self-study resources page in learnIT](https://learnit.itu.dk/mod/page/view.php?id=185265)). 

You can use the `numbers.csv` file (provided with this notebook) to test your function:
* `binary_search(4403)` should return `True` (4403 is on the list)
* `binary_search(52301)` should return False (52301 is not on the list)

In [None]:
def binary_search(my_sorted_list, my_item):
    '''
    your docstring here
    '''

    # define the "search limits"

    # while we still have a list with more than 1 number to search:

        # find the "middle index" (// gives us integer division, always rounds down to next integer)
        
        # if the item at middle_index position is the one we're looking for,
        # we're done! we found the item on the list - so return True
        
        # if at the "middle" position we have something SMALLER than my_item,
        # my_item must be in the right half of the search space,
        # so we need to modify the left limit:

        # if at the "middle" position we have something BIGGER than my_item,
        # then my_item must be in the left half of the search space,
        # so we need to modify the right limit:

    # the while loop ends when left_limit becomes smaller or equal to right limit,
    # which means that we will have searched the entire space, but found no match;
    # in that case it means our number is not on the list;
    # then return False:

**Read in the data from `numbers.csv` to a list**

In [None]:
# import pandas
import pandas as pd
# read in numbers.csv into dataframe called "df"; 
# header = None means that in the csv, the first line is not a header row, but already contains data
df = pd.read_csv("files/numbers.csv", header = None)
# the numbers are in a column called "0" by default (because no header was provided);
# use list() to convert the column into a list
my_list = list(df[0])

In [None]:
# check if your binary_search function works as expected

# Task 3: Recursive Fibonacci with memory caching

> ADVANCED material!

In the lecture, we got familiar with the recursive [Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_sequence) (where each number is the sum of the preceding two numbers). Below, we have already implemented a recursive function, `fib(n)`, that returns the n-th element of the Fibonacci sequence. 

Try to compute `fib(42)`; you will notice it takes a long time to compute (since the recursive function recomputes values `fib(n)` an exponential amount of times).

Try to improve the performance of this function by implementing a new function, `fib_cache(n)`, which does exactly the same as `fib(n)`, but in addition uses a "memory cache" (in our case, a simple dictionary, defined OUTSIDE the function), where it stores all already computed values as key-value pairs (keys: `n`; values: `fib(n)`); and at each function call, does NOT recompute fib(n) for all n, but instead FIRST looks up the values the "memory cache" (the dictionary).

Then, try to recompute the Fibonacci sequence for 42 by running `fib_cache(42)`; did the performance improve? Why? Do `fib(42)` and `fib_cache(42)` return the same results? And what is now inside the `memo_dict` dictionary?

If you need some inspiration on how to approach this task, check out **Chapter 8  - Thursday, subchapters "Understanding memoization" and "Using memoization"** from the book [**Python Projects for Beginners**](https://learnit.itu.dk/pluginfile.php/356837/mod_page/content/8/PythonProjectsForBeginners.pdf) (the PDF is available for download on the [Self-study resources page in learnIT](https://learnit.itu.dk/mod/page/view.php?id=185265)). 


In [None]:
# run this cell to define the simple fib(n) function (WITHOUT memory cache)
def fib(n):
    # base case:
    if n == 1:
        return 1
    elif n == 0:
        return 0
    else:
        return fib(n-1) + fib(n-2)

In [None]:
# try to compute fib(42)

In [None]:
# implement a function WITH memory cache

# initialize an empty dictionary, will be used for "caching" the already computed fib(n) values
memo_dict = {}

# the function below considers two options:

# option 1: we are calling fib_cache(n) for the first time for a given n (0, 1, 2, 3, ...)
# then we need to: compute fib(n) for this n; save fib(n) in the memo_dict with key n; and then return fib(n).
# (we are using the variable "result" for this)

# option 2: we are calling fib_cache(n) for an n whose fibonacci number has already been computed;
# then we need to just look it up in the dictionary

def fib_cache(n):
    
    # "option 2" from above: if the fibonacci number for this n 
    # has already been calculated,
    # look no further, just return its value from the memo_dict
    
    # "option 1" from above:
    # else, if n is not in memo_dict,
    # it means we haven't computed fib(n) yet, so let's 
    # do it here; start with "base case" 0 
    
    # save fibonacci number computed at this step as value to the dict
    
    # return the result variable (which now contains fib(n) for the given n)


In [None]:
# compute fib_cache(42), did perfomance improve?


In [None]:
# does fib(42) and fib_cache(42) return the same result?

In [None]:
# what is inside the memo_dict dictionary?