# **Computação II** <br/>
**Bachelor's Degree Programs in Data Science and Information Systems**<br/>
**NOVA IMS**<br/>

**NOTE:** Adapted from Prof. Dr. Illya Bakurov's class materials.

## References
1. [Python ``timeit`` module, official documentation](https://docs.python.org/3/library/timeit.html)

# 1. Search for a specific value in an sequence.

Consider two approaches that find a given value in an array (vector).

<center><img src="https://www.w3resource.com/w3r_images/numpy-basic-image-exercise-15.png" width=400/></center>

Imports ``timeit`` modules to keep track of the runtime. Also, imports ``numpy`` to generate random values, and ``matplotlib`` to draw plots.

In [1]:
import time 
import timeit
import numpy as np
import matplotlib.pyplot as plt

Creates a vector of integer values using ``numpy``.

In [2]:
vec = np.arange(30, 50, 2) 
n = len(vec)
print("Vector: {} of length {}".format(vec, n))

Vector: [30 32 34 36 38 40 42 44 46 48] of length 10


**The purpose of this exercise is to manually implement an efficient and effective function to return the position (index) of a user-specified value in a vector.** For the sake of simplicity, if the value is repeated, return the index of the first-found value.

## 1.1. Linear search
Find the index of a user-specified value $val$:
1. Iterate over every single position $i$ in the vector $vec$, starting from $i=0$
    1. If the value at $i$ equals $val$, return $i$
    2. Alternatively, move to $i+1$
2. If the value was not found, return -1

The time complexity for this algorithm can be defined as $O(n)$.

In [7]:
def search_linear(vec, val):    
    for i, val_i in enumerate(vec):
        if val_i==val:
            return i
    return -1 

Tests ``search_linear()``.

In [8]:
print("Vector: {} of length {}".format(vec, n))
vals = [29, 30, 33, 34, 42, 48, 50]
for val in vals:
    idx = search_linear(vec, val)
    print("The value {} can be found at index {}".format(val, idx))

Vector: [30 32 34 36 38 40 42 44 46 48] of length 10
The value 29 can be found at index -1
The value 30 can be found at index 0
The value 33 can be found at index -1
The value 34 can be found at index 2
The value 42 can be found at index 6
The value 48 can be found at index 9
The value 50 can be found at index -1


## 1.2. A more efficient approach: binary search
Binary search can be particularly useful when searching for a value in a sorted vector. Consider the following visualization:

<center><img src="https://jojozhuang.github.io/assets/images/algorithm/1211/binarysearch.png" width=400/></center>

**Given a sorted vector** $vec$, to find the index $i$ of a user-specified value $val$:
1. Compare $val$ with the value at the index representing one-half of $vec$ (i.e., the middle value)
2. If $val$ equals the middle value, return its index $i$
3. If $val$ is smaller than the middle value, then your problem gets logically reduced to finding $val$ in the first half of $vec$
3. If $val$ is larger than the middle value, then your problem gets logically reduced to finding $val$ in the second half of $vec$

The time complexity for this algorithm is:
- definitely below $O(n)$, since not all the values in $vec$ will be compared to $val$ (in the worst case)
- definitely larger than $O(1)$, since more than one basic operation has to be performed, and it is always a function of $n$
- therefore, between $O(1)$ and $O(n)$: $O(log(n))$. In the next class, we will see precisely why $log(n)$ is being used.

First attempt (blindly following the pseudo-code) using recursion. Recall that:

"*A recurrence relation defines a function by means of an expression that includes one or more (smaller) instances of itself.*"

From the mechanics of binary search and the pseudo-code, we can clearly see that every recurrent call of the function implies a reduction of the search space by a factor of two (roughly). In this sense, the parameters ``idx_start`` and ``idx_end`` are necessary to reference the starting and the ending indices of the vector $vec$ throughout the recursive calls.

In [10]:
def search_binary(vec, val, idx_start, idx_end):
    # 1 and 2
    idx_half = (idx_start + idx_end) // 2    
    if val==vec[idx_half]:
        return idx_half
    # 3.
    elif val > vec[idx_half]:
        return search_binary(vec, val, idx_half + 1, idx_end)    
    # 4.    
    else:  #elif val < vec[idx_half]:
        return search_binary(vec, val, idx_start, idx_half - 1)    

Tests ``search_binary()`` using some values that are present in the vector.

In [11]:
print("Vector: {} of length {}".format(vec, n))
vals = [30, 34, 42, 48]
for val in vals:
    idx = search_binary(vec, val, 0, len(vec)-1)
    print("The value {} can be found at index {}".format(val, idx))

Vector: [30 32 34 36 38 40 42 44 46 48] of length 10
The value 30 can be found at index 0
The value 34 can be found at index 2
The value 42 can be found at index 6
The value 48 can be found at index 9


Tests ``search_binary()`` with a value that is not in the vector. **Note that you might need to restart your kernel and re-run the code-cells above after running the code cell below.**

In [None]:
val = 33
idx = search_binary(vec, val, 0, len(vec)-1)
print("The value {} can be found at index {}".format(val, idx))

It seems like the function ``search_binary()`` does not take into account the case when the value $val$ is not present in the vector... this can be formalized a pair of additional termination conditions (in bold):

1. **If $vec$ has only one value:**
    1. **return the index of the value in $vec$, if it is equal to $val$**
    2. **return -1**
2. **If $vec$ has only two values:**
    1. **return the index of the first value in $vec$, if it is equal to $val$**
    2. **return the index of the second value in $vec$, if it is equal to $val$**
    3. **return -1**
3. Compare $val$ with the value at the index representing one-half of $vec$ (i.e., the middle value)
4. If $val$ equals the middle value, return its index $i$
5. If $val$ is smaller than the middle value, then your problem gets logically reduced to finding $val$ in the first half of $vec$
6. If $val$ is larger than the middle value, then your problem gets logically reduced to finding $val$ in the second half of $vec$

Consider a second (debugged) attempt.

In [4]:
def search_binary(vec, val, idx_start, idx_end):
    # Stopping condition 1: case when only one value is in the vector
    if idx_end==idx_start:
        if vec[idx_end] == val:
            return idx_end
        else:
            return - 1
    # Stopping condition 2: case when only two values are in the vector
    if idx_end - idx_start == 1:
        if vec[idx_start]==val:
            return idx_start
        elif vec[idx_end] == val:
            return idx_end
        else:
            return - 1    
    else:
        idx_half = (idx_start + idx_end) // 2    
        # Stopping condition 3: case when the value is in the middle position
        if val==vec[idx_half]:
            return idx_half
        # Recurrent call: the value might be in the second half
        elif val > vec[idx_half]:
            return search_binary(vec, val, idx_half + 1, idx_end)    
        # Recurrent call: the value might be in the first half
        else:  
            return search_binary(vec, val, idx_start, idx_half - 1)    

Tests ``search_binary()``.

In [5]:
print("Vector: {} of length {}".format(vec, n))
vals = [1, 29, 30, 31, 32, 33, 34, 47, 48, 49, 52, 100]
for val in vals:
    idx = search_binary(vec, val, 0, len(vec)-1)
    print("The value {} can be found at index {}".format(val, idx))

Vector: [30 32 34 36 38 40 42 44 46 48] of length 10
The value 1 can be found at index -1
The value 29 can be found at index -1
The value 30 can be found at index 0
The value 31 can be found at index -1
The value 32 can be found at index 1
The value 33 can be found at index -1
The value 34 can be found at index 2
The value 47 can be found at index -1
The value 48 can be found at index 9
The value 49 can be found at index -1
The value 52 can be found at index -1
The value 100 can be found at index -1


However, the function definition provided above can be simplified. In particular, the two termination conditions that regard the case when a value is not present in the vector can be defined by  ``idx_start > idx_end`. This is plausible because:
- each recursive call makes ``idx_start`` increase by some quantity. From the code: ``idx_start = idx_half + 1``
- alternatively, each recursive call makes ``idx_end`` decrease by some quantity. From the code: ``idx_end = idx_half - 1``
- in this sense, the search can be stopped when any of the indices representing smaller instances of the vector $vec$ assume invalid quantities: either the starting index is increased until it is larger than the ending index, either the ending index is decreased until it gets smaller than the starting index.
- in any *invalid* case, this would mean that the vector $vec$ does not contain $val$: **the algorithm has exhausted smaller instances of the vector $vec$**

In [6]:
def search_binary(vec, val, idx_start, idx_end):
    if idx_start > idx_end:
        return - 1
    else:        
        idx_half = (idx_start + idx_end) // 2            
        if val==vec[idx_half]:
            return idx_half
        elif val > vec[idx_half]:
            return search_binary(vec, val, idx_start = idx_half + 1, idx_end = idx_end)    
        else:  
            return search_binary(vec, val, idx_start = idx_start, idx_end = idx_half - 1)    

Tests ``search_binary()``.

In [7]:
print("Vector: {} of length {}".format(vec, n))
vals = [0, 1, 29, 30, 31, 34, 42, 48, 49, 100]
for val in vals:
    idx = search_binary(vec, val, 0, len(vec)-1)
    print("The user-specified value {} can be found at index {}".format(val, idx))

Vector: [30 32 34 36 38 40 42 44 46 48] of length 10
The user-specified value 0 can be found at index -1
The user-specified value 1 can be found at index -1
The user-specified value 29 can be found at index -1
The user-specified value 30 can be found at index 0
The user-specified value 31 can be found at index -1
The user-specified value 34 can be found at index 2
The user-specified value 42 can be found at index 6
The user-specified value 48 can be found at index 9
The user-specified value 49 can be found at index -1
The user-specified value 100 can be found at index -1


## 1.3. Binary search without recursion

## 2. The ``timeit`` module
Assess the runtime of both algorithms using the ``timeit`` module.