# Day 7: Searching Algorithms
--------------------------------------------

## Reflections from Last Day

- Data Science: _Japa!!_
```python
df = pd.read_csv('https://raw.githubusercontent.com/naijacoderorg/lectures/main/lectures2024/datascience/migrations.csv')
emigrations_total = (df[['origin_country', '1960', '1970', '1980', '1990', '2000']] #select the relevant columns
                     .groupby('origin_country') #groupby
                     .sum() #aggregate
                     .reset_index()
                    )
```
- Growth of Functions
    - **Big O Notation (O)**: Worst case upper bound e.g. $O(n^2)$
    - **Omega Notation ($\Omega$)**: Worst case lower bound e.g. $\Omega(n)$
    - **Theta Notation ($\Theta$)**: Both upper and lower bound e.g. $\Theta(n)$
    - **Small O Notation (o)**. e.g. $o(n)$
    - **Small omega Notation ($\omega$)**. e.g. $\omega(n^2)$ 

## Exercises from Last Day

Determine whether the following statements are true (T) or false (F):

1. $n^2 = O(n^3)$
2. $n^2 = o(n^3)$
3. $n^2 = \Theta(n^2)$
4. $n^2 = \omega(n)$
5. $n^2 = \Omega(n^2)$

## Agenda for Today

- Searching Algorithms
    - Linear Search
    - Binary Search 

## Searching Algorithms

Locate elements within a data structure such as arrays, lists, trees, graphs, and more.

e.g. The position of the number 10 in the list `[0, 20, -2, 10, 5]` is `3`

In Python, these algorithms are crucial for efficiently finding specific items based on certain criteria.


### Linear Search

Each element is checked in order.
    - Check first element, second element, etc until you find it

#### Example of Linear Search in Python:

In [23]:
def linear_search(arr, target):
    for i in range(len(arr)):
        if arr[i] == target:
            return i  # Return index if found
    return -1  # Return -1 if not found

# Example usage
my_list = [10, 30, 20, 5, 15]
target_value = 20
result = linear_search(my_list, target_value)
if result != -1:
    print(f"Element found at index {result}")
else:
    print("Element not found")

Element found at index 2


#### Linear Search Time Complexity

Let's count how many operations happen

In [27]:
def linear_search(arr, target):
    index = -1
    for i in range(len(arr)):
        if arr[i] == target:
            index = i
            break
    if index == -1:
        print("Element not found")
    else:
        print(f"{target} found at index {index:2d}. {i+1:2d} Iterations")
    return index  # Return -1 if not found

# Data has a length of 20
data = [13, -4, 2, 6, 9, 16, -2, 4, -16, 8, 19, 14, 10, 22, -3, 12, 18, -18, -14, -10]

linear_search(data, 6);
# 20 operations is the worst-case
linear_search(data, -10);


6 found at index  3.  4 Iterations
-10 found at index 19. 20 Iterations


**Time Complexity**: $O(n)$ - Linear time complexity, where $n$ is the number of elements in the list.

### Binary Search

If we first sort the list, then intuitively, it should be faster to locate where element should be.

Once sorted

- repeatedly divide the search interval in half
- If match the middle, then we found it!!
- Otherwise, determine if we search to the left or to the right depending on whether it's bigger or smaller than middle element

#### Example of Binary Search in Python:

In [31]:
def binary_search(arr, target):
    low = 0
    high = len(arr) - 1
    operation_count = 0
    index = -1
    while low <= high:
        mid = (low + high) // 2 # divide and round down
        operation_count += 1
        if arr[mid] == target:
            index = mid
            break
        elif arr[mid] < target:
            low = mid + 1  # Search in the right half
        else:
            high = mid - 1  # Search in the left half
    if index == -1:
        print("Element not found")
    else:
        print(f"{target} found at index {index:2d}. {operation_count:2d} Iterations")
    return index  # Return -1 if not found

# Example usage
sorted_list = [5, 10, 15, 20, 30]
target_value = 20
binary_search(sorted_list, target_value)

sorted_list = list(range(1000))
# Linear search
linear_search(sorted_list, 500)
# Binary search
binary_search(sorted_list, 500); # Just 9 iterations!!

20 found at index  3.  2 Iterations
500 found at index 500. 501 Iterations
500 found at index 500.  9 Iterations


Worst case, we divide N by 2 until there is only 1 element left

e.g. For N = 1000

In [38]:
N = 1000
count = 0
while N > 1:
    N = N // 2
    count = count + 1
    print(f"N = {N:3d} after {count} iterations")

N = 500 after 1 iterations
N = 250 after 2 iterations
N = 125 after 3 iterations
N =  62 after 4 iterations
N =  31 after 5 iterations
N =  15 after 6 iterations
N =   7 after 7 iterations
N =   3 after 8 iterations
N =   1 after 9 iterations


Mathematically, 
$$2^x = N$$

Take $log_2$ of both sides

$$ log_2{2^x} = log_2{N}$$
$$ x*log_2{2} = log_2{N}$$
$$ log_2{2} = 1 $$
$$ x = log_2{N} $$


**Time Complexity**: $O(\log n)$ - Logarithmic time complexity on a sorted list (where n is the number of elements in the sorted list).

### Key Considerations

- **Data Structure**: The choice of searching algorithm often depends on the data structure being used (e.g., lists, trees, graphs).
- **Performance**: Binary search is significantly faster than linear search for large datasets, especially when the data is sorted.
    - Especially if the search will be performed again on the same list
    - Incur sorting penalty just once 
- **Edge Cases**: Consider edge cases such as empty lists or arrays with duplicate values when implementing and testing searching algorithms.


### Linear Search Exercises

1. **Exercise 1: Finding an Element**
   - **Problem**: Implement a function `linear_search(arr, target)` that returns the index of `target` in the list `arr` using linear search. If `target` is not present, return `-1`.
   - **Example**:
     ```python
     arr = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
     target = 5
     # Output: 4 (index of the first occurrence of 5)
     ```
     Use a while loop!

2. **Exercise 2: Counting Occurrences**
   - **Problem**: Modify the previous function to return the number of times `target` appears in `arr`.
   - **Example**:
     ```python
     arr = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
     target = 5
     # Output: 2 (number of times 5 appears in arr)
     ```

3. **Exercise 3: Sum of Elements**
   - **Problem**: Write a function `sum_linear_search(arr)` that computes the sum of all elements in the list `arr` using linear search.
   - **Example**:
     ```python
     arr = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
     # Output: 39 (sum of all elements in arr)

### Binary Search Exercises

1. **Exercise 4: Basic Binary Search**
   - **Problem**: Implement a function `binary_search(arr, target)` that performs binary search on a sorted list `arr` to find the index of `target`. If `target` is not present, return `-1`.
   - **Example**:
     ```python
     arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
     target = 6
     # Output: 5 (index of 6 in arr)
     ```
    - Implement using `for` loop
    - Implement again using recursion

2. **Exercise 5: Finding Smallest Element**
   - **Problem**: Modify the binary search function to find the smallest element in a rotated sorted array `rotated_arr`.
   - **Example**:
     ```python
     rotated_arr = [4, 5, 6, 7, 0, 1, 2]
     # Output: 0 (smallest element in rotated_arr)
     ```

3. **Exercise 6: Counting Elements**
   - **Problem**: Write a function `count_binary_search(arr, target)` that counts the number of occurrences of `target` in a sorted list `arr` using binary search.
   - **Example**:
     ```python
     arr = [1, 2, 2, 2, 3, 4, 5, 5, 6]
     target = 2
     # Output: 3 (number of times 2 appears in arr)