# Data Structures and Algorithm

Credits: [freeCodeCamp Data Structures and Algorithms in Python](https://www.youtube.com/watch?v=pkYVOmU3MgA)

## Why You Should Learn Data Structures and Algorithms

Whether you're pursuing a career in software development or data science, it's almost certain that you'll be asked to solve programming problems like *reversing a linked list* or *balancing a binary tree* in a technical interview or coding assessment.

It's well known, however, that you will almost never face these problems in your job as a software developer. So it's reasonable to wonder why such problems are asked in interviews and coding assessments. Solving programming problems demonstrates the following traits:

1. You can **think about a problem systematically** and solve it systematically step-by-step.
2. You can **envision different inputs, outputs, and edge cases** for programs you write.
3. You can **communicate your ideas clearly** to co-workers and incorporate their suggestions.
4. Most importantly, you can **convert your thoughts and ideas into working code** that's also readable.

It's not your knowledge of specific data structures or algorithms that's tested in an interview, but your approach towards the problem. You may fail to solve the problem and still clear the interview or vice versa. In this course, you will learn the skills to both solve problems and clear interviews successfully.

## Problem 

**QUESTION 1:** Alice has some cards with numbers written on them. She arranges the cards in decreasing order, and lays them out face down in a sequence on a table. She challenges Bob to pick out the card containing a given number by turning over as few cards as possible. Write a function to help Bob locate the card.

<img src="https://i.imgur.com/mazym6s.png" width="480">

## Solution

### 1. Problem (P) statement, the input (I/) & the output (/O) formats, and signature of function.

**P**
> Find the position of a given card (number) with as few turnings/attempts (access) of the set of cards. 

**I/**
1. `cards`: list of cards sorted in decreasing order. E.g. `[5,4,3,2,1]`
2. `query`: card number whose position needs to be determined. E.g `3` 

**/O**
1. `position`: the position of `query` in the list `cards`. E.g `2` in above case (counting from `0`)

In [1]:
def get_card_position(cards, query):
    pass

### 2. Example inputs & outputs with all edge cases.

In [14]:
tests = []

# somewhere in middle
tests.append({
    'input': {'cards': [13, 11, 10, 7, 4, 3, 1, 0], 'query': 7}, 'output': 3
})

# query is the first element
tests.append({
    'input': {'cards': [4, 2, 1, -1], 'query': 4}, 'output': 0
})

# query is the last element
tests.append({
    'input': {'cards': [3, -1, -9, -127], 'query': -127}, 'output': 3
})

# cards contains just one element, query
tests.append({
    'input': {'cards': [6], 'query': 6}, 'output': 0
})

# cards does not contain query 
tests.append({
    'input': {'cards': [9, 7, 5, 2, -9], 'query': 4}, 'output': -1
})

# cards is empty
tests.append({
    'input': {'cards': [],'query': 7}, 'output': -1
})

# numbers can repeat in cards
tests.append({
    'input': {'cards': [8, 8, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0], 'query': 3}, 'output': 7
})

# query occurs multiple times
tests.append({
    'input': {'cards': [8, 8, 6, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0], 'query': 6}, 'output': 2
})

In [63]:
# testing
def test_func(tests, search_func):
    for i in range(len(tests)):
        correct = search_func(**tests[i]['input']) == tests[i]['output']
        if not correct:
            print('Failed test:', i+1)
            print(tests[i])
            print('Search func output:', search_func(**tests[i]['input']))
        # else:
        #     print('Passes test:', i+1)

### 3. Come up with a correct solution for the problem. State it in plain English.

1. Iterate over the list of cards.
3. Check whether the number at index `position` in `card` equals `query`.
4. If it does, `position` is the answer and can be returned from the function
5. If not, increment the value of `position` by 1, and repeat steps 2 to 5 till we reach the last position.
6. If the number was not found, return `-1`.

> **Linear Search Algorithm**: searching through a list in a linear fashion i.e. element after element.

###  4. Implement the solution and test it using example inputs. Fix bugs, if any.

In [64]:
def linear_search(cards, query):
    for position in range(len(cards)):
        if cards[position] == query:
            return position
    return -1

test_func(tests, linear_search)

### 5. Analyze the algorithm's complexity and identify inefficiencies, if any.

Linear serach complexity: Time $O(N)$, Space $O(1)$

### 6. Apply the right technique to overcome the inefficiency. Repeat steps 3 to 6.

#### **Step 3.**

Make use of the cards being sorted -> **Binary search**

Here's how binary search can be applied to our problem:

1. Find the middle element of the list.
2. If it matches queried number, return the middle position as the answer.
3. If it is less than the queried number, then search the first half of the list
3. If it is greater than the queried number, then search the second half of the list
4. If no more elements remain, return -1.

#### **Step 4.**

In [145]:
# helper function for case with repeated entries
def test_location(cards, query, mid):
    mid_num = cards[mid]
    if mid_num == query:
        if mid-1 >=0 and cards[mid-1] == query:
            return 'left'
        else:
            return 'found'
    elif mid_num < query:
        return 'left'
    else:
        return 'right'

def binary_search(cards, query):
    low, high = 0, len(cards)-1

    while low <= high:
        mid = (low+high) // 2
        move = test_location(cards, query, mid)

        if move == 'found':
            return mid
        elif move == 'left':
            high = mid-1
        elif move == 'right':
            low = mid+1

    return -1 

test_func(tests, binary_search)

#### **Step 5.**

Count the number of iterations in the algorithm. If we start out with an array of N elements, then each time the size of the array reduces to half for the next iteration, until we are left with just 1 element.

Initial length - `N`

Iteration 1 - `N/2`

Iteration 2 - `N/4` i.e. `N/2^2`

Iteration 3 - `N/8` i.e. `N/2^3`

...

Iteration k - `N/2^k`


Since the final length of the array is 1, we can find the 

`N/2^k = 1`

Rearranging the terms, we get

`N = 2^k`

Taking the logarithm

`k = log N`

Where `log` refers to log to the base 2. Therefore, our algorithm has the time complexity **O(log N)**. This fact is often stated as: binary search _runs_ in logarithmic time. You can verify that the space complexity of binary search is **O(1)**.

Binary serach complexity: Time $O(log_2 N + 1)$ or simply $O(log_2 N)$, Space $O(1)$

#### Linear vs Binary search

In [147]:
max_num = 10000000
query = 2
large_test = [{
    'input': {'cards': list(range(max_num, 0, -1)), 'query': query}, 
    'output': max_num - query
}]

In [148]:
%%time
test_func(large_test, linear_search)

CPU times: user 548 ms, sys: 0 ns, total: 548 ms
Wall time: 546 ms


In [149]:
%%time
test_func(large_test, binary_search)

CPU times: user 27 µs, sys: 0 ns, total: 27 µs
Wall time: 30.3 µs


As the size of the input grows larger, the difference only gets bigger. For a list 10 times, the size, linear search would run for 10 times longer, whereas binary search would only require 3 additional operations! (can you verify this?) That's the real difference between the complexities **O(N)** and **O(log N)**.

Another way to look at it is that binary search runs  `c * N / log N` times faster than linear search, for some fixed constant `c`. Since `log N` grows very slowly compared to `N`, the difference gets larger with the size of the input. Here's a graph showing how the comparing common functions for running time of algorithms:

<img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NR3M1nw8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/z4bbf8o1ly77wmkjdgge.png" width="480">

Do you see now why we ignore constants and lower order terms while expressing the complexity using the Big O notation?