## Data Structures & Searching Algorithms from `geeksforgeeks.org`

Check webpages:

* Data Structures: https://www.geeksforgeeks.org/python-data-structures/

* Searching Algorithms: https://www.geeksforgeeks.org/searching-algorithms/?ref=lbp

### Searching

Searching is the process of locating a specific element or item within a collection of data. This collection of data can take various forms, such as arrays, lists, trees or other structured representations. 

The primary objective of searching is to determine whether the desired element exists within the data, and if so, to identify its precise location or retrieve it.

#### Linear Search

Linear search is defined as a *sequential* search algorithm that starts at one end and goes through each element of a list until the desired element is found, otherwise the search continues until the end of the data set.

![](https://media.geeksforgeeks.org/wp-content/cdn-uploads/Linear-Search.png)

**Time Complexity**:

* Best case: $O(1)$ if element is at the first index.

* Worst case: $O(N)$ if element is at the last index.

* Average case: $O(N)$

In [1]:
# Linear search
def linear_search(arr, N, x):

    for i in range(N):

        if arr[i] == x:
            return i
        
    return -1

In [2]:
# Example
arr = [2, 3, 4, 10, 40]
x = 10
N = len(arr)

result = linear_search(arr, N, x)

if result == -1:
    print('Element is not present in array')
else:
    print(f'Element is present at index {result}')

Element is present at index 3


#### Binary Search

Binary search is defined as searching algorithm used in a *sorted array* by **repeatedly dividing the search interval in half**. The idea of binary search is to use the information that the array is sorted and reduce the time complexity to $O(log\ N)$.

![](https://media.geeksforgeeks.org/wp-content/uploads/20220309171621/BinarySearch.png)

* **Conditions**:

    * The data structure must be sorted

    * Access to any element of the data structure takes constant time

* **Algorithm**:

    * Divide the search space into two halves by *finding the middle index*:

        $$
        \text{middle index} = \text{low} + \frac{(\text{high} - \text{low})}{2}
        $$

    * If the key is found at the middle element, then finish.

    * If the key is smaller than the middle element, then use the left side for next search

    * If the key is larger than the middle element, then use the right side for next search

    * This process continues until the key is found or the total search space is exhausted

* **Implementation**:

    * Iterative

    * Recursive

* **Iterative Binary Search**:

    - **Time Complexity**: $O(log\ N)$

    - **Auxiliary Space**: $O(1)$

In [7]:
# Iterative Binary Search
def iter_binary_search(arr, low, high, x):

    while low <= high:

        # Find middle index
        mid = low + (high - low)//2

        # Check if x is at mid
        if arr[mid] == x:
            
            return mid
        
        elif arr[mid] < x: # if x is greater
            
            low = mid + 1

        else: 
            high = mid - 1 # if x is smaller

    return -1

In [10]:
# Example
arr = [2, 3, 4, 10, 40]
x = 10

result = iter_binary_search(arr, 0, len(arr)-1, x)

if result != -1:
    print('Element is present at index', result)
else:
    print('Element is not in array')

Element is present at index 3


* **Recursive Binary Search**:

    - **Time Complexity**:

        * Best case: $O(1)$

        * Average case: $O(log\ N)$

        * Worst case: $O(log\ N)$

    - **Auxiliary Space**: $O(1)$. If the recursive call stack is considered, then the auxiliary space will be $O(log\ N)$

In [9]:
# Recursive Binary Search
def recurs_binary_search(arr, low, high, x):

    # Check base case:
    if high >= low:

        mid = low + (high - low)//2

        # Check if x is at the mid
        if arr[mid] == x:
            
            return mid
        
        elif arr[mid] > x: # if x is greater

            return recurs_binary_search(arr, low, mid-1, x)
        
        else: # if x is smaller

            return recurs_binary_search(arr, mid+1, high, x)
    
    else:
        return -1

In [11]:
# Example
arr = [2, 3, 4, 10, 40]
x = 10

result = recurs_binary_search(arr, 0, len(arr)-1, x)

if result != -1:
    print('Element is at index', result)
else:
    print('Element is not in array')

Element is at index 3


### Sorting

Sorting refers to rearrange an array or list of elements according to a comparison operator on the elements. The comparison operator is used to decide the new order of elements in the respective data structure. Sorting means **reordering** of all the elements either in ascending or in descending order.

* *In-place Sorting*: An in-place sorting algorithm **uses constant space** for producing the output (modifies the given array only). It sorts the list only by modifying the order of the elements within the list

* *Internal Sorting*: Internal sorting is when all **the data is placed in the main memory or internal memory**. In internal sorting, the problem cannot take input beyond its size.

* *External Sorting*: External sorting is when all **the data that needs to be sorted cannot be placed in memory at a time**. External sorting is ued for the massive amount of data.

* *Stability*: A sorting algorithm is said to be stable if the relative order of equal elements is preserved after sorting. This is important in certain applications where the original order of equal elements must be maintained.

**Types of sorting algorithms**:

The following two types of sorting algorithms can be broadly classified:

* Comparison-based

* Non-comparison-based

![](https://media.geeksforgeeks.org/wp-content/uploads/20220916131621/SortingTYPE.png)

**Comparison of complexity analysis**:

| Name               | Best case | Average case | Worst case | Memory   | Stable | Method used  |
|--------------------|-----------|--------------|------------|----------|--------|--------------|
| Quick sort         | $nlog\ n$ | $nlog\ n$    | $n^2$      | $log\ n$ | No     | Partitioning |
| Merge sort         | $nlog\ n$ | $nlog\ n$    | $nlog\ n$  | $n$      | Yes    | Merging      |
| Heap sort          | $nlog\ n$ | $nlog\ n$    | $nlog\ n$  | $1$      | No     | Selection    |
| Insertion sort     | $n$       | $n^2$        | $n^2$      | $1$      | Yes    | Insertion    |
| Selection sort     | $n^2$     | $n^2$        | $n^2$      | $1$      | No     | Selection    |
| Bubble sort        | $n$       | $n^2$        | $n^2$      | $1$      | Yes    | Exchanging   |


#### QuickSort

QuickSort is a sorting algorithm based on the *Divide and Conquer* algorithm that picks an element as a pivot and partitions the given array around the picked pivot by placing the pivot in its correct position in the sorted array.

![](https://www.geeksforgeeks.org/wp-content/uploads/gq/2014/01/QuickSort2.png)

QuickSort uses partitions to place the pivot at its correct position in the sorted array and put all smaller elements to the left of the pivot, and all greater elements to the right of the pivot.

**Choice of Pivot**:

* Always pick the first element as a pivot

* Always pick the last element as a pivot

* Pick a random element as a pivot

* Pick the middle element as a pivot

**Partition algorithm**:

* Start from the leftmost element and keep track of the index of smaller or equal elements as $i$.

* While traversing, if we find a smaller element, we swap the current element with `arr[i]`. Otherwise, we ignore the current element.

In [1]:
# Partition the array
def partition(arr, low, high):

    # Choose the rightmost element as pivot
    pivot = arr[high]

    # Pointer for greater element
    i = low - 1

    # Traverse the array and compare with pivot
    for j in range(low, high):
        
        # If element is smaller
        if arr[j] <= pivot:
            i += 1

            # Swap
            arr[i], arr[j] = arr[j], arr[i]

    # Swap pivot with greater element in i
    arr[i+1], arr[high] = arr[high], arr[i+1]

    return i + 1

In [3]:
# QuickSort
def quick_sort(arr, low, high):

    if low < high:
        
        # Find pivot index
        pivot_ind = partition(arr, low, high)

        # Recursive call on left part of pivot
        quick_sort(arr, low, pivot_ind - 1)

        # Recursive call on right part of pivot
        quick_sort(arr, pivot_ind + 1, high)


In [5]:
# Example
array = [10, 7, 8, 9, 1, 5]
n = len(array)

quick_sort(array, 0, n - 1)

print('Sorted array:', array)

Sorted array: [1, 5, 7, 8, 9, 10]


#### MergeSort

MergeSort is a sorting algorithm that follows the *Divide and Conquer* approach. It works by recursively dividing the input array into smaller subarrays and sorting those subarrays then merging them back together to obtained the final sorted array.

![](https://media.geeksforgeeks.org/wp-content/uploads/20230706153706/Merge-Sort-Algorithm-(1).png)

**Algorithm**:

* Divide the list/array recursively into two halves until it cannot be divided more

* Sort each subarray individually using the merge-sort algorithm

* Merge back together the sorted arrays in sorted order. The process continues until all elements from both subarrays have been merged.

In [6]:
# Merge two arrays
def merge(arr1, arr2):

    # Initialise
    i = 0
    j = 0
    result = []

    while i < len(arr1) and j < len(arr2):

        if arr2[j] < arr1[i]:

            result.append(arr1[i])
            i += 1
        else:
            result.append(arr2[j])
            j += 1
    
    while i < len(arr1):

        result.append(arr1[i])
        i += 1

    while j < len(arr2):
        result.append(arr2[j])
        j += 1

    return result

In [7]:
# MergeSort
def merge_sort(arr):

    if len(arr) <= 1:
        return arr

    mid = len(arr)//2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])

    return merge(left, right)

In [10]:
# Example
arr = [12, 11, 13, 5, 6, 7]

sorted_arr = merge_sort(arr)

print('Sorted array:', sorted_arr)

Sorted array: [13, 12, 11, 7, 6, 5]


### Exercises

#### Find the missing and repeating number

Given an unsorted array of size $n$. Array elements are in the range of 1 to $n$. One number from set $\{1, 2, \dots, n\}$ is missing and one number occurs twice in the array. Find these two numbers:

**Approach #1**: Use count array

1. Create a temp array `temp[]` of size $n$ with all initial values as 0

2. Traverse the input array `arr[]`, and do the following for each `arr[i]`

    * if `temp[arr[i]-1] == 0`, set it to 1

    * if `== 1`, the number is repeating and output `arr[i]`

3. Traverse `temp[]` and output `i + 1` when element in array is 0, which is the missing number.

**Time Complexity**: $O(n)$

**Auxiliary Space**: $O(n)$

In [12]:
def find_two_elems(arr):

    n = len(arr)
    temp = [0] * n
    repeated_num = -1
    missing_num = -1

    for i in range(n):
        temp[arr[i]-1] += 1

        if temp[arr[i]-1] > 1:
            repeated_num = arr[i]

    for i in range(n):
        if temp[i] == 0:
            missing_num = i + 1
            break

    print('The repeated number is:', repeated_num)
    print('The missing number is:', missing_num)

In [13]:
# Example
arr = [7, 3, 4, 5, 5, 6, 2]
find_two_elems(arr)

The repeated number is: 5
The missing number is: 1


**Approach #2**: Use elements as index and mark visited ones

1. Traverse the array and use the absolute value of every element as an index

2. Make the value at this index negative to mark it visited

    * If an element is negative, then this is the repeated element

    * If there's a positive value when traversing the array again, this is the missing element

**Time Complexity**: $O(n)$

**Auxiliary Space**: $O(1)$

In [14]:
def find_two_elems_marking(arr, size):

    for i in range(size):
        if arr[abs(arr[i])-1] > 0:
            arr[abs(arr[i])-1] = - arr[abs(arr[i])-1]
        else:
            print('The repeated number is:', abs(arr[i]))
    
    for i in range(size):
        if arr[i] > 0:
            print('The missing element is:', i + 1)

    print(arr)

In [16]:
# Example
arr = [7, 3, 4, 5, 5, 6, 2]
n = len(arr)
find_two_elems_marking(arr, n)

The repeated number is: 5
The missing element is: 1
[7, -3, -4, -5, -5, -6, -2]


#### Two elements whose sum is closest to zero

An array of integers is given (both negative and positive). You need to find the two elements such that their sum is closest to zero.

* **Method #1**: Simple

For each element, find the sum of it with every other element in the array and compare sums. Finally, return the minimum sum.

**Time Complexity**: $O(n^2)$

**Auxiliary Space**: $O(1)$

In [19]:
def min_sum_pairs(arr, size):

    # Initialise values
    ind_1 = 0
    ind_2 = 1
    min_sum = arr[0] + arr[1]

    for ind_1 in range(size-1):
        for ind_2 in range(ind_1+1, size):

            temp_sum = arr[ind_1] + arr[ind_2]

            if abs(min_sum) > abs(temp_sum):
                min_sum = temp_sum
                ind_1 = ind_1
                ind_2 = ind_2
        
    print('The two elements whose sum is closest to zero are:')
    print(arr[ind_1], 'and', arr[ind_2])
    print('The sum is:', arr[ind_1] + arr[ind_2])


In [20]:
# Example
arr = [1, 60, -10, 70, -80, 85]

min_sum_pairs(arr, len(arr))

The two elements whose sum is closest to zero are:
-80 and 85
The sum is: 5


* **Method #2**: Use sorting

1. Sort all the elements in the input array

2. Traverse the array from left and right ends using two index variables: left = $0$ and right = $n - 1$

3. `sum = arr[left] + arr[right]`

    * If sum is negative, then increase left

    * If sum is positive, then reduce right

    * Keep track of the absolute min sum

7. Repeat step 3 while left < right

**Time Complexity**: 

* Sorting: $O(nlog\ n)$ 

* Finding optimum pair: $O(n)$

* Total: $O(nlog\ n)$

**Auxiliary Space**: $O(1)$

In [24]:
# Partition for sorting
def partition(arr, start_ind, end_ind):

    x = arr[end_ind]
    i = start_ind - 1

    for j in range(start_ind, end_ind):

        if arr[j] <= x:
            i += 1
            arr[i], arr[j] = arr[j], arr[i]

    arr[i+1], arr[end_ind] = arr[end_ind], arr[i+1]

    return i + 1

# Apply quick sort algorithm
def quick_sort(arr, start_ind, end_ind):

    # Initialise partition index
    part_ind = 0

    if start_ind < end_ind:

        part_ind = partition(arr, start_ind, end_ind)
        quick_sort(arr, start_ind, part_ind-1)
        quick_sort(arr, part_ind+1, end_ind)

# Find minimum sum
def min_sum_sorting(arr, size):

    # Initialise
    temp_sum, min_sum = 0, 10**9

    left = 0
    right = size - 1

    # Keep track of min left and min right
    min_left = left
    min_right = size - 1

    # Sort array
    quick_sort(arr, left, right)

    while left < right:

        temp_sum = arr[left] + arr[right]

        # Check if abs sum is < min_sum
        if abs(temp_sum) < abs(min_sum):
            min_sum = temp_sum
            min_left = left
            min_right = right
        
        # Update indexes accordingly
        if temp_sum < 0:
            left += 1
        else:
            right -= 1

    print('The two elements whose sum is closest to zero are:')
    print(arr[min_left], 'and', arr[min_right])
    print('The sum is:', arr[min_left] + arr[min_right])

In [25]:
# Example
arr = [1, 60, -10, 70, -80, 85]
n = len(arr)

min_sum_sorting(arr, n)

The two elements whose sum is closest to zero are:
-80 and 85
The sum is: 5
