## Sorting boot camp

It's important to know how to use effectively the sort functionality provided by your language's library. Let's say we are given a student class that implements a compare method that compares students by name. Then by default, the array sort library routine will sort by name. To sort an array of students by GPA, we have to explicitly specify the compare function to the sort routine. 

In [5]:
class Student(object):
    def __init__(self, name: str, grade_point_average: float) -> None:
        self.name = name
        self.grade_point_average = grade_point_average
        
    def __lt__(self, other: 'Student') -> bool:
        return self.name < other.name

In [6]:
students = [
    Student('A', 4.0),
    Student('B', 3.0),
    Student('C', 2.0),
    Student('D', 3.2)
]

In [7]:
# Sort acoording to __it__ defined in Student, students remained unchanged. 
students_sort_by_name = sorted(students)

In [8]:
print(students_sort_by_name)

[<__main__.Student object at 0x10bc8f780>, <__main__.Student object at 0x10bc8f5f8>, <__main__.Student object at 0x10bc8f630>, <__main__.Student object at 0x10bc8f668>]


In [10]:
for i in students_sort_by_name:
    print(i.name, i.grade_point_average)

A 4.0
B 3.0
C 2.0
D 3.2


In [11]:
# Sort students in-place by grade_point_average
students.sort(key = lambda student: student.grade_point_average)

In [12]:
for i in students:
    print(i.name, i.grade_point_average)

C 2.0
B 3.0
D 3.2
A 4.0


The time complexity of any reasonable library sort is O(n logn) for an array with n entries. Most library sorting functions are based on quicksort, which has O(1) space complexity. 

## 13.1 Compute the intersection of two sorted arrays 

Write a program which takes as input two sorted arrays, and returns a new array containing elements that are present in both of the input arrays. The input arrays may have duplicate entries but the retunred array should be free of duplicates. For example, the input is <2,3,3,5,5,6,7,7,8,12> and <5,5,6,8,8,9,10,10>, your output should be <5,6,8>.

**Sol:** The brute-force algorithm is a "loop join", i.e., traversing through all the elements of one array and comparing them to the elements of the other array. Let m and n be the lengths of the two input arrays. The brute-force algorithm has O(mn) time complexity. 

In [13]:
def intersect_two_sorted_arrays(A : list, B : list) -> list:
    return [a for i, a in enumerate(A) if (i == 0 or a!= A[i-1]) and a in B]

In [14]:
A = [2,3,3,5,5,6,7,7,8,12]
B = [5,5,6,8,8,9,10,10]
intersect_two_sorted_arrays(A, B)

[5, 6, 8]

In [16]:
C = [a for a in A if a in B]

In [17]:
print(C)

[5, 5, 6, 8]


Since both the arrays are sorted, we can make some optimizations. First, we can iterate through the first array and use binary search in array to test if the element is present in the second array. The time complexity is O(m log n), where m is the length of the array being iterated over. We can further improve our run time by choosing the shorter array for the outer loop since if n is much smaller than m, then n log(m) is much smaller than m log(n). 

In [20]:
import bisect

In [30]:
def intersect_two_sorted_arrays_2(A: list, B: list) -> list:
    def is_present(k):
        i = bisect.bisect_left(B,k)
        return i < len(B) and B[i] == k
    
    return [
        a for i, a in enumerate(A)
        if (i == 0 or a!= A[i-1]) and is_present(a)
    ]

In [31]:
intersect_two_sorted_arrays_2(A, B)

[5, 6, 8]

In [23]:
bisect.bisect_left(B,4)

0

In [24]:
bisect.bisect_left(B,5)

0

In [25]:
print(B)

[5, 5, 6, 8, 8, 9, 10, 10]


In [26]:
bisect.bisect_left(B,6)

2

In [33]:
bisect.bisect_left(B,11)

8

In [34]:
len(B)

8

This is the best solution if one set is much smaller than the other. However, it is not the best when the array lengths are similar becuase we are not exploiting the fact that both arrays are sorted. We can achieve linear runtime by simultaneously advancing through the two input arrays in increasing order. At each iteraton, if the array elements differ, the smaller one can be eliminated. If they are equal, we add that value to the intersection and advance both. (We handle duplicates by comparing the current lement with the previous one.)

In [35]:
def intersect_two_sorted_arrays_3(A : list, B : list) -> list:
    i, j, intersection_A_B = 0, 0, []
    while i < len(A) and j < len(B):
        if A[i] == B[j]:
            if i == 0 or A[i] != A[i-1]:
                intersection_A_B.append(A[i])
            i,j = i+1, j+1
        elif A[i] < B[j]:
            i += 1
        else: # A[i] > B[j]
            j += 1
    return intersection_A_B

In [36]:
intersect_two_sorted_arrays_3(A, B)

[5, 6, 8]

Since we spend O(1) time per input array element, the time complexity for the entire algorithm is O(m+n). 

## 13.2 Merge two sorted array

Write a program which takes as input two sorted arrays of integers, and updates the first to the combined entries of the two arrays in sorted order. Assume the first array has enough empty entries at its end to hold the result. 

In [45]:
def merge_two_sorted_array(A: list, m: int, B: list, n: int) -> None:
    a, b, write_idx = m-1, n-1, m+n -1
    while a >= 0 and b>= 0:
        if A[a] > B[b]: 
            A[write_idx] = A[a]
            write_idx -= 1
            a -= 1
        else:
            A[write_idx] = B[b]
            write_idx -= 1
            b -= 1
    while b >= 0: # fill up second array
        A[write_idx] = B[b]
        write_idx -= 1
        b -= 1

In [39]:
A = [3,13,17,None,None,None,None,None]

In [40]:
print(A)

[3, 13, 17, None, None, None, None, None]


In [41]:
B = [3,7,11,19]

In [43]:
merge_two_sorted_array(A,3,B,4)

In [44]:
print(A)

[3, 3, 7, 11, 13, 17, 19, None]


The time complexity is O(m + n) and the additional space complexity is O(1). 

## 13.3 Computing the h-index

Given an array of positive integers, find the largest h such taht there are at least h entries in the array that are greater than or equal to h. 

In [46]:
def h_index(citations: list) -> int:
    citations.sort()
    n = len(citations)
    for i, c in enumerate(citations):
        if c >= n -i:
            return n- i
    return 0

In [47]:
citations = [1,4,1,4,2,1,3,5,6]
h_index(citations)

4

In [48]:
citations.sort()

In [49]:
citations

[1, 1, 1, 2, 3, 4, 4, 5, 6]

In [50]:
n = len(citations)

In [51]:
n

9

In [52]:
for i, c in enumerate(citations):
    print(i,c)
    print(n-i)
    print(c >= n-i)

0 1
9
False
1 1
8
False
2 1
7
False
3 2
6
False
4 3
5
False
5 4
4
True
6 4
3
True
7 5
2
True
8 6
1
True


The time complexity is O(n log n) for sorting and the space complexity is O(1).

## 13.4 Remove first-name duplicates

Design an efficient algorithm for removing all first-name duplicates from an array. For example, if the input is <(Ian, Botham), (David,Gower), (Ian,Bell),(Ian,Chappel)>, one result chould be <(Ian,Bell),(David,Gower)>, <(david,Gower),(Ian,Botham)> would also be acceptable. 

**Hint:** Brings equal items close together. 

In [53]:
class Name:
    def __init__(self, first_name: str, last_name: str) -> None:
        self.first_name, self.last_name = first_name, last_name
        
    def __eq__(self, other) -> bool:
        return self.first_name == other.first_name
    
    def __lt__(self, other) -> bool:
        return (self.first_name < other.first_name
               if self.first_name != other.first_name else
               self.last_name < other.last_name)
    
def eliminate_duplicate(A: list) -> None:
    A.sort() # Makes identical elements become neighbors 
    write_idx = 1
    for cand in A[1:]:
        if cand != A[write_idx - 1]:
            A[write_idx] = cand
            write_idx += 1
    del A[write_idx:]

In [55]:
A = Name('Ian', 'Botham')
B = Name('David', 'Gower')
C = Name('Ian', 'Bell')
D = Name('Ian', 'Chappell')
E = Name('David', 'Brown')
F = Name('Hui', 'Zhong')
G = Name('Bao', 'Xiao')
Name_list = [A,B,C,D,E,F,G]

In [56]:
eliminate_duplicate(Name_list)

In [57]:
for name in Name_list:
    print(name.first_name, name.last_name)

Bao Xiao
David Brown
Hui Zhong
Ian Bell


The time complexity is O(n log n). It uses O(1) additional space. 

## 13.5 Smallest nonconstructible value 

Given a set of coins, there are some amounts of change that you may not able to make with them, e.g., you cannot create a change amount greater than the sum of your coins. For example, if your coins are 1,1,1,1,1,5,10,25, then the smallest value of change which cannot be made is 21.

Write a program which takes an array of positive integers and returns the smallest number which is not to the sum of a subset of elements of the array. 

**Sol:** Suppose a collection of numbers can produce every value up to and including V, but not V+1. Consider the effect of adding a new element u to the collection. 
* If u <= V+1, we can still produce every value up to and including V+u and we cannot product V+u+1;
* On the other hand, if u > V+1, then even by adding u to the collection we cannot produce V+1. 

By sorting the array allows us to stop when we reach a value that is too large to help, since all subsequent values are at least as large as that value. Specifically, let M[i-1] be the maximum constructible amount from the first i elements of the sorted array. If the next array element x is greater than M[i-1]+1, M[i-1] is still the maximum constructible amount, so we stop and return M[i-1]+1 as the result. Otherwise, we set M[i] = M[i-1]+x and continue with element (i+1). 

In [70]:
def smallest_nonconstructible_value(A: list)-> int:
    max_constructible_value = 0
    for a in sorted(A):
        if a >= max_constructible_value +1:
            max_constructible_value += a
            print(a)
            print(max_constructible_value)
        else:
            return(max_constructible_value + 1)
    return(max_constructible_value + 1)

In [71]:
A = [12,2,1,15,2,4]
smallest_nonconstructible_value(A)

1
1
2
3


4

In [62]:
A.sort()

In [63]:
print(A)

[1, 2, 2, 4, 12, 15]


The time complexity as a function of n, the length of the array, is O(n log n) to sort and O(n) to iterate, i.e., O(n log n). 

## 13.6 Render a calender 

Consider the problem of designing an online calendaring application. One component of the design is to render the calendar, i.e., display it visually. 

Write a program that takes a set of events, and determines the maximum number of events that take place concurrently. 

**Hint:** Focus on endpoints. 

As we proceed through endpoints we can incrementally track the number of events taking place at that endpoint uising a counter. For each endpoint that is the start of an interval, we increment the counter by 1, and for each endpoint that is the end of an interval, we decrement the counter by 1. The maximum value attained by the counter is maximum number of overlappoing intervals. 

In [72]:
import collections

In [75]:
# Event is a tuple (start_time, end_time)
Event = collections.namedtuple('Event', ('start', 'finish'))

# Endpoint is a tuple (start_time, 0) or (end_time,1) so that if times are equal, start_time comes first
Endpoint = collections.namedtuple('Endpoint', ('time', 'is_not_start'))

def find_max_simultaneous_events(A: list) -> int:
    # Builds an array of all endpoints
    E = [
        p for event in A for p in (Endpoint(event.start, True),
                                   Endpoint(event.finish, False))
    ]
    print(E)
    # Sorts the endpoint array according to the time, breaking ties by putting start times before end times
    E.sort(key = lambda e:(e.time, not e.is_not_start))
    print(E)
    
    # Track the number of simultaneous events, record the maximum number of simultaneous events.
    max_num_simultaneous_events, num_simultaneous_events = 0, 0 
    for e in E:
        if e.is_not_start:
            num_simultaneous_events += 1
            max_num_simultaneous_events = max(max_num_simultaneous_events,
                                             num_simultaneous_events)
        else:
            num_simultaneous_events -= 1
    return max_num_simultaneous_events