## Sorting boot camp

It's important to know how to use effectively the sort functionality provided by your language's library. Let's say we are given a student class that implements a compare method that compares students by name. Then by default, the array sort library routine will sort by name. To sort an array of students by GPA, we have to explicitly specify the compare function to the sort routine. 

In [5]:
class Student(object):
    def __init__(self, name: str, grade_point_average: float) -> None:
        self.name = name
        self.grade_point_average = grade_point_average
        
    def __lt__(self, other: 'Student') -> bool:
        return self.name < other.name

In [6]:
students = [
    Student('A', 4.0),
    Student('B', 3.0),
    Student('C', 2.0),
    Student('D', 3.2)
]

In [7]:
# Sort acoording to __it__ defined in Student, students remained unchanged. 
students_sort_by_name = sorted(students)

In [8]:
print(students_sort_by_name)

[<__main__.Student object at 0x10bc8f780>, <__main__.Student object at 0x10bc8f5f8>, <__main__.Student object at 0x10bc8f630>, <__main__.Student object at 0x10bc8f668>]


In [10]:
for i in students_sort_by_name:
    print(i.name, i.grade_point_average)

A 4.0
B 3.0
C 2.0
D 3.2


In [11]:
# Sort students in-place by grade_point_average
students.sort(key = lambda student: student.grade_point_average)

In [12]:
for i in students:
    print(i.name, i.grade_point_average)

C 2.0
B 3.0
D 3.2
A 4.0


The time complexity of any reasonable library sort is O(n logn) for an array with n entries. Most library sorting functions are based on quicksort, which has O(1) space complexity. 

## 13.1 Compute the intersection of two sorted arrays 

Write a program which takes as input two sorted arrays, and returns a new array containing elements that are present in both of the input arrays. The input arrays may have duplicate entries but the retunred array should be free of duplicates. For example, the input is <2,3,3,5,5,6,7,7,8,12> and <5,5,6,8,8,9,10,10>, your output should be <5,6,8>.

**Sol:** The brute-force algorithm is a "loop join", i.e., traversing through all the elements of one array and comparing them to the elements of the other array. Let m and n be the lengths of the two input arrays. The brute-force algorithm has O(mn) time complexity. 

In [13]:
def intersect_two_sorted_arrays(A : list, B : list) -> list:
    return [a for i, a in enumerate(A) if (i == 0 or a!= A[i-1]) and a in B]

In [14]:
A = [2,3,3,5,5,6,7,7,8,12]
B = [5,5,6,8,8,9,10,10]
intersect_two_sorted_arrays(A, B)

[5, 6, 8]

In [16]:
C = [a for a in A if a in B]

In [17]:
print(C)

[5, 5, 6, 8]


Since both the arrays are sorted, we can make some optimizations. First, we can iterate through the first array and use binary search in array to test if the element is present in the second array. The time complexity is O(m log n), where m is the length of the array being iterated over. We can further improve our run time by choosing the shorter array for the outer loop since if n is much smaller than m, then n log(m) is much smaller than m log(n). 

In [20]:
import bisect

In [30]:
def intersect_two_sorted_arrays_2(A: list, B: list) -> list:
    def is_present(k):
        i = bisect.bisect_left(B,k)
        return i < len(B) and B[i] == k
    
    return [
        a for i, a in enumerate(A)
        if (i == 0 or a!= A[i-1]) and is_present(a)
    ]

In [31]:
intersect_two_sorted_arrays_2(A, B)

[5, 6, 8]

In [23]:
bisect.bisect_left(B,4)

0

In [24]:
bisect.bisect_left(B,5)

0

In [25]:
print(B)

[5, 5, 6, 8, 8, 9, 10, 10]


In [26]:
bisect.bisect_left(B,6)

2

In [33]:
bisect.bisect_left(B,11)

8

In [34]:
len(B)

8

This is the best solution if one set is much smaller than the other. However, it is not the best when the array lengths are similar becuase we are not exploiting the fact that both arrays are sorted. We can achieve linear runtime by simultaneously advancing through the two input arrays in increasing order. At each iteraton, if the array elements differ, the smaller one can be eliminated. If they are equal, we add that value to the intersection and advance both. (We handle duplicates by comparing the current lement with the previous one.)

In [35]:
def intersect_two_sorted_arrays_3(A : list, B : list) -> list:
    i, j, intersection_A_B = 0, 0, []
    while i < len(A) and j < len(B):
        if A[i] == B[j]:
            if i == 0 or A[i] != A[i-1]:
                intersection_A_B.append(A[i])
            i,j = i+1, j+1
        elif A[i] < B[j]:
            i += 1
        else: # A[i] > B[j]
            j += 1
    return intersection_A_B

In [36]:
intersect_two_sorted_arrays_3(A, B)

[5, 6, 8]

Since we spend O(1) time per input array element, the time complexity for the entire algorithm is O(m+n). 

## 13.2 Merge two sorted array

Write a program which takes as input two sorted arrays of integers, and updates the first to the combined entries of the two arrays in sorted order. Assume the first array has enough empty entries at its end to hold the result. 

In [45]:
def merge_two_sorted_array(A: list, m: int, B: list, n: int) -> None:
    a, b, write_idx = m-1, n-1, m+n -1
    while a >= 0 and b>= 0:
        if A[a] > B[b]: 
            A[write_idx] = A[a]
            write_idx -= 1
            a -= 1
        else:
            A[write_idx] = B[b]
            write_idx -= 1
            b -= 1
    while b >= 0: # fill up second array
        A[write_idx] = B[b]
        write_idx -= 1
        b -= 1

In [39]:
A = [3,13,17,None,None,None,None,None]

In [40]:
print(A)

[3, 13, 17, None, None, None, None, None]


In [41]:
B = [3,7,11,19]

In [43]:
merge_two_sorted_array(A,3,B,4)

In [44]:
print(A)

[3, 3, 7, 11, 13, 17, 19, None]


The time complexity is O(m + n) and the additional space complexity is O(1). 

## 13.3 Computing the h-index

Given an array of positive integers, find the largest h such taht there are at least h entries in the array that are greater than or equal to h. 

In [46]:
def h_index(citations: list) -> int:
    citations.sort()
    n = len(citations)
    for i, c in enumerate(citations):
        if c >= n -i:
            return n- i
    return 0

In [47]:
citations = [1,4,1,4,2,1,3,5,6]
h_index(citations)

4

In [48]:
citations.sort()

In [49]:
citations

[1, 1, 1, 2, 3, 4, 4, 5, 6]

In [50]:
n = len(citations)

In [51]:
n

9

In [52]:
for i, c in enumerate(citations):
    print(i,c)
    print(n-i)
    print(c >= n-i)

0 1
9
False
1 1
8
False
2 1
7
False
3 2
6
False
4 3
5
False
5 4
4
True
6 4
3
True
7 5
2
True
8 6
1
True


The time complexity is O(n log n) for sorting and the space complexity is O(1).

## 13.4 Remove first-name duplicates

Design an efficient algorithm for removing all first-name duplicates from an array. For example, if the input is <(Ian, Botham), (David,Gower), (Ian,Bell),(Ian,Chappel)>, one result chould be <(Ian,Bell),(David,Gower)>, <(david,Gower),(Ian,Botham)> would also be acceptable. 

**Hint:** Brings equal items close together. 

In [53]:
class Name:
    def __init__(self, first_name: str, last_name: str) -> None:
        self.first_name, self.last_name = first_name, last_name
        
    def __eq__(self, other) -> bool:
        return self.first_name == other.first_name
    
    def __lt__(self, other) -> bool:
        return (self.first_name < other.first_name
               if self.first_name != other.first_name else
               self.last_name < other.last_name)
    
def eliminate_duplicate(A: list) -> None:
    A.sort() # Makes identical elements become neighbors 
    write_idx = 1
    for cand in A[1:]:
        if cand != A[write_idx - 1]:
            A[write_idx] = cand
            write_idx += 1
    del A[write_idx:]

In [55]:
A = Name('Ian', 'Botham')
B = Name('David', 'Gower')
C = Name('Ian', 'Bell')
D = Name('Ian', 'Chappell')
E = Name('David', 'Brown')
F = Name('Hui', 'Zhong')
G = Name('Bao', 'Xiao')
Name_list = [A,B,C,D,E,F,G]

In [56]:
eliminate_duplicate(Name_list)

In [57]:
for name in Name_list:
    print(name.first_name, name.last_name)

Bao Xiao
David Brown
Hui Zhong
Ian Bell


The time complexity is O(n log n). It uses O(1) additional space. 

## 13.5 Smallest nonconstructible value 

Given a set of coins, there are some amounts of change that you may not able to make with them, e.g., you cannot create a change amount greater than the sum of your coins. For example, if your coins are 1,1,1,1,1,5,10,25, then the smallest value of change which cannot be made is 21.

Write a program which takes an array of positive integers and returns the smallest number which is not to the sum of a subset of elements of the array. 

**Sol:** Suppose a collection of numbers can produce every value up to and including V, but not V+1. Consider the effect of adding a new element u to the collection. 
* If u <= V+1, we can still produce every value up to and including V+u and we cannot product V+u+1;
* On the other hand, if u > V+1, then even by adding u to the collection we cannot produce V+1. 

By sorting the array allows us to stop when we reach a value that is too large to help, since all subsequent values are at least as large as that value. Specifically, let M[i-1] be the maximum constructible amount from the first i elements of the sorted array. If the next array element x is greater than M[i-1]+1, M[i-1] is still the maximum constructible amount, so we stop and return M[i-1]+1 as the result. Otherwise, we set M[i] = M[i-1]+x and continue with element (i+1). 

In [70]:
def smallest_nonconstructible_value(A: list)-> int:
    max_constructible_value = 0
    for a in sorted(A):
        if a >= max_constructible_value +1:
            max_constructible_value += a
            print(a)
            print(max_constructible_value)
        else:
            return(max_constructible_value + 1)
    return(max_constructible_value + 1)

In [71]:
A = [12,2,1,15,2,4]
smallest_nonconstructible_value(A)

1
1
2
3


4

In [62]:
A.sort()

In [63]:
print(A)

[1, 2, 2, 4, 12, 15]


The time complexity as a function of n, the length of the array, is O(n log n) to sort and O(n) to iterate, i.e., O(n log n). 

## 13.6 Render a calender 

Consider the problem of designing an online calendaring application. One component of the design is to render the calendar, i.e., display it visually. 

Write a program that takes a set of events, and determines the maximum number of events that take place concurrently. 

**Hint:** Focus on endpoints. 

As we proceed through endpoints we can incrementally track the number of events taking place at that endpoint uising a counter. For each endpoint that is the start of an interval, we increment the counter by 1, and for each endpoint that is the end of an interval, we decrement the counter by 1. The maximum value attained by the counter is maximum number of overlappoing intervals. 

In [72]:
import collections

In [75]:
# Event is a tuple (start_time, end_time)
Event = collections.namedtuple('Event', ('start', 'finish'))

# Endpoint is a tuple (start_time, 0) or (end_time,1) so that if times are equal, start_time comes first
Endpoint = collections.namedtuple('Endpoint', ('time', 'is_not_start'))

def find_max_simultaneous_events(A: list) -> int:
    # Builds an array of all endpoints
    E = [
        p for event in A for p in (Endpoint(event.start, True),
                                   Endpoint(event.finish, False))
    ]
    print(E)
    # Sorts the endpoint array according to the time, breaking ties by putting start times before end times
    E.sort(key = lambda e:(e.time, not e.is_not_start))
    print(E)
    
    # Track the number of simultaneous events, record the maximum number of simultaneous events.
    max_num_simultaneous_events, num_simultaneous_events = 0, 0 
    for e in E:
        if e.is_not_start:
            num_simultaneous_events += 1
            max_num_simultaneous_events = max(max_num_simultaneous_events,
                                             num_simultaneous_events)
        else:
            num_simultaneous_events -= 1
    return max_num_simultaneous_events

## 13.7 Merging Intervals 

Write a program which takes as input an array of disjoint closed intervals with integer endpoints sorted by increasing order of left endpoint, and an interval to be added, and returns the union of the intervals in the array and the added interval. You result should be expressed as a union of disjoint intervals sorted by left endpoints. 

The brute-force approach examines values that are not endpoints, which is wasteful, since if an integer point p is not an endpoint, it must lie in the same interval as p-1 does, A better approach is to focus on endpoints, and use the sorted property to quickly process intervals in the array. 

Specifically, processing an interval in the array takes place in three stages:
(1.) First, we iterate through intervals which appear completely before the interval to be added-- all these intervals are added directly to the result. 
(2.) As soon as we encounter an. interval that intersects the interval to be added, we compute its union with the interval to be added. This union is itself an interval. We iterate through subsequent intervals, as long as they intersect with the union we are forming. Ths single union is added to the result. 
(3.) Finally, we iterate through the remaining intervals. Because the array was originally sorted, none of these can intersect with the interval to be added, so we add these intervals to the result. 

In [1]:
import collections

In [14]:
Interval = collections.namedtuple('Interval', ('left', 'right'))

def add_interval(disjoint_intervals: list,
                 new_interval: Interval) -> list:
    i, result = 0, []
    
    # Processes intervals in disjoint_intervals which come before new_interval.
    while(i < len(disjoint_intervals)
          and new_interval.left > disjoint_intervals[i].right):
        result.append(disjoint_intervals[i])
        i += 1
    
    # Processes intervals in disjoint_intervals which overlap with new_interval
    while(i < len(disjoint_intervals)
          and  new_interval.right >= disjoint_intervals[i].left):
        new_interval = (Interval(
            min(new_interval.left, disjoint_intervals[i].left),
            max(new_interval.right, disjoint_intervals[i].right)))
        i += 1
    
    # Processes intervals in disjoint_intervals which come after new_interval
    return result + [new_interval] + disjoint_intervals[i:]

In [15]:
I1 = Interval(-4,-1)

In [16]:
I2 = Interval(0,2)
I3 = Interval(3,6)
I4 = Interval(7,9)
I5 = Interval(11,12)
I6 = Interval(14,17)
new_interval = Interval(1,8)

In [17]:
disjoint_intervals = [I1,I2,I3,I4,I5,I6]
add_interval(disjoint_intervals, new_interval)

[Interval(left=-4, right=-1),
 Interval(left=0, right=9),
 Interval(left=11, right=12),
 Interval(left=14, right=17)]

Since the program spends O(1) time per entry, its time complexity is O(n). 

## 13.8 Compute the union of intervals 

In this problem we consider sets of intervals with integer endpoints, the intervals may be open or closed at either end. We want to compute the union of the intervals in such sets.

Design an algorithm that takes as input set of intervals, and outputs their union expressed as a set of disjoint intervals. 

A faster approach is to process the intervals in sorted order, so that we can limit our attention to a subset of intervals as we proceed. Specifically, we begin by sorting the intervals on their left endpoints. The idea is that this allows us to have to revisit intervals which are entirely to the left of the interval currently being processed. 

As we iterate through the sorted array of intervals, we have the following cases:
* The interval most recently added to the result does not intersect the current interval, nor does its right endpoint equal the left endpoint of the current interval. In this case, we simply add the current interval to the end of the result array as a new interval. 
* The interval most recently added to the result intersects the current intreval. In this case, we update the most recently added interval to the union of it with the current interval.
* The interval most recently added to the result has its right endpoint equal to the left endpoint of the current interval, and one(or both) of these endpoints are closed. In this case too, we update the most recently added interval to the union of it with the current interval. 


When sorting, if two intervals have the same lfet endpoint, we put intervals which are left-cloed first. We break ties arbitrarily. 

In [18]:
Endpoint = collections.namedtuple('Endpoint', ('is_closed', 'val'))

Interval = collections.namedtuple('Interval', ('left', 'right'))

In [57]:
def union_of_intervals(intervals: list) -> list:
    # Empty input
    if not intervals:
        return []
    
    # Sort intervals according to left endpoints of intervals. For the same left endpoints, consider 
    # the left point is closed first. 
    intervals.sort(key = lambda i: (i.left.val, not i.left.is_closed))
    result = [intervals[0]]
    for i in intervals:
        if i.left.val > result[-1].right.val:
            result.append(i)
        elif (i.left.val < result[-1].right.val or 
        (i.left.val == result[-1].right.val and (i.left.is_closed or result[-1].right.is_closed))):
            if (i.right.val > result[-1].right.val or
               (i.right.val == result[-1].right.val and i.right.is_closed)):
                result[-1] = Interval(result[-1].left, i.right)
    return result 

In [28]:
End11 = Endpoint(False, 0)
End12 = Endpoint(False, 3)
I1 = Interval(End11,End12)

In [29]:
print(I1)

Interval(left=Endpoint(is_closed=False, val=0), right=Endpoint(is_closed=False, val=3))


In [30]:
for left, right in I1:
    print(left, right)

False 0
False 3


In [31]:
I1.left.val

0

In [32]:
I1.left.is_closed

False

In [34]:
End21 = Endpoint(True,3)
End22= Endpoint(False,4)
I2 = Interval(End21, End22)

In [35]:
End31 = Endpoint(True,2)
End32= Endpoint(True,4)
I3 = Interval(End31, End32)

In [36]:
End41 = Endpoint(True,5)
End42= Endpoint(False,7)
I4 = Interval(End41, End42)

In [37]:
End51 = Endpoint(True,7)
End52= Endpoint(False,8)
I5 = Interval(End51, End52)

In [38]:
End61 = Endpoint(True,8)
End62= Endpoint(False,11)
I6 = Interval(End61, End62)

In [39]:
End71 = Endpoint(False,9)
End72= Endpoint(True,11)
I7 = Interval(End71, End72)

In [40]:
End81 = Endpoint(True,12)
End82= Endpoint(True,14)
I8 = Interval(End81, End82)

In [41]:
End91 = Endpoint(False,12)
End92= Endpoint(True,16)
I9 = Interval(End91, End92)

In [42]:
End101 = Endpoint(False,13)
End102= Endpoint(False,15)
I10 = Interval(End101, End102)

In [43]:
End111 = Endpoint(False,16)
End112= Endpoint(False,17)
I11 = Interval(End111, End112)

In [45]:
inervals = [I1,I2, I3, I4, I5, I6, I7, I8, I9, I10, I11]

In [58]:
union_of_intervals(inervals) 
# always encounter the closed left end points first. This is the reason why we do the sorting in next line. 

[Interval(left=Endpoint(is_closed=False, val=0), right=Endpoint(is_closed=True, val=4)),
 Interval(left=Endpoint(is_closed=True, val=5), right=Endpoint(is_closed=True, val=11)),
 Interval(left=Endpoint(is_closed=True, val=12), right=Endpoint(is_closed=False, val=17))]

In [53]:
inervals.sort(key = lambda i: (i.left.val, not i.left.is_closed))

In [54]:
for i in inervals:
    print(i)

Interval(left=Endpoint(is_closed=False, val=0), right=Endpoint(is_closed=False, val=3))
Interval(left=Endpoint(is_closed=True, val=2), right=Endpoint(is_closed=True, val=4))
Interval(left=Endpoint(is_closed=True, val=3), right=Endpoint(is_closed=False, val=4))
Interval(left=Endpoint(is_closed=True, val=5), right=Endpoint(is_closed=False, val=7))
Interval(left=Endpoint(is_closed=True, val=7), right=Endpoint(is_closed=False, val=8))
Interval(left=Endpoint(is_closed=True, val=8), right=Endpoint(is_closed=False, val=11))
Interval(left=Endpoint(is_closed=False, val=9), right=Endpoint(is_closed=True, val=11))
Interval(left=Endpoint(is_closed=True, val=12), right=Endpoint(is_closed=True, val=14))
Interval(left=Endpoint(is_closed=False, val=12), right=Endpoint(is_closed=True, val=16))
Interval(left=Endpoint(is_closed=False, val=13), right=Endpoint(is_closed=False, val=15))
Interval(left=Endpoint(is_closed=False, val=16), right=Endpoint(is_closed=False, val=17))


The time complexity for sorting is O(n log n) and for merging is O(n) as in 13.7. The total time complexity is O(n log n). 

## 13.9 Partitioning and sorting an array with many repeated entries 

Suppose you need to reorder the elements of a very large array so that equal elements appear together. For example, if the array is <b,a,c,b,d,a,b,d> then <a,a,b,b,b,c,d,d>is an acceptable reordering, as is <d,d,,c,a,a,b,b,b>. 

You are given an array of student objects. Each student has an integer-valued age field that is to be treated as a key. Rearrange the elements of the array so that students of equl age appear together. The order in which different ages appear is not important. How would your solutaion change if ages have to appear in sorted order? 

**Sol:** The brute-force solution is to sort the array, comparing on age. If the array lenght is n, the time complexity is O(n log n) and space complexity is O(1). The inefficiency in this approach stems from the fact that it does more than it required -- the specification simply asks for students of equal age to be adjacent. 

We can iterate through the array and record the number of students of each age in a hash. Specifically, keys are ages, and values are the corresponding counts. If we had a new array to write to, we can write the two students of age 14 starting at index 0, and two students of age 12 starting at index 0+2 =2, the one student of age 11 at index 2+2 = 4, and the three students of age 13 starting at index 4+1 = 5. We would iterate the original array, and write each entry into a new array according to these offsets. For example, after the first four iterations, the new array would be <(Greg,14), , (John,12), ,(Andy,11),(Jim,13), , >. 

The time complexity of this approach is O(n), but it entails O(n) additional space for the result array.

In [59]:
Person = collections.namedtuple('Person', ('age', 'name'))

In [95]:
def group_by_age(people: list) -> None:
    age_to_count = collections.Counter((person.age for person in people))
    people_new = [None]*len(people)
    count = 0
    for person in people:
        people_new[count] = person
        count += age_to_count[person.age]
        count %= len(people)
    return people_new

In [115]:
P1 = Person(14, 'Greg')
P2 = Person(12, 'John')
P3 = Person(11, 'Andy')
P4 = Person(13, 'Jim')
P5 = Person(12, 'Phil')
P6 = Person(13, 'Bob')
P7 = Person(13, 'Chip')
P8 = Person(14, 'Tim')
people = [P1,P2,P3,P4,P5,P6,P7,P8]

In [116]:
age_to_count = collections.Counter((person.age for person in people))

In [117]:
print(age_to_count)

Counter({13: 3, 14: 2, 12: 2, 11: 1})


In [71]:
age_to_count[13]

3

In [96]:
group_by_age(people)

[Person(age=14, name='Tim'),
 None,
 Person(age=13, name='Bob'),
 None,
 Person(age=11, name='Andy'),
 Person(age=13, name='Chip'),
 None,
 None]

In [76]:
len(people)

8

In [77]:
people_new = [None]*8

In [78]:
people[0]

Person(age=14, name='Greg')

In [80]:
people_new[2]

We can avoid having to allocate a new array by performing the updates in-place. The idea is to maintain a subarray for each of the difference types of elements. Each subarray marks out entries which have not yet been assigned elements of that type. We swap elements across these subarrays to move them to their correct position. 
In the program below, we use two hash tables to track the subarrays. One is the starting offset of the subarray, the other its size. As soon as the subarray becomes empty, we remove it. 

In [97]:
def group_by_age_2(people: list) -> None:
    age_to_count = collections.Counter((person.age for person in people))
    age_to_offset, offset = {}, 0
    
    for age, count in age_to_count.items():
        age_to_offset[age] = offset
        offset += count 
    
    while age_to_offset:
        from_age = next(iter(age_to_offset))
        from_idx = age_to_offset[from_age]
        to_age = people[from_idx].age
        to_idx = age_to_offset[people[from_idx].age]
        people[from_idx], people[to_idx] = people[to_idx], people[from_idx]
        # Use age_to_count to see when we are finished with a particular age 
        age_to_count[to_age] -= 1
        if age_to_count[to_age]:
            age_to_offset[to_age] = to_idx + 1
        else:
            del age_to_offset[to_age]

In [98]:
group_by_age_2(people)

In [100]:
print(people)

[Person(age=14, name='Greg'), Person(age=14, name='Tim'), Person(age=12, name='John'), Person(age=12, name='Phil'), Person(age=11, name='Andy'), Person(age=13, name='Jim'), Person(age=13, name='Bob'), Person(age=13, name='Chip')]


In [118]:
age_to_offset, offset = {}, 0
for age, count in age_to_count.items():
    age_to_offset[age] = offset
    offset += count

In [119]:
print(age_to_offset)

{14: 0, 12: 2, 11: 4, 13: 5}


In [120]:
while age_to_offset:
    from_age = next(iter(age_to_offset))
    from_idx = age_to_offset[from_age]
    to_age = people[from_idx].age
    to_idx = age_to_offset[people[from_idx].age]
    print(from_age, from_idx)
    print(to_age, to_idx)
    people[from_idx], people[to_idx] = people[to_idx], people[from_idx]
    print(people)
    age_to_count[to_age] -= 1
    if age_to_count[to_age]:
        age_to_offset[to_age] = to_idx + 1
    else:
        del age_to_offset[to_age]

14 0
14 0
[Person(age=14, name='Greg'), Person(age=12, name='John'), Person(age=11, name='Andy'), Person(age=13, name='Jim'), Person(age=12, name='Phil'), Person(age=13, name='Bob'), Person(age=13, name='Chip'), Person(age=14, name='Tim')]
14 1
12 2
[Person(age=14, name='Greg'), Person(age=11, name='Andy'), Person(age=12, name='John'), Person(age=13, name='Jim'), Person(age=12, name='Phil'), Person(age=13, name='Bob'), Person(age=13, name='Chip'), Person(age=14, name='Tim')]
14 1
11 4
[Person(age=14, name='Greg'), Person(age=12, name='Phil'), Person(age=12, name='John'), Person(age=13, name='Jim'), Person(age=11, name='Andy'), Person(age=13, name='Bob'), Person(age=13, name='Chip'), Person(age=14, name='Tim')]
14 1
12 3
[Person(age=14, name='Greg'), Person(age=13, name='Jim'), Person(age=12, name='John'), Person(age=12, name='Phil'), Person(age=11, name='Andy'), Person(age=13, name='Bob'), Person(age=13, name='Chip'), Person(age=14, name='Tim')]
14 1
13 5
[Person(age=14, name='Greg'), 

The time complexity is O(n), since the first pass entials n hash table inserts, and the second pass spends O(1) time to move one element to its proper location. The additional space complexity is dictated by the hash table, i.e.,O(m), where m is the number of distinct ages. 

## 13.10 Team photo day -1 

You are a photographer for a soccer meet. You will be taking of pairs of opposing teams. All teams have the same number of players. A player in the back row must be taller than the player in front of him. All players in a row must be from the same team. 

Design an algorithm that takes as input two teams and the heights of the players in the teams and checks if it is possible to place players to take the photo subject to the placement constraints. 

In [133]:
class Team:
    Player = collections.namedtuple('Player', ('height'))
    
    def __init__(self, height: list) -> None:
        self._players = [Team.Player(h) for h in height]
    
    # check if team0 can be placed in front of team1.
    @staticmethod
    def valid_placement_exists(team0: 'Team', team1: 'Team') -> bool:
        return all(
            a<b
            for a,b in zip(sorted(team0._players), sorted(team1._players)))
        

In [122]:
h1 = [4,5,29,3,49,20,384,90,1,10,11]
h2 = [1,2,3,4,5,6,7,8,8,9,10]

In [123]:
team0 = Team(h1)

In [124]:
team1 = Team(h2)

In [134]:
Team.valid_placement_exists(team0, team1)

False

In [135]:
all(a<b for a,b in zip(sorted(h1), sorted(h2)))

False

In [136]:
h1.sort()

In [137]:
h1

[1, 3, 4, 5, 10, 11, 20, 29, 49, 90, 384]

In [138]:
h2.sort()

In [139]:
h2

[1, 2, 3, 4, 5, 6, 7, 8, 8, 9, 10]

In [141]:
all(b<=a for a,b in zip(sorted(h1), sorted(h2)))

True

In [142]:
for a,b in zip(sorted(h1), sorted(h2)):
    print(a,b)

1 1
3 2
4 3
5 4
10 5
11 6
20 7
29 8
49 8
90 9
384 10


The time complexity is that of sorting, i.e. O(n log n). 

## 13.11 Implement a fast sorting algorithm for lists 

Implement a routine which sorts lists efficiently. It should be a stable sort, i.e., the relative positions of equal elements must remain unchanged. 

**Hint:** In what respects are lists superior to arrays? 

**Sol:** The brute-force approach is to repeatedly delete the smallest element in the list and add it to the end of a new list. The time complexity is O(n^2) and the additional space complexity is O(n), where n is the number of nodes in the list. We can refine the simple algorithm to run in O(1) space by reordering the nodes, instead of creating new ones. 

In [143]:
class ListNode:
    def __init__(self, data = 0, next = None):
        self.data = data
        self.next = next 

In [145]:
def print_ListNode(L:ListNode) -> None:
    while L:
        print(L.data)
        L = L.next

In [151]:
def insertion_sort(L: ListNode) -> ListNode:
    dummy_head = ListNode(0,L)
    # The sublist consisting of nodes up to and including iter is sorted in 
    # increasing order. We need to ensure that after we move to L.next this 
    # property continues to hold. We do this by swapping L.next with its 
    # precdecessors in the list till it's in the right place.
    while L and L.next:
        if L.data > L.next.data:
            target, pre = L.next, dummy_head
            while pre.next.data < target.data:
                pre = pre.next
            temp, pre.next, L.next = pre.next, target, target.next
            target.next = temp
        else:
            L = L.next
        print('list node after every iteration')
        print_ListNode(dummy_head)
        
    return dummy_head.next

In [170]:
node1 = ListNode(1)
node2 = ListNode(33)
node3 = ListNode(2734)
node4 = ListNode(5)
node5 = ListNode(7)
node6 = ListNode(10)
node7 = ListNode(37)
node8 = ListNode(99)
node9 = ListNode(32)
node10 = ListNode(45)

In [171]:
node1.next = node2
node2.next = node3
node3.next = node4
node4.next = node5
node5.next = node6
node6.next = node7
node7.next = node8
node8.next = node9
node9.next = node10

In [154]:
insertion_sort(node1)

list node after every iteration
0
1
33
2734
5
7
10
37
99
32
45
list node after every iteration
0
1
33
2734
5
7
10
37
99
32
45
list node after every iteration
0
1
5
33
2734
7
10
37
99
32
45
list node after every iteration
0
1
5
7
33
2734
10
37
99
32
45
list node after every iteration
0
1
5
7
10
33
2734
37
99
32
45
list node after every iteration
0
1
5
7
10
33
37
2734
99
32
45
list node after every iteration
0
1
5
7
10
33
37
99
2734
32
45
list node after every iteration
0
1
5
7
10
32
33
37
99
2734
45
list node after every iteration
0
1
5
7
10
32
33
37
45
99
2734


<__main__.ListNode at 0x107f56438>

The time complexity is O(n^2), which corresponds to the case where the list is reverse-sorted to begin with. The space complexity is O(1). 

To improve on runtime, we can gain intitioni from considering arrays. Quicksort is the best all round sorting algorithm for arrays-- it runs in time O(n log n), and is in-place. However, it is not stable. Mergesort applied to arrays is a stable O(n log n) algorithm. However, it is not in-place, since there is no way to merge two sorted halves of an array in-place in linear time.

Unlike array, lists can be merged in-place --conceptually, this is because insertion into the middle of a list is an O(1) operation. The following program implements a mergesort on lists. We decompose the list into two equal-sized sublists around the node in the middle of the list. We find this node by advancing two iterators through ths list, one twice as fast as the other. When the fast iterator reaches the end of the list, the slow iterator is at the middle of the list. We recurse on the sublists, and use Solution 7.1 (merge two sorted lists) to combine the sorted sublists. 

In [156]:
def merge_two_sorted_lists(L1: ListNode,
                           L2: ListNode) -> ListNode:
    # creates a placeholder for the result.
    dummy_head = tail = ListNode()
    
    while L1 and L2:
        if L1.data < L2.data:
            tail.next, L1 = L1, L1.next
        else:
            tail.next, L2 = L2, L2.next
    # Appends the remaining nodes of L1 or L2
    tail.next = L1 or L2
    return dummy_head.next 

In [167]:
def stable_sort_list(L: ListNode) -> ListNode:
    # Base cases: L is empty or a single node, nothing to do 
    if L is None or L.next is None:
        return L
    # Find the midpoint of L using a slow and a fast pointer
    pre_slow, slow, fast = None, L, L
    while fast and fast.next:
        pre_slow = slow
        fast, slow = fast.next.next, slow.next
        
    if pre_slow:
        pre_slow.next = None # Splits the list into two equal-sized lists
    print_ListNode(L)
    print_ListNode(slow)
        
    return merge_two_sorted_lists(stable_sort_list(L), stable_sort_list(slow))

In [168]:
L = stable_sort_list(node1)

In [164]:
L

<__main__.ListNode at 0x107f56630>

In [172]:
print_ListNode(node1)

1
33
2734
5
7
10
37
99
32
45


## 13.12 Compute a salary threshold

Design an algorithm for computing the salary cap, given existing salaries and the target payroll. 

In [173]:
def find_salary_cap(target_payroll: int, current_salaries: list) -> float:
    current_salaries.sort()
    unadjusted_salary_sum = 0.0
    for i, current_salary in enumerate(current_salaries):
        adjusted_people = len(current_salaries) - i
        adjusted_salary_sum = current_salary * adjusted_people 
        if unadjusted_salary_sum + adjusted_salary_sum >= target_payroll:
            return (target_payroll - unadjusted_salary_sum) /adjusted_people
        unadjusted_salary_sum += current_salary
        
    # No solution, since target_payroll > existing payroll.
    return -1.0

In [174]:
target_payroll = 210
current_salaries = [20, 30, 100, 40, 20]
find_salary_cap(target_payroll, current_salaries)

100.0

The most expensive operation for this entire solution is sorting A, hence the run time is O(n logn). Once we have A sorted, we simply iterate through its entries looking for the first entry which implies a payroll that exceeds the target, and then solve for the cap using an arithmetical exper