## Greedy algorithms boot camp

For US currency, wherein coins take values 1,5,10,25,50,100 cents, the greedy algorithm for making change results in the minimum number of coins. 

In [2]:
def change_making(cents: int) -> int:
    coins = [100, 50, 25, 10, 5, 1]
    num_coins = 0
    for coin in coins:
        num_coins += cents // coin
        cents %= coin 
    return num_coins

In [3]:
cents = 57

In [4]:
cents// 50

1

In [5]:
cents

57

In [6]:
cents % 50

7

In [7]:
cents 

57

In [8]:
change_making(57)

4

In [9]:
change_making(4)

4

In [10]:
change_making(10)

1

In [11]:
change_making(29)

5

We perform 6 iterations and each iteration does a constant amount of computation, so the time complexity is O(1). 

## 17.1 Compute an optimum assignment of tasks 

We consider the problem of assigning tasks to workers. Each worker must be assigned exactly two tasks. Each task takes a fixed amount of time. Tasks are independent, i.e., there are no constraints of the form "Task 4 cannot start before Task 3 is completed." Any task can be assinged to any worker. 

**Sol:** In summary, we sort the set of task durations, and pair the shrotest, second shortest, third shortest, etc. tasks with the longest, second longest, third longest,etc. tasks. 

In [12]:
import collections

In [15]:
PairedTasks = collections.namedtuple('PairedTasks', ('task_1', 'task_2'))

def optimum_task_assignment(task_durations: list) -> list:
    task_durations.sort()
    return [
        PairedTasks(task_durations[i], task_durations[~i])
        for i in range(len(task_durations)//2)
    ]

In [16]:
task_durations = [5,2,1,6,4,4]
optimum_task_assignment(task_durations)

[PairedTasks(task_1=1, task_2=6),
 PairedTasks(task_1=2, task_2=5),
 PairedTasks(task_1=4, task_2=4)]

In [17]:
task_durations = [4,9,2,39,4,1,0,2,8,41]
optimum_task_assignment(task_durations)

[PairedTasks(task_1=0, task_2=41),
 PairedTasks(task_1=1, task_2=39),
 PairedTasks(task_1=2, task_2=9),
 PairedTasks(task_1=2, task_2=8),
 PairedTasks(task_1=4, task_2=4)]

The time complexity is dominated by the time to sort, i.e., O(n log n). 

## 17.2 Schedule to minimize waiting time 

A database has to respond to set of client SQL queries. The service time required for each entry is known in advance. For this application, the queries must be processed by the database one at a time, but can be done in any order. The time a query waits before its turn comes is called its waiting time. 

Given service times for a set of queries, compute a schedule for processing the queries that minimizes the total waiting time. Return the minimum waiting time. 

In [19]:
def minimum_total_waiting_time(service_times: list) -> int:
    # Sort the service times in increasing order
    service_times.sort()
    total_waiting_time = 0
    for i, service_time in enumerate(service_times):
        num_remaining_queries = len(service_times) - (i+1)
        total_waiting_time += service_time * num_remaining_queries
    return total_waiting_time

In [24]:
service_times = [5,3,2,1]

In [25]:
minimum_total_waiting_time(service_times)

10

In [26]:
service_times = [6,4,6,2,1]

In [27]:
minimum_total_waiting_time(service_times)

24

The time complexity is dominated by the time sort, i.e., O(n log n). 

## 17.3 The interval covering problem 

Consider a forman responsible for a number of tasks on the factory floor. Each task starts at a fixed time and ends at a fixed time. The foreman wants to visit the floor to check on the tasks. Your job is to help him minimize the number of visits he makes. In each visit, he can check on all the tasks taking place at the time of visit. 

You are given a set of closed intervals. Desing an efficient algorithm for finding a minimum sized set of numbers that covers all the intervals. 

**Sol:** The above observation leads to the following algorithm. Sort all the intervals, comparing on right endpints. Select the first interval's right endpoint. Iterate through the intervals, looking for the first one not covered by this right endpoint. As soon as such an interval is found, select its right endpoint and continue the interation. 

In [31]:
import operator

In [37]:
Interval = collections.namedtuple('Interval', ('left', 'right'))

def find_minimum_visits(intervals: list) -> int:
    # Sort intervals based on the right endpoints
    intervals.sort(key = operator.attrgetter('right'))
    last_visit_time, num_visits = float('-inf'), 0
    time_to_visit = []
    for interval in intervals:
        if interval.left > last_visit_time:
            # The current right endpoint, last_visit_time, will not cover any more intervals
            last_visit_time = interval.right 
            time_to_visit.append(last_visit_time)
            num_visits += 1
    return num_visits, time_to_visit

In [38]:
it1 = Interval(1,2)
it2 = Interval(2,3)
it3 = Interval(3,4)
it4 = Interval(2,3)
it5 = Interval(3,4)
it6 = Interval(4,5)
itervals = [it1,it2,it3,it4,it5,it6]

In [39]:
find_minimum_visits(itervals)

(2, [2, 4])

We spend O(1) time per index. The time complexity after the initial sort is O(n), where n is the number of intervals. Therefore, the time taken is dominated by the initial sort, i.e., O(n log n). 

## Invariants boot camp

Suppose you were asked to write a program that takes as input a sorted array and a given target value and determines if there are two entries in the array (not necessarily distinct) that add up to that value. 

Brute-force algorithm for this problem consists of a pair of nested for loops. Its complexity is O(n^2), where n is the length of the input array. A faster approach is to add each element of the array to a hash table, and test for each element e if K-e, where K is the target value, is present in the hash table. While reducing time complexity to O(n), this approach requires O(n) additional storage for the hash. 

The most efficient appraoch uses invariants: maintain a subarray that is guaranteed to hold a solution, if it exists. This subarray is initialized to the entire array, and iteratively shrunk from one side or the other. The shriking makes use of the sortedness of the array. Specifically, if the sum of the leftmost and the rightmost elements is less than the target, then the leftmost element can never be combined with some lement to obtain the target. A similar observation holds for the rightmost element. 

In [40]:
def has_two_sum(A: list, t: int) -> bool:
    i, j = 0, len(A) -1
    
    while i<= j:
        if A[i] + A[j] == t:
            return True
        elif A[i] + A[j] < t:
            i += 1
        elif A[i] + A[j] > t:
            j -= 1
    return False 

In [41]:
A = [-2,1,2,4,7,11]
t = 9
has_two_sum(A,t)

True

In [42]:
t = 10
has_two_sum(A,t)

False

In [43]:
t = 0 
has_two_sum(A,t)

True

In [44]:
t = 4
has_two_sum(A,t)

True

The time cpmplexity is O(n), where n is the length of the array. The space complexity is O(1), since the subarray can be represented by two variables. 

## 17.4 The 3-sum problem

Design an algorithm that takes as input an array and a number, and determines if there are three entries in teh array (not necessarily distinct) which add up to the specified number. 

In [47]:
def has_three_sum(A:list, t:int) -> bool:
    A.sort()
    # Finds if the sum of two numbers in A equals to t-a
    return any(has_two_sum(A, t-a) for a in A)

In [48]:
A = [2,3,5,7,11]
t = 21
has_three_sum(A,t)

True

In [49]:
t = 22
has_three_sum(A,t)

False

The additional space needed is O(1), and the time complexity is the sum of the time taken to sort, O(n logn), and then to run the O(n) algorithm to find a pair in a sorted array that sums to a specified value, which is O(n^2) overall. 

## 17.5 Find the majority element 

You are reading a sequence of strings. You know a priori that more than half the strings are repetitions of a single string (the "majority element") but the positions where the majority element occurs are unknown. Write a program that makes a single pass over the sequence and identifies the majority element. For example, if the input is <b,a,c,a,a,b,a,a,c,a>, then a is the majority element. 

In [53]:
def majority_search(stream) -> str:
    candidate_count = 0
    for it in stream:
        if candidate_count == 0:
            candidate, candidate_count = it, candidate_count + 1
        elif candidate == it:
            candidate_count += 1
        else:
            candidate_count -= 1
    return candidate

In [58]:
stream = ['a','b','c','c','d','a','a','c','c','d','a','a','d','c','c','a']

In [59]:
majority_search(stream)

'c'

Since we spend O(1) time per entry, the time complexity is O(n). The additional space complexity is O(1). 

## 17.6 The gasup problem

Given an instance of the gasup problem, how would you efficiently compute an ample city? You can assume that there exists an ample city. The input is given in the form of two arrays --one for the amount of gas at each city, the other for the distance to the next city.

**Hint:** Think about starting with more than enough gas to complete the circuit without gassing up. Track the amount of gas as you perform the circuit, gassing up at each city. 

**Sol:** Consider a city where the amount of gas in the tank is minimum when we enter that city. Observe that it does not depend where we begin from--because graphs are the same up to translation and shifting, a city that is minimum for one graph will be a minimum city for all graphs. Let z be a city where the amount of gas in the tank before we refuel at the city is minimum. Now we pick z as the starting point, with the gas present at z. Since we never have less gas than we started with at z, and when we return to z we have 0 gas (since it's given that the total amount of gas is just enough to complete the traversal) it means we can complete the journey without running out of gas. Note that the reasoning given above assumes that there always exists an ample city. 

The computation to determine z can be easily performed with a single pass over all the cities simulating the changes to amount of gas as we advance. 

In [63]:
MPG = 20

# gallons[i] is the amount of gas in city i, and distance[i] is the distance city i to the next city
def find_ample_city(gallons: list, distances: list) -> int:
    remaining_gallons = 0
    CityAndRemainingGas = collections.namedtuple('CityAndRemainingGas',
                                                ('city', 'remaining_gallons'))
    city_remaining_gallons_pair = CityAndRemainingGas(0, 0)
    num_cities = len(gallons)
    for i in range(1, num_cities):
        remaining_gallons = gallons[i-1] - distances[i-1]//MPG
        if remaining_gallons < city_remaining_gallons_pair.remaining_gallons:
            city_remaining_gallons_pair = CityAndRemainingGas(
                i, remaining_gallons)
    return city_remaining_gallons_pair.remaining_gallons

In [64]:
gallons = [50,20,5,30,25,10,10]
distances = [900,600,200,400,600,200,100]

In [65]:
find_ample_city(gallons, distances)

-10

In [67]:
remaining_gallons = []
for i in range(1,len(gallons)):
    remaining_gallons.append(gallons[i-1] - distances[i-1]//MPG)
    

In [68]:
remaining_gallons

[5, -10, -5, 10, -5, 0]

The time complexity is O(n), and the space complexity is O(1). 

## 17.7 Compute the maximum water trapped by a pair of vertical lines 

An array of integers naturally defines a set of lines parallel to the Y-axis, starting from x = 0 as illustrated in Figure 14.7(a) on the next page. The goal of this problem is to find the paair of lines that together with the X-axis "trap" the most water.

Write a program which takes as input an integer array and returns the pair of entries that trap the maximum amount of water. 

**Hint:** Start with 0 and n-1 and work your way in. 

In [82]:
def get_max_trapped_water(heights: list) -> int:
    i, j, max_water = 0, len(heights)-1, 0
    while i<j:
        if heights[i] < heights[j]:
            max_water = max(max_water, heights[i]*(j-i))
            i += 1
        else:
            max_water = max(max_water, heights[j]*(j-i))
            j -= 1
    return max_water

In [83]:
heights = [1,2,1,3,4,4,5,6,2,1,3,1,3,2,1,2,4,1]
get_max_trapped_water(heights)

48

We iteratively eliminate one line at a time, and we spend O(1) time per iteration, so the time complexity is O(n). 

## 17.8 Compute the largest rectangle under the skyline 

You are given a sequence of adjacent buildings. Each has unit width and an integer height. These bulidings form the skyline of a city. An architect wants to know the area of a largest rectangle contained in this skyline. 

Let A be an array representing the heights of adjacent buildings of unit width. Design an algorithm to compute the area of the largest rectangle contained in this skyline. 

In [96]:
def calculate_largest_rectangle(heights: list) -> int:
    pillar_indices = []
    max_rectangle_area = 0
    # By appending [0] to heights, we can uniformly handle the computation for rectangle area here. 
    for i, h in enumerate(heights + [0]):
        while pillar_indices and heights[pillar_indices[-1]] >= h:
            # when encounter a building has a smaller height than previous one, calculate the rectangle
            height = heights[pillar_indices.pop()]
            print(height)
            width = i if not pillar_indices else i - pillar_indices[-1] -1
            print(width)
            max_rectangle_area = max(max_rectangle_area, height * width)
            print(max_rectangle_area)
        pillar_indices.append(i)
        print(pillar_indices)
    return max_rectangle_area

In [97]:
heights = [1,4,2,5,6,3,2,6,6,5,2,1,3]
calculate_largest_rectangle(heights)

[0]
[0, 1]
4
1
4
[0, 2]
[0, 2, 3]
[0, 2, 3, 4]
6
1
6
5
2
10
[0, 2, 5]
3
3
10
2
5
10
[0, 6]
[0, 6, 7]
6
1
10
[0, 6, 8]
6
2
12
[0, 6, 9]
5
3
15
2
9
18
[0, 10]
2
10
20
1
11
20
[11]
[11, 12]
3
1
20
1
13
20
[13]


20

The time complexity is O(n). When advancing through buildings, the time spent for building is proprotional to the number of pushes and pops performed when processing it. Althouhg for some buildings, we may perform multiple pops, in total we perform at most n pushes and at most n pops. This is because in the advancing phase, an entry i is added at most once to the stack and cannot be popped more than once. The time complexity of processing remaining stack elements after the advancing is complete is also O(n) since there are at most n elements in the stack, and the time to process each one is O(1). Thus, the overall time complexity is O(n). The space compelxity is O(n), which is the largest the stack can grow to, e.g., if buldings appear in ascending order. 