# Lab 10
## Data Structures & Algorithms
### Thursday, 25 April 2024

## Today

* [Greedy Algorithms](#greedy)
* [Examples](#examples)
* [Exercises](#exercises)

## Greedy Algorithms <a class="anchor" id="greedy"></a>

* no simple definition but: algorithms that - at each step - optimise some measure locally in order to get to the solution
* sometimes it can be proven that a greedy algorithm produces an optimal solution, other times the solutions from greedy algorithms are locally optimal and approximate globally optimal solutions

## Examples <a class="anchor" id="examples"></a>

### Interval Scheduling

Given a number of requests, schedule them in a way so that (a) none overlap in time and (b) you have scheduled the largest possible subset of the original set (rejecting the ones that don't fit).

General idea: start with first scheduling request and then reject all requests that are not compatible with the first one. Out of all the compatible ones, select one according to some simple rule (which needs to be chosen). 

Algorithm: Create sorted list of requests ordered by finishing time, select the first request as the one with the earliest finishing time to ensure that our resource becomes free again as soon as possible, then move along the list of remaining intervals and select the first compatible request as the next one. 

<div>
   <img src="images/screenshot_greedy_interval_scheduling2.png" width="500px">
</div>

It can be proven (as you saw in the lecture), that the set $A$ returned by this algorithm is **compatible** and also **optimal** (=has maximum size). It can also be shown that is has running time $O(n \log n)$: $O(n \log n)$ time for sorting in order of finishing time and then $O(n)$ to go through the list and select the next compatible request.

In [26]:
def interval_scheduling(start_times, finish_times):
    """
    Finds a maximal set of non-overlapping intervals based on their start and finish times.
    
    Parameters:
    start_times (list): List of start times for each interval.
    finish_times (list): List of finish times for each interval.
    
    Returns:
    set: A set containing the indices of the selected non-overlapping intervals.
    """
    # Create a list of indices for the intervals
    n = len(start_times)
    index = list(range(n))
    
    # Sort the indices based on the finish times of the intervals
    index.sort(key=lambda i: finish_times[i])
 
    # Initialize an empty list to store the indices of the maximal set of intervals
    maximal_set = []
    
    # Initialize the previous finish time
    prev_finish_time = 0
    
    # Iterate through the sorted indices
    for i in index:
        # Check if the current interval can be added to the maximal set
        if start_times[i] >= prev_finish_time:
            # Add the index of the current interval to the maximal set
            maximal_set.append(i)
            # Update the previous finish time
            prev_finish_time = finish_times[i]
 
    return maximal_set

In [27]:
start_times = [2, 3, 1, 8, 5]
finish_times = [4, 6, 3, 9, 7]
print(interval_scheduling(start_times=start_times, finish_times=finish_times))

[2, 1, 3]


### Interval Partitioning

Given a number of requests and many identical resources, schedule **all** requests using the smallest possible number of resources (e.g. scheduling lectures in classrooms).

Algorithm: Sort the requests by starting time. For each request $j$, label it with a number (out of a set of $d$ numbers where $d$ is the depth of the set of intervals) that has not been used previously for any other request $i$ that overlaps with $j$.

<div>
   <img src="images/screenshot_greedy_interval_partitioning.png" width="500px">
</div>

It can be proven that this greedy algorithm schedules all requests on one of $d$ resources, where $d$ is the depth of the set of requests, and that runs in $O(n \log n)$. 

### Dijkstra's Algorithm

Find the shortest paths from a starting node $s$ in a graph to all other nodes. Note that, unlike BFS and DFS, Dijkstra's algorithm can be used on **weighted** graphs.

Algorithm: Kind of like a continuous version of breadth-first search!

<div>
   <img src="images/screenshot_dijkstra.png" width="400px">
</div>

This can be implemented in an overall $O(m \log n)$ running time. 

Let's take a look at an implementation using the `heapq` library in Python. First, here's a short introduction on how to use heapq (as found [here](https://www.geeksforgeeks.org/heap-queue-or-heapq-in-python/)).

In [3]:
import heapq
 
# initializing list
li = [5, 7, 9, 1, 3]
print(li)
 
# using heapify to convert list into heap
heapq.heapify(li)
print(li)
 
# using heappush() to push elements into heap
heapq.heappush(li, 4)
print(li)
 
# using heappop() to pop highest priority element
el = heapq.heappop(li)
print(el)
print(li)

[5, 7, 9, 1, 3]
[1, 3, 9, 7, 5]
[1, 3, 4, 7, 5, 9]
1
[3, 5, 4, 7, 9]


The following implementation is borrowed from this [website](https://bradfieldcs.com/algos/graphs/dijkstras-algorithm/), which is also a really useful resource.

In [4]:
def dijkstra_algorithm(graph, starting_vertex):
    """
    Apply Dijkstra's algorithm to find the shortest paths from a starting vertex to all other vertices.

    Parameters:
    - graph (dict): A dictionary representing the graph where keys are vertices and values are dictionaries 
                    containing neighbors as keys and edge weights as values.
    - starting_vertex: The vertex from which to start the search.

    Returns:
    - distances (dict): A dictionary containing the shortest distances from the starting_vertex to all other vertices.
                       Keys are vertices, and values are the shortest distances from the starting_vertex to each vertex.
    """
    # using infinity at initialisation ensures that any initially discovered distance will be considered smaller
    distances = {vertex: float('infinity') for vertex in graph}
    distances[starting_vertex] = 0

    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        # Pop the vertex with the shortest distance from the priority queue
        current_distance, current_vertex = heapq.heappop(pq)

        # Nodes can get added to the priority queue multiple times. We only
        # process a vertex the first time we remove it from the priority queue.
        if current_distance > distances[current_vertex]:
            continue

        # Explore neighbors of the current vertex
        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight

            # Update the distance if a shorter path is found
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))

    return distances

In [5]:
graph = {
    'U': {'V': 2, 'W': 5, 'X': 1},
    'V': {'U': 2, 'X': 2, 'W': 3},
    'W': {'V': 3, 'U': 5, 'X': 3, 'Y': 1, 'Z': 5},
    'X': {'U': 1, 'V': 2, 'W': 3, 'Y': 1},
    'Y': {'X': 1, 'W': 1, 'Z': 1},
    'Z': {'W': 5, 'Y': 1},
}
print(dijkstra_algorithm(graph, 'X'))

{'U': 1, 'V': 2, 'W': 2, 'X': 0, 'Y': 1, 'Z': 2}


## Exercises <a class="anchor" id="exercises"></a>

### Exercise 1

Rewrite the interval scheduling algorithm with the following changes:

* the function should now take only one argument called `intervals`, containig a list of tuples, where each interval is represented as a tuple of the start and end date (i.e. `intervals = [(start_0, end_0), (start_1, end_0), ..., (start_n, end_n)]`).
* update the part where the algorithm sorts according to finishing time so that it works with the new input
* the function should return a list of those tuples that have been selected by the algorithm as the non-verlapping ones (rather than indices as in the example).

In [6]:
def interval_scheduling2(intervals):
    """
    Solves the interval scheduling problem by selecting non-overlapping intervals with maximum end times.
    
    Parameters:
    intervals (list of tuples): List of intervals, where each interval is represented as a tuple (start, end).
    
    Returns:
    list of tuples: List of selected non-overlapping intervals.
    """
    # Implement me

In [7]:
intervals = [(2, 4), (3, 6), (1, 3), (8, 9), (5, 7)]
print(interval_scheduling2(intervals))  

[(1, 3), (3, 6), (8, 9)]


### Exercise 2

Modify the Dijkstra algorithm to return not only the shortest distances but also **the paths themselves** from the starting vertex to all other vertices. For the format of this new output, look at the prepared docstring (the commented text just underneath the function name). Hint: Copy the code from the dijkstra algorithm above, and change it accordingly!

In [8]:
def dijkstra_algorithm_with_paths(graph, starting_vertex):
    """
    Apply Dijkstra's algorithm to find the shortest paths from a starting vertex to all other vertices.

    Parameters:
    - graph (dict): A dictionary representing the graph where keys are vertices and values are dictionaries 
                    containing neighbors as keys and edge weights as values.
    - starting_vertex: The vertex from which to start the search.

    Returns:
    - distances (dict): A dictionary containing the shortest distances from the starting_vertex to all other vertices.
                       Keys are vertices, and values are the shortest distances from the starting_vertex to each vertex.
    - paths (dict): A dictionary containing the shortest paths from the starting_vertex to all other vertices.
                    Keys are vertices, and values are lists representing the shortest path from the starting_vertex
                    to each vertex.
    """
    # Implement me

In [9]:
graph_example1 = {
    'A': {'B': 2, 'C': 5},
    'B': {'C': 1, 'D': 2},
    'C': {'D': 3},
    'D': {}
}

print(dijkstra_algorithm_with_paths(graph_example1, 'A'))

graph_example2 = {
    'A': {'B': 4, 'C': 2},
    'B': {'C': 5, 'D': 10},
    'C': {'D': 3},
    'D': {}
}

print(dijkstra_algorithm_with_paths(graph_example2, 'A'))

({'A': 0, 'B': 2, 'C': 3, 'D': 4}, {'A': ['A'], 'B': ['A', 'B'], 'C': ['A', 'B', 'C'], 'D': ['A', 'B', 'D']})
({'A': 0, 'B': 4, 'C': 2, 'D': 5}, {'A': ['A'], 'B': ['A', 'B'], 'C': ['A', 'C'], 'D': ['A', 'C', 'D']})


### Exercise 3

Implement a function to partition a set of intervals into the minimum number of subsets such that no two intervals in the same subset overlap.

Hint: You can follow the following steps separately and then put them together in the `interval_partitioning` function in the end:

1. **Sort intervals:** sort the input list of intervals based on their start times.
2. **Initialise partitions:** create an empty list of lists of partitions (each partition represents a 'classroom' in the example above and from the lecture and each 'classroom' will have a list of intervals - therefore a list of lists). Initialise the first element of the first partition with the first interval (so to start with, you're only adding the first lecture to the first classroom)
3. **Place intervals:** write a for loop, that loops through all remaining intervals (other than the first one, which you already added to the first partition in `partitions`) and checks whether it can be added to any of the partitions ('classrooms') by creating a nested loop through existing partitions. Check with an if clause if the interval fits into the current partition (if the end time of the last interval in the partition is not later than the start time of the current interval for which you are checking). If the interval fits, append the interval to the current partition, break the If clause and flag somehow (with a flagging variable) that the interval has been `placed`.
5. **Create new partitions:** if the interval hasn't been placed after looping through all the lists in `partitions`, create a new partition by appending a new list (only consisting of the current interval) to the `partitions` list of lists. 

In [10]:
def interval_partitioning(intervals):
    """
    Implement the interval partitioning algorithm to partition a set of intervals into the minimum number of subsets 
    such that no two intervals in the same subset overlap.
    
    Parameters:
    - intervals (list of tuples): A list of intervals represented as tuples (start, end).
    
    Returns:
    - partitions (list of lists): A list of partitions, where each partition is a list of intervals.
    """
    # Implement me

In [11]:
intervals1 = [(1, 3), (2, 4), (3, 5), (6, 8)]
print(interval_partitioning(intervals1))

intervals2 = [(1, 5), (2, 6), (4, 8), (7, 10)]
print(interval_partitioning(intervals2))

intervals3 = [(1, 11), (2, 6), (4, 8), (7, 10)]
print(interval_partitioning(intervals3))

[[(1, 3), (3, 5), (6, 8)], [(2, 4)]]
[[(1, 5), (7, 10)], [(2, 6)], [(4, 8)]]
[[(1, 11)], [(2, 6), (7, 10)], [(4, 8)]]


### Exercise 4

Modify the interval partitioning algorithm to return not only the partitions but also the maximum number of intervals in any partition.

In [12]:
def interval_partitioning_with_max_intervals(intervals):
    """
    Implement the interval partitioning algorithm to partition a set of intervals into the minimum number of subsets 
    such that no two intervals in the same subset overlap.

    Parameters:
    - intervals (list of tuples): A list of intervals represented as tuples (start, end).

    Returns:
    - partitions (list of lists): A list of partitions, where each partition is a list of intervals.
    - max_intervals (int): The maximum number of intervals in any partition.
    """
    # Implement me

In [13]:
intervals1 = [(1, 3), (2, 4), (3, 5), (6, 8)]
print(interval_partitioning_with_max_intervals(intervals1))

intervals2 = [(1, 5), (2, 6), (4, 8), (7, 10)]
print(interval_partitioning_with_max_intervals(intervals2))

intervals3 = [(1, 11), (2, 6), (4, 8), (7, 10)]
print(interval_partitioning_with_max_intervals(intervals3))

([[(1, 3), (3, 5), (6, 8)], [(2, 4)]], 3)
([[(1, 5), (7, 10)], [(2, 6)], [(4, 8)]], 2)
([[(1, 11)], [(2, 6), (7, 10)], [(4, 8)]], 2)


### Exercise 5

Implement a function to find the minimum spanning tree (MST) of a connected, undirected graph using Prim's algorithm, as you saw in the lectures. As a refresher, the algorithm works as follows: 

1. Choose a random vertex as the starting point, and include it as the first vertex in the MST.
2. Compare the edges going out from the MST. Choose the edge with the lowest weight that connects a vertex among the MST vertices to a vertex outside the MST.
3. Add that edge and vertex to the MST.
4. Keep doing step 2 and 3 until all vertices belong to the MST.

Hint: You can follow the following steps separately and then put them together in the `prim_algorithm` function in the end:

1. **Initialise minimum spanning tree:** Create an empty list to store the edges forming the minimum spanning tree.
2. **Initialise visited set:** Create an empty set to keep track of visited vertices (an empty set is initialised by `set()`). Then add the starting vertex to the visited set to mark it as visited.
3. **Initialise priority queue:** Create a priority queue (heap, using the heapq package) to store edges incident to visited vertices. Initialise the priority queue with edges incident to the starting vertex. In practice, do this by creating a list of tuples (`pq = [(weight1, starting_vertex, neighbour1),(weight2, starting_vertex, neighbour2), ...]` first, and then use `heapq.heapify(pq)` to turn it into a priority queue.
4. **Explore edges:** Use a while loop to iterate until the priority queue is empty. Inside the loop, pop the edge with the minimum weight from the priority queue (using `heapq.heappop(pq)`)
5. **Check visited status:** Check if the destination vertex of the popped edge is already visited. If not visited:
   - Mark the destination vertex as visited.
   - Add the edge to the minimum spanning tree.
   - Explore edges incident to the destination vertex.
   - Add unvisited incident edges to the priority queue.
6. **Return minimum spanning tree:** After visiting all vertices, return the list of edges forming the minimum spanning tree.

In [14]:
def prim_algorithm(graph, starting_vertex):
    """
    Implemented Prim's algorithm to find the minimum spanning tree of a connected, undirected graph.

    Parameters:
    - graph (dict): A dictionary representing the graph where keys are vertices and values are dictionaries 
                    containing neighbors as keys and edge weights as values.
    - starting_vertex: The vertex from which to start the search.

    Returns:
    - min_spanning_tree (list of tuples): A list of edges forming the minimum spanning tree.
    """
    # Implement me

In [15]:
graph1 = {
    'A': {'B': 2, 'C': 1},
    'B': {'A': 2, 'C': 2, 'D': 1},
    'C': {'A': 1, 'B': 2, 'D': 2},
    'D': {'B': 1, 'C': 2}
}
print(prim_algorithm(graph1, 'A'))

graph2 = {
    'A': {'B': 4, 'C': 8},
    'B': {'A': 4, 'C': 2, 'D': 5},
    'C': {'A': 8, 'B': 2, 'D': 5},
    'D': {'B': 5, 'C': 5}
}
print(prim_algorithm(graph2, 'A'))

[('A', 'C', 1), ('A', 'B', 2), ('B', 'D', 1)]
[('A', 'B', 4), ('B', 'C', 2), ('B', 'D', 5)]
