# Lab 10
## Data Structures & Algorithms

## Today

* [Refresher on Greedy Algorithms](#greedy)
* [1. Interval Scheduling](#1-interval-scheduling)
* [2. Interval Partitioning](#2-interval-partitioning)
* [3. Dijkstras Algorithm](#3-dijkstras-algorithm)
* [Exercises](#exercises)

## Greedy Algorithms <a class="anchor" id="greedy"></a>

There is no single, formal definition of a greedy algorithm, but generally, it refers to an approach that:

- **Makes the locally optimal choice at each step** with the hope of finding a globally optimal solution.
- **Optimizes a specific measure step-by-step** rather than considering all possible future consequences.

#### When Do Greedy Algorithms Work?
- In some problems, **greedy choices lead to an optimal solution**, and we can mathematically prove their correctness.
- In other cases, greedy algorithms **only find an approximate solution**, as they optimize locally but may miss the globally optimal path.


# Examples <a class="anchor" id="examples"></a>

## 1. Interval Scheduling

**Problem Statement**

Given a set of time-based requests, schedule them in a way that:
1. **No two requests overlap** in time.
2. **The maximum number of requests are scheduled**, rejecting the ones that do not fit.

**General Idea**
1. Start by selecting the **first request** to schedule.
2. Reject all requests that are **incompatible** with this selection.
3. Among the remaining compatible requests, select the next one based on a **simple rule** (which must be carefully chosen).

**Algorithm**
1. **Sort the requests** in ascending order of their **finishing times**.
2. Select the first request that finishes the earliest—this ensures that the resource is freed as soon as possible.
3. Iterate through the sorted list and **select the first compatible request** as the next one.
4. Repeat until no more requests can be scheduled.

<div>
   <img src="images/screenshot_greedy_interval_scheduling2.png" width="500px">
</div>

**Correctness & Complexity**
- As shown in the lecture, it can be **proven** that the set of scheduled intervals $ A $ returned by this algorithm is:
  - **Compatible** (no two selected intervals overlap).
  - **Optimal** (it selects the maximum possible number of non-overlapping intervals).
- **Time Complexity:**  
  - **Sorting the requests** by finishing time takes **$ O(n \log n) $**.
  - **Iterating** through the list to select compatible requests takes **$ O(n) $**.
  - **Total runtime: $ O(n \log n) $**.

In [48]:
def interval_scheduling(start_times, finish_times):
    """
    Finds a maximal set of non-overlapping intervals based on their start and finish times.
    
    Parameters:
    start_times (list): List of start times for each interval.
    finish_times (list): List of finish times for each interval.
    
    Returns:
    set: A set containing the indices of the selected non-overlapping intervals.
    """
    # Create a list of indices for the intervals
    index = list(range(len(start_times)))
    
    # Sort the indices based on the finish times of the intervals
    index.sort(key=lambda i: finish_times[i])
 
    # Initialize an empty list to store the indices of the maximal set of intervals
    maximal_set = []
    
    # Initialize the previous finish time
    prev_finish_time = 0
    
    # Iterate through the sorted indices
    for i in index:
        # Check if the current interval can be added to the maximal set
        if start_times[i] >= prev_finish_time:
            # Add the index of the current interval to the maximal set
            maximal_set.append(i)
            # Update the previous finish time
            prev_finish_time = finish_times[i]
 
    return maximal_set

In [49]:
start_times = [2, 3, 1, 8, 5]
finish_times = [4, 6, 3, 9, 7]
print(interval_scheduling(start_times=start_times, finish_times=finish_times))

[2, 1, 3]


## 2. Interval Partitioning

**Problem Statement**

Given a set of time-based requests and multiple identical resources, schedule **all** requests using the **smallest possible number of resources**.  

A common example is scheduling **lectures in classrooms**, where we want to **minimize** the number of rooms needed while ensuring that no two overlapping lectures share the same room.

**Algorithm**
1. **Sort the requests** in ascending order of their **starting times**.
2. For each request $ j $:
   - Assign it a **label (resource number)**.
   - Ensure that this label has **not been used** by any other request $ i $ that **overlaps** with $ j $.
3. Continue assigning resources while keeping the total number of resources used as low as possible.

<div>
   <img src="images/screenshot_greedy_interval_partitioning.png" width="500px">
</div>

**Correctness & Complexity**
- It can be proven that this greedy algorithm schedules all requests using at most **$ d $ resources**, where:
  - $ d $ is the **depth of the set of intervals** (i.e., the maximum number of overlapping requests at any point in time).
- **Time Complexity:**  
  - **Sorting the requests** by start time takes **$ O(n \log n) $**.
  - **Iterating** through the requests to assign labels takes **$ O(n) $**.
  - **Total runtime: $ O(n \log n) $**.

## 3. Dijkstra's Algorithm

**Problem Statement**

Find the **shortest paths** from a given **starting node** $ s $ to **all other nodes** in a weighted graph with **non-negative edge weights**.

**Key Idea**
- Dijkstra's algorithm is essentially a **continuous version of breadth-first search (BFS)**, but instead of exploring in layers, it expands nodes in the order of **their shortest known distance from $ s $**.
- It maintains a **priority queue** to always expand the **closest** unvisited node next.
- Intuition: 
   1. maintain a **set of explored nodes** about which we know the shortest paths from source to periphery, 
   2. repeatedly expand that set by one node (i.e. the shortest edge between explored and unexplored). How?
      - this is key role of the priority queue: it is a dynamic set of edges between the explored and unexplored; at each step we just pop out the shortest of that ever-changing set.
   3. Repeat (i.e. keep adding the next shortest edge )
- Essentially this is continuous BFS: whereaas before we had levels, here each level set is just 1 node, because distances are unique.

<div>
   <img src="images/screenshot_dijkstra.png" width="400px">
</div>


**Time Complexity**
- Using a **binary heap (priority queue)**, the algorithm runs in **$ O(m \log n) $**, where:
  - $ n $ is the number of **nodes**.
  - $ m $ is the number of **edges**.
  - The $ \log n $ factor comes from efficiently updating the priority queue.

---

### A Detailed Look at Dijkstra's Algorith

**1. Initialization**
- $ v $ represents all nodes in the graph,
- $ s $ represents the source node,
- We maintain a **distance function** $ \pi[v] $, where:
  - $ \pi[v] = \infty $ for all $ v \neq s $ (indicating that the shortest path from $ s $ to $ v $ is unknown initially).
  - $ \pi[s] = 0 $, since the shortest distance from $ s $ to itself is always 0.
- $S$ is the set of all explored nodes,

**2. Using the Priority Queue**
- Store all **unexplored nodes** $ v \notin S $ in a **priority queue** (min-heap).
- Priority for each node is given by its current shortest known distance $ \pi[v] $, meaning:
  - the **node at the front of the heap** is the one with:
    - the **shortest known** distance to the source $ s $.
    - it **has been discovered** but **not yet fully explored**.
- The **priority queue always selects the next "best" node** for processing

💡 **Key Insight:**  
The priority queue helps ensure that **we always expand the most promising node first**—the one with the smallest known distance.

**3. Updating $ \pi[v] $ (Relaxation Step)**
- when a **new node $ u $** is added to $ S $ (i.e., it has been fully explored):
  - we check **all outgoing edges** $ e = (u, v) $ (edges connecting $ u $ to unexplored nodes).
  - for each such edge, we calculate:
    $$
    \text{New candidate distance: } \pi[u] + l_{uv}
    $$
  - if this new path offers a **shorter distance** than our current best estimate $ \pi[v] $, we update:
    $$
    \pi[v] = \pi[u] + l_{uv}
    $$
  - the **priority queue is updated** via the `Decrease-key` operation to reflect this new, lower distance.

💡 **Why does this work?**  
- Since we **always explore the smallest $ \pi[v] $ first**, we guarantee that once a node $ v $ is processed, we have found the true shortest path to it.

**In sum**
1. **Start with the source $ s $**, initializing $ \pi[s] = 0 $ and all others as $ \infty $.
2. **Use a priority queue** to always select the next closest node.
3. **For each newly processed node** $ u $, update $ \pi[v] $ for its neighbors and push updated distances into the priority queue.
4. **Repeat until all reachable nodes are processed**, ensuring that each node gets the shortest path from $ s $.


---

### Python Implementation of Dijkstra’s Algorithm
To efficiently implement Dijkstra’s algorithm, we use the `heapq` library, which provides a **min-heap** (priority queue).  

To satisfy the **min-heap property**: 
- The smallest element must move to the front.
- The heap is not necessarily a fully sorted list; it only guarantees that the smallest element is always at index 0.

Before diving into the full implementation, here’s a short introduction on how to use `heapq`, as described in this [guide](https://www.geeksforgeeks.org/heap-queue-or-heapq-in-python/).


In [64]:
import heapq
 
# initializing list
li = [5, 7, 9, 1, 3]
print(li)
 
# using heapify to convert list into heap
heapq.heapify(li)
print(li)
 
# using heappush() to push elements into heap
heapq.heappush(li, 4)
print(li)
 
# using heappop() to pop highest priority element
el = heapq.heappop(li)
print(el)
print(li)

[5, 7, 9, 1, 3]
[1, 3, 9, 7, 5]
[1, 3, 4, 7, 5, 9]
1
[3, 5, 4, 7, 9]


The following implementation is borrowed from this [website](https://bradfieldcs.com/algos/graphs/dijkstras-algorithm/), which is also a really useful resource.

In [65]:
def dijkstra_algorithm(graph, starting_vertex):
    """
    Apply Dijkstra's algorithm to find the shortest paths from a starting vertex to all other vertices.

    Parameters:
    - graph (dict): A dictionary representing the graph where keys are vertices and values are dictionaries 
                    containing neighbors as keys and edge weights as values.
    - starting_vertex: The vertex from which to start the search.

    Returns:
    - distances (dict): A dictionary containing the shortest distances from the starting_vertex to all other vertices.
                       Keys are vertices, and values are the shortest distances from the starting_vertex to each vertex.
    """
    # using infinity at initialisation ensures that any initially discovered distance will be considered smaller
    distances = {vertex: float('infinity') for vertex in graph} # a dict w/ all distances as infinity
    distances[starting_vertex] = 0

    pq = [(0, starting_vertex)] # sets up tuple that will be our priority queue (starting vertex & distance)
    while len(pq) > 0:
        # Pop the vertex with the shortest distance from the priority queue
        current_distance, current_vertex = heapq.heappop(pq) 
        # now, current_distance is the shortest known distance to current_vertex.

        # Some nodes may be added to the heap multiple times (if we later find a shorter path to them)
        # If the current_distance larger than best-known distance in distances[current_vertex], ignore outdated entry.
        if current_distance > distances[current_vertex]:
            continue

        # Explore neighbors of the current vertex:
        # Loop through all neighboring nodes of current_vertex, retrieving corresponding edge weights
        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight

            # Update the distance if a shorter path is found
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))

    return distances

In [66]:
graph = {
    'U': {'V': 2, 'W': 5, 'X': 1},
    'V': {'U': 2, 'X': 2, 'W': 3},
    'W': {'V': 3, 'U': 5, 'X': 3, 'Y': 1, 'Z': 5},
    'X': {'U': 1, 'V': 2, 'W': 3, 'Y': 1},
    'Y': {'X': 1, 'W': 1, 'Z': 1},
    'Z': {'W': 5, 'Y': 1},
}
print(dijkstra_algorithm(graph, 'X'))

{'U': 1, 'V': 2, 'W': 2, 'X': 0, 'Y': 1, 'Z': 2}


# Exercises <a class="anchor" id="exercises"></a>

### Exercise 1

Rewrite the interval scheduling algorithm with the following changes:

* the function should now take only one argument called `intervals`, containig a list of tuples, where each interval is represented as a tuple of the start and end date (i.e. `intervals = [(start_0, end_0), (start_1, end_0), ..., (start_n, end_n)]`).
* update the part where the algorithm sorts according to finishing time so that it works with the new input
* the function should return a list of those tuples that have been selected by the algorithm as the non-verlapping ones (rather than indices as in the example).

### Solution 1

In [67]:
def interval_scheduling2(intervals):
    """
    Solves the interval scheduling problem by selecting non-overlapping intervals with maximum end times.
    
    Parameters:
    intervals (list of tuples): List of intervals, where each interval is represented as a tuple (start, end).
    
    Returns:
    list of tuples: List of selected non-overlapping intervals.
    """
    
    # Sort the intervals based on their end times
    intervals.sort(key=lambda x: x[1])
    
    # Initialize an empty list to store the indices of the maximal set of intervals
    maximal_set = []
    
    # Initialize the previous finish time
    prev_finish_time = 0
    
    # Iterate through the sorted intervals
    for interval in intervals:
        # Check if the current interval's start time is after the end time of the last selected interval
        if interval[0] >= prev_finish_time:
            # If so, add the current interval to the result list
            maximal_set.append(interval)
            # Update the end time of the last selected interval
            prev_finish_time = interval[1]
    
    return maximal_set

In [68]:
intervals = [(2, 4), (3, 6), (1, 3), (8, 9), (5, 7)]
print(interval_scheduling2(intervals))  # Output should be: [(1, 3), (3, 6), (8, 9)]

[(1, 3), (3, 6), (8, 9)]


### Exercise 2

Modify the Dijkstra algorithm to return not only the shortest distances but also **the paths themselves** from the starting vertex to all other vertices. For the format of this new output, look at the prepared docstring (the commented text just underneath the function name). Hint: Copy the code from the dijkstra algorithm above, and change it accordingly!

### Solution 2

In [69]:
def dijkstra_algorithm_with_paths(graph, starting_vertex):
    """
    Apply Dijkstra's algorithm to find the shortest paths from a starting vertex to all other vertices.

    Parameters:
    - graph (dict): A dictionary representing the graph where keys are vertices and values are dictionaries 
                    containing neighbors as keys and edge weights as values.
    - starting_vertex: The vertex from which to start the search.

    Returns:
    - distances (dict): A dictionary containing the shortest distances from the starting_vertex to all other vertices.
                       Keys are vertices, and values are the shortest distances from the starting_vertex to each vertex.
    - paths (dict): A dictionary containing the shortest paths from the starting_vertex to all other vertices.
                    Keys are vertices, and values are lists representing the shortest path from the starting_vertex
                    to each vertex.
    """
    distances = {vertex: float('infinity') for vertex in graph}
    distances[starting_vertex] = 0

    # Initialize a dictionary to store the paths
    paths = {vertex: [] for vertex in graph}
    paths[starting_vertex] = [starting_vertex]

    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)

        if current_distance > distances[current_vertex]:
            continue

        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight

            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))

                # Update the path to the neighbor
                paths[neighbor] = paths[current_vertex] + [neighbor]

    return distances, paths

In [70]:
graph_example1 = {
    'A': {'B': 2, 'C': 5},
    'B': {'C': 1, 'D': 2},
    'C': {'D': 3},
    'D': {}
}

print(dijkstra_algorithm_with_paths(graph_example1, 'A'))

graph_example2 = {
    'A': {'B': 4, 'C': 2},
    'B': {'C': 5, 'D': 10},
    'C': {'D': 3},
    'D': {}
}

print(dijkstra_algorithm_with_paths(graph_example2, 'A'))

({'A': 0, 'B': 2, 'C': 3, 'D': 4}, {'A': ['A'], 'B': ['A', 'B'], 'C': ['A', 'B', 'C'], 'D': ['A', 'B', 'D']})
({'A': 0, 'B': 4, 'C': 2, 'D': 5}, {'A': ['A'], 'B': ['A', 'B'], 'C': ['A', 'C'], 'D': ['A', 'C', 'D']})


### Exercise 3

Implement a function to partition a set of intervals into the minimum number of subsets such that no two intervals in the same subset overlap.

Hint: You can follow the following steps separately and then put them together in the `interval_partitioning` function in the end:

1. **Sort intervals:** sort the input list of intervals based on their start times.
2. **Initialise partitions:** create an empty list of lists of partitions (each partition represents a 'classroom' in the example above and from the lecture and each 'classroom' will have a list of intervals - therefore a list of lists). Initialise the first element of the first partition with the first interval (so to start with, you're only adding the first lecture to the first classroom)
3. **Place intervals:** write a for loop, that loops through all remaining intervals (other than the first one, which you already added to the first partition in `partitions`) and checks whether it can be added to any of the partitions ('classrooms') by creating a nested loop through existing partitions. Check with an if clause if the interval fits into the current partition (if the end time of the last interval in the partition is not later than the start time of the current interval for which you are checking). If the interval fits, append the interval to the current partition, break the If clause and flag somehow (with a flagging variable) that the interval has been `placed`.
5. **Create new partitions:** if the interval hasn't been placed after looping through all the lists in `partitions`, create a new partition by appending a new list (only consisting of the current interval) to the `partitions` list of lists. 

### Solution 3

In [71]:
def interval_partitioning(intervals):
    """
    Implement the interval partitioning algorithm to partition a set of intervals into the minimum number of subsets 
    such that no two intervals in the same subset overlap.
    
    Parameters:
    - intervals (list of tuples): A list of intervals represented as tuples (start, end).
    
    Returns:
    - partitions (list of lists): A list of partitions, where each partition is a list of intervals.
    """
    intervals.sort(key=lambda x: x[0])  # Sort intervals by start time
    partitions = [[intervals[0]]]  # Initialize first partition with the first interval
    
    for interval in intervals[1:]:
        placed = False
        for partition in partitions:
            if interval[0] >= partition[-1][1]:  # If interval can be placed in the current partition
                partition.append(interval)
                placed = True
                break
        if not placed:  # If interval cannot be placed in any existing partition, create a new partition
            partitions.append([interval])
    
    return partitions


In [90]:
intervals1 = [(1, 3), (2, 4), (3, 5), (6, 8)]
print(interval_partitioning(intervals1))

intervals2 = [(1, 5), (2, 6), (4, 8), (7, 10)]
print(interval_partitioning(intervals2))

intervals3 = [(1, 11), (2, 6), (4, 8), (7, 10)]
print(interval_partitioning(intervals3))

[[(1, 3), (3, 5), (6, 8)], [(2, 4)]]
[[(1, 5), (7, 10)], [(2, 6)], [(4, 8)]]
[[(2, 6), (7, 10)], [(4, 8)], [(1, 11)]]


### Exercise 4

Modify the interval partitioning algorithm to return not only the partitions but also the maximum number of intervals in any partition.

### Solution 4

In [91]:
def interval_partitioning_with_max_intervals(intervals):
    """
    Implement the interval partitioning algorithm to partition a set of intervals into the minimum number of subsets 
    such that no two intervals in the same subset overlap.

    Parameters:
    - intervals (list of tuples): A list of intervals represented as tuples (start, end).

    Returns:
    - partitions (list of lists): A list of partitions, where each partition is a list of intervals.
    - max_intervals (int): The maximum number of intervals in any partition.
    """
    intervals.sort(key=lambda x: x[0])  # Sort intervals by start time
    partitions = [[intervals[0]]]  # Initialize first partition with the first interval
    max_intervals = 1  # Initialize max_intervals to 1

    for interval in intervals[1:]:
        placed = False
        for partition in partitions:
            if interval[0] >= partition[-1][1]:  # If interval can be placed in the current partition
                partition.append(interval)
                placed = True
                max_intervals = max(max_intervals, len(partition))  # Update max_intervals if necessary
                break
        if not placed:  # If interval cannot be placed in any existing partition, create a new partition
            partitions.append([interval])
            max_intervals = max(max_intervals, 1)  # New partition only contains one interval, hence max_intervals is 1

    return partitions, max_intervals

In [93]:
intervals1 = [(1, 3), (2, 4), (3, 5), (6, 8)]
print(interval_partitioning_with_max_intervals(intervals1))

intervals2 = [(1, 5), (2, 6), (4, 8), (7, 10)]
print(interval_partitioning_with_max_intervals(intervals2))

intervals3 = [(1, 11), (2, 6), (4, 8), (7, 10)]
print(interval_partitioning_with_max_intervals(intervals3))

([[(1, 3), (3, 5), (6, 8)], [(2, 4)]], 3)
([[(1, 5), (7, 10)], [(2, 6)], [(4, 8)]], 2)
([[(1, 11)], [(2, 6), (7, 10)], [(4, 8)]], 2)


### Exercise 5

Implement a function to find the minimum spanning tree (MST) of a connected, undirected graph using Prim's algorithm, as you saw in the lectures. As a refresher, the algorithm works as follows: 

1. Choose a random vertex as the starting point, and include it as the first vertex in the MST.
2. Compare the edges going out from the MST. Choose the edge with the lowest weight that connects a vertex among the MST vertices to a vertex outside the MST.
3. Add that edge and vertex to the MST.
4. Keep doing step 2 and 3 until all vertices belong to the MST.

Hint: You can follow the following steps separately and then put them together in the `prim_algorithm` function in the end:

1. **Initialise minimum spanning tree:** Create an empty list to store the edges forming the minimum spanning tree.
2. **Initialise visited set:** Create an empty set to keep track of visited vertices (an empty set is initialised by `set()`). Then add the starting vertex to the visited set to mark it as visited.
3. **Initialise priority queue:** Create a priority queue (heap, using the heapq package) to store edges incident to visited vertices. Initialise the priority queue with edges incident to the starting vertex. In practice, do this by creating a list of tuples (`pq = [(weight1, starting_vertex, neighbour1),(weight2, starting_vertex, neighbour2), ...]` first, and then use `heapq.heapify(pq)` to turn it into a priority queue.
4. **Explore edges:** Use a while loop to iterate until the priority queue is empty. Inside the loop, pop the edge with the minimum weight from the priority queue (using `heapq.heappop(pq)`)
5. **Check visited status:** Check if the destination vertex of the popped edge is already visited. If not visited:
   - Mark the destination vertex as visited.
   - Add the edge to the minimum spanning tree.
   - Explore edges incident to the destination vertex.
   - Add unvisited incident edges to the priority queue.
6. **Return minimum spanning tree:** After visiting all vertices, return the list of edges forming the minimum spanning tree.

### Solution 5

In [81]:
def prim_algorithm(graph, starting_vertex):
    """
    Implemented Prim's algorithm to find the minimum spanning tree of a connected, undirected graph.

    Parameters:
    - graph (dict): A dictionary representing the graph where keys are vertices and values are dictionaries 
                    containing neighbors as keys and edge weights as values.
    - starting_vertex: The vertex from which to start the search.

    Returns:
    - min_spanning_tree (list of tuples): A list of edges forming the minimum spanning tree.
    """
    min_spanning_tree = [] 
    visited = set()  
    visited.add(starting_vertex)  
    pq = [(weight, starting_vertex, neighbor) for neighbor, weight in graph[starting_vertex].items()] 
    heapq.heapify(pq)
    
    while pq:
        weight, u, v = heapq.heappop(pq)  # Pop the edge with the minimum weight
        if v not in visited:  # If the destination vertex is not visited
            visited.add(v)  # Mark the destination vertex as visited
            min_spanning_tree.append((u, v, weight))  # Add the edge to the minimum spanning tree
            for neighbor, weight in graph[v].items():  # Add edges incident to the destination vertex to the priority queue
                if neighbor not in visited:
                    heapq.heappush(pq, (weight, v, neighbor))
    
    return min_spanning_tree

In [82]:
graph1 = {
    'A': {'B': 2, 'C': 1},
    'B': {'A': 2, 'C': 2, 'D': 1},
    'C': {'A': 1, 'B': 2, 'D': 2},
    'D': {'B': 1, 'C': 2}
}
print(prim_algorithm(graph1, 'A'))

graph2 = {
    'A': {'B': 4, 'C': 8},
    'B': {'A': 4, 'C': 2, 'D': 5},
    'C': {'A': 8, 'B': 2, 'D': 5},
    'D': {'B': 5, 'C': 5}
}
print(prim_algorithm(graph2, 'A'))

[('A', 'C', 1), ('A', 'B', 2), ('B', 'D', 1)]
[('A', 'B', 4), ('B', 'C', 2), ('B', 'D', 5)]
