<a href="https://colab.research.google.com/github/karen-wang/class-scheduling/blob/main/Using_Topological_Sort_to_Schedule_Courses_Efficiently.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Topological Sort to Schedule Courses Efficiently
*Author: [Karen Wang](https://github.com/karen-wang)*<br/>
*Last Edited: April 11th, 2023*<br/>

As a student, you probably have a list of courses that you need to take in order to graduate. Many of those courses also likely have prerequisites. Since you can't take a course before completing its prerequisites, how do you figure out a valid sequence of courses to take? One approach uses an algorithm called **topological sorting**. In this tutorial, we will explain how to use topological sorting to write a course-scheduling Python function.

## Overview

### Audience

This tutorial is intended for computer science students who are familiar with the following topics:
- Complexity analysis
- Data structures, particularly graphs
- Programming

### What this tutorial covers
- Explanation of what topological sorting is and why it's useful
- Application of topological sorting to a course-scheduling problem

### What this tutorial doesn't cover
- Detailed implementation of topological sorting

## Input

Our input will be a Python dictionary representing a catalog of courses required to graduate. 
The catalog maps each course to a set of its prerequisites, if any.

In [1]:
course_catalog = {
    "Intro to Programming": {},
    "Computer Systems": {"Intro to Programming"},
    "Discrete Math": {},
    "Probability": {"Discrete Math"},
    "Algorithms": {"Computer Systems", "Discrete Math"},
}

## Output

Our output will be a course sequence. A student who follows the outputted sequence will be able to fulfill prerequisites in the correct order and graduate. There might be many valid sequences—returning one of them is enough. For the input above, a valid sequence is below.

```
1. Intro to Programming
2. Discrete Math
3. Computer Systems
4. Probability
5. Algorithms
```

If we cannot find a valid output, then we will return an error. 

## Inefficient solution

Let's first explore a naive solution that does not use topological sorting. The approach is to generate all possible course sequences, and return the first one that fulfills prerequisites in the correct order.

Here is some pseudocode for this solution.


```
S <- Set of all possible course sequences.

for each sequence L in S do
  T <- Empty set that will contain courses already taken.
  F <- Flag that will keep track of whether or not we've
        found a valid sequence. Initially set to true.
 
  for each course C in L do
    P <- Set of prerequisites for course C.
    if P is a subset of T then
      insert C into T
      continue
    else
      F <- false
      break
  if F is true then
    return L  # A valid course sequence.

return error  # No valid course sequence could be found.
```

The issue with this naive approach is that it is not particularly efficient. Let's discuss why.

Let `n` represent the total number of courses. Then the inner loop would run in `O(n)` time.

If `n` represents the number of courses, then there are `n!` possible permutations of courses. Each permutation corresponds to a unique sequence of courses. This means the outer loop needs to iterate over `n!` different elements. So the time complexity of our naive algorithm is `O(n * n!)` time. Let's see if we can do better than this.

## What is topological sorting?

Let's explore a different approach. First, let's try drawing a diagram showing the prerequisite relationships between courses in our catalog. If `Course B` requires `Course A`, we'll draw an arrow like this: `Course A -> Course B`. So for example, since `Computer Systems` requires `Intro to Programming`, we'll draw `Intro to Programming -> Computer Systems`.

<img src="https://drive.google.com/uc?id=1Rtx4a4yZlDJ9DLsJ0mTIH3c8iOMAIMcf" width="30%" alt="Diagram of courses and their prerequisites."/>


### Directed acylic graphs (DAGs)

This diagram can also be interpreted as a **graph**. Recall that computer scientists define a graph as a set of **nodes** and a set of **edges** between those nodes. In a directed graph, like the one above, edges are directional and represented as arrows.

In fact, we can be even more specific about describing the graph above. It is actually a **directed acylic graph**, or **DAG** for short. A DAG is a special type of directed graph that does not contain any cycles. In other words, there are no closed loops within the graph. 

Why is this important? Imagine if there was a closed loop. 

<img src="https://drive.google.com/uc?id=1tSNFbhr2uD92krwqDiksaxmeplUMT5Zg" width="30%" alt="Graph of courses and their prerequisites, where the edges form a cycle.">


This graph is directed but contains a cycle. It is not a DAG. According to this graph, `Course C` requires `Course B`, and `Course B` requires `Course A`. But, `Course A` itself requires `Course C`! This is impossible. If your courses looks like this, you should probably complain to your advising department.

### Topological ordering

We've established that the relationship between courses and their prerequisites can be represented as a DAG. An important property of a DAG is that it always has at least one **topological ordering**. A topological ordering of a graph is a list of nodes that, if visited in-order, ensures all arrows are "followed" in the right direction. For example, a topological ordering of the graph below is:

```
1. Intro to Programming
2. Discrete Math
3. Computer Systems
4. Probability
5. Algorithms
```

<img src="https://drive.google.com/uc?id=1Rtx4a4yZlDJ9DLsJ0mTIH3c8iOMAIMcf" width="30%" alt="Diagram of courses and their prerequisites."/>


Does this list look familiar to you? In fact, this is the example of a valid course sequence from earlier! It turns out that, in order to find a valid course sequence, we simply need to find a topological ordering of the graph.

### Implementation overview

To find a topological ordering of a DAG, we need to perform a **topological sort**. A topological sort organizes the nodes of a graph into an ordered list. In this case, the order is determined by the following rule: if `Course A` is a prerequisite for `Course B`, then `Course A` must come before `Course B` in the list.

There are a number of different ways to implement topological sorting. The Python graph library [`networkx`](https://networkx.org) uses an approach called Kahn's algorithm. Here is the pseudocode for the algorithm.

```
L <- Empty list that will contain the sorted nodes.
S <- Set of all nodes with no incoming edges.

while S is not empty do
    remove a node N from S
    add N to L
    for each node M with an edge E from N to M do
        remove edge E from the graph
        if M has no other incoming edges then
            insert M into S

if graph has edges then
    return error  # Graph has at least one cycle.
else 
    return L  # A topologically sorted order.
```

The time complexity of Kahn's algorithm is `O(n + e)` where `n` is the number of nodes and `e` is the number of edges. This is because the algorithm explores each node and edge exactly once. This is a huge improvement over the `O(n * n!)` time complexity of the naive approach!

### Final solution

Now that we understand the principles behind topological sorting, we can use it to write an efficient course-scheduling function. First, we need to build a DAG from the given course catalog. Next, we  need to find a topological ordering of the DAG. Luckily, `networkx` already provides most of the functionality we need. The solution code is below.

In [6]:
import networkx
from collections.abc import Collection, Mapping

def build_graph(course_catalog: Mapping[str, Collection[str]]) -> networkx.DiGraph:
  G = networkx.DiGraph()  # Create an empty directed graph.
  for course, prereqs in course_catalog.items():
    G.add_node(course)
    for prereq in prereqs:
      G.add_edge(prereq, course)
  return G

def find_course_sequence(course_catalog: Mapping[str, Collection[str]]) -> list[str]:
  """Finds a valid course sequence."""
  G = build_graph(course_catalog)
  return list(networkx.topological_sort(G))
  

In [10]:
def display(course_sequence: list[str]) -> None:
  for i, course in enumerate(course_sequence):
    print(f"{i + 1}. {course}")

display(find_course_sequence(course_catalog))

1. Intro to Programming
2. Discrete Math
3. Computer Systems
4. Probability
5. Algorithms


## Conclusion

In summary, we have taken our course-scheduling problem and reframed it as finding a topological ordering over a DAG. We've shown how to use the topological sort algorithm to write an efficient and elegant solution, and compared its performance to a naive approach.

In addition to course-scheduling, topological sorting can be applied to a wide variety of problems that involve process flows. Can you think of any other applications? Try it out for yourself!

### Further reading
For more details about the implementation of Kahn's algorithm, please refer to the official [`networkx` guide](https://networkx.org/nx-guides/content/algorithms/dag/index.html).