<a href="https://colab.research.google.com/github/karen-wang/class-scheduling/blob/main/Using_Topological_Sort_to_Schedule_Classes_Efficiently.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Topological Sort to Schedule Classes Efficiently
*by Karen Y. Wang on April 6th, 2023*

As a student, you probably have a list of classes that you need to take in order to graduate. Many of those classes also likely have prerequisites. Since you can't take a class before completing its prequisites, how do you figure out a valid sequence of classes to take? One approach uses an algorithm called **topological sorting**. In this tutorial, we will explain how to use topological sorting to write a class-scheduling Python function.

## Overview

### Audience

This tutorial is intended for computer science students who are familiar with the following topics:
- Combinatorics
- Complexity analysis
- Data structures, particularly graphs
- Programming
- Set theory

### What this tutorial covers
- Explanation of what topological sorting is and why it's useful
- Application of topological sorting to a class-scheduling problem

### What this tutorial doesn't cover
- Detailed implementation of topological sorting

## Input

Our input will be a Python dictionary representing a directory of classes required to graduate. The directory lists classes and their prequisites, if any.

In [3]:
class_directory = {
    "Intro to Programming": [],
    "Computer Systems": ["Intro to Programming"],
    "Discrete Math": [],
    "Probability": ["Discrete Math"],
    "Algorithms": ["Computer Systems", "Discrete Math"],
}

## Output

Our output will be a class sequence. A student who follows the given class sequence should be able to fulfill all prequisites and graduate. There might be many valid sequences—returning one of them is enough. For the input above, a valid sequence is below.

```
1. Intro to Programming
2. Discrete Math
3. Computer Systems
4. Probability
5. Algorithms
```

## Inefficient solution

Let's first explore a naive solution that does not use topological sorting and is relatively inefficient. The approach is to generate all possible class sequences, and return the first one that fulfills all prequisites.

Here is some pseudocode for this solution.


```
S <- Set of all possible class sequences

for each sequence L in S do
  T <- Empty set that will contain classes already taken
  F <- Flag that will keep track of whether we've found
       a valid sequence. Initially set to true
 
  for each class C in L do
    P <- Set of prequisites for class C
    if P is a subset of T then
      insert C into T
      continue
    else
      F <- false
      break
  if F is true then
    return L  # A valid class sequence

return error  # No valid class sequence could be found
```

The issue with this naive approach is that it is not particularly efficient. Let's discuss why.

Let `n` represent the total number of classes. Then the inner loop would run in `O(n)` time.

If `n` represents the number of classes, then there are `n!` possible permutations of classes. Each permutation corresponds to a unique sequence of classes. This means the outer loop needs to iterate over `n!` different elements. So the time complexity of our naive algorithm is `O(n * n!)` time. Let's see if we can do better than this.

## What is topological sorting?

In order to think of a different approach, let's try drawing a diagram showing the relationship between classes and their prerequisites. If `Class B` has `Class A` as a prequisite, we'll draw an arrow like this: `Class A -> Class B`.

<img src="https://drive.google.com/uc?id=1Rtx4a4yZlDJ9DLsJ0mTIH3c8iOMAIMcf" width="30%" alt="Diagram of classes and their prerequisites."/>


### Directed acylic graphs (DAGs)

This diagram can also be interpreted as a **graph**. Recall that computer scientists define a graph as a set of **nodes** and a set of **edges** between those nodes. In a directed graph, like the one above, edges are directional and are represented as arrows.

In fact, we can be even more specific about describing the graph above. It is actually a **directed acylic graph**, or **DAG** for short. A DAG is a special type of directed graph that does not contain any cycles. In other words, there are no closed loops within the graph. 

Why is this important? Imagine if there was a closed loop in a graph of classes and their prerequisites. 

<img src="https://drive.google.com/uc?id=1FLZZnfFB2tkeKtimh5L56DQOdEJvObS0" width="30%" alt="Graph of classes and their prerequisites, where the edges form a cycle.">


This graph is directed but contains a cycle. It is not a DAG. According to this graph, you would need to take `Class A` before `Class B`, and `Class B` before `Class C`. But, you would also need to take `Class C` before `Class A`! This is impossible. If your class sequence looks like this, you should probably complain to your advising department.

### Topological ordering

We've established that the relationship between classes and their prerequisites can be represented as a DAG. An important property of a DAG is that it always has at least one **topological ordering**. A topological ordering of a graph is a list of nodes that, if visited in-order, ensures that all arrows are "followed" in the right direction. For example, a topological ordering of the graph below is:

```
1. Intro to Programming
2. Discrete Math
3. Computer Systems
4. Probability
5. Algorithms
```

<img src="https://drive.google.com/uc?id=1Rtx4a4yZlDJ9DLsJ0mTIH3c8iOMAIMcf" width="30%" alt="Diagram of classes and their prerequisites."/>


Does this list look familiar to you? In fact, this is the  class sequence from earlier! It turns out that, in order to find a valid class sequence, we simply need to find a topological ordering of the graph of classes and their prerequisites.

### Implementation overview

To find a topological ordering of a DAG, we need to perform a **topological sort**. A topological sort organizes the nodes of a graph into an ordered list. In this case, the order is determined by the following rule: if `Class A` is a prequisite for `Class B`, then `Class A` must come before `Class B` in the list.

There are a number of different ways to implement topological sorting. The Python graph library [`networkx`](https://networkx.org) uses an approach called Kahn's algorithm. Here is the pseudocode for the algorithm.

```
L <- Empty list that will contain the sorted nodes
S <- Set of all nodes with no incoming edges

while S is not empty do
    remove a node N from S
    add N to L
    for each node M with an edge E from N to M do
        remove edge E from the graph
        if M has no other incoming edges then
            insert M into S

if graph has edges then
    return error  # Graph has at least one cycle
else 
    return L  # A topologically sorted order
```

The time complexity of Kahn's algorithm is `O(n + e)` where `n` is the number of nodes and `e` is the number of edges. This is because the algorithm explores each node and edge exactly once. This is an improvement over the `O(n * n!)` time complexity of the naive approach.

### Final solution

Now that we understand the principles behind topological sorting, we can use it to write an efficient class-scheduling function. We'll use the `networkx` library to build a DAG from the given class directory and sort it topologically.

In [4]:
import networkx
from typing import Collection, List, Mapping

def build_graph(class_directory: Mapping[str, Collection[str]]) -> networkx.DiGraph:
  G = networkx.DiGraph()  # Create an empty directed graph.
  for classname, prereqs in class_directory.items():
    G.add_node(classname)
    for prereq in prereqs:
      G.add_edge(prereq, classname)
  return G

def find_class_sequence(class_directory: Mapping[str, Collection[str]]) -> List[str]:
  """Finds a valid class sequence."""
  G = build_graph(class_directory)
  return list(networkx.topological_sort(G))
  

In [5]:
def display(class_sequence: List[str]) -> None:
  for i, classname in enumerate(class_sequence):
    print("{0}. {1}".format(i + 1, classname))

display(find_class_sequence(class_directory))

1. Intro to Programming
2. Discrete Math
3. Computer Systems
4. Probability
5. Algorithms


## Conclusion

In summary, we have taken our original class-scheduling problem and reframed it as finding a topological ordering over a DAG. We've shown how to use the topological sort algorithm to write an efficient and elegant solution, and compared its performance to a naive approach.

In addition to scheduling classes, topological sorting can be applied to a wide variety of problems that involve process flows. Can you think of any other applications? Try it out for yourself!

### Further reading
For more details about the implementation of Kahn's algorithm, please refer to the official [`networkx` guide](https://networkx.org/nx-guides/content/algorithms/dag/index.html).