# Technique - Topological Sorting

A topological ordering of nodes in a graph is an ordering where for every pair of nodes `N,M`, if a directed edge exists from `N` to `M`, `N` precedes `M` in the ordering. Topological orderings are only possible for directed acyclic graphs (DAGs). Every DAG has at least one topological ordering. 

Suppose we had the following tree, where directed edges only exist from parent to child:
```
              1
            /   \
           /     \
          2       3
         / \     / \
        4   5   6   7
       / \   \ /     \
      8   9   10     11
```
Some valid possible topological orderings for this tree are: 
- 1, 2, 5, 4, 9, 8, 3, 7, 11, 6, 10
- 1, 3, 7, 11, 6, 2, 5, 10, 4, 9, 8

In [1]:
from typing import Union, List

class GraphNode:
    def __init__(self, value: int = None, children: Union[List, None] = None):
        self.children = children if children else []
        self.value = value


nodes = {val: GraphNode(value=val) for val in range(1,12)}
nodes[1].children += [nodes[2], nodes[3]]
nodes[2].children += [nodes[4], nodes[5]]
nodes[3].children += [nodes[6], nodes[7]]
nodes[4].children += [nodes[8], nodes[9]]
nodes[5].children += [nodes[10]]
nodes[6].children += [nodes[10]]
nodes[7].children += [nodes[11]]
root = nodes[1]


Topological orderings can be generated via Kahn's algorithm or using DFSes, and are typically `O(|V|+|E|)` for V vertices and E edges in a DAG. 

Some important considerations before implementing a topological sorting function:
- Is the graph certain to be directed and acyclic? What do we do if we detect a cycle?
- Will we be given a node/collection of nodes from which every other node is certain to be reachable? (if not, topological sorting is not possible)
- How are edges and nodes represented? How is directionality represented?
- How many children can a node have? How are those children represented (does the node have a list of children, .left and .right attributes, or something else)?
- Can we modify the input graph?

When in doubt:
- Assume nodes may have unbounded children
- Assume cycles may exist
- If performance isn't totally critical, don't modify the input graph; generate an adjacency list or collection of nodes and modify that.
- Rule out the possibility of unreachable nodes (either make sure you have a reference to every node, or that all nodes will be reachable from whatever subset you're given).


The basic DFS implementation for topological ordering requires two functions:
```
top_sort(root or list of roots):
    t_ordering = []
    visited = set()
    while list of roots is nonempty: # only need a single dfs is one root is given 
        ancestors = set()
        dfs(root, visited, ancestors, t_ordering)
    return t_ordering

dfs(node, visited, ancestors, t_ordering):
    if node in visited: return
    if node in ancestors: raise exception("cycle exists")
    ancestors.add(node)
    for each child of node:
        dfs(child, visited, ancestors, t_ordering)
    ancestors.remove(node)
    visited.add(node)
    t_ordering.prepend(node) # can use a deque for this, or just reverse the t_ordering at the end
    return 
    
```
Some common twists are:
- The graph may have cycles; return False or throw an exception if so
- The graph may have multiple roots; you need to be able to access them all to formulate the t order
- You don't get a root at the beginning but some other representation of a graph; you will need to find roots first and sometimes construct an adjacency list. 

The code below implements the DFS-based te,[;ate on the above tree. In it, we assume that the given node is the root node, and that every other node is reachable from it:

In [2]:
from collections import defaultdict
from typing import Union, List

def visit(node, visited, ancestors, top_order):
    if node in visited:
        return
    if node in ancestors:
        raise RuntimeErrror("Cycle detected, topological sorting impossible.")
    
    ancestors.add(node)
    for child in node.children:
        visit(child, visited, ancestors, top_order)
    
    ancestors.remove(node)
    visited.add(node)
    top_order.append(node.value)

        
def top_sort(root):
    top_order = []
    visit(root, set(), set(), top_order)    
    return top_order[::-1]

print(top_sort(root))


[1, 3, 7, 11, 6, 2, 5, 10, 4, 9, 8]


## Example: [Course Schedule](https://leetcode.com/problems/course-schedule/)

In this question, we are given the list of edges (i.e. prereqs) and asked only to determine if topological ordering is possible (not to actually generate one); this is equivalent to asking to determine if a graph is acyclic. Since we are given a possibly acyclic graph and access to every node, we can use the DFS method to confirm every node is reachable from _some_ node and no cycles exist. By the end, we should have visited every node. Note that pairs are given as `(course, prereq)`. 

In [3]:
from collections import defaultdict
from typing import List

def visit(node, child_nodes, ancestors):
    if node not in child_nodes: return True
    if node in ancestors: return False
    ancestors.add(node)
    valid = True
    for child in child_nodes[node]:
        valid &= visit(child, child_nodes, ancestors)
    ancestors.remove(node)
    del child_nodes[node]
    return valid 

class Solution:
    def canFinish(self, num_courses: int, prerequisites: List[List[int]]) -> bool:
        starting_nodes = set([i for i in range(num_courses)])
        child_nodes = defaultdict(set)
        for course, prereq in prerequisites:
            child_nodes[prereq].add(course)
            starting_nodes.remove(course) if course in starting_nodes else None
        
        while starting_nodes:
            ancestors = set()
            current = starting_nodes.pop()
            if not visit(current, child_nodes, ancestors):
                return False
        return not (starting_nodes or child_nodes)
        


s = Solution()
s.canFinish(8, [[1,0],[2,6],[1,7],[5,1],[6,4],[7,0],[0,5]])
s.canFinish(3, [[1,0],[1,2],[0,1]])

False

## Example [Course Schedule II](https://leetcode.com/problems/course-schedule-ii/)

This problem is a more typical topological sort problem; we're actually expected to produce the ordering. We can do the same operation with a minor change to produce the actual order.


In [4]:
from collections import defaultdict

class CycleExistsError(Exception):
    pass

def visit(course, next_courses, visited, ancestors, top_order):
    if course in visited: return
    if course in ancestors: raise CycleExistsError
    
    ancestors.add(course)
    for child in next_courses[course]:
        visit(child, next_courses, visited, ancestors, top_order)
    ancestors.remove(course)
    visited.add(course)
    top_order.append(course)

class Solution:
    def findOrder(self, num_courses: int, prerequisites: List[List[int]]) -> List[int]:
        starting_courses = set([i for i in range(num_courses)])
        next_courses = defaultdict(set)
        for course, prereq in prerequisites:
            next_courses[prereq].add(course)
            starting_courses.remove(course) if course in starting_courses else None
        if not starting_courses:
            return []
        
        top_order = []
        visited = set()
        while starting_courses:
            ancestors = set()
            curr_course = starting_courses.pop()
            try:
                visit(curr_course, next_courses, visited, ancestors, top_order)
            except CycleExistsError:
                return []
        return top_order[::-1] if len(visited) == num_courses else []

s = Solution()
cases = [
    (2, [[1,0]]),
    (2, [[1,0], [0,1]]),
    (3, [[1,0], [0,1]]),
    (4, [[1,0], [2,1]])

]
for n_courses, prereqs in cases:
    print(s.findOrder(n_courses, prereqs))

[0, 1]
[]
[]
[3, 0, 1, 2]


## Example: [Graph Valid Tree](https://leetcode.com/problems/graph-valid-tree/)

## Example: [Alien Dictionary](https://leetcode.com/problems/alien-dictionary/)

#### Topological sorting is the easy part
If we can construct a DAG describing observed letter orders in the wordlist, we can use a topological sort to get  a global letter ordering (assuming the DAG is valid). The hard part for this problem is creating the DAG based on the given wordlist and finding the roots to start the tsort from. In our DAG, edge `a->b` should be added if we find `a` precedes `b` from the wordlist; we use an adjacency list to represent this. 

#### What does the wordlist tell us about the lex order?
- Word order tells us the order of their first letters (unless the letters match); seeing `["add, "bob", "car"]` implies `a < b < c`. 
- For any group of words with the same first k letters, their ordering tells us the order of their k+1 letters; seeing `["aaa, "aab", "aac"]` implies `a < b < c`.  
- To correctly execute a tsort, we only need to know the immediate predecessor and successor of a letter - in seeing `["aaa, "aab", "aac"]`, we only need to store `a < b, b < c`. We don't need to store that `a < c`.

#### How to form the DAG 
We can go word-by-word and examine the ordering of the letters. For two adjacent words `word1` and `word2`, 
if `word1[0] != word2[0]`, we can add an edge `word1[0] -> word2[0]`. If `word1[0] == word2[0]`, we must then test `word1[1] == word2[1]` and so on until we find a nonmatch or exhaust a word. If we exhaust a word, we don't store an edge - e.g. `[abc, abcd]` is the correct order but we learn nothing about `d`. Note that the wordlist is invalid if we exhausted the second word; `[abcd, abc]` is invalid. 

This technique can be used to solve [Verifying An Alien Dictionary](https://leetcode.com/problems/verifying-an-alien-dictionary/). 

#### How to get the roots
Once we have a DAG, we need to start from a node that has no parents; this is a letter that isn't found in any successor list. First, we should make sure every letter in every word is present in the DAG, even if it has no children or parents. Then, we can take the setwise complement of the keys of the adjacency list (i.e. every node in the DAG) with the union of each set of successor letters in the DAG. This leaves us with only letters who have no parents, i.e. roots. 

#### Edge cases
- Single word in wordlist. We aren't told what to do here, so I'm going to assume this is an invalid order (we have no way of knowing which letter comes first).
- Letters with no obvious order. If we get `["abc", "qmk"]`, we only know that `a` precedes `q`. I'm going to assume the remaining four letters can come in any order. 

#### A note about constraints
The constraints of the problem are small; we have at most 100 words which are each at most 100 characters (so 100,000 characters total at most), and there are only 26 possible characters. Any approach bound by the number of possible letters is `O(c)`, so we could use a quadratic algorithm on set of letters. Linear approaches looking at every given character will also easily work in under a second (quadratic ones will not, though). 



In [5]:
def create_letter_dag(words):
    """
    Create DAG by doing letter-by-letter
    comparisons on each word as described above.
    """
    # Every letter in given input should be added to the DAG
    all_letters = set()
    for word in words:
        all_letters |= set(word)
    successors = {letter: set() for letter in all_letters}

    # Word-by-word comparison to establish ordering
    prev = words[0]
    for word in words[1:]:
        i = 0
        shorter_len = min(len(prev), len(word)) 
        while i < shorter_len and prev[i] == word[i]:
            i+=1
        # Invalid: matched up until end of one, but shorter comes after
        if (i >= shorter_len and len(prev) > len(word)):  
            return {}
        successors[prev[i]].add(word[i]) if i < shorter_len else None
        prev = word
    return successors
        
def visit(letter, dag, visited, ancestors, t_order):
    """
    Standard DFS for topological sort 
    """
    if letter in visited: return True
    if letter in ancestors or not dag: return False
    ancestors.add(letter)
    for child in dag[letter]:
        if not visit(child, dag, visited, ancestors, t_order):
            return False
    ancestors.remove(letter)
    visited.add(letter)
    t_order.append(letter)
    return True

class Solution:
    def alienOrder(self, words: List[str]) -> str:
        """
        Create DAG from letters, then launch DFS and
        only return top ordering if we visit every 
        node without hitting a cycle.
        """
        # Instead of taking the complement
        # of the union of all successors and the
        # set of all nodes, we cumulative take
        # the complement of each successor with
        # the set of all letters.
        dag = create_letter_dag(words)
        roots = set(dag.keys())
        for successors in dag.values():
            roots -= successors
            
        visited = set()
        t_order = []
        while roots:
            letter = roots.pop()
            if not visit(letter, dag, visited, set(), t_order):
                    return ""
        return ''.join(t_order[::-1]) if len(t_order) == len(dag) else ""
            
            
s = Solution()
cases = [
    (["wrt","wrf","er","ett","rftt"], "wertf"),
    (["z","x"], "zx"),
    (["z","x","z"], ""),
    (["abc","ab"], ""),
    (["z","z"], "z"),
    (["dvpzu","bq","lwp","akiljwjdu","vnkauhh","ogjgdsfk","tnkmxnj","uvwa","zfe","dvgghw","yeyruhev","xymbbvo","m","n"], "")
]
for word_list, expected in cases:
    actual = s.alienOrder(word_list)
    assert actual == expected, f"{word_list}: {expected} != {actual}"