<a href="https://colab.research.google.com/github/mahbubcsedu/interviewcoding/blob/main/Toposort.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Topological sorting
##How to figure out that a problem can be solved by topological sorting

To determine whether a problem involves a Directed Acyclic Graph (DAG), you can look for certain clues and characteristics in the problem statement. Here are some tips:

1. **Direction and Cycles:**
   - A DAG has directed edges and no cycles. If the problem specifies directed relationships and emphasizes that there are no cycles, it likely involves a DAG.

2. **Topological Sorting:**
   - If the problem requires you to order or schedule tasks in a sequence such that each task follows certain prerequisites, this is a strong indicator of a DAG. Topological sorting is a common algorithm used on DAGs.

3. **Dependencies:**
   - Look for problems that mention dependencies between elements. If elements or tasks depend on one another and must be completed in a specific order, a DAG might be involved.

4. **Graph Terminology:**
   - Pay attention to the use of terms like "directed," "acyclic," "topological order," or "dependencies." These terms are often associated with DAGs.

5. **Common Scenarios:**
   - Some common scenarios where DAGs are used include:
     - Task scheduling with dependencies (e.g., course prerequisites, project planning).
     - Representing hierarchical structures without cycles (e.g., version control, build systems).
     - Finding the longest or shortest paths in directed graphs without cycles.

### Example Indicators

- **Task Scheduling:** If the problem is about scheduling tasks where certain tasks must be completed before others, it's likely involving a DAG.
- **Prerequisites:** If the problem involves courses with prerequisites, it implies a DAG structure to maintain the order of completion.
- **Order of Operations:** If the problem requires you to determine an order of operations or dependencies, a DAG might be at play.

By keeping an eye out for these clues and characteristics, you can identify when a problem involves a Directed Acyclic Graph. Let me know if you have any specific problems in mind, and I can help you analyze them!

[LC-207 Course Schedule](https://leetcode.com/problems/course-schedule)

[course-schedule-ii](https://leetcode.com/problems/course-schedule-ii/)

In [None]:
# Course can finish?

# one method is to find cycle in the graph using dfs
# need to create a graph where prereqisti will be the neigbors of the course
# we can keep track of that using a hash pre

# But as part of topo, this problem is best suited by top sort

def canFinish(numCourse, prerequisite):
  indegree = {c:0 for c in range(numCourse)} # put indegree for all nodes

  g = collections.defaultdict(list)
  q = collections.deque()

  for crs, pre in prerequisites:
    g[pre].append(crs)
    indegree[crs] +=1

  #our graph is ready, lets add indegree 0 to the queue
  for i in range(numCourse):
    if indegree[i] ==0:
      q.append(i)

  order = []
  while q:
    node = q.popleft()
    order.append(node)

    for nei in g[node]:
      indegree[nei] -=1
      if indegree[nei] ==0:
        q.append(nei)
  return len(order) == numCourse

In [None]:
class Solution:
    def findOrder(numCourses: int, prerequisites: List[List[int]]) -> List[int]:
        indegree = {c: 0 for c in range(numCourses)}

        g = collections.defaultdict(list)

        for nxt, pre in prerequisites:
            g[pre].append(nxt)
            indegree[nxt] +=1

        order=[]
        q = collections.deque()

        for i in range(numCourses):
            if indegree[i]==0:
                q.append(i)

        while q:
            node = q.popleft()
            order.append(node)

            for neighbor in g[node]:
                indegree[neighbor] -=1
                if indegree[neighbor]==0:
                    q.append(neighbor)

        return order if len(order)==numCourses else []

In [None]:
# The concept of minimum height tree is we can remove two degree or edges while traversing as the graph is not directed
# when we add two way connection, removing requires to remove both of them
# if two nodes are left, we can return that two nodes
# we will take all leaves node from q to an array and remove one of the edge first
# then we travers as normal graph neighbor of g[leaf] and remove another, this way both edges will be removed
# after doing this, the last round leaves wil be the result
# the reason is clear, if we have even length, we will have one, but odd lengths longest path, result will be two
class Solution:
    def findMinHeightTrees( n: int, edges: List[List[int]]) -> List[int]:

        if n <=2:
            return [i for i in range(n)]

        indegre = {c:0 for c in range(n)}
        g=collections.defaultdict(list)

        q=collections.deque()

        for u, v in edges:
            g[u].append(v)
            g[v].append(u)
            indegre[u] +=1
            indegre[v] +=1

        for i in range(n):
            if indegre[i]==1:
                q.append(i)
        # remaining_nodes = n
        leaves=[]
        while q:
            # node = q.popleft()
            leaves=[]
            while q:
                leaves.append(q.popleft())

            for leaf in leaves:
                indegre[leaf] -=1

                for neighbor in g[leaf]:
                    indegre[neighbor] -=1

                    if indegre[neighbor]==1:
                        q.append(neighbor)

        return leaves


In [None]:
#Alien dictionary

class Solution:
    def alienOrder(self, words: List[str]) -> str:
        degree = {key: 0 for key in set(''.join(words))}
        graph = collections.defaultdict(list)

        # build graph for topological sort
        for w1, w2 in zip(words, words[1:]):

            # for-else loop in python, when for does not encounter any break,
            # mean all c1,c2 is matched, but c2 has more elements
            for c1, c2 in zip(w1,w2):
                if c1 != c2:
                    degree[c2] += 1
                    graph[c1].append(c2)
                    break
            else:
                if len(w1) > len(w2): return ""  # if abc, ab, then invalid, ab, aba, abc is valid

        lst_no_dep = [ x for x in degree.keys() if degree[x] == 0]

        # Topological sort
        stk = []
        while lst_no_dep:
            ch = lst_no_dep.pop()
            stk.append(ch)
            for ch_greater in graph[ch]:
                degree[ch_greater] -= 1
                if degree[ch_greater] == 0:
                    lst_no_dep.append(ch_greater)

        return ''.join(stk) if len(stk) == len(degree) else ''


The **Sequence Reconstruction** problem on LeetCode (Problem 444) asks if there is a unique way to reconstruct an original sequence from a list of subsequences. This can be solved using **topological sorting**.

### Problem Summary

Given:
- An `org` array representing the target sequence.
- A list of subsequences, `seqs`, that must allow us to reconstruct the exact order in `org`.

The task is to determine whether there exists a **unique sequence** that can be reconstructed from `seqs` and is identical to `org`.

### Approach Using Topological Sorting

1. **Graph Construction**:
   - Construct a **directed graph** from the subsequences in `seqs`, where each element in a subsequence implies an ordering between consecutive elements.
   - Create an **in-degree** array to count incoming edges for each node.

2. **Topological Sort with Unique Order Check**:
   - Perform a topological sort on the graph:
     - At each step, if there is more than one node with `in-degree` zero, then there is no unique way to reconstruct the sequence, and the result should be `False`.
     - If we can uniquely determine the next element each time, we proceed.
   - If the topological order of the sorted nodes matches `org`, then we have a unique reconstruction that matches the target sequence.

3. **Edge Cases**:
   - If `seqs` is empty or has no valid information about all elements in `org`, return `False`.
   - If any element in `org` is missing from `seqs`, reconstruction is impossible.

### Implementation

Here's how to implement this approach in Python:

```python
from collections import defaultdict, deque

def sequenceReconstruction(org, seqs):
    # Edge case: empty seqs or mismatch in elements
    if not seqs or not any(seqs):
        return False

    # Step 1: Build the graph and in-degree array
    graph = defaultdict(list)
    in_degree = {num: 0 for num in org}

    # Populate graph and in-degree count
    nodes_in_seqs = set()  # To check if all nodes in org appear in seqs
    for seq in seqs:
        nodes_in_seqs.update(seq)
        for i in range(len(seq) - 1):
            u, v = seq[i], seq[i + 1]
            if v not in graph[u]:  # Avoid duplicate edges
                graph[u].append(v)
                in_degree[v] += 1

    # If any element in org is missing in seqs, return False
    if set(org) != nodes_in_seqs:
        return False

    # Step 2: Perform topological sort
    queue = deque([node for node in org if in_degree[node] == 0])
    reconstructed_sequence = []

    while queue:
        # If there's more than one node with zero in-degree, not a unique sequence
        if len(queue) > 1:
            return False

        # Pop the only node with zero in-degree
        current = queue.popleft()
        reconstructed_sequence.append(current)

        # Reduce in-degree of neighbors
        for neighbor in graph[current]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    # Step 3: Check if reconstructed sequence matches the original and if we used all nodes
    return reconstructed_sequence == org

# Example usage
org = [1, 2, 3]
seqs = [[1, 2], [1, 3], [2, 3]]
print(sequenceReconstruction(org, seqs))  # Output: True
```

### Explanation of the Code

1. **Graph Construction**:
   - We create `graph`, a dictionary of lists, where each key is a node and each value is the list of nodes it points to (its successors).
   - `in_degree` keeps track of how many edges are pointing to each node.

2. **Topological Sort Using Queue**:
   - We initialize `queue` with nodes that have zero in-degree (i.e., no dependencies).
   - For each node:
     - If there’s more than one node in `queue`, return `False`, as this means there is more than one possible order.
     - Otherwise, we add the node to the `reconstructed_sequence` and decrease the in-degrees of its neighbors.
     - If any neighbor’s in-degree becomes zero, we add it to the queue.

3. **Final Check**:
   - After constructing the sequence, we check if `reconstructed_sequence` matches `org`. If it does, the unique order requirement is met, and we return `True`.
   - If the two sequences don’t match, return `False`.

### Complexity Analysis
- **Time Complexity**: \(O(V + E)\), where \(V\) is the number of nodes in `org` and \(E\) is the number of edges in `seqs`.
- **Space Complexity**: \(O(V + E)\) for storing the graph and in-degree array.

This approach ensures we can check for a unique topological sort that exactly matches `org`, providing an efficient solution to the problem.

## Amazon question from discussion
a) You have a package repository in which there are dependencies between packages for building like package A has to be built before package B. If you are given dependencies between the packages and package name x, we have find the build order for x.
Ex: A → {B,C}
B → {E}
C → {D,E,F}
D → {}
F → {}
G → {C}


For package A, build order is E B F D C A (may not unique)


Given a function Set getDependencies (Package packageName) which returns a set of dependencies for a given package name, write a method List getBuildOrder(Package packageName) which returns the build order


b) How would you handle cyclic dependencies (Algo only)

Yes, we can solve this problem using topological sorting since the dependencies form a Directed Acyclic Graph (DAG). Topological sorting will allow us to determine a build order for the packages. Here's how you can implement it:

### Steps for Topological Sort:
1. **Create a graph representation of the dependencies.**
2. **Compute in-degrees for each node** (number of incoming edges).
3. **Initialize a queue with nodes having an in-degree of 0** (no dependencies).
4. **Process the queue:** Remove nodes from the queue, add them to the build order, and decrement the in-degrees of their dependent nodes. If any dependent node's in-degree becomes 0, add it to the queue.
5. **Check for cycles:** If the build order contains all nodes, it means there are no cycles. Otherwise, a cycle exists.

### Implementation

Here's a Python implementation of the topological sort approach:

```python
from collections import defaultdict, deque
from typing import List, Dict, Set

class PackageRepository:
    def __init__(self):
        self.dependencies = defaultdict(set)
    
    def addPackage(self, package, deps):
        self.dependencies[package] = deps
    
    def getDependencies(self, package):
        return self.dependencies[package]

class Solution:
    def getBuildOrder(self, packageName, repo):
        graph = defaultdict(set)
        in_degrees = defaultdict(int)

        # Build the graph and compute in-degrees
        for package in repo.dependencies:
            for dep in repo.getDependencies(package):
                graph[dep].add(package)
                in_degrees[package] += 1

        # Initialize the queue with nodes having in-degree 0
        queue = deque([node for node in repo.dependencies if in_degrees[node] == 0])
        build_order = []

        while queue:
            node = queue.popleft()
            build_order.append(node)
            for neighbor in graph[node]:
                in_degrees[neighbor] -= 1
                if in_degrees[neighbor] == 0:
                    queue.append(neighbor)

        # If build_order doesn't contain all packages, there is a cycle
        if len(build_order) != len(repo.dependencies):
            raise ValueError("Cyclic dependency detected")

        # Find the order related to the given packageName
        if packageName in build_order:
            index = build_order.index(packageName)
            return build_order[:index+1]
        
        return []

# Example usage:
repo = PackageRepository()
repo.addPackage('A', {'B', 'C'})
repo.addPackage('B', {'E'})
repo.addPackage('C', {'D', 'E', 'F'})
repo.addPackage('D', set())
repo.addPackage('F', set())
repo.addPackage('G', {'C'})

sol = Solution()
print(sol.getBuildOrder('A', repo))  # Output: ['E', 'B', 'D', 'F', 'C', 'A'] (order may vary)
```

### Explanation:
1. **PackageRepository Class:** Manages the dependencies using a dictionary of sets.
2. **Solution Class:** Implements the `getBuildOrder` method, which performs topological sorting using Kahn's algorithm.
3. **Graph and In-Degrees:**
   - Builds the graph representation and computes in-degrees for each package.
4. **Topological Sort:**
   - Initializes the queue with packages having in-degree 0.
   - Processes the queue to determine the build order.
   - Checks for cycles by comparing the length of the build order with the number of packages.
5. **Result:**
   - Returns the build order for the given packageName.

This approach ensures that we find a valid build order if no cycles exist. If a cycle is detected, an error is raised. Let me know if you need any further assistance!

### Solution

#### a) Finding the Build Order

To determine the build order for a given package and its dependencies, we can use Depth-First Search (DFS) to perform a topological sort. This ensures that we visit all dependencies of a package before visiting the package itself. We'll also use a stack to store the build order.

Here's the Python implementation:

```python
from collections import defaultdict, deque

class PackageRepository:
    def __init__(self):
        self.dependencies = defaultdict(set)
    
    def addPackage(self, package, deps):
        self.dependencies[package] = deps
    
    def getDependencies(self, package):
        return self.dependencies[package]

class Solution:
    def getBuildOrder(self, packageName, repo):
        visited = set()
        stack = []
        cyclic = set()

        def dfs(package):
            if package in cyclic:
                raise ValueError("Cyclic dependency detected")
            if package in visited:
                return
            cyclic.add(package)
            for dep in repo.getDependencies(package):
                dfs(dep)
            cyclic.remove(package)
            visited.add(package)
            stack.append(package)
        
        try:
            dfs(packageName)
        except ValueError as e:
            return str(e)
        
        return stack[::-1]

# Example usage:
repo = PackageRepository()
repo.addPackage('A', {'B', 'C'})
repo.addPackage('B', {'E'})
repo.addPackage('C', {'D', 'E', 'F'})
repo.addPackage('D', set())
repo.addPackage('F', set())
repo.addPackage('G', {'C'})

sol = Solution()
print(sol.getBuildOrder('A', repo))  # Output: ['E', 'B', 'F', 'D', 'C', 'A'] (order may vary)
```

### Explanation:
1. **PackageRepository Class:** Manages the dependencies using a dictionary of sets.
2. **Solution Class:** Implements the `getBuildOrder` method, which performs a DFS to determine the build order.
3. **DFS Function:** Visits each package and its dependencies, adds them to the visited set and stack in the correct order, and detects cyclic dependencies.

#### b) Handling Cyclic Dependencies

To handle cyclic dependencies, the algorithm needs to detect cycles during the DFS traversal. This can be achieved by maintaining a set of packages currently in the recursion stack (i.e., the `cyclic` set in the code). If a package is revisited while it's still in the recursion stack, it indicates a cyclic dependency.

### Algorithm:
1. **DFS Traversal:** Perform DFS on each package.
2. **Track Recursion Stack:** Maintain a set of packages in the current DFS path (recursion stack).
3. **Cycle Detection:** If a package is revisited while still in the recursion stack, raise a cyclic dependency error.
4. **Return Order:** If no cycle is detected, return the topological order.

This approach ensures that cyclic dependencies are detected, and the build order is only provided if the dependencies form a valid DAG (Directed Acyclic Graph). Let me know if you need any further clarifications or assistance!