# CSPB 3104 Assignment 8: Problem Set
## Instructions

> This assignment is to be completed and uploaded to 
moodle as a python3 notebook. 

> Submission deadlines are posted on moodle. 

> The questions  provided  below will ask you to either write code or 
write answers in the form of markdown.

> Markdown syntax guide is here: [click here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

> Using markdown you can typeset formulae using latex.

> This way you can write nice readable answers with formulae like thus:

>> The algorithm runs in time $\Theta\left(n^{2.1\log_2(\log_2( n \log^*(n)))}\right)$, 
wherein $\log^*(n)$ is the inverse _Ackerman_ function.

__Double click anywhere on this box to find out how your instructor typeset it. Press Shift+Enter to go back.__


----

## Question 1: Shortest Cycle Involving a Given Node.

You are given a directed graph $G: (V, E)$ using an adjacency list representation and a vertex (node) $u$ of the graph.
Write an algorithm to perform the following tasks:

__1(A)__ Write an algorithm that decides (true/false) whether the vertex $u$ belongs to a cycle.

What is the complexity for your algorithm in terms of the number of vertices $|V|$ and the number of edges $|E|$?

Note: Throughout this assignment you may describe your algorithms using words and definitely use algorithms that you have already learned in class. A brief description will do.


In [None]:
def isNodeInCycle(graph, u):
    def dfs(v):
        visited[v] = True
        recursionStack[v] = True
        for neighbor in graph[v]:
            if not visited[neighbor]:
                if dfs(neighbor):
                    return True
            elif recursionStack[neighbor]:
                return True
        recursionStack[v] = False
        return False

    visited = [False] * len(graph)
    recursionStack = [False] * len(graph)
    return dfs(u)

# test
graph_example = {
    0: [1],
    1: [2],
    2: [3, 4],
    3: [0],
    4: []
}
# Check if node 1 is in a cycle
is_node_in_cycle = isNodeInCycle(graph_example, 1)
is_node_in_cycle

True

### Initialization: 
    Start by marking all vertices as unvisited. Create a recursion stack that keeps track of the vertices in the current DFS path.

### Depth-First Search (DFS): 
    Perform a DFS starting from the given vertex u. For each vertex v visited during the DFS, do the following:
    Mark v as visited.
    Add v to the recursion stack.
    For each neighbor w of v, check:
        If w is not visited, recursively perform DFS on w.
        If w is already in the recursion stack, a cycle is found. Return true.
    After exploring all neighbors of v, remove v from the recursion stack.

### Cycle Detection: 
    If the DFS completes without finding any backedge as described above, then vertex u does not belong to any cycle. Otherwise, if a backedge is found during the DFS, u is part of a cycle.

### Complexity Analysis

### Time Complexity: 
    The algorithm visits each vertex once and explores each edge once in the worst case. Therefore, the time complexity is O(∣V∣+∣E∣), where ∣V∣ is the number of vertices and ∣E∣ is the number of edges in the graph.

### Space Complexity: 
    The space complexity is O(∣V∣) due to the storage needed for the recursion stack and the visited vertices.

__1(B)__ Write an algorithm which prints the smallest length cycle involving the vertex $u$.

What is the complexity for your algorithm in terms of the number of vertices $|V|$ and the number of edges $|E|$?


In [5]:
from collections import deque

def printSmallestCycleLength(graph, u):
    def bfs(v):
        # Initialize distances as infinity, except for the starting vertex v
        distances = {vertex: float('inf') for vertex in graph}
        distances[v] = 0
        queue = deque([v])
        
        while queue:
            current = queue.popleft()
            for neighbor in graph[current]:
                if distances[neighbor] == float('inf'):
                    distances[neighbor] = distances[current] + 1
                    queue.append(neighbor)
                # If we reach u, return the cycle length
                if neighbor == u and current != u:  # Ensure we don't count the starting vertex itself
                    return distances[current] + 1  # +1 to include the edge from current to u
        return float('inf')  
    
    min_cycle_length = float('inf')
    for neighbor in graph[u]:  # Start BFS from each neighbor of u
        cycle_length = bfs(neighbor)
        min_cycle_length = min(min_cycle_length, cycle_length)
    
    if min_cycle_length == float('inf'):
        print("None", u) # No cycle
    else: 
        print(min_cycle_length) # Return smallest cycle

# test
graph_example = {
    0: [1],
    1: [2],
    2: [0, 3], 
    3: [1]
}

# Find and print the smallest length cycle involving node 1
printSmallestCycleLength(graph_example, 1)

2


### Initialization: 
    For each neighbor v of u, do the following steps. The goal is to find the shortest path from v back to u, which along with the edge from u to v, forms a cycle.

### Breadth-First Search (BFS):
    For each neighbor v of u, initiate a BFS, but with a modification to track the path length.
    Maintain a queue to manage the BFS frontier and a dictionary or array to keep track of distances from v.
    Initialize the distance for all vertices as "infinity", except for v which is set to 0.
    As you explore the graph from vv, update the distance for each newly visited vertex as the distance of its predecessor plus one.
    If you reach u during the BFS from any neighbor v, you have found a cycle. The length of this cycle is the distance from v to u plus one.

### Finding the Shortest Cycle:
    Repeat the BFS for every neighbor v of u, and keep track of the minimum cycle length found.
    After completing the BFS from all neighbors of u, the minimum cycle length recorded is the length of the shortest cycle involving u.
    Print the shortest cycle length found.

### Complexity Analysis

### Time Complexity: 
    For each neighbor of uu, a BFS is performed which can explore all vertices and edges in the worst case. If u has d neighbors, and considering the worst case where the BFS from each neighbor explores the entire graph, the time complexity is O(d⋅(∣V∣+∣E∣)). However, since in the worst case, d can be O(∣V∣), this simplifies to O(∣V∣⋅(∣V∣+∣E∣)).

### Space Complexity: 
    The space complexity is O(∣V∣) due to the storage needed for the BFS queue, the distances, and the visited status of the nodes.

----

## Question 2: Tracing an Epidemic

An email with a malicious attachment has evaded the antivirus software of company X.
We know that the CEO's computer was infected during a business trip last month. Since then,investigators have 
been trying to determine whose mailboxes could be infected. For an employee's mailbox to be infected, he or she must have received
and read  an email sent by an already affected employee. 

Starting from the time $0$ denoting when the CEO's mailbox was first infected, investigators have "metadata" for all
the emails from all employees in the form

$(P_i, P_j, t_k, t_l)$ meaning that employee $P_i$ sent an email at time $t_k$ to employee $P_j$, and $P_j$ opened the email at
time $t_l > t_k$.  We assume that $P_j$'s mailbox is infected instantaneously at time $t_l$ if $P_i$'s mailbox was infected before time $t_k$. 

You are given a collection of email records in the form given above, and  you know that person $P_0$ is the CEO who was infected at time $t = 0$.

we ask if a given person of interest $P_j$ could have been infected at a given time of interest $t = T$.

__2(A)__ Write an algorithm that, given a person $P_j$ and time $T$, determines if $P_j$'s mailbox was infected before or at time $T$. What is the worst case complexity of your algorithm in terms of the number of persons $|P|$,  and the number of emails sent $|E|$.

**Hint** You need to first make a graph that represents the possible flow of the "infection" through emails. It is easier to make a complicated graph (in this case, one where each vertex represents more than just a person) and then run a simple graph algorithm (one of the vanilla algorithms we learned this week, ie BFS/DFS/Topological sort) rather than making a simple graph and running a complicated ad-hoc algorithm on it (If your algorithm requires table lookups or passing on metadata specific to the problem at hand, it's probably too complicated).  

In [6]:
from collections import deque

def can_infect(email_records, ceo, target, time_T):
    # Graph Construction
    graph = {} 
    graph[(ceo, 0)] = []
    
    for sender, receiver, send_time, open_time in email_records:
        if open_time > send_time:
            if (sender, send_time) not in graph:
                graph[(sender, send_time)] = []
            if (receiver, open_time) not in graph:
                graph[(receiver, open_time)] = []
            graph[(sender, send_time)].append((receiver, open_time))

    # BFS
    infected = set() 
    queue = deque([(ceo, 0)])
    
    while queue:
        current_person, current_time = queue.popleft()
        infected.add((current_person, current_time))
        
        # Visit all neighboring states
        for neighbor in graph.get((current_person, current_time), []):
            if neighbor not in infected:
                queue.append(neighbor)
    return any(time <= time_T for (person, time) in infected if person == target)

# test
email_records = [
    # (Sender, Receiver, Send Time, Open Time)
    ('CEO', 'A', 1, 2),
    ('A', 'B', 3, 4),
    ('B', 'Target', 5, 6)
]
ceo = 'CEO'
target = 'Target'
time_T = 6

can_infect(email_records, ceo, target, time_T)

False

### Graph Construction:
Create vertices for each state, represented as a tuple $(P_i,t)$, where $P_i$ is an employee, and $t$ is the time at which their mailbox could potentially become infected. Include a vertex for the CEO at time $t=0$.

For each email record $(P_i,P_j,t_k,t_l)$, create an edge from $(P_i,t_k)$ to $(P_j,t_l)$ if $t_l>t_k$. This represents the potential transmission of the infection.

Ensure to include edges that represent the progression of time for an individual's mailbox without receiving any email, which allows the infection to persist over time.

### Graph Traversal (BFS or DFS):
Perform a Breadth-First Search (BFS) or Depth-First Search (DFS) starting from the vertex representing the CEO's infection at time $t=0$.

During the traversal, mark each visited vertex as infected.

### Infection Check:
After the traversal, check if there exists a vertex for $P_j$ at time $T′$ such that $T′≤T$. If such a vertex is marked as infected, then $P_j$'s mailbox was infected before or at time $T$.

### Complexity Analysis

### Time Complexity: 
Let $∣P∣$ be the number of persons and $∣E∣$ be the number of emails sent. The graph could potentially have $O(∣E∣)$ vertices and $O(∣E∣)$ edges, since each email could introduce a new vertex and an edge. 

The time complexity of BFS/DFS in such a graph is $O(∣V∣+∣E∣)$, where $∣V∣$ is the number of vertices. Thus, the worst-case time complexity is $O(∣E∣)$.

### Space Complexity: 
The space complexity is also $O(∣E∣)$ due to the storage of the graph.

__2(B)__ Write an algorithm that prints out each person who is infected in increasing order of the times in which they
first got infected.


In [7]:
def print_infected_in_order(email_records, ceo):
    graph = {}
    graph[(ceo, 0)] = []
    infection_time = {ceo: 0} 

    for sender, receiver, send_time, open_time in email_records:
        if open_time > send_time:
            if (sender, send_time) not in graph:
                graph[(sender, send_time)] = []
            if (receiver, open_time) not in graph:
                graph[(receiver, open_time)] = []
            graph[(sender, send_time)].append((receiver, open_time))

    queue = deque([(ceo, 0)])
    while queue:
        current_person, current_time = queue.popleft()
        for neighbor, infection_time in graph.get((current_person, current_time), []):
            if neighbor not in infection_time or infection_time[neighbor] > infection_time:
                infection_time[neighbor] = infection_time  # Update with the earliest infection time
                queue.append((neighbor, infection_time))

    for person, time in sorted(infection_time.items(), key=lambda x: x[1]):
        print(f"{person} was first infected at time {time}")

# Test
email_records = [
    ('CEO', 'A', 1, 2),
    ('A', 'B', 3, 4),
    ('B', 'C', 5, 6),
    ('C', 'D', 7, 8)
]
ceo = 'CEO'

print_infected_in_order(email_records, ceo)

CEO was first infected at time 0


### Graph Construction: 
    Create a graph representing potential infection transmissions based on the email metadata.

### Graph Traversal: 
    Perform a BFS starting from the CEO's initial infection at time t=0, marking each visited state as infected and tracking the earliest time each person got infected.

### Tracking Infection Times: 
    Maintain a dictionary to record the earliest time each person got infected. Update this dictionary during the BFS traversal.

### Sorting and Printing: 
    After the traversal, sort the individuals by their first infection time and print them in order.

----

## Question 3: Testing Moth Age Expert

A person claims to have spent his life studying the emperor gum moth  *Opodiphthera eucalypti*. 
Given two moth samples, he claims to tell us which one is the older. Of course, 
we ourselves are no experts and they all in fact look the same to us.


We test the person as follows: (a) collect a large number $n$ of e.g. moth specimen; (b) randomly
select $m$ different pairs from our collection and have the person tell us which one is older; 
(c) record their answers and analyze them to see if they are _consistent_

Write an algorithm to detect if the "expert" opinions are _consistent_. 


**Hint:** We have refrained from discussing what consistency means in this case. But can provide you an example as a hint.

__Example__ 

Suppose $n= 4$ and the expert says that

Specimen \# $1$ is older than $2$, $3$ is older than $4$, $4$ is older than $2$ and $2$ is older
than $3$.

The expert's opinion is clearly *inconsistent*.

Suppose $n=4$ and the expert says that

Specimen \# $1$ is older than $2$, $3$ is older than $4$ and $4$ is older than $1$. The
expert's answer is *consistent*.



----
Initialize a directed graph with nodes representing each of the 𝑛 moth specimens.

For each pair comparison provided by the expert, add a directed edge from the older moth to the younger moth.

Check for cycles in the graph. If there is any cycle, the expert's opinions are inconsistent. Otherwise, they are consistent.

This problem can be efficiently solved using a graph traversal algorithm to detect cycles, such as Depth-First Search (DFS). 

For each node in the graph:

If the node has not been visited, perform a DFS from that node.

If you encounter a node that is already being visited (i.e., it's in the call stack), a cycle is found, indicating inconsistency.
    
If no cycle is found after DFS on all nodes, the expert's opinions are consistent.

In [1]:
from collections import defaultdict

def dfs(graph, node, visited, recStack):
    visited[node] = True
    recStack[node] = True
    
    # For every neighbour of the current node
    for neighbour in graph[node]:
        # If the neighbour is not visited, then recurse on it
        if not visited[neighbour]:
            if dfs(graph, neighbour, visited, recStack):
                return True
        # If the neighbour is visited and in the recursion stack, then we have a cycle
        elif recStack[neighbour]:
            return True
    
    # Remove the node from recursion stack before returning
    recStack[node] = False
    return False

def check_consistency(n, comparisons):
    graph = defaultdict(list)
    for older, younger in comparisons:
        graph[older].append(younger)
    
    # Initialize visited and recursion stack arrays
    visited = [False] * n
    recStack = [False] * n
    
    # Perform DFS from each unvisited node to detect cycles
    for node in range(n):
        if not visited[node]:
            if dfs(graph, node, visited, recStack):
                return False  # Inconsistent (cycle detected)
    
    return True  # Consistent (no cycles)

# Test
n = 4  # Number of nodes
comparisons = [(0, 1), (1, 2), (2, 3)] 
print(check_consistency(n, comparisons))

True


----

## Question 4: Testing if an undirected graph is acyclic

You are given a strongly connected, undirected graph $G$ with $n$ vertices as an adjacency list. Write an algorithm to check if $G$ has a cycle that runs in time $\Theta(n)$.

*Hint* A connected, undirected acyclic graph is a tree. Since you are already given that $G$ is connected, you are just checking if $G$ is a tree. How many edges would a tree have?


----
To test if a connected, undirected graph G with n vertices is acyclic (i.e., is a tree), we can follow a simple principle: a tree with n vertices always has exactly n−1 edges. This characteristic arises because adding any additional edge to a tree would create a cycle, making it no longer a tree.

Given G is connected, to check if G is a tree (acyclic), we need to:

    Verify that the number of edges is exactly n−1.
    Ensure there are no cycles.
    
We will also focus on counting the edges and ensuring there are exactly n−1 of them for G to be considered a tree.

### Algorithm

Count the total number of edges in the graph. Since it's an undirected graph represented by an adjacency list, each edge will be counted twice (once for each end), so the total edge count needs to be divided by 2.

If the total number of edges equals n−1, then the graph is a tree. Otherwise, it contains a cycle.

In [2]:
def is_tree(adj_list):
    # Calculate the number of vertices
    n = len(adj_list)
    
    # Count edges
    edge_count = sum(len(neighbors) for neighbors in adj_list) // 2
    
    # A tree with n vertices has exactly n-1 edges
    return edge_count == n - 1

# Test
adj_list = [[1, 2], [0, 2], [0, 1]]  # Example adjacency list for a graph with 3 vertices and 3 edges
print(is_tree(adj_list))  # This would return False since a tree with 3 vertices should have exactly 2 edges

False
