# Shortest Path revisited and NP complete problems

## Single source shortest path

- directed graph $G = (V, E)$
- for every destonation $v$ and source $s$ in $V$, compute the length of the shortest s-v path

Dijkstra
- $O(mlogn)$ with heaps
- $m$ = number of edges
- $n$ = number of nodes
- edges must have positive cost

Bellman-Ford
- either compute the length of the shortest s-v path or a negative cycle
- for every $v$ in $V$ and $i = 1 \dots n$, let $P$ = shortest s-v path with at most $i$ edges
- case 1: if $P$ has $\le (i-1)$ edges, it is a shortest s-v path with $\le (i-1)$ edges
- case 2: if $P$ has $i$ edges with the last hop $(w,v)$m then $P^{'}$ is a shortest s-v path with $\le (i-1)$ edges

Recurrence
- let $L_{i,v}$ = minimum length of a s-v path with $\le i$ edges (with cycles allowed)
- for every $v$ in $V$, $i = 1 \dots n$

$L_{i,v} = min\left[L_{i-1,v}, min\left[L_{i-1,v},c_{wv}\right]\right]$

- If no negative cycles
    - shortest path do not have cycles
    - have $\le (n-1)$ edges
    - compute $L_{i,v}$ for all $i = 1 \dots n-1$ and all $v$ in $V$
    
Pseudo code
- let $A$ = 2-D array (index $i$ and $v$)
- base case: $A[0,s] = 0$ and $A[0,v] = +\infty$ for all $v != s$
- for i = 1 to n-1
    - for each $v$ in $V$
        - $A[i,v] = min\left[A_{i-1,v}, min\left[A_{i-1,v},c_{wv}\right]\right]$
- if $G$ has no negative cycle, then answer is $A[n-1,v]$

Stopping early
- suppose for some $j < n-1$, $A[j,v] = A[j-1,v]$ for all vertices $v$
- for all $v$, all future $A[i,v]'s will be the same$
- can safely halt

Negative cycle detection
- $G$ has no negative cycle iff $A[n-1,v] = A[n,v]$ for all $v$ in $V$
- can perform one extra interation where $i = n$

Space optimization
- only need $A[i-1,v]$'s to compute $A[i,v]$'s
- thus, only need $O(n)$ to remember current and last rounds of subproblems
- compute a second table B where $B[i,v]$ = 2nd-to-last vertex on a shortest s-v path with $\le i $ edges (or NULL if no such path exists)
- assume input graph $G$ has no negative cycles and we correctly compute $B[i,v]$'s
- then, tracing back predecessor pointers $B[n-1,v]$'s from $v$ to $s$ yields a shortest s-v path
- base case: $B[0,v]$ = NULL for all $v$ in $V$
- to compute $B[i,v]$ with $i > 0$
- case 1: $B[i,v] = B[i-1,v]$
- case 2: $B[i,v]$ = the vertex $x$, $w$ achieving the minimum

## All pairs shortest path

- no distinguished source vertex
- shortest s-v path for all pairs of vertices $u,v$ in $V$ or report that $G$ contains a negative cycle

Floyd-Warshall
- order vertices $V = \{1 \dots n\}$ arbitrarily
- let $V^{(k)} = \{1 \dots k\}$
- suppose $G$ has no negative cycle
- fix source $i$ in $V$, destination $j$ in $V$, and $k$ in $\{1 \dots k\}$ 
- let $p$ = shortest (cycle free) i-j path with all internal nodes in $V^{(k)}$

Then
- case 1: if $k$ is not internal to $p$, the $p$ is a shortest (cycle free) i-j path with all internal nodes in $V^{(k-1)}$ 
- case 2: if $k$ is internal to $p$, then
    - $p_{1}$ = shortest (cycle free) i-k path with all internal nodes in $V^{(k-1)}$
    - $p_{2}$ = shortest (cycle free) k-j path with all internal nodes in $V^{(k-1)}$
    
Pseudo code
- let $A$ = 3-D array (nidex i,j,k)
- $A[i,j,k]$ = length of a shortest i-j path with all internal nodes in $\{1 \dots k\}$
- base case: for all $i,j$ in $V$
    - $A[i,j,0]$ = 0 if $i$ = $j$, $C_{(ij)}$ if $(i,j)$ in $E$, $+\infty$ if $i$ != $j$ and $(i,j)$ not in $E$
- for k =1 to n
    - for i = 1 to n
        - for j = 1 to n
            $A[i,j,k]$ = $min\left[A[i,j,k-1], A[i,k,k-1] + A[k,j,k-1]\right]$
- negative cycle? 
    - will have $A[i,i,n] < 0$ for at least one $i$ in $V$ at the end of algorithm
- reconstruct a shortest i-j path? 
    - in addition to $A$, have Floyd-Warshall compute $B[i,j]$ = max level of an internal node on a shortest i-j path for all $i,j$ in $V$ 
    - reset $B[i,j] = k$ if 2nd case of recurrence used to compute $A[i,j,k]$
    - can use $B[i,j]$'s to recursively reconstruct shortest paths
    
Johnson's algorithm
- reduce to all pairs shortest path
    - n Dijkstra
    - 1 Bellman-Ford
- reweighting using vertex weights $\{p_{v}\}$ adds the same amount (namely, $p_{s} - p_{t}$) to every s-t path
- reweighting always leaves the shortest path unchanged

Example
- define vertex weight $p_{v}$ = length of shortest s-v path
- for every edge $e=(u,v)$, define $c^{'}_{e}$ = $c_{e}$ + $p_{u}$ - $p_{v}$
- after reweighting, all edge length are non-negative

In summary
1. form $G^{'}$ by adding a new vertex $s$ and a new edge $(s,v)$ with length 0 for each $v$ in $G$
2. run Bellman-Ford on $G^{'}$ with source vertex s
3. for each $v$ in $G$, define $p_{v}$ = length of a shortest s-v path in $G^{'}$. For each edge $e=(u,v)$ in $G$, define $c^{'}_{e}$ = $c_{e}$ + $p_{u}$ - $p_{v}$
4. for each vertex $u$ of $G$, run Dijkstra in $G$, with edge lengths $\{c^{'}_{e}\}$, with source vertex $u$, to compute the shortest path distance $d^{'}(u,v)$ for each $v$ in $G$
5. for each pair $u,v$ in $G$, return the shortest path distance $d(u,v)$ = $d^{'}(u,v)$ - $p_{u}$ + $p_{v}$

In [None]:
import numpy as np


def open_file(file_path):
    """
    Read-in a file containing rows of data

    Args:
    file_path -- location of file to read

    Returns:
    (data_dict, num_nodes) -- a tuple with a dictionary representing a graph and an integer reprsenting number of nodes
    """

    data_dict = {}

    with open(file_path, 'r') as line:
        data_array = line.read().split("\n")
        num_vertices = int(data_array[0].split(" ")[0])
        num_edges = int(data_array[0].split(" ")[1])
        del data_array[0] # delete first element, which is just metadata
        for item in data_array:
            node1 = item.split(" ")[0]
            node2 = item.split(" ")[1]
            weight = int(item.split(" ")[2])
            data_dict[node1+"-"+node2] = weight
    return (data_dict, num_vertices, num_edges)


tuple_obj = open_file("data/all-pairs-shortest-path1.txt")
# tuple_obj = open_file("data/all-pairs-shortest-path2.txt")
# tuple_obj = open_file("data/all-pairs-shortest-path3.txt")
# tuple_obj = open_file("data/all-pairs-shortest-path-test1.txt")
# tuple_obj = open_file("data/all-pairs-shortest-path-test2.txt")
# tuple_obj = open_file("data/all-pairs-shortest-path-test3.txt")
# tuple_obj = open_file("data/all-pairs-shortest-path-test4.txt")
data_dict = tuple_obj[0]
num_vertices = tuple_obj[1]
num_edges = tuple_obj[2]
print("num_vertices: " + str(num_vertices))
print("num_edges: " + str(num_edges))

A = np.zeros((num_vertices, num_vertices, 2))
                  
for i in range(0, num_vertices):
    for j in range(0, num_vertices):
        index = str(i+1) + "-" + str(j+1)
        if i == j:
            A[i][j][0] = 0
        if index in data_dict:
            A[i][j][0] = data_dict[index]
        if i != j and index not in data_dict:
            A[i][j][0] = 10000000
        
        
smallest = 10000000      
for k in range(1, num_vertices):
    for i in range(0, num_vertices):
        for j in range(0, num_vertices):
            A[i][j][1] = min(A[i][j][0], A[i][k][0] + A[k][j][0])
            if A[i][j][1] < smallest:
                smallest = A[i][j][1]
            A[i][j][0] = A[i][j][1]


print("smallest: " + str(smallest))
    

for i in range(0, num_vertices):
    if A[i][i][1] < 0:
        print(str(i)+"-"+str(i)+" => Negative cycle! " + str(A[i][i][1]))
            
# -41
# -89
# negative cycle
# -2

## NP complete

- polynomial-time solvable? $O(n^{k})$ time
- P = a set of polynomial-time solvable problems
- let $C$ = a set of problems. Problem $\pi$
    - $\pi$ in $C$
    - everything in $C$ reduces to $\pi$
- that is, $\pi$ is the hardest problem in all of $C$
- a problem is in NP if
    - solutions always have length polynomial in the input size
    - solutions can be verified in polynomial time
- every problem in NP can be solved by brute-force search in exponential time
- a polynomial-time algorithm for one NP-complete problem solves every problem in NP efficiently [P = NP]
- but generally, P != NP
- NP-completeness of $\pi$
    - find a known NP-complte problem $\pi^{'}$
    - prove that $\pi^{'}$ reduced to $\pi$

Vertex cover problem
- given an undirected graph $G=(V,E)$
- compute a minimum-cardinality subset $S$ in $V$ that contains at least one endpoint of each edge of $G$
- given a positive integer $k$ as input, we want to check whether or not there is a vertex cover with size $\le k$
- consider graph $G$, edge $(u,v)$ in $G$, integer $k \ge 1$
- let $G_{u} = G$ with $u$ and its incident edged deleted
- let $G_{v} = G$ with $v$ and its incident edged deleted
- then, $G$ has a vertex cover of size $k$ iff $G_{u}$ or $G_{v}$ or both have a vertex cover of size $(k-1)$

Recurrence
- ignore base case
- pick an arbitrary edge $(u,v)$ in $E
- recursively search for a vertex cover $S$ of size $(k-1)$ in $G_{u}$. If found, return $S$ plus $u$
- recursively search for a vertex cover $S$ of size $(k-1)$ in $G_{v}$. If found, return $S$ plus $v$

Traveling salesman problem
- given undirected graph with non-negative edge cost, find min cost to visit all vertices
- to enforce constraint that each vertex visited exactly once, need to remember the "identities" of vertices visited in a sub-problem
- sub-problem: for every destination $j = \{1 \dots n \}$, every subset $S$ in $\{1 \dots n \}$ that contains $1$ and $j$, let $L_{s,j}$ = minimum length of a path from $1$ to $j$ that visits precisely the vertices of $S$ (exactly once each)
- let $p$ be a shortest path from $1$ to $j$ that visits the vertices $S$ (exactly once each) if last hop of $p$ is $(k,j)$, then $p^{'}$ is a shortest path from $1$ to $k$ that visits every vertex of $S - \{j\}$ exactly once

Recurrence
- $L_{s,j} = min\left[L_{s-\{j\},k} + C_{kj}\right]$ ($k$ in $S$, $k != j$)

Pseudo code
- let A = 2-D array, indexed by subsets $S$ in $\{1 \dots n\}$ that contain 1 and destinations $j$ in $\{1 \dots n\}$
- base case: $A[s,1]$ = $0$ if $S = \{1\}$ , $+\infty$ otherwise
- for m = 2,3,4,...,n [m = sub-problem size]
    - for each set $S$ in $\{1 \dots n\}$ of size m that contains 1
        - foe each $j$ in $S$, $j=1$
            - $A_{s,j} = min\left[A_{s-\{j\},k} + C_{kj}\right]$ ($k$ in $S$, $k != j$)
- return $min\left[A[\{1,2,3,...,n\}, j] + C_{j1}\right]$

## Approximation algorithm for NP complte problem (heuristics)

Greedy algorithm
- ideally items with large value, but small size
- sort adn reindex items so that
    - $\dfrac{v_{1}}{w_{1}} \ge \dfrac{v_{2}}{w_{2}} \ge \dots \ge \dfrac{v_{n}}{w_{n}}$
- pack items in this order until one doesn't fit    
- return either solution from above, or the maximun valuable item, whichever is better

Dynamic programming
- for a user-specified parameter $\epsilon \gt 0$, guarantee a $(1-\epsilon)$ approximation
- running time will increase as $\epsilon$ decreases
- If $w_{i}$'s and $W$ are integers, can solve knapsack problem via dynamic programming in $O(nW)$ time
- If $v_{i}$'s are integers, can solve knapsack problem via dynamic programming in $O(n^{2}v_{max})$ time
1. round each $v_{i}$ down to the nearest multiple of $m$ (where $m$ depends on $\epsilon$)
2. divide the result by $m$ to get $\hat{v_{i}}$'s integers ($\hat{v_{i}}$ = floor($\dfrac{v_{i}}{m}$))
3. solve using dynamic programming with sizes $\hat{v_{1}} \dots \hat{v_{n}}$, weights $w_{1} \dots w_{n}$, and maximun weight limit $W$

Sub-problem
- for i = 0 to n and x = 0 to n, define $S_{i,x}$ = minimum total size needed to achieve value $\ge x$ while using only the first $i$ items (or $+\infty$ if impossible)

Recurrence
- $S_{i,x} = min\left[S_{(i-1),x}, w_{i} + S_{(i-1),(x-v_{i})}\right]$ where $S_{(i-1),(x-v_{i})} = 0$ if $v_{i} \ge x$

Pseudo code
- let $A$ = 2-D array ($i$ = $0$ to $n$, $x$ = $0$ to $nv_{max}$)
- base case: $A[0,x]$ = $0$ if $x = 0$, $+\infty$ otherwise
- for i = 1 to n
    - for x = 0 to $nv_{max}$
        - $A[i,x]$ = $min\left[A[i-1,x], w_{i} + A[i-1,x-v_{i}]\right]$ where $A[i-1,x-v_{i}] = 0$ if $v_{i} \ge x$
- return the largest $x$ such that $A[n,x] \le W$

## Maximum cut problem

- given $G = (V,E)$, a cut $(A,B)$ that maximizes the number of crossing edges
- for a cut $(A,B)$ and a vertex $V$, define
    - $c_{v}(A,B)$ = number of edges incident on $v$ that corss $(A,B)$
    - $d_{v}(A,B)$ = number of edges incident on $v$ that don't corss $(A,B)$
    
## Local search algorithm

- let $(A,B)$ be an arbitrary cut of $G$
- while there is a vertex $v$ with $d_{v}(A,B) \gt c_{v}(A,B)$
    - move $v$ to other side of the cut (increase number of crossing edges by $d_{v}(A,B) - c_{v}(A,B) \gt 0$)
- return final cut $(A,B)$

## Neighbourhoods

- let $x$ = set of candidates solutions to a problem for each $x$ in $X$, specify which $y$ in $X$ are its neighbours
- $x,y$ are neighbouring cuts iff differ by moving one vertex
- $x,y$ are neighbouring variable assignments iff differ in the value of a single variable
- $x,y$ are neighbouring TSP tours iff differ by 2 edges

## Generic local search algorithm

- let $x$ = some initial solution
- while the current solution $x$ has a superior neighbouring solution $y$: set $x = y$
- return the final (locally optimal) solution $x$

FAQ
- how to pick initial soltion $x$? random / best heuristic
- if there are several superior neighbouring $y$, which to choose? random / biggest improvement
- how to define neighbourhoods? find "sweat spots" between solution quality and efficient searchability
- is local search guaranteed to terminate? if $x$ is finite and every local step improves some objective function, then yes
- is local search guaranteed to converge quickly? usually not
- are locally optimal solutions generally good approximations to globally optimal ones? no (but you can run many times and pick the best)

## 2-SAT

- given $n$ Boolean variables $x_{1} \dots x_{n}$ (True or False) and $m$ classes of 2 literal each ($x_{i}$ or $!x_{i}$)
- return "yes" if there is an assignment that simultaneously satisfies every class. "no" otherwise
- Ex. "yes" when $x_{1} = x_{3}$ = TRUE and $x_{2} = x_{4}$ = FALSE 

Can be solved in polynomial time!
- reduction to computing strongly connected components
- "backtracking" works in polynomial time
- randomized local search

## 3-SAT

- NP complete

## Papadimitriou's 2-SAT algorithm

Repeat $log_{2}n$ times (n = number of variables)
- choose random initial assignment
- repeat $2n^{2}$ times
    - if current assignment satisfies all clauses, halt and report this
    - else, pick arbitrary unsatisfied clause and flip the value of one of its variables (choose between the two unformly at random)
- report "unsatisfiable"

Advantages
- runs in polynomial time
- always correct on unsatisfiable instances

## Random walks

- at each time step, your position goes up or down by 1, with 50/50 chance (except if at position 0, in which case you move to position 1 with 100 chance)
- for an integer $n \ge 0$, let $T_{n}$ = number of steps until random walk reaches position n
- $E[T_{n}] = n^2$ (corollary: $Pr[T_{n} > 2n^{2}] \ge \dfrac{1}{2}$)