# Kliment Mamykin, UNI 2770
## Algorithms for Data Science, Homework 2

### Problem 1

Let $D_y$ be the number of shortest paths from $v$ to $y$ in graph $G = (V, E), v \in V, y \in V$. We need to find $D_w$.

**Claim**: Given BFS tree of graph G with level sets $\{L_0, L_1, \dots\}$, for any node $y$ connected to node $v$, the number of shortest paths from $v$ to $y$ equals the sum of number of shortest paths from all nodes connected to $y$ and already discovered on previous levels of BFS. 

$$
\begin{equation}
D_y = \sum_{(x,y) \in E, x \in L_{i-1}} D_x, y \in L_i, i \gt 1 \\
D_v = 1, i = 0
\end{equation}
$$

**Proof by induction**

**Basis**: 

It is true for $L_0$, there is only one node $v$ at level $L_0$ and the number of shortest paths is 1 ($\{v\}$)

**Hypothesis**:

Suppose there is BFS level $L_i$, with all nodes at this level at the shortest distance from $v$. Suppose $D_{node}$ for each node in this level contains the number of shortest paths from the root of the BFS tree to this node. 

**Step**:

Consider a node $y$ in level $L_{i+1}$. There are three cases for each node $x$ adjacent to $y$: 

1. $x$ was discovered by BFS in the previous level $L_i$ (adjacent nodes can be discovered with at most 1 level difference)
2. $x$ was discovered by BFS at the same level $L_{i+1}$
3. $x$ was discovered by BFS at the next level $L_{i+2}$

For cases 2 and 3 $x$ does not belong to the shortest path between $v$ and $y$, and they don't affect the number of shortest paths from $v$ to $y$.

For case 1, each node $x_k$ adjacent to $y$ that is also in the previous layer $L_i$ is part of the shortest path. Therefor one can make a shortest path from $y$ to $v$ through each of $x_k$, and the number of shortest paths $D_y = \sum_{{x_k}} D_{x_k}$

**Algorithm**

We use a modified BFS algorithm to traverse graph G and keep track of the number of shortest paths $D[node]$ to each of the nodes. At the end we return $D[w]$ for the final answer. BFS is an $O(n+m)$ algorithm and since we add a constant time operations, it is still an $O(n+m)$ algo.


```
Number_of_Shortest_Paths(G, start_node, end_node )
    array discovered[V] initialized to 0
    array dist[V] initialized to ∞
    array parent[V] initialized to NIL
    array D[V] initialized to 0
    queue q
    discovered[start_node] = 1
    dist[start_node] = 0s
    parent[start_node] = NIL
    D[start_node] = 1 // added
    enqueue(q, start_node)
    while size(q) > 0 do
        u = dequeue(q)
        for (u, v) ∈ E do
            if discovered[v] == 0 then
                discovered[v] = 1
                dist[v] = dist[u] + 1
                parent[v] = u
                enqueue(q, v)
            end if
            // added, if discovered and on previous level
            if discovered[v] == 1 and dist[v] < dist[u] then
                D[u] = D[u] + D[v]
            end if
        end for
    end while
    return D[end_node] // added
```

### Problem 3

Let $C_{ij}$ be the cost (penalty) of travel from hotel at mile post $a_i$ to hotel at mile post $a_j$. Since we can start at the very beginning, we denote that location as $a_0$. We can pre-calculate the cost matrix $C$ to travel from from $i$ to $j$, $(0 \le i \lt j, 1 \lt j \le n)$, using formula $(200 - (a_j-a_i))^2$. $C$ will be a top triangular matrix that we can calculate with $\Theta(n^2)$ time.

Using Dynamic Programming approach, let an optimal cost $OPT(j)$ to travel from the beginning to hotel $a_j$ be

$$
\begin{equation}
OPT(j) = \left\{ 
\begin{array}{ l l }
0 & j = 0 \\
\min(OPT(i) + C_{ij}) & 0 \le i \lt j, 1 \le j \le n 
\end{array}
\right.
\end{equation}
$$

$OPT(n)$ will be an optimal cost to travel from the begining to the last stop, hotel $a_n$.

Proof by strong induction:

**Base case**: for $j = 0$ there is no need to travel, and the optimal cost $OPT(0) = 0$. For $j = 1$ the cost of travel from $a_0$ to $a_1$ is one hop with the value of cost/penalty = $C_{01}$. 

**Hypothesis**: for some $j \gt 1$ assume ${OPT(0), OPT(1), \dots, OPT(j)}$ are all values of optimal travel costs up to and including $j$.

**Induction step**: for some $j + 1$, construct a set of travel options with next to last stop at some point $i$. Since we can only stop at a point before $j+1$, we have the optimal costs for all stops by hypothesis. For each travel option (when $i$ is fixed), the optimal cost will be the optimal cost to travel to point $i$ and the cost to travel from point $i$ to $j+1$. Finding a minimal cost across all travel options results in a min travel cost to the point $j+1$.


```
Optimal_Trip_Cost(A)
    // input A - array 1..n of distances
    // returns the minimal cost/panalty to travel to the last hotel 
    let n = |A|
    let C[0..n-1][1..n] be array initialized to infinity values
    // calculate the cost matrix of each hop
    for j = 1..n
        for i = 0..j-1
            C[i,j] = (200 - (A[j] - (A[i] || 0)))**2
            
    let OPT[n] = array initialized to infinity values
    OPT[1] = C[0,1] // trivial case
    for j = 2..n
        for i = 0..j-1
            // find the min value across all next to last stop options
            OPT[j] = min(OPT[j], OPT[i] + C[i, j])
    return OPT[n]
```