<h1 align="center">Dynamic Programming</h1>

- **Dynamic programming**, like the divide-and-conquer method, **solves problems recursively by combining the solutions to subproblems**.


- The **main difference** is that dynamic programming applies when the **subproblems overlap**, i.e. when **subproblems share subsubproblems**.


- The **dynamic programming** is applied to **optimization problems**.

  In other words, a problem have **many possible solutions** and we wish to **find a solution** with the **optimal (minimum or maximum) value**.
  

- Since there may be several solutions that achieve the optimal value, we call such a solution **an optimal solution**, as opposed to **the optimal solution**.


- When developing a **dynamic-programming algorithm**, we follow a sequence of **four steps**:
  1. **Characterize** the structure of an optimal solution.
  2. **Define** recursively  the value of an optimal solution.
  3. **Compute** the value of an optimal solution (typically in a bottom-up fashion).
  4. **Construct** an optimal solution from computed information.
  

- **Note**: If we **need only the value** of an optimal solution, and not the solution itself, then we **can omit step 4**.

<h2 align="center">Matrix-Chain Multiplication</h2>

- Our **next example** of **dynamic programming** is an algorithm that solves the problem of **matrix-chain multiplication**.


- **Problem Statement**: We are given a **sequence** (**chain**) $\left \langle A_1, A_2, ..., A_n  \right \rangle $ of $n$ **matrices** to be multiplied, and **we wish to compute the product**:

  $$A_1  A_2   A_3   \cdots  A_n.$$

 
 
- For example, if the chain of matrices is $\left \langle A_1, A_2, A_3, A_4\right \rangle $, then we can **fully parenthesize** the product $A_1  A_2   A_3   A_4.$ in **five distinct ways**:

  $$(A_1  (A_2  (A_3  A_4))).$$
  $$(A_1  ((A_2  A_3)  A_4)).$$
  $$((A_1  A_2)  (A_3  A_4)).$$
  $$((A_1  (A_2  A_3))  A_4).$$
  $$(((A_1  A_2) A_3)  A_4).$$


- The **standard algorithm** is given by the following procedure, which generalizes the `sqaureMatrixMultiplication` procedure:

In [7]:
import numpy as np

def matrixMultiply(A,B):
    nrowsA, ncolsA = A.shape
    nrowsB, ncolsB = B.shape
    if ncolsA != nrowsB:
        print("Incompatible dimensions!")
        return -1
    else:
        C = np.empty((nrowsA,ncolsB))
        count = 0
        for i in range(nrowsA):
            for j in range(ncolsB):
                C[[i],[j]] = 0
                for k in range(ncolsA):
                    C[[i],[j]] = C[[i],[j]] + A[[i],[k]] * B[[k],[j]]
                    count = count + 1
        print(count)
        return C

A = np.array([[1,2],[3,4]])
B = np.array([[1,2],[3,4]])
C = matrixMultiply(A,B)
C

8


array([[ 7., 10.],
       [15., 22.]])

- If $A$ is a $p \times q$ matrix and $B$ is a $q \times r$ matrix, the resulting matrix $C$ is a $p \times r$ matrix. 


- The **time to compute** $C$ is dominated by the **number of scalar multiplications** in line 16, which is $p \cdot q \cdot r$.


- For example, suppose that the dimensions of the matrices are $10 \times 100$, $100 \times 5$, and $5 \times 50$, respectively.

  If we multiply according to the parenthesization $((A_1  A_2) A_3 )$, we perform $10 \cdot 100 \cdot 5 = 5000$ scalar multiplications to compute the $10 \times 5$ matrix product $A_1  A_2$, plus another $10 \cdot 5  \cdot 50 =  2500$ scalar multiplications to multiply this matrix by $A_3$, for a total of $7500$ scalar multiplications. 
  
  If instead we multiply according to the parenthesization $(A_1  (A_2  A_3 ))$, we perform $100 \cdot 5 \cdot 50 = 25,000$ scalar multiplications to compute the $100\times 50$ matrix product $A_2  A_3$, plus another $10 \cdot 100 \cdot 50 = 50,000$ scalar multiplications to multiply $A_1$ by this matrix, for a total of $75,000$ scalar  ultiplications. 
  
  Thus, **computing the product** according to the **first parenthesization** is **$10$ times faster**.

<h3 align="center">Using Dynamic Programming for Matrix-Chain Multiplication</h3>

- We can state the **matrix-chain multiplication problem** as follows:

  Given a **chain** $\left \langle A_1, A_2, ..., A_n  \right \rangle $ of $n$ matrices, where for $i = 1, 2, ..., n$, matrix $A_i$ has dimension $p_{i-1} \times p_i$.
  
  **Fully parenthesize** the product $A_1  A_2  \cdots  A_n$ in a way that **minimizes** the **number of scalar multiplications**.


- Lets **count** the **number of parenthesizations**:

  **Denote** the **number of alternative parenthesizations** of a sequence of $n$ matrices by $P(n)$.
  
  When $n = 1$, we have just **one matrix** and therefore **only one way** to fully parenthesize the matrix product.
  
  When $n \geq 2$, a **fully parenthesized** matrix product **is** the **product** of **two fully parenthesized matrix subproducts**, and the split between the two subproducts may occur between the $k$-th and $(k+1)$-st matrices for any $k = 1, 2, ..., n-1$.
  
  Thus, we have:
  
  $$P(n) = 
\left\{\begin{matrix}
1 & \text{ if } n=1, \\ 
\sum_{k=1}^{n-1} P(k)P(n-k) & \text{ if } n \geq 2. 
\end{matrix}\right.$$

  The solution of this recurrence is $\Omega(2^n)$.
  
  
- The **number of solutions** is thus **exponential in** $n$, and the **brute-force method** of exhaustive search **makes for a poor strategy** when determining how to optimally parenthesize a matrix chain.


- Let's apply dynamic programming:

  In so doing, we shall follow the four-step sequence that we stated at the beginning of this lecture:
  
  1. **Characterize** the structure of an optimal solution.
  2. **Define** recursively  the value of an optimal solution.
  3. **Compute** the value of an optimal solution (typically in a bottom-up fashion).
  4. **Construct** an optimal solution from computed information.
  
  
1. **The structure of an optimal parenthesization**.

   For convenience, let us adopt the **notation** $A_{i..j}$, where $i \leq j$, for the matrix that results from evaluating the product $A_i A_{i+1} \cdots A_j$.
   
   The optimal substructure of this problem is as follows.
   
   Suppose that to optimally parenthesize $A_{i..j}$, we split the product between $A_k$ and $A_{k+1}$.
   
   Then the way we parenthesize the **prefix** subchain $A_{i..k}$ within this optimal parenthesization of $A_{i..i+1}$ must be an optimal parenthesization of $A_{i..k}k$. 
   
   **Why**? If there were a less costly way to parenthesize $A_{i..k}$, then we could substitute that parenthesization in the optimal parenthesization of $A_{i..j}$ to produce another way to parenthesize $A_{i..j}$ whose cost was lower than the optimum: a contradiction. 
   
   A similar observation holds for how we parenthesize the subchain $A_{k..j}$ in the optimal parenthesization of $A_{i..j}$: it must be an optimal parenthesization of $A_{k..j}$.
   


2. **A recursive solution**.

   Let's $m[i,j]$ be the minimum number of scalar multiplications needed to compute the matrix $A_{i..j}$.
   
   We can define $m[i,j]$ recursively as follows.
   
   if $i = j$, the problem is trivial: the chain consists of just **one matrix** $A_{i..i} = A_i$, thus $m[i,i] = 0$ for $i = 1, 2, ..., n$.
   
   To compute $m[i, j]$ when $i < j,$ we **take advantage** of the structure of an **optimal solution** from **step 1**.
   
   Let us assume that to optimally parenthesize, we split the product $A_{i..j}$ bewteen $A_k$ and $A_{k+1}$, where $i \leq k < j$. 
   
   Then, $m[i,j]$ **equals** the minimum **cost** for **computing the subproducts** $A_{i..k}$ and $A_{k..j}$, **plus** the **cost** of **multiplying these two matrices** together:
   
   $$m[i,j] = m[i,k] + m[k+1, j] + p_{i-1} p_k p_j.$$
   
   Thus, our recursive definition for the minimum cost of parenthesizing the product $A_{i..j}$ becomes:
   
   $$m[i,j] = 
\left\{\begin{matrix}
0 & \text{ if } i = j,\\ 
\min_{i \leq j < j} \{m[i,k] + m[k+1, j] + p_{i-1} p_k p_j\}& \text{ if } i < j. 
\end{matrix}\right.$$ 
 
   The $m[i,j$ values give the costs of optimal solutions to subproblems, but they do not provide all the information we need to construct an optimal solution. 
  
   To help us do so, we define $s[i,j]$ to be a value of $k$ at which we split the product $A_{i..j}$ in an optimal parenthesization. 
   
   That is, $s[i,j]$ equals a value $k$ such that $m[i,j] = m[i,k] + m[k+1, j] + p_{i-1} p_k p_j$.
   
   
3. **Computing the optimal costs**.

   Instead of computing the solution to recurrence recursively, we **compute** the **optimal cost** by using a **bottom-up approach**.
   
   As an example, we will generate matrices of the following sizes:
   
| Matrix    |      $A_1$     |      $A_2$     |     $A_3$     |     $A_4$     |      $A_5$     |      $A_6$     |
|-----------|:--------------:|:--------------:|:-------------:|:-------------:|:--------------:|:--------------:|
| Dimension | $30 \times 35$ | $35 \times 15$ | $15 \times 5$ | $5 \times 10$ | $10 \times 20$ | $20 \times 25$ |

In [8]:
import numpy as np

def matrixChainOrder(p):
    n = len(p)-1
    m = np.zeros((n+1,n+1))
    s = np.zeros((n+1,n+1))
    for l in range(2, n+1):
        for i in range(1, n - l + 2):
            j = i + l - 1
            m[i,j]= np.infty
            for k in range(i, j):
                q = m[i,k] + m[k+1,j] + p[i-1] * p[k] * p[j]
                if q < m[i,j]:
                    m[i,j] = q
                    s[i,j] = k
    return m, s

In [9]:
A1 = np.random.rand(30, 35)
A2 = np.random.rand(35, 15)
A3 = np.random.rand(15,  5)
A4 = np.random.rand( 5, 10)
A5 = np.random.rand(10, 20)
A6 = np.random.rand(20, 25)

B = [A1, A2, A3, A4, A5, A6]

p = np.empty(len(B)+1)
p[0] = int(B[0].shape[0])
for i in range(len(B)):
    p[i+1] = int(B[i].shape[1])
    
m, s = matrixChainOrder(p)
print(m)
print(s)

[[    0.     0.     0.     0.     0.     0.     0.]
 [    0.     0. 15750.  7875.  9375. 11875. 15125.]
 [    0.     0.     0.  2625.  4375.  7125. 10500.]
 [    0.     0.     0.     0.   750.  2500.  5375.]
 [    0.     0.     0.     0.     0.  1000.  3500.]
 [    0.     0.     0.     0.     0.     0.  5000.]
 [    0.     0.     0.     0.     0.     0.     0.]]
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 3. 3. 3.]
 [0. 0. 0. 2. 3. 3. 3.]
 [0. 0. 0. 0. 3. 3. 3.]
 [0. 0. 0. 0. 0. 4. 5.]
 [0. 0. 0. 0. 0. 0. 5.]
 [0. 0. 0. 0. 0. 0. 0.]]


<center><img src="images/S6_Tables.png" width="900" alt="Example" /></center>

4. **Step 4: Constructing an optimal solution**.

   Although `matrixChainOrder` procedure determines the optimal number of scalar multiplications needed to compute a matrix-chain product, it does not directly show how to multiply the matrices.
   
   The initial call `printOptimalParenthesis(s, 1, n)` procedure prints an optimal parenthesization of $\left \langle A_1, A_2, ..., A_n  \right \rangle $:

In [10]:
s

array([[0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 1., 3., 3., 3.],
       [0., 0., 0., 2., 3., 3., 3.],
       [0., 0., 0., 0., 3., 3., 3.],
       [0., 0., 0., 0., 0., 4., 5.],
       [0., 0., 0., 0., 0., 0., 5.],
       [0., 0., 0., 0., 0., 0., 0.]])

In [11]:
import numpy as np

def printOptimalParenthesis(s, i, j):
    if i == j:
        print(f"A{i}", end=' ')
    else: 
        print("(", end=' ')
        printOptimalParenthesis(s, i, int(s[i,j]))
        printOptimalParenthesis(s, int(s[i,j]+1), j)
        print(")", end=' ')

printOptimalParenthesis(s, 1, 6)

( ( A1 ( A2 A3 ) ) ( ( A4 A5 ) A6 ) ) 

<h1 align="center">End of Seminar</h1>