# [Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming - Week4](https://www.coursera.org/learn/algorithms-greedy/home/week/4)

- The Knapsack Problem
- Sequence Alignment

## XXVI. The Knapsack Problem

### [The Knapsack Problem](https://www.coursera.org/learn/algorithms-greedy/lecture/LIgLJ/the-knapsack-problem)

#### Problem Definition

![](https://i.imgur.com/PNIZ1vG.png)
- (video unwatched)
- **Q: 「size $w_i$ 跟 capacity $W$ 是 integer」是 algorithm 能 work 的必要條件嗎?**
    - A: yes，後面就知道為什麼啦~
- 我們可以用跟前一周 Weighted Independent Set 相同的方式來找到對這個問題的 DP 解法

#### Developing a Dynamic Programming Algorithm

![](https://i.imgur.com/9OSxgsX.png)
- (video unwatched)
- (以下感覺算是通用解法? meta-knowledge)
- 劇透：
    - Step 1: Formulate recurrence [optimal solution as function of solutions to "smaller subproblems"] based on a structure of an optimal solution
    - Step 2: Identify the subproblems
    - Step 3: Use recurrence from step 1. to systematicaly solve all problems.
- 我們先來做 step 1。
- 令 $S$ = optimal solution (of arbitrary instance?)
- 那麼 optimal solution $S$ 就會分成兩個 case
- Case 1: item $n\notin S$
    - Then? 對前 $n-1$ 個 item 來說，$S$ 一定也是 optimal (with capacity $W$)
        - 因為若前 $n-1$ 個 item 的 optimal $S^*\ne S$，那麼 $S^*$ 對這 $n$ 個 item 來說也會是 optimal [contradiction]
- Case 2: item $n\in S$
    - Then?

#### Optimal Substructure

![](https://i.imgur.com/g1LEqCr.png)
- (video unwatched)
- [ ] is an optimal solution with respect to the 1st $n-1$ items and capacity $W$
- [ ] is an optimal solution with respect to the 1st $n-1$ items and capacity $W-v_n$
- [ ] is an optimal solution with respect to the 1st $n-1$ items and capacity $W-w_n$
- [ ] might not be feasible for capacity $W-w_n$

Ans: <font style="opacity:.05">$S-\{n\}$ is an optimal solution with respect to the 1st $n-1$ items and capacity $W-w_n$</font>

Proof: If $S^*$ has higher value than $S-\{n\}+\textrm{total size}\le W-w_n$, then $S^*\cup\{n\}$ has size $\le W$ and value more than $S$ [contradiction]

### [A Dynamic Programming Problem](https://www.coursera.org/learn/algorithms-greedy/lecture/0n68L/a-dynamic-programming-algorithm)

#### Recurrence from Last Time

![](https://i.imgur.com/0GJeF7F.png)
- (video unwatched)
- 相較於之前的 WIS 的 DP，我們的 variable 多了 total size
- Edge case: 如果 $w_i$ 就已經大於 size constraint $x$ 了，那 $V_{i,x}$ 必等於 $V_{(i-1),x}$

#### The Subproblems

![](https://i.imgur.com/4j2wabr.png)
- (video unwatched)
- 來到了 Step 2: Identify the subproblems.
    - 我們來看看所有 distinct subproblems 有哪些
    - 所有 possible 的前 $k$ 個 item (prefixeds of items)：$\{1,2,...,i\}$
        - **Q: 不是應該是 $1,2,...,n$ 哦?**
    - 所有 possible 的 residual capacities (剩餘容量) $x\in\{0,1,2,...,W\}$
        - **這邊要注意，因為問題裡的 capacity 跟 weight 都是 integer，所以才能有這種性質！**
- 最後是 Step 3: Use recurrence from Step 1 to systematically solve all problems.
    - 利用 step 1 和 step 2 得到的結論來寫出 pseudocode！
    - 註：若有 case 是前面提到的 $w_i>x$，那麼 index 就會是負的，會 error 所以要另外處理吧

#### Running Time
Question: What is the running time of this algorithm?
- (video unwatched)
- [ ] $\Theta(n^2)$
- [ ] $\Theta(nW)$
- [ ] $\Theta(n^2W)$
- [ ] $\Theta(2^n)$

Ans: <font style="opacity:.05">$\Theta(nW)$ ($\Theta(nW)$ subproblems, solve each in $\Theta(1)$ time)</font>

Proof correctness by induction [use step 1 argument to justify inductive step]
<!-- ![](https://i.imgur.com/005I4TB.png) -->

### [Example [Review - Optional]](https://www.coursera.org/learn/algorithms-greedy/lecture/LADQc/example-review-optional)

- watch later

## XXVII. Sequence Alignment

### [Optimal Substructure](https://www.coursera.org/learn/algorithms-greedy/lecture/QJkyp/optimal-substructure)

#### Problem Definition

![](https://i.imgur.com/UyzCFwC.png)
- (video unwatched)
- review 一下之前的 sequence alignment problem
- Input: $X$ of length $m$, $Y$ of length $n$
- 我們想要 minimize total penalty
    - 注意這邊 mismatch 的 case 若不同，就會有不同的 penalty
        - $\alpha_{ab}$ 就是把 $a,b$ 做 match 的 penalty。若 $a=b$ 則 $\alpha_{ab}=0$


#### A Dynamic Programming Approach

<!-- ![](https://i.imgur.com/aRDTHlV.png) -->
![](https://i.imgur.com/wmZYxvX.png)
- (video unwatched)

##### Question

![](https://i.imgur.com/5WMIueK.png)
- 來看看，假設我們已經找到 optimal solution，那麼這兩個 sequences 的 final position 有幾種可能性?
- [ ] $2$
- [ ] $3$
- [ ] $4$
- [ ] $mn$

Ans: <font style="opacity:.1">3 種 case：$(x_m,y_n)$、$(x_m,gap)$、$(gap,y_n)$</font>

#### Optimal Substructure

![](https://i.imgur.com/0ghCTbC.png)
- (video unwatched)
- 假設把 $X$ 拿掉 $x_m$ 的 subproblem 叫 $X'$、把 $Y$ 拿掉 $y_n$ 的 subproblem 叫 $Y'$
- 那麼我們可以分解出的 optimal structure 也分 3 種 case

#### Optimal Substructure (Proof)

![](https://i.imgur.com/2Si3Tb7.png)
- (video unwatched)
- 這個證明滿直觀的，而且 3 個 case 都可以套用

### [A Dynamic Programming Problem](https://www.coursera.org/learn/algorithms-greedy/lecture/tNmae/a-dynamic-programming-algorithm)

#### The Subproblems

![](https://i.imgur.com/y8bzUtu.png)
- (video unwatched)
- 這樣所有的 subproblem 就能夠寫成如右形式：$(X_i,Y_j)$，因為我們分解 subproblems 的時候是從右邊一個一個把 letter 砍掉！
    - $X_i$ 就是 $X$ 的前 $i$ 個 letters
    - $Y_j$ 就是 $Y$ 的前 $j$ 個 letters


#### The Recurrence

![](https://i.imgur.com/IE4KJ0s.png)
- (video unwatched)
- Notation: $P_{ij}$ 為 $X_i,Y_j$ 的 penalty of optimal alignment

#### Quiz: Base Cases

Questions: What is the value of $P_{i,0}$ and $P_{0,i}$ ?
- (video unwatched)
- [ ] $0$
- [ ] $i\cdot \alpha_{gap}$
- [ ] $+\infty$
- [ ] undefined

#### The Algorithm

![](https://i.imgur.com/UcwAC8M.png)
- (video unwatched)
- 嗯 直觀

#### Reconstructing a Solution

![](https://i.imgur.com/1h1PaFB.png)
- (video unwatched)
- 一樣我們希望從 optimal value 重建出 optimal solution
- 從 $A[m,n]$ 開始，直接利用 table 來比對我們是用哪個 case，然後做 match。
- running time: $O(m+n)$

## XVIII. Optimal Search Trees

### [Problem Definition](https://www.coursera.org/learn/algorithms-greedy/lecture/GKCeN/problem-definition)

#### Multiplicity of Search Trees

![](https://i.imgur.com/GbW0REG.png)
- (video unwatched)
- 一般來說，好的 search tree 就是 balanced search tree

#### Exploiting Non-Uniformity

<!-- ![](https://i.imgur.com/Q5nKFoq.png) -->
![](https://i.imgur.com/SnJwI2I.png)
- (video unwatched)
但是當 keys 的出現頻率是 non-uniform 的時候呢?

Ans: <font style="opacity:.1">1.9 and 1.3</font>

#### Problem Definition

![](https://i.imgur.com/sDpGtyX.png)
- (video unwatched)
- 這邊假設 items 的 keys 是 $1,2,...,n$ (嗎?
- objective function: $C(T)=\sum_{\textrm{items }i}p_i\cdot$ [1 + Depth of $i$ in $T$]
- Example: 若 $T$ 是一個 red-black tree，則 $C(T)=O(\log n)$
    - 思考：因為 <font style="opacity:.2">$p_i$ 加總就是 1，而 depth 最多就是 $c\log n$，所以 summation $\le\sum_{\textrm{items }i}p_i\cdot c\log n=c\log n$</font>

#### Comparison with Huffman Codes

![](https://i.imgur.com/n0ZWAXz.png)
- (video unwatched)
- Similarities
    - 都想 output binary tree
    - objective function 都是 minimize average code length (depth of code)
- Differences
    - constraint 不一樣：
        - Huffman codes 的 constraint 是 symbols only at leaves
        - Optimal BST 的 constraint 是要滿足 search tree property

### [Optimal Substructure](https://www.coursera.org/learn/algorithms-greedy/lecture/rUDLu/optimal-substructure)

#### Problem Definition

![](https://i.imgur.com/nIUFJ6p.png)
- (video unwatched)
- just review

#### Greedy Doesn't Work

<!-- ![](https://i.imgur.com/GfYnYER.png) -->
![](https://i.imgur.com/IXPD31v.png)
- (video unwatched)
- Intuition: 越靠近 root 就越 frequently accessed
    - **注意：要同時 maintain search tree property**
- 紫色數字是 key；綠色數字是 frequency
- 不論是 bottom-up 還是 top-down 都不能保證得出 optimal solution
    - slide typo: 上方的 node 4 的 frequency 是 1(%) 不是 2(%)
    - bottom-up 流程(左上)：會先把 frequency 最小的放在最下面 => not work
        - 所以說我們在 Huffman code 做的事情，不能直接套用到這裡
    - top-down 流程(左下)：會先把 frequency 最大的放在最上面 => not work
- (目前沒有 greedy algorithm 能夠解 optimal BST 這個問題)

#### Choosing the Root

![](https://i.imgur.com/SU1O2hw.png)
- (video unwatched)
- 總之，選擇不同的 root 會造成不同的、難以預期的結果
- 這阻礙了 greedy 跟 divide & conquer approaches
- 那麼假設我們知道誰是 optimal root，那麼就可以 recursively 計算左右的 subtree 了
    - 有沒有覺得很熟悉呀? ~~沒有~~
    - 用 DP 試試所有可能?
- 註：repercussion (壞的)影響；惡果
- 註：stymie 阻止；妨礙；阻撓

#### Optimal Substructure

![](https://i.imgur.com/IrxTrpK.png)
- (video unwatched)
- **meta-knowledge: 這邊我們又要來思考「optimal solution 有什麼樣的性質」=> 以 DP 來說就是思考「optimal solution 跟 subproblem 的 optimal solution 有什麼樣的關係」**
- [ ] Neither $T_1$ nor $T_2$ need be optimal for the items it contains.
- [ ] At least one of $T_1,T_2$ is optimal for the items it contains.
- [ ] Each of $T_1, T_2$ is optimal for the items it contains.
- [ ] $T_1$ is optimal for the keys $\{1,2,...,r-1\}$ and $T_2$ for the keys $\{r+1,r+2,...,n\}$

Ans: <font style="opacity:.05">4 (3 is also correct but 4 is stronger statement)</font>

### [Proof of Optimal Substructure](https://www.coursera.org/learn/algorithms-greedy/lecture/0qjbs/proof-of-optimal-substructure)

#### Proof of Optimal Substructure

![](https://i.imgur.com/5GPctMW.png)
- (video unwatched)
- 證明思路感覺還算直觀 (反證法)，先假設 optimal tree 的某個 subtree 不是該 subtree 的 optimal solution，再推導出原本的 optimal tree 不是 optimal tree [contradiction]

#### Proof of Optimal Substructure (con'd)

![](https://i.imgur.com/RaOlrXX.png)
- (video unwatched)
- 注意一下變換，因為 $T$ 比 $T_1$ 跟 $T_2$ 的 height 都多 1

### [A Dynamic Programming Algorithm I](https://www.coursera.org/learn/algorithms-greedy/lecture/3wrTN/a-dynamic-programming-algorithm-i)

#### Optimal Substructure

![](https://i.imgur.com/SlPO4xT.png)
- (video unwatched)
- 根據之前的推導，我們得到了 Optimal Substructure Lemma
    - 如果 $T$ 是 optimal BST，則左右子樹也都是 optimal BST

#### Relevant Subproblems

![](https://i.imgur.com/7xSpNnC.png)
- (video unwatched)
- [ ] Prefixes ($S=\{1,2,...,i\}$ for every $i$)
- [ ] Prefixes and suffixes ($S=\{1,...,i\}$ and $\{i,...,n\}$ for every $i$)
- [ ] Contiguous intervals ($S=\{i,i+1,...,j-1,j\}$ for every $i\le j$)
- [ ] All subsets $S$

Ans: <font style="opacity:.05">Contiguous intervals</font>

#### The Recurrence

![](https://i.imgur.com/LjdXYmc.png)
- (video unwatched)
- 回想一下 $C(T)$ 的 formula 應該就可以寫出 $C_{ij}$ 的 recurrence
- Q: Correctness 暫時不確定想表達什麼，先略過

### [A Dynamic Programming Algorithm II](https://www.coursera.org/learn/algorithms-greedy/lecture/5ERYG/a-dynamic-programming-algorithm-ii)

#### The Algorithm

<!-- ![](https://i.imgur.com/MSudnKl.png) -->
![](https://i.imgur.com/OGLksAI.png)
- $A[i,j]$ 代表 從 item $i$ 到 item $j$ 的 optimal BST value
- outer loop $s$ 代表 subproblem size
- inner loop $i$ 代表 subproblem 看的第一個 item
- **pseudocode 若看不懂可以先圖像化思考** (meta-knowledge)
- Pictorially (可以圖像化的看這個解法)
    - $i$ 是橫軸； $j$ 是縱軸
    - 當 $i=j$ 時，可以直接填 $p_i$
    - 右下部分可以填 $0$，因為此時 $i>j$ 
    - 再回頭看 pseudocode，我們的 loop 是斜地跑
        - 第一個 outer loop 先解所有的 $A[i,i]$。
        - 第二個 outer loop 解所有的 $A[i,i+1]$ ...依此類推
        - **Q: 這樣看來，pseudocode 的 inner loop 好像有多跑? 感覺應該只需要 For i = 1 to (n-s) 吧**
<font style="opacity:.2"></font>

#### Running Time

![](https://i.imgur.com/5nGArh9.png)
- 目前這樣看來，time complexity 為 $\Theta(n^3)$
- 但其實可以改良到 $\Theta(n^2)$
    - **Q: 我感覺只要先用 $\Theta(n^2)$ 把 $\sum_{k=i}^{i+s}p_k$ 的各種 $i,s$ 組合都算出來就可以了R，為何影片一直提到什麼 BFS?**