In today's notes, we are going to focus on the following: 

1. Strassen's Matrix Multiplication algorithm. 
2. Optimizing the implementation of the algorithm in Julia. 
3. Paralellizing the algorithm in Julia while keeping the efficiecy. 
4. Bench mark and show that the algorithm is indeed having a complexity less than $O(N^3)$, assuming the matrices are $N \times N$. 

---
### Strassen's Matrix Multiplication Algorithm

The objective of the algorithm is to compute the product of 2 matrices: $A$, $B$. The algorithm uses an recursive approach, here we consider the recursive structure of the matrix in the following ways: 

$$
A = \begin{bmatrix}
A_{1, 1} & A_{1, 2} \\
A_{2, 1} & A_{2, 2} 
\end{bmatrix}
\quad 
B = \begin{bmatrix}
B_{1, 1} & B_{1, 2}\\
B_{2, 1} & B_{2, 2}
\end{bmatrix}
\quad 
C = \begin{bmatrix}
C_{1, 1} & C_{1, 2}\\
C_{2, 1} & C_{2, 2}
\end{bmatrix}
$$

And then, we use these submatrices to compute some intermediate matrices:
$$M_1 = (A_{1, 1}+ A_{2,2})(B_{1, 1} + B_{2,2})$$
$$M_2 = (A_{2, 1} + A_{2,2})B_{1,1}$$
$$M_3 = A_{1,1}(B_{1,2} - B_{2,2})$$
$$M_4 = A_{2,2}(B_{2,1} - B_{1,1})$$
$$M_5 = (A_{1,1} + A_{1,2})B_{2,2}$$
$$M_6 = (A_{2,1} - A_{1,1})(B_{1,1} + B_{1,2})$$
$$M_7 = (A_{1,2} - A_{2,2})(B_{2,1} + B_{2,2})$$

Using these 7 intermediate matrices which are computed using only 7 multiplications, we will get sub-matrices for the $C$ matrix:

$$
C_{1, 1} = M_1 + M_4 - M_5 + M_7
$$
$$
C_{1, 2} = M_3 + M_5
$$
$$
C_{2, 1} = M_2 + M_4
$$
$$
C_{2,2} = M_1 - M_2 + M_3 + M_6
$$

#### Complications: 
The matrices need to be equally partitioned, meaning that they have size equals to $2^N$. Is it possible to bypass this? 

#### Considering Remainder of Partitioning

if given matrix $A, B$ Matrices are in the size of $M\times N$ and $N\times K$, now assumes that all dimensnion $M, N, K$ are not divisible by 2, then it will left last row and the last column from both matrices that we dont't know how to deal with. 

$$
\widehat{A} = \begin{bmatrix} 
0 & \cdots & a_{1, N} \\
0 & \cdots & a_{2, N} \\ 
0 & \cdots & \vdots \\
a_{M, 1} & \cdots & a_{M, N}
\end{bmatrix}
\quad 
\widehat{B} = \begin{bmatrix} 
0 & \cdots & b_{1, K} \\
0 & \cdots & b_{2, K} \\ 
0 & \cdots & \vdots \\
b_{N, 1} & \cdots & b_{N, K}
\end{bmatrix}
$$

Which is just the outter produce between the last row of the $B$ Matrix, $B_{end, :}$ and the last column of the $A$ matrix: $A_{:, end}$ but with the last element replaced by the dot product of the last column of the $B$ matrix and the last row of the $A$ matrix. 

Let's denote the outter product of the last column of Matrix $A$ and the last row of the matrix $B$ to be $T$, and let $G$ denotes the inner product of the last row of $A$ and the last column of matrix $B$. So then the product $\widehat{A} \widehat{B}$ can be written as: 

$$
\widehat{A}\widehat{B} = 
\begin{bmatrix}
(T)_{1, 1} & (T)_{1, 2} & \cdots & T_{1, K} \\ 
(T)_{2, 1} & (T)_{2, 2} & \cdots & T_{2, K} \\
\vdots \\ 
(T)_{M, 1} & (T)_{M, 2} & \cdots & G
\end{bmatrix}
$$

---
#### Take notes
Notice that there are several cases when the submatrices cannot be evenly partitioned, and we are going to make the argument simplier by considering Block structured Matrices. 

The block structure of the matrix $\widehat{A}$ and $\widehat{B}$, written as the following: 

$$
\widehat{A}\widehat{B} =
\begin{bmatrix}
\mathbb{0} & b \\ 
a^T & x
\end{bmatrix}
\begin{bmatrix}
\mathbb{0} & c \\ 
d^T & y
\end{bmatrix}
=\begin{bmatrix}
bc^T & by \\
xc^T & a^Td + xy
\end{bmatrix}
$$


And this part is going to be involved into our algorithm when the matrices cannot be evenly partitioned, We set parts of the vector and scalars to be the value we want, and if absent, then we set them to zero, and then we use the results from above to compute and then add the value back. 

Note: this kind of non-even partitioning fixing is subject to each recursive step of the algorithm. 





