# Homework 1
**Note 1:** To run this code it is necessary to use the `matrix.py` file that comes with the notebook, in the original one infact gauss multiplication is not implemented.

**Note 2:** The resolution of the exercise is present in the text in markdown but also in the comments of the code, comments are more handy to point out important choiches in the code

## Ex1
The implementation of the strassen algorithm for matrix multiplicaition follows the steps seen in the lessons. The first case is to consider $2^n x 2^n$ matrices. In the implementation there is a little additional code to check if the matrices satisfy the shape condition.

In [1]:
from matrix import *
from random import random
#implementation of Base strassen matrix multiplication
def isPwr2(x): 
    #uses the fact that a power of 2 in binary has one 1 and the remaining digits are 0
    #so 16 = 10000, 32 = 100000 and so on 
    #and eg. 16 - 1 = 15 = 01111 this holds for every 2**n
    #so bitwise 16 && 15 = 10000 && 01111 = 00000
    #so not(16 && 15) returns 1
    return not(x & (x - 1))

def strassen_matrix_mult(A: Matrix, B: Matrix) -> Matrix:
    if A.num_of_cols != B.num_of_rows:
        raise ValueError("Wrong matrix shape: number of columns of A is %d, number of rows of B is %d"
                         %(A.num_of_cols, B.num_of_cols) )
        
    if (A.num_of_cols != A.num_of_rows or B.num_of_cols != B.num_of_rows) and not (isPwr2(A.num_of_cols)) :
        raise NotImplementedError("This implemetation deals with SQUARE matrices products with use instead GENERAL_strassen_matrix_mul")
    
    #Base case
    if A.num_of_cols < 32:
        return gauss_matrix_mult(A,B)
    
    #quadrant subdivision
    n_half = A.num_of_cols//2
    
    A11 = A.submatrix(0, n_half, 0, n_half)
    A21 = A.submatrix(n_half, n_half, 0, n_half)
    A12 = A.submatrix(0, n_half, n_half, n_half)
    A22 = A.submatrix(n_half, n_half, n_half, n_half)
    
    B11 = B.submatrix(0, n_half, 0, n_half)
    B21 = B.submatrix(n_half, n_half, 0, n_half)
    B12 = B.submatrix(0, n_half, n_half, n_half)
    B22 = B.submatrix(n_half, n_half, n_half, n_half)
        
    S1 = B12 - B22
    S2 = A11 + A12
    S3 = A21 + A22
    S4 = B21 - B11
    S5 = A11 + A22
    S6 = B11 + B22
    S7 = A12 - A22
    S8 = B21 + B22
    S9 = A11 - A21
    S10 = B11 + B12
    
    P1 = strassen_matrix_mult(A11,S1)
    P2 = strassen_matrix_mult(S2,B22)
    P3 = strassen_matrix_mult(S3,B11)
    P4 = strassen_matrix_mult(A22,S4)
    P5 = strassen_matrix_mult(S5,S6)
    P6 = strassen_matrix_mult(S7,S8)
    P7 = strassen_matrix_mult(S9,S10)
    
    C11 = P5 + P4 - P2 + P6
    C12 = P1 + P2
    C21 = P3 + P4
    C22 = P5 + P1 - P3 - P7
    
    C = Matrix([[0 for j in range(B.num_of_cols)] for i in range(A.num_of_rows)])
    
    C.assign_submatrix(0,0,C11)
    C.assign_submatrix(n_half,0,C21)
    C.assign_submatrix(0,n_half,C12)
    C.assign_submatrix(n_half,n_half,C22)
    
    return C



In [2]:
"""
Checking for the correctness of the result obtained with Strassen's alg.
"""
nc = 128
nr = 128

A = Matrix([[random() for j in range(nc)] for i in range(nr)])
B = Matrix([[random() for j in range(nc)] for i in range(nr)])

c1 = gauss_matrix_mult(A,B)
c2 = strassen_matrix_mult(A,B)

diff = c1 - c2

one_L = Matrix([[1 for j in range(diff.num_of_rows)]])
one_R = Matrix([[1] for i in range(diff.num_of_cols)])

#total sums of the errors on the matrix
#little trick to check on average how much matrices differ
#it is not correct at 100% but it is usefull to have a glance on the correctness of the algorithm 
#a more accurate check should be checking all elements differ below a certain threshold.
gauss_matrix_mult(one_L,gauss_matrix_mult(diff,one_R))

[1.637801005927031e-12]

In [3]:
from time import perf_counter
for n in range(2,10):
    nc = 2**n
    nr = 2**n
    print("n = %d" % 2**n)
    A = Matrix([[random() for j in range(nc)] for i in range(nr)])
    B = Matrix([[random() for j in range(nc)] for i in range(nr)])
    
    t0 = perf_counter()
    c = strassen_matrix_mult(A,B)
    t1 = perf_counter()
    
    print("Strassen alg elapsed time: %.4f s" % (t1-t0))
    
    t0 = perf_counter()
    c = gauss_matrix_mult(A,B)
    t1 = perf_counter()
    
    print("Gauss alg elapsed: %.4f s" % (t1-t0))
    print("-------")
    
   



n = 4
strassen elapsed: 0.0001
gauss elapsed: 0.0000
-------
n = 8
strassen elapsed: 0.0003
gauss elapsed: 0.0003
-------
n = 16
strassen elapsed: 0.0020
gauss elapsed: 0.0020
-------
n = 32
strassen elapsed: 0.0213
gauss elapsed: 0.0191
-------
n = 64
strassen elapsed: 0.1067
gauss elapsed: 0.1016
-------
n = 128
strassen elapsed: 0.7264
gauss elapsed: 0.7993
-------
n = 256
strassen elapsed: 5.1891
gauss elapsed: 6.5072
-------
n = 512
strassen elapsed: 37.2672
gauss elapsed: 61.6406
-------


We can see that strassen algorithm outperforms gauss naive approach, I noticed also that If the base case is "too low" (namely less than 16) the strassen implementation suffers from the overhead it inherently has.

## Ex 2
Strassen's algorithm relies on summations such as B12 + B22 but if matrix B has an odd number
of rows it is not possible to do this sum due to the incompatible shape of the matrices. This kind of summations are performed also on the A matrix

The algorithm works well using also even number of rows/columns. The idea so is to pad the matrix with zeros whenever a dimension is odd. So to add a row or a column (or both) of zeros.

Given $A \in \mathbb{R}^{j\ x\ j}$ and $B \in \mathbb{R}^{j\ x\ k}$ the computational complexity for the Gauss algorithm is $\Theta(i k j) \in O(t^3)$ where $n = \max(i,j,k)$. 

Now using this strategy we can compute the complexity for the proposed strassen approach
Define $T_{SR}(i,j,k)$ the time to execute the strassen algorithm for the multiplication of 2 general matrices.

Now $T_{SR}(i,j,k)$ is equal to:
- 1 if $i,j,k = 1$
- $O(n^2)$ if one dimension of the 3 is equal to 1
- $O({n^{log_2 7}})$ if $i = j = k$ and they are a power of 2;
- $O(n^2) + 7\ T_{SR}(i/2,j/2,k/2)$ if $i,j,k$ even
- if at least one $i,j,k$ is odd then add a row or column of zeros, so in the worst case the cost is $O(n^2) + 7\ T_SR((i + 1)/2,(j + 1) /2,(k + 1) /2)$

This is a recursion tree that behavies the same as the one used to calculate computational cost of strassen algorithm.

So defining $n = \max (i,j,k)\ \ \ T_{SR}(i,j,k) \in O(n^{\log_2 7})$



In [4]:

def GEN_strassen_matrix_mult(A: Matrix, B: Matrix) -> Matrix:
    if A.num_of_cols != B.num_of_rows:
        raise ValueError("Wrong matrix shape: number of columns of A is %d, number of rows of B is %d"
                         %(A.num_of_cols, B.num_of_cols) )
    if (A.num_of_cols == A.num_of_rows and B.num_of_cols == B.num_of_rows) and (isPwr2(A.num_of_cols)) :
        #if the matrices satisfy classical strassen use it
        return   strassen_matrix_mult(A,B)  
    #Base case
    if min(A.num_of_cols,A.num_of_rows,B.num_of_cols)< 32:
        return gauss_matrix_mult(A,B)
    
    #padding
    nr_Ap = A.num_of_rows + A.num_of_rows % 2
    nc_Ap = A.num_of_cols + A.num_of_cols % 2
    
    nr_Bp = B.num_of_rows + B.num_of_rows % 2
    nc_Bp = B.num_of_cols + B.num_of_cols % 2
    
    Ap = Matrix([[0 for j in range(nc_Ap)] for i in range(nr_Ap)])
    Bp = Matrix([[0 for j in range(nc_Bp)] for i in range(nr_Bp)])
    
    Ap.assign_submatrix(0,0,A)
    Bp.assign_submatrix(0,0,B)
    
    
    #quadrant subdivision, a little bit more elaborated
    nr_Ap_half = nr_Ap//2
    nc_Ap_half = nc_Ap//2
    nr_Bp_half = nr_Bp//2
    nc_Bp_half = nc_Bp//2
    
    A11 = Ap.submatrix(0, nr_Ap_half, 0, nc_Ap_half)
    A21 = Ap.submatrix(nr_Ap_half, nr_Ap_half, 0, nc_Ap_half)
    A12 = Ap.submatrix(0, nr_Ap_half, nc_Ap_half, nc_Ap_half)
    A22 = Ap.submatrix(nr_Ap_half, nr_Ap_half, nc_Ap_half, nc_Ap_half)
    
    B11 = Bp.submatrix(0, nr_Bp_half, 0, nc_Bp_half)
    B21 = Bp.submatrix(nr_Bp_half, nr_Bp_half, 0, nc_Bp_half)
    B12 = Bp.submatrix(0, nr_Bp_half, nc_Bp_half, nc_Bp_half)
    B22 = Bp.submatrix(nr_Bp_half, nr_Bp_half, nc_Bp_half, nc_Bp_half)
        
    S1 = B12 - B22
    S2 = A11 + A12
    S3 = A21 + A22
    S4 = B21 - B11
    S5 = A11 + A22
    S6 = B11 + B22
    S7 = A12 - A22
    S8 = B21 + B22
    S9 = A11 - A21
    S10 = B11 + B12
    
    #now use classical strassen
    P1 = GEN_strassen_matrix_mult(A11,S1)
    P2 = GEN_strassen_matrix_mult(S2,B22)
    P3 = GEN_strassen_matrix_mult(S3,B11)
    P4 = GEN_strassen_matrix_mult(A22,S4)
    P5 = GEN_strassen_matrix_mult(S5,S6)
    P6 = GEN_strassen_matrix_mult(S7,S8)
    P7 = GEN_strassen_matrix_mult(S9,S10)
    
    C11 = P5 + P4 - P2 + P6
    C12 = P1 + P2
    C21 = P3 + P4
    C22 = P5 + P1 - P3 - P7
    
    C = Matrix([[0 for j in range(nc_Bp)] for i in range(nr_Ap)])
    
    C.assign_submatrix(0,0,C11)
    C.assign_submatrix(nr_Ap_half,0,C21)
    C.assign_submatrix(0,nc_Bp_half,C12)
    C.assign_submatrix(nr_Ap_half,nc_Bp_half,C22)
    
    #cut out the result
    return C.submatrix(0,A.num_of_rows,0,B.num_of_cols)




In [5]:
nrC = 11
ncC = 10
nrD = ncC
ncD = 10

C = Matrix([[random() for j in range(ncC)] for i in range(nrC)])
D = Matrix([[random() for j in range(ncD)] for i in range(nrD)])


c3 = gauss_matrix_mult(C,D)
c4 = GEN_strassen_matrix_mult(C,D)
diff = c3 - c4

one_L = Matrix([[1 for j in range(diff.num_of_rows)]])
one_R = Matrix([[1] for i in range(diff.num_of_cols)])

#total sums of the errors on the matrix

#gauss_matrix_mult(one_L,gauss_matrix_mult(diff,one_R))

print("c1[0]",c3[0],"\n")
print("c2[0]",c4[0],"\n")

c1[0] [4.088909340167472, 4.158444551565834, 2.5855212076468024, 2.279749505512873, 2.2474032502691363, 1.962782538915654, 2.234552188866843, 3.1172981041796297, 2.938342557196078, 3.052443311912871] 

c2[0] [4.088909340167472, 4.158444551565834, 2.5855212076468024, 2.279749505512873, 2.2474032502691363, 1.962782538915654, 2.234552188866843, 3.1172981041796297, 2.938342557196078, 3.052443311912871] 



In [9]:
"""
now to compare timings we want to use rectangular matrices
the idea is to avoid to use nr << nc (or vice versa) to see if 
Strassen's alg is viable or not
"""
for n in range(2,10):    
    
    nrC = n**3 + 1
    ncC = n**2
    nrD = ncC
    ncD = n**3 + 3
    print("nrA = %d (x) ncA = %d (x) ncB = %d  \n" % (nrC, ncC, ncD))

    C = Matrix([[random() for j in range(ncC)] for i in range(nrC)])
    D = Matrix([[random() for j in range(ncD)] for i in range(nrD)])
    
    t0 = perf_counter()
    c = GEN_strassen_matrix_mult(C,D)
    t1 = perf_counter()
    
    print("GEN Strassen alg elapsed time: %.4f s" % (t1-t0))
    
    t0 = perf_counter()
    c = gauss_matrix_mult(C,D)
    t1 = perf_counter()
    
    print("Gauss alg elapsed time: %.4f s" % (t1-t0))
    print("-------")

nrA = 9 (x) ncA = 4 (x) ncB = 11  

GEN Strassen alg elapsed time: 0.0085 s
Gauss alg elapsed time: 0.0008 s
-------
nrA = 28 (x) ncA = 9 (x) ncB = 30  

GEN Strassen alg elapsed time: 0.0076 s
Gauss alg elapsed time: 0.0062 s
-------
nrA = 65 (x) ncA = 16 (x) ncB = 67  

GEN Strassen alg elapsed time: 0.0420 s
Gauss alg elapsed time: 0.0322 s
-------
nrA = 126 (x) ncA = 25 (x) ncB = 128  

GEN Strassen alg elapsed time: 0.1697 s
Gauss alg elapsed time: 0.1743 s
-------
nrA = 217 (x) ncA = 36 (x) ncB = 219  

GEN Strassen alg elapsed time: 0.6862 s
Gauss alg elapsed time: 0.6660 s
-------
nrA = 344 (x) ncA = 49 (x) ncB = 346  

GEN Strassen alg elapsed time: 2.5066 s
Gauss alg elapsed time: 2.4081 s
-------
nrA = 513 (x) ncA = 64 (x) ncB = 515  

GEN Strassen alg elapsed time: 6.1713 s
Gauss alg elapsed time: 6.5330 s
-------
nrA = 730 (x) ncA = 81 (x) ncB = 732  

GEN Strassen alg elapsed time: 20.2258 s
Gauss alg elapsed time: 29.1103 s
-------


Note that, if the base case of the Strassen's recursion is to low the overhead of the strassen algorithm becomes huge and a comparison between methods becomes very hard, the Strassen's one eventually will be better but using very large matrices. The same thing happens if we consider somehow, pathological cases such as matrices with one dimension significantly smaller than the other. 

## Ex 3

## Ex 4 
It is considered only the case in wich matrices are of dimension $2^n x 2^n$.
The discussion is made in terms of "floating point number required space" so we will indicate as 1 the space required to store a single or double precision number.
Gauss multiplication can be performed always in loco so the space required S(n) belongs to $O(n^2)$ 

For the strassen algorithm we have that:
- For n = 1 we need 1 "cell" of storage
- For n>1 we need n^2 cells for the result + 10 cells for the auxiliary matrices + 7 times the space required to multiply n/2 matrices -> $S_strassen(n) = n^2 + 10 * (n/2)^2 + 7 S_s(n/2)

This is again the recursive tree used in the proof of the complexity of strassen's algorithm so the overall space required is "S_strassen(n) \in O(n^{\log_2 7}) and due to the fact that $\log_2 7 > 2$, $O(n^2) \subset O(n^{\log_2 7})$.

The additional space required so to use strassen's algorithm is $O(n^{\log_2 7})$