In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

2.1.0


# Homework 7 - Implementation

## Reading

1. Implementation
Motivation: understand xNN implementation
https://github.com/arthurredfern/UT-Dallas-CS-6301-CNNs/blob/master/Lectures/xNNs_070_Implementation.pdf

__Completed__

2. Efficient processing of deep neural networks: a tutorial and survey
Motivation: an alternative presentation of xNN hardware circa 2017
https://arxiv.org/abs/1703.09039

__Completed__

3. A new golden age for computer architecture
Motivation: a very nice talk from the Turing award winners on the past and future of
hardware and software, available in text and video form
https://cacm.acm.org/magazines/2019/2/234352-a-new-golden-age-for-computer-architecture/fulltext
https://www.youtube.com/watch?v=3LVeEjsn8Ts

__Completed__


## Theory

4. Write out all of the terms of Strassen based matrix matrix multiplication for C = A B with BLAS dimensions M = N = K = 4 by applying the Strassen decomposition twice (an initial decomposition then a recursive decomposition). Use the following notation for the scalars in the 3 matrices


<img src="img/hw7_p1.png">

Apologies for the un beautiful matrix formatting, but the above is meant to represent C = A B

In [54]:
def add_matr(A,B=None):
    if B is None:
        return A
    dim = len(A)
    C = [[0 for j in range(dim)] for i in range(dim)]
    
    for i in range(dim):
        for j in range(dim):
            C[i][j] = str(A[i][j])+ "+" + str(B[i][j])
    return np.array(C)
        
def add(a,b=None):
    if b is None:
        return np.asscalar(a)
    else:
        return np.asscalar(a)+"+"+np.asscalar(b)

In [59]:
def strassen(mat_A, mat_B):
    half = int(len(mat_A)/2)
    A_00 = mat_A[:half, :half]
    A_10 = mat_A[half:,:half]
    A_01 = mat_A[:half,half:]
    A_11 = mat_A[half:, half:]
    B_00 = mat_B[:half, :half]
    B_10 = mat_B[half:,:half]
    B_01 = mat_B[:half,half:]
    B_11 = mat_B[half:, half:]

    partials = []
    if half>1:
        op=add_matr
    else:
        op=add
    
    partials.append([op(A_00, A_11), op(B_00, B_11)])
    partials.append([op(A_10, A_11), op(B_00)])
    partials.append([op(A_00), op(B_01, B_11)])
    partials.append([op(A_11), op(B_10, B_00)])
    partials.append([op(A_00, A_01), op(B_11)])
    partials.append([op(A_10,A_00), op(B_00, B_01)])
    partials.append([op(A_01, A_11), op(B_10, B_11)])
    
    
    if half>1:
        for i in range(7):
            print("S"+str(i+1))
            print(np.array([["M1+M4-M5+M7", "M3+M5"],["M2+M4","M1-M2+M3+M6"]]))
            strassen(partials[i][0],partials[i][1])
    else:
        for i in range(7):
            print("\tM"+str(i)+" = ("+partials[i][0]+")*("+partials[i][1]+")")
    

In [60]:
strassen(A,B)

S1
[['M1+M4-M5+M7' 'M3+M5']
 ['M2+M4' 'M1-M2+M3+M6']]
	M0 = (A00+A22+A11+A33)*(B00+B22+B11+B33)
	M1 = (A10+A32+A11+A33)*(B00+B22)
	M2 = (A00+A22)*(B01+B23+B11+B33)
	M3 = (A11+A33)*(B10+B32+B00+B22)
	M4 = (A00+A22+A01+A23)*(B11+B33)
	M5 = (A10+A32+A00+A22)*(B00+B22+B01+B23)
	M6 = (A01+A23+A11+A33)*(B10+B32+B11+B33)
S2
[['M1+M4-M5+M7' 'M3+M5']
 ['M2+M4' 'M1-M2+M3+M6']]
	M0 = (A20+A22+A31+A33)*(B00+B11)
	M1 = (A30+A32+A31+A33)*(B00)
	M2 = (A20+A22)*(B01+B11)
	M3 = (A31+A33)*(B10+B00)
	M4 = (A20+A22+A21+A23)*(B11)
	M5 = (A30+A32+A20+A22)*(B00+B01)
	M6 = (A21+A23+A31+A33)*(B10+B11)
S3
[['M1+M4-M5+M7' 'M3+M5']
 ['M2+M4' 'M1-M2+M3+M6']]
	M0 = (A00+A11)*(B02+B22+B13+B33)
	M1 = (A10+A11)*(B02+B22)
	M2 = (A00)*(B03+B23+B13+B33)
	M3 = (A11)*(B12+B32+B02+B22)
	M4 = (A00+A01)*(B13+B33)
	M5 = (A10+A00)*(B02+B22+B03+B23)
	M6 = (A01+A11)*(B12+B32+B13+B33)
S4
[['M1+M4-M5+M7' 'M3+M5']
 ['M2+M4' 'M1-M2+M3+M6']]
	M0 = (A22+A33)*(B20+B00+B31+B11)
	M1 = (A32+A33)*(B20+B00)
	M2 = (A22)*(B21+B01+B31+B11)
	M3 

## Practice

5. We’ve seen the importance of quantization to reducing memory, reducing data movement and increasing compute (with the same resources). To gain more experience with quantization, read the following introduction of post training quantization and its implementation in TensorFlow Lite and work through the following examples (though remember that quantization during training is better):

• https://www.tensorflow.org/lite/performance/post_training_quantization

• https://www.tensorflow.org/lite/performance/post_training_quant

• https://www.tensorflow.org/lite/performance/post_training_integer_quant

• https://www.tensorflow.org/lite/performance/post_training_float16_quant


6. Creating tools to automate work is a common activity when working with xNNs. For this problem, you will build a tool to predict performance using a simplified network specification for a simplified hardware architecture specification. 

The network specification can come from scraping the graph of the network created by TensorFlow (preferred), or it can be hand specified as follows. Specify a network as a text file using the following to describe each layer (ID is a unique identifier like a unique number)