#### About the Notebook
This notebook builds on a custom Linear Layer with weights being built after the layer is invoked. The main purpose is to show the use of Gradient Tape which stores the gradients and can be used manually or along with an optimizer to update the weights.

#### Import the Modules and Check the versions

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import models,layers
import numpy as np
print(tf.__version__)

2.0.0-alpha0


#### Create the Custom Layer

In [2]:
from tensorflow.keras.layers import Layer

In [3]:
class Linear(Layer):
    def __init__(self,units=32):
        super(Linear,self).__init__()
        self.units=units
    ### The build method is to build the weights
    def build(self,input_shape):
        ### We will build the weight using self.add_weight using shape, initializer and setting trainable 
        ### to be true
        self.w = self.add_weight(shape=(input_shape[-1],self.units),initializer='random_normal',trainable=True)
        self.b = self.add_weight(shape=(self.units,),initializer='random_normal',trainable=True)
    
    ### The call method is use to do the computation required in the Layer
    def call(self,inputs):
        return tf.matmul(inputs,self.w)+self.b

#### Download the data and create the Dataset

In [4]:
### Download the mnist dataset from the keras datasets. Wheh load_data() is called it provides the x,y values
### of train and test separately
(x_train,y_train),(x_test,y_test) = keras.datasets.mnist.load_data()
### Create the dataset using tf.data.Dataset using the from_tensor_slices method
### This can be used when the size of the dataset is not big
### the values will have to be reshaped into a 2D vector and normalized (by dividing by 255)
train_ds = tf.data.Dataset.from_tensor_slices((x_train.reshape((60000,784)).astype('float32')/255,y_train))
### Shuffle the dataset and create batches from it
train_ds = train_ds.shuffle(buffer_size=1024).batch(64)

#### Create the layer, specify the parameters and train
Using tf.GradientTape we will collect the gradients of the trainable parameters and update the weights of those parameters using these gradients using the optimizer

In [5]:
### We will create a linear layer (created above)
model = Linear(10)
### Specify the losses and the optimizer. Loss Function used is Sparse Categorical Cross Entropy which is used for
### multi class classification with int labels
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
### build and run the model
for step, (x,y) in enumerate(train_ds):
    ### tf.GradientTape stores the gradients of the trainable parameters
    with tf.GradientTape() as tape:
        logits = model(x)
        loss = loss_fn(y,logits)
    ### We are taking the gradients from the tf.GradientTape after optimizing for loss
    gradients = tape.gradient(loss,model.trainable_weights)
    ### The optimizer is used to update the weights of the trainable parameters using the gradients collected above
    optimizer.apply_gradients(zip(gradients,model.trainable_weights))
    if step%100 == 0:
        print("Step is {} and Loss is {}".format(step,loss))

Step is 0 and Loss is 2.355255126953125
Step is 100 and Loss is 2.2778024673461914
Step is 200 and Loss is 2.1152334213256836
Step is 300 and Loss is 2.0014028549194336
Step is 400 and Loss is 1.987851619720459
Step is 500 and Loss is 1.944572925567627
Step is 600 and Loss is 1.76313054561615
Step is 700 and Loss is 1.7469508647918701
Step is 800 and Loss is 1.6767005920410156
Step is 900 and Loss is 1.6092195510864258
