In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

---
Seperate `.Reshape` Layer
---

The `tf.keras.layers.Reshape` method reshapes input layer into the given shape. Suppose for the MNIST Database that has `X_train.shape = (m, 784)`, we take the images pixel values. The first (Reshape) layer is called an input layer and takes care of converting the input data for the layers below. Our images are (28,28) = 784 pixels. We’re just converting the 2D 28x28 array to a 1D 784 array (shape of a single training example)

<blockquote>Note: Originally, the mnist database contains 60,000 examples, we've taken 6000 for convenience

In [2]:
m = 6000
model_mnist = keras.Sequential(
    [
        layers.Reshape(target_shape =  (784, ), input_shape = (28,28), name = "layer1"),
        layers.Dense(units = 20, activation = "relu", name = "layer2"),
        layers.Dense(units = 15, activation = "relu", name = "layer3"),
        layers.Dense(units = 10, activation = "softmax", name = "layer4")
    ]
)

All other layers are Dense (interconnected). You might notice the parameter `units`, it sets the number of neurons for each layer. 

The last (output) layer is a special one. It has 10 neurons because we have 10 different types of digits in our data. You get the predictions of the model from this layer.

In [3]:
model_mnist.layers[0].weights

[]

Reasonable since input layer has no weights for input (we dont modify the inputs, only normalize via dividing with 255)

In [4]:
import numpy as np

In [5]:
X = np.random.rand(m,28,28)

In [6]:
y = model_mnist(X)

Let's see the weight matrix of layer 2

In [7]:
model_mnist.layers[1].weights[0]

<tf.Variable 'layer2/kernel:0' shape=(784, 20) dtype=float32, numpy=
array([[ 0.06341705, -0.00515229,  0.05339459, ..., -0.07548207,
         0.04109949, -0.08259559],
       [ 0.02626079, -0.08455778, -0.00049573, ...,  0.02106384,
        -0.01787308, -0.07395803],
       [-0.07881464, -0.05320477, -0.02720047, ..., -0.05149955,
         0.02208954,  0.00111201],
       ...,
       [-0.02048101,  0.03883322, -0.02884215, ..., -0.02147762,
         0.04683614, -0.02072953],
       [-0.02436427, -0.02305762,  0.04502301, ..., -0.05181403,
         0.04863092,  0.01638307],
       [ 0.04234535, -0.06593636,  0.05577298, ...,  0.01808231,
         0.05588435, -0.0305473 ]], dtype=float32)>

In [8]:
model_mnist.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Reshape)            (None, 784)               0         
                                                                 
 layer2 (Dense)              (None, 20)                15700     
                                                                 
 layer3 (Dense)              (None, 15)                315       
                                                                 
 layer4 (Dense)              (None, 10)                160       
                                                                 
Total params: 16,175
Trainable params: 16,175
Non-trainable params: 0
_________________________________________________________________


The `.weights` attribute returns a list of (weights, biases) thus indexing to 0 returns the weights and indexing to 1 returns the bias

In [9]:
model_mnist.layers[1].weights[1]

<tf.Variable 'layer2/bias:0' shape=(20,) dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.], dtype=float32)>

No need to worry about the shape since the weight matrix is transposed during $W^T \cdot X + b$

---
`input_shape` & `batch_size` argument (recommended)
---
Instead of creating a dense layer, pass an `input_shape` ***specifying the batch size of a single training example***

In `input_shape`, ***the batch dimension is not included.***

If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a `batch_size` argument to a layer. If you pass both `batch_size=32` and `input_shape=(6, 8)` to a layer, it will then expect every batch of inputs to have the batch shape (32, 6, 8).

In [10]:
m = 6000
model_mnist = keras.Sequential(
    [
        layers.Dense(units = 20, activation = "relu", input_shape = (784, ), batch_size = m, name = "layer2"),
        layers.Dense(units = 15, activation = "relu", name = "layer3"),
        layers.Dense(units = 10, activation = "softmax", name = "layer4")
    ]
)

When passing the input, ***pass the whole batch as input, with the examples arranged row-wise***

In [11]:
x = np.random.rand(m, 784)
y = model_mnist(x)

In [12]:
model_mnist.layers

[<keras.layers.core.dense.Dense at 0x1deb99998e0>,
 <keras.layers.core.dense.Dense at 0x1deb9999850>,
 <keras.layers.core.dense.Dense at 0x1deb9999e20>]

In [13]:
model_mnist.layers[0].weights

[<tf.Variable 'layer2/kernel:0' shape=(784, 20) dtype=float32, numpy=
 array([[ 0.0665631 ,  0.08623147,  0.00196733, ..., -0.08250167,
         -0.01381974, -0.04086908],
        [ 0.08275779, -0.08443937, -0.05693385, ..., -0.03143446,
         -0.00321385,  0.05273043],
        [ 0.02321914, -0.0463839 ,  0.07912473, ..., -0.06857607,
          0.01187074, -0.00695767],
        ...,
        [-0.0231508 , -0.04780063,  0.031583  , ...,  0.06229813,
         -0.01030135, -0.0825861 ],
        [ 0.04937162, -0.048889  ,  0.02012544, ..., -0.03416599,
          0.07288265,  0.04098983],
        [ 0.05486134,  0.02025336, -0.01189148, ..., -0.06408609,
         -0.06474005, -0.07898491]], dtype=float32)>,
 <tf.Variable 'layer2/bias:0' shape=(20,) dtype=float32, numpy=
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0.], dtype=float32)>]

In [14]:
model_mnist.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer2 (Dense)              (6000, 20)                15700     
                                                                 
 layer3 (Dense)              (6000, 15)                315       
                                                                 
 layer4 (Dense)              (6000, 10)                160       
                                                                 
Total params: 16,175
Trainable params: 16,175
Non-trainable params: 0
_________________________________________________________________


Observe the difference in the summary output's output shape column of both methods. 

---
Preferred Method
---
The above method is equivalent to explicitly specifying an `input` (layer 0) with same arguments - `batch_size` and `shape` (same as `input_shape`)

In [15]:
m = 6000
model_mnist = keras.Sequential(
    [
        keras.Input(shape = 784, batch_size = m, name = "layer0"),
        layers.Dense(units = 20, activation = "relu", name = "layer1"),
        layers.Dense(units = 15, activation = "relu", name = "layer2"),
        layers.Dense(units = 10, activation = "softmax", name = "layer3")
    ]
)
x = np.random.rand(m, 784)
y = model_mnist(x)

In [16]:
model_mnist.layers[0].weights

[<tf.Variable 'layer1/kernel:0' shape=(784, 20) dtype=float32, numpy=
 array([[-0.02716694,  0.04937129, -0.0426433 , ...,  0.05493189,
          0.04167481,  0.03356742],
        [-0.05586587,  0.02772526, -0.0515061 , ...,  0.0767339 ,
         -0.01936822, -0.00356047],
        [-0.05499648,  0.06784353, -0.04141156, ...,  0.00869787,
         -0.05308804, -0.03030455],
        ...,
        [-0.08397981, -0.05376915, -0.06199621, ...,  0.05157672,
         -0.05533327,  0.07808746],
        [-0.02395387, -0.02428569,  0.03156265, ...,  0.00683144,
          0.06160909, -0.00181548],
        [ 0.03074189,  0.07069701,  0.01362167, ..., -0.00930146,
         -0.06262088, -0.02077889]], dtype=float32)>,
 <tf.Variable 'layer1/bias:0' shape=(20,) dtype=float32, numpy=
 array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0.], dtype=float32)>]

In [17]:
model_mnist.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (6000, 20)                15700     
                                                                 
 layer2 (Dense)              (6000, 15)                315       
                                                                 
 layer3 (Dense)              (6000, 10)                160       
                                                                 
Total params: 16,175
Trainable params: 16,175
Non-trainable params: 0
_________________________________________________________________
