# Chapter 10 - Introducing DL and TF

In [1]:
import tensorflow as tf
from tensorflow import keras

In [4]:
# Loading fashion MNIST dataset
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [5]:
# datasets insights
print(X_train_full.shape)
print(X_train_full.dtype)

(60000, 28, 28)
uint8


In [6]:
# creating validation set and scaling images
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

In [7]:
# index assigment to the class images
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
class_names[y_train[0]]

'Coat'

In [None]:
# Model using the Sequential API
model = keras.models.Sequential()
# flatten computes the same as using X.reshape(-1, 1)
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
# softmax for multi class classification
model.add(keras.layers.Dense(10, activation="softmax"))

In [10]:
# We can pass a list of layers
model = keras.models.Sequential([
# flatten computes the same as using X.reshape(-1, 1)
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation="relu"),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

2022-01-02 08:58:58.549268: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [11]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________


In the dense layer are a lot of trainable parameters, but this can mean overfitting, especially if we don't have a lot of training data.

In [13]:
# Accesing parameters
weights, biases = model.layers[1].get_weights()
print(weights)
print(biases)

[[-0.0427193   0.06983759  0.01617632 ...  0.03645301  0.05874191
  -0.02300327]
 [-0.04519181  0.01063262 -0.03289299 ...  0.02470661  0.0589585
  -0.0414156 ]
 [ 0.06095725  0.02318412 -0.03409308 ... -0.02777764 -0.01964277
   0.05559188]
 ...
 [ 0.00396863 -0.05053726 -0.03997453 ...  0.06997421 -0.05281679
  -0.03008704]
 [-0.0585525   0.0056704  -0.03836118 ... -0.05477173 -0.05920067
  -0.01448533]
 [-0.06434825  0.06261031 -0.02577966 ... -0.06849533 -0.0001585
  -0.03367925]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

If we ever want another weights or biases initialization, we can
set _kernel_initializer_ or _bias_initializer_ when creating a layer.

If there isn't the input_shape, Keras will figure it out when feeding the data or calling the _build()_ method. But the random initialization won't happen until then.

In [None]:
# Compiling the model and specifying the loss function and the optimizer
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])
# Is the same using:
# keras.metrics.sparse_categorical_accuracy and keras.optimizers.SGD()

We use __sparse_categorical_crossentropy__ loss because we have sparse labels (10 classes, we only predict one), we would use __categorical_crossentropy__ if we had one target probability per class for each instance (ex, one hot vectors, class1=[1, 0 ... 0]). If it were binary classification, in the last layer we would use the __sigmoid__ activation function and __binary_crossentropy__ as the loss.

To convert to 1-hot-vector from sparse, use: _keras.utils.to_categorical()_ . The other way around: _argmax(axis=1)_ 

Using _optimizer="sgd"_ defaults to lr=0.01. Better: _keras-optimizers.SGD(lr=???)_