## Notes on Chapter 12 of Aurélien Géron's: Hands-On machine learning with SciKit-Learn, Keras and Tensorflow (2nd edition).

In [39]:
import numpy as np

np.random.seed(42)

import tensorflow as tf

tf.random.set_seed(42)

from tensorflow import keras

95% of the code you encounter will require only tf.keras and tf.data. Here you learn to create custom:
  - loss functions
  - metrics
  - layers
  - models
  - initializers
  - regularizers
  - weight constraints
  - training loop
    - 99% of the time you will not need a custom loop
    - for the 1% you need it you can apply
      - special transformations or constraints to the gradients
      - multiple optimizers for different parts of the network

## Tensorflow Library Tour

  - Well suited for heavy computations
  - Fine-tuned for large-scale machine learning
  - Developed by the Google Brain team. Powers many of Google's large-scale services
  - Overview of Tensorflow offerings:
    - Similar to numpy, but with GPU support
    - Each operation
      - is implemented using highly efficient C++ code
      - has multiple implementations called kernels (one for CPUs, one for GPUs, one for TPUs)
    - Supports distributed computing
    - Extracts computation graph from Python functions, optimizes it, runs it in parallel
    - Train a Tensorflow model in one environment (Python on Linux), export the computation graph, run it in another environment (Java on Android)
    - Implements autodiff for computing gradients faster
    - Provides excellent optimizers like RMSProp and Nadam
  - Main uses of Tensorflow:
    - High level Deep learning APIs
      - tf.keras (recommended)
      - tf.estimator
    - Low level deep learning APIs
      - tf.nn
      - tf.losses
      - tf.metrics
      - tf.optimizers
      - tf.train
      - tf.initializers
    - Autodiff
      - tf.GradientTape
      - tf.gradients()
    - I/O and processing
      - tf.data
      - tf.feature_column
      - tf.audio
      - tf.image
      - tf.io
      - tf.queue
    - Visualization with TensorBoard
      - tf.summary
    - Deployment and optimization
      - tf.distribute
      - tf.saved_model
      - tf.autograph
      - tf.graph_util
      - tf.lite
      - tf.quantization
      - tf.tpu
      - tf.xla
    - Special data structures
      - tf.lookup
      - tf.nest
      - tf.ragged
      - tf.sets
      - tf.sparse
      - tf.strings
    - Mathematics, including linear algebra and signal processing
      - tf.math
      - tf.linalg
      - tf.signal
      - tf.random
      - tf.bitwise
    - Miscellaneous
      - tf.compat
      - tf.config
      - and more ...
  - TensorFlow library ecosystem
    - Tensorboard for visualization
    - TensorFlow Extended (TFX) to productionize Tensorflow projects. Includes tools for
      - Data validation
      - Preprocessing
      - Model analysis
      - Serving with TF Serving
    - To download and re-use pre-trained models for particular datasets:
      - https://www.tensorflow.org/resources/models-datasets
      - https://github.com/jtoy/awesome-tensorflow
      - Tensorflow Hub: https://www.tensorflow.org/hub/
      - https://github.com/tensorflow/models/
      - Machine learning papers with code: https://paperswithcode.com/

## Using Tensorflow like numpy

### Tensor operations

In [40]:
t = tf.constant([[1., 2., 3.], [4., 5., 6.]])    # matrix
t

<tf.Tensor: id=86, shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [41]:
tf.constant(10)                                  # scalar

<tf.Tensor: id=87, shape=(), dtype=int32, numpy=10>

In [42]:
t.shape, t.dtype

(TensorShape([2, 3]), tf.float32)

#### Tensorflow indexing is same as Numpy indexing

In [43]:
t[:, :1]                                         # Selecting a column as a 2-D array

<tf.Tensor: id=91, shape=(2, 1), dtype=float32, numpy=
array([[1.],
       [4.]], dtype=float32)>

In [44]:
t[..., 1]                                         # Selecting a column as a 1-D array

<tf.Tensor: id=95, shape=(2,), dtype=float32, numpy=array([2., 5.], dtype=float32)>

In [45]:
t[..., 1, tf.newaxis]                            # The selection is made based on
                                                 # t[..., 1] =>
                                                 #   Select all dimensions until the last
                                                 #   Select only index 1 of the last dimension.
                                                 #   Resulting in [2, 5]
                                                 # Since tf.newaxis is at the last position,
                                                 # the selected values are enclosed inside
                                                 # square brackets for each element =>
                                                 #   Resulting in [[2], [5]]

<tf.Tensor: id=99, shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>

In [46]:
t[0, ..., tf.newaxis]                            # The selection is made based on
                                                 # t[0, ...] =>
                                                 #   Select only index 0 of the first dimension.
                                                 #   Resulting in [1, 2, 3]
                                                 #   Select all dimensions until the last
                                                 # Since tf.newaxis is at the last position,
                                                 # the selected values are enclosed inside
                                                 # square brackets for each element =>
                                                 #   Resulting in [[1], [2], [3]]

<tf.Tensor: id=103, shape=(3, 1), dtype=float32, numpy=
array([[1.],
       [2.],
       [3.]], dtype=float32)>

In [47]:
t[0, tf.newaxis, ...]                            # The selection is made based on
                                                 # t[0, ...] (tf.newaxis ignored) =>
                                                 #   Select only index 0 of the first dimension.
                                                 #   Resulting in [1, 2, 3]
                                                 #   Select all dimensions until the last
                                                 # Since tf.newaxis is at the middle position,
                                                 # the selected values are enclosed inside
                                                 # square brackets for each element =>
                                                 #   Resulting in [[1, 2, 3]]

<tf.Tensor: id=107, shape=(1, 3), dtype=float32, numpy=array([[1., 2., 3.]], dtype=float32)>

In [48]:
t + 10

<tf.Tensor: id=109, shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>

In [49]:
tf.square(t)

<tf.Tensor: id=110, shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

In [50]:
t @ tf.transpose(t)                              # @ represents matrix multiplication. You can
                                                 # also say tf.matmul(t, tf.transpose(t))

<tf.Tensor: id=113, shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

In [51]:
tf.matmul(t, tf.transpose(t))

<tf.Tensor: id=116, shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

In [52]:
# Other math operations:         tf.add(),     tf.multiply(), tf.square(), tf.exp(), tf.sqrt()
# Operations available in numpy: tf.reshape(), tf.squeeze(),  tf.tile()
# Differently-named operations:  tf.reduce_mean(), tf.reduce_sum(), tf.reduce_max(),
#                                tf.math.log()
# The names are different from those in numpy, because the operations do different things. ex:
# t.T in numpy is the transpose, but it is tf.transpose(t) in Tensorflow.
# In numpy, t.T gives you a transposed view on the same data.
# In Tensorflow, you get a copy of the transposed data.
#
# Many classes have aliases. ex. tf.add() is same as tf.math.add().
# This helps keep the packages organized while having concise names for common operations.
#
# If you want code that is usable in other Keras implementations, you should
# use only the functions in the keras.backend. However this is only a subset
# of all the Tensorflow functions. ex:
K = keras.backend
K.square(K.transpose(t)) + 10

<tf.Tensor: id=121, shape=(3, 2), dtype=float32, numpy=
array([[11., 26.],
       [14., 35.],
       [19., 46.]], dtype=float32)>

### Tensors and NumPy

In [53]:
a = np.array([2., 4., 5.])
tf.constant(a)                            # Results in a float64 array.

<tf.Tensor: id=122, shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

In [54]:
# Numpy uses float64 by default.
# Neural networks (and thus tensorflow) use float32, we should use
# tf.constant(a, dtype=tf.float32) instead.
tf.constant(a, dtype=tf.float32)

<tf.Tensor: id=123, shape=(3,), dtype=float32, numpy=array([2., 4., 5.], dtype=float32)>

In [55]:
t.numpy()                                 # Get numpy array from tensor

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

### Type conversions

In [56]:
# Tensorflow does not do type conversions automatically
# tf.constant(2.) + tf.constant(40)         # InvalidArgumentError since float + int
# tf.constant(2.) + \                       # InvalidArgumentError since float32 + float64
#   tf.constant(40., dtype=tf.float64)    

### Variables

In [57]:
# Tensors are immutable. They're used to store data.
# We cannot use tensors to implement weights in a Neural Network.
# We can use Variables instead.
# Can be modified in-place using: 
#   assign() for assigning value to a variable
#   assign_add() for incrementing
#   assign_sub() for decrementing
#
# You can also modify individual cells (or slices) using the cells (or slices) assign(),
# or by using the scatter_update() or scatter_nd_update(). nd stands for n-dimensions.
#
# In practice, you will add weights using add_weight() function.
# You will rarely need to create variables manually.
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [58]:
v.assign(2 * v)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [59]:
v[0, 1].assign(42)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [60]:
v[:, 2].assign([0., 1.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  0.],
       [ 8., 10.,  1.]], dtype=float32)>

In [61]:
v.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[100.,  42.,   0.],
       [  8.,  10., 200.]], dtype=float32)>

### Other Data Structures
  - Sparse tensors (tf.SparseTensor)
    - Represent tensors with mostly zeros, efficiently.
  - Tensor arrays (tf.TensorArray)
    - List of tensors with fixed size by default. Can optionally be made dynamic.
      All tensors they contain must have the same shape and data type.
  - Ragged tensors (tf.RaggedTensor)
    - List of list of tensors, where each tensor has the same shape and data type.
  - String tensors
    - Regular tensors of type string (represents byte strings). Unicode strings are encoded
      to utf-8 automatically. Represent unicode strings using tensors of type tf.int32 with
      4 int32 values representing a unicode code point.
  - Sets
    - Represented as tensors/sparse tensors. ex. tf.constant([[1, 2], [3, 4]]) represents
      two sets {1, 2}, and {3, 4}. Each set is represented as a vector in the tensors
      last axis.
  - Queues
    - Store tensors across multiple steps. FIFOQueue, PriorityQueue, RandomShuffleQueue (shuffles it's items), PaddingFIFOQueue (pads it's differently-shaped items)

## Customizing Models and Training Algorithms