# Basic Tensorflow operations

In [3]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow.data as data
import numpy as np

## Date preparation

### Creating datasets

In [11]:
def dataset_slice(dataset, n):
    i = 0
    items = []
    for item in dataset:
        i += 1
        if i > n:
            break
        items.append(item)
    return items


`from_tensor_slices`(*array*) regards the first axis of *array* as the batch indexing dimension.
For example, a (4, 2)-shaped np-array corresponds with a 4-element dataset, in which each element is of (2,)-shape.

In [4]:
dataset1 = data.Dataset.from_tensor_slices(np.array([
    [1, 1],
    [1, 0],
    [0, 1],
    [0, 0]
]))

In [12]:
dataset_slice(dataset1, 3)

[<tf.Tensor 'IteratorGetNext_46:0' shape=(2,) dtype=int32>,
 <tf.Tensor 'IteratorGetNext_47:0' shape=(2,) dtype=int32>,
 <tf.Tensor 'IteratorGetNext_48:0' shape=(2,) dtype=int32>]

`from_tensors`(*array*) creates a dataset with only one element, and the element is just *array*.
In this way, a (4, 2)-shaped input results in a dataset with only one (4, 2)-shaped element.

In [14]:
dataset2 = data.Dataset.from_tensors(np.array([
    [1, 1],
    [1, 0],
    [0, 1],
    [0, 0]
]))
dataset_slice(dataset2, 3)

[<tf.Tensor 'IteratorGetNext_50:0' shape=(4, 2) dtype=int32>,
 <tf.Tensor 'IteratorGetNext_51:0' shape=(4, 2) dtype=int32>,
 <tf.Tensor 'IteratorGetNext_52:0' shape=(4, 2) dtype=int32>]

The input need not be np-arrays:

In [27]:
dataset_slice(
    data.Dataset.from_tensor_slices([
        [1, 1],
        [1, 0],
        [0, 1],
        [0, 0]
    ]), 5
)

[<tf.Tensor 'IteratorGetNext_66:0' shape=(2,) dtype=int32>,
 <tf.Tensor 'IteratorGetNext_67:0' shape=(2,) dtype=int32>,
 <tf.Tensor 'IteratorGetNext_68:0' shape=(2,) dtype=int32>,
 <tf.Tensor 'IteratorGetNext_69:0' shape=(2,) dtype=int32>,
 <tf.Tensor 'IteratorGetNext_70:0' shape=(2,) dtype=int32>]

In [28]:
dataset_slice(
    data.Dataset.from_tensor_slices([
        [1., 1],
        [1, 0],
        [0, 1],
        [0, 0]
    ]), 5
)

[<tf.Tensor 'IteratorGetNext_72:0' shape=(2,) dtype=float32>,
 <tf.Tensor 'IteratorGetNext_73:0' shape=(2,) dtype=float32>,
 <tf.Tensor 'IteratorGetNext_74:0' shape=(2,) dtype=float32>,
 <tf.Tensor 'IteratorGetNext_75:0' shape=(2,) dtype=float32>,
 <tf.Tensor 'IteratorGetNext_76:0' shape=(2,) dtype=float32>]

When the input is a tuple of arrays, 

A tuple of lists causes strange behaviors... Why?? TODO

In [17]:
dataset3 = data.Dataset.from_tensor_slices((
    [1, 1],
    [1, 0],
    [0, 1],
    [0, 0]
))

In [18]:
dataset_slice(dataset3, 2)

[(<tf.Tensor 'IteratorGetNext_54:0' shape=() dtype=int32>,
  <tf.Tensor 'IteratorGetNext_54:1' shape=() dtype=int32>,
  <tf.Tensor 'IteratorGetNext_54:2' shape=() dtype=int32>,
  <tf.Tensor 'IteratorGetNext_54:3' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_55:0' shape=() dtype=int32>,
  <tf.Tensor 'IteratorGetNext_55:1' shape=() dtype=int32>,
  <tf.Tensor 'IteratorGetNext_55:2' shape=() dtype=int32>,
  <tf.Tensor 'IteratorGetNext_55:3' shape=() dtype=int32>)]

In [36]:
dataset4 = data.Dataset.from_tensor_slices((
    np.array([
        [1, 0],
        [0, 1],
        [1, 1],
        [0, 0]
    ]),
    np.array([
        1, 1, 0, 0
    ])
))

dataset_slice(dataset4, 5)

[(<tf.Tensor 'IteratorGetNext_97:0' shape=(2,) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_97:1' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_98:0' shape=(2,) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_98:1' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_99:0' shape=(2,) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_99:1' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_100:0' shape=(2,) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_100:1' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_101:0' shape=(2,) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_101:1' shape=() dtype=int32>)]

In [30]:
dataset5 = data.Dataset.from_tensor_slices((
    np.array([
        [[1, 0]],
        [[0, 1]],
        [[1, 1]],
        [[0, 0]]
    ]),
    np.array([
        1, 1, 0, 0
    ])
))

dataset_slice(dataset5, 5)

[(<tf.Tensor 'IteratorGetNext_84:0' shape=(1, 2) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_84:1' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_85:0' shape=(1, 2) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_85:1' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_86:0' shape=(1, 2) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_86:1' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_87:0' shape=(1, 2) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_87:1' shape=() dtype=int32>),
 (<tf.Tensor 'IteratorGetNext_88:0' shape=(1, 2) dtype=int32>,
  <tf.Tensor 'IteratorGetNext_88:1' shape=() dtype=int32>)]

### Operations on datasets

In [34]:
dataset1 = dataset1.shuffle(3)

## Models

### Creating models

#### Functional API

A model is a collection of connected neurons, with their input and output nodes specified.
In the code below we create an input node (actually what we created is *a bundle of* input nodes, encapsuled into a single vector in the computational graph)

In [2]:
inputs = keras.Input(shape=(10,), name='digits')
x = layers.Dense(32, activation='relu', name='dense_1')(inputs)
x = layers.Dense(32, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)

model1 = keras.Model(inputs=inputs, outputs=outputs, name='3_layer_mlp')


W0102 19:31:10.708741  4892 deprecation.py:506] From C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Pay special attention to the `layers.Dense(64, activation='relu', name='dense_1')(inputs)` statement.
Each `Layer`, `Input` or any other keras objects corresponds to some nodes in the computational graph, 
and by invoking `x = layers.Dense(64, activation='relu', name='dense_1')(inputs)`, we link `x` to `input`, 
creating a path in the computational graph.
When creating a layer, we specify its size, but do not give any information about the size of its antecedent.
This raises a question - since the layer's weights depends on its own size and the size of its antecedent, 
how can the layer be initialized with only its own size given?
The simple answer is: it is not always initialized.
If the size of the antecedent (in this case, it is `input`) is already known, 
then the weights matrix of `x` is also determined, and thus `x` is *built*.
On the other side, `x` is not initialized until the first bunch of data arrives, 
and `x` takes the size of the data as the size of its antecedent.
Of course, if the second bunch of data takes a different size, 
or the first bunch of data takes a shape that is not compatible with `x`, an error occurs.


It is OK to reuse some existing nodes to create another model. In this case, the two models will have shared weights.

In [3]:
y = layers.Dense(2, activation='softmax')(x)
model2 = keras.Model(inputs=inputs, outputs=y)

In [None]:
Now we illustrate the structure of the model

In [9]:
model1.summary()

Model: "3_layer_mlp"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
digits (InputLayer)          [(None, 784)]             0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                50240     
_________________________________________________________________
dense_2 (Dense)              (None, 64)                4160      
_________________________________________________________________
predictions (Dense)          (None, 10)                650       
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________


In [4]:
model2.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
digits (InputLayer)          [(None, 10)]              0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                352       
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense (Dense)                (None, 2)                 66        
Total params: 1,474
Trainable params: 1,474
Non-trainable params: 0
_________________________________________________________________


We can `summary` these two models because the size of the input node is specified, 
so the size of everything else is known, and therefore the two models are already `built` when summarized.

By default, every `Input` node must have a given size, 
so models created with `keras.Model` are always built before running.
That is not the case for sequential models. We will soon say it.

Another notice is the output shape in the summary. 
They are all in the form of ` (None, xxx) `.
All layers we defined have one-dimension inputs, so what is `None` about?
Actually it means batch size.
When defining the inputs and outputs of a certain node, the batch size dimension is excluded.

#### Sequential models

To create a sequential model(i.e. a model in which neurons are organized as a sequence of layers, and neurons in one layer have no or few connection), use the following method.

In [6]:
model3 = keras.Sequential()
model3.add(layers.Dense(8, activation='relu'))
model3.add(layers.Dense(8, activation='sigmoid'))

You can eliminate `model.add` by 

In [4]:
model4 = keras.Sequential([
    layers.Dense(2, activation='relu'),
    layers.Dense(2, activation='sigmoid')
])

Trying to `summarize` the model results as

In [7]:
model3.summary()

ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic build.

Just as we expected, the input shape of `model3` is not specified so it is not built,
so it is impossible to summarize the model.
Actually `Sequential` uses an input node slightly different from `keras.Input`.
To ensure the model gets built from beginning, we mannually set the input shape of the first layer:

In [19]:
model5 = keras.Sequential([
    layers.Dense(2, activation='relu', input_shape=(2,)),
    layers.Dense(3, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])
model5.summary()

W0102 23:48:19.868268 16272 deprecation.py:506] From C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2)                 6         
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 9         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 4         
Total params: 19
Trainable params: 19
Non-trainable params: 0
_________________________________________________________________

Another way to build a sequential is by invoking `build` method:

In [3]:
model6 = keras.Sequential([
    layers.Dense(2, activation='relu'),
    layers.Dense(2, activation='relu')
])
model6.build((None,2))
model6.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              multiple                  6         
_________________________________________________________________
dense_4 (Dense)              multiple                  6         
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________


TODO: What does multiple mean?

#### Layers

Details of a layer include
- activation function
- initializer
- regularizer

In [12]:
layer1 = layers.Dense(
    16, 
    activation=tf.keras.activations.sigmoid,
    kernel_regularizer=tf.keras.regularizers.l1(0.01),
    bias_regularizer=tf.keras.regularizers.l2(0.01),
    kernel_initializer='orthogonal',
    bias_initializer=tf.keras.initializers.Constant(2.0),
    input_shape=(2,),
    batch_size=2
)

Note: if a string *s* is passed as the regularizer or initializer, 
`tf.keras.regularizers.__attr__`(*s*) and `tf.keras.initializers.__attr__`(*s*) will be designated as the regularizer or the initializer.

In [28]:
layer1.build((None, 16))

### Compiling

Configuration of the model's optimizer, loss function and metrics is called compiling.

In [32]:
model1.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [37]:
model1.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.binary_crossentropy
)

In [38]:
model1.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.BinaryCrossentropy()
)

Note that camel-named items are classes, while snake-named items are instances of corresponding classes.
That is why we write `BinaryCrossentropy()` but `binary_crossentropy`.

In [20]:
model5.compile(
    optimizer='sgd',
    loss='mse'
)

### Saving and loading models

In [30]:
model1.save('./my_model.h5')
model1_prime = keras.models.load_model('./my_model.h5')
model1_prime.summary()

W0102 20:03:24.132149  4892 deprecation.py:506] From C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\init_ops.py:97: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0102 20:03:24.135138  4892 deprecation.py:506] From C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\init_ops.py:97: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0102 20:03:24.509174  4892 hdf5_format.py:221] No training configuration found in save file: the model was *not* compiled. Compile it manually.
Model: "3_layer_mlp"
_________________________________________________________________
La

In [31]:
model1.summary()

Model: "3_layer_mlp"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
digits (InputLayer)          [(None, 10)]              0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                352       
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
predictions (Dense)          (None, 10)                330       
Total params: 1,738
Trainable params: 1,738
Non-trainable params: 0
_________________________________________________________________


### Illustrating models

In [3]:
model1.summary()

Model: "3_layer_mlp"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
digits (InputLayer)          [(None, 10)]              0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                352       
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
predictions (Dense)          (None, 10)                330       
Total params: 1,738
Trainable params: 1,738
Non-trainable params: 0
_________________________________________________________________


In [4]:
keras.utils.plot_model(model1)
# TODO

Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.


# Training

In [21]:
x_train = np.array([
    [1, 1],
    [1, 0],
    [0, 1],
    [0, 0]
])
y_train = np.array([
    [0],
    [1],
    [1],
    [0]
])

model5.fit(x_train, y_train, epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x25643984630>

When the output is a one-element tensor, it is OK to pass a scalar:

In [22]:
x_train = np.array([
    [1, 1],
    [1, 0],
    [0, 1],
    [0, 0]
])
y_train = np.array([
    0,
    1,
    1,
    0
])

model5.fit(x_train, y_train, epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x25647d9fcc0>

It is also possible to integrate x_train and y_train into one dataset. 
Note that in this case the batch size should be designated explicitly,
since when there is only one dataset, `model.fit` assumes that the dataset will take care of batching,
so it takes whatever the dataset gives as the fitting data.

In [42]:
model5.fit(tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(2), epochs=3)

W0103 00:36:06.083314 16272 training_utils.py:1300] Expected a shuffled dataset but input dataset `x` is not shuffled. Please invoke `shuffle()` on input dataset.
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x25649e77da0>

In [43]:
model5.fit(tf.data.Dataset.from_tensor_slices((x_train, y_train)), epochs=3)

W0103 00:38:23.392063 16272 training_utils.py:1300] Expected a shuffled dataset but input dataset `x` is not shuffled. Please invoke `shuffle()` on input dataset.


ValueError: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

# Predicting and evaluating

In [6]:
model5.predict(np.array([[1, 0]]))

array([[0.635807]], dtype=float32)

In [7]:
model5.predict(np.array([
    [1, 0], 
    [0, 1],
    [1, 1],
    [0, 0]
]))

array([[0.6358069 ],
       [0.6995458 ],
       [0.7994131 ],
       [0.49304864]], dtype=float32)

Some notes on `predict`: 
- Always pass a dataset or NumPy array to it, not a built-in list or tuple.
- Though the input shape is `(2,)`, `predict` actually takes a `(None, 2)`-shaped input.
That is because the first dimension is always the batch index.
Also, though the output shape is `(1,)`, we get a `(None, 1)`-shaped output. 
- The shape of the output is `(1,)`, which means each item in the output is a 1-dim *vector*, not a scalar.
This should not be surprisal given that inputs and ouputs of a layer is always organized as a vector, 
even if there is only one neuron in the layer.

In [None]:
model5.evaluate(x_test,  y_test, verbose=2)