1. Tensorflow is a library which is running highly optimized numerical operations on wide range of different platforms.
2. Tensorflow isn't a drop-in replacement for Numpy, because some of the function behaves differently, some has different names.
3. No, but the only difference will be dtype, which is int64 by default in numpy

In [1]:
import tensorflow as tf
tf.range(10)

<tf.Tensor: shape=(10,), dtype=int32, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)>

In [2]:
import numpy as np
tf.constant(np.arange(10))

<tf.Tensor: shape=(10,), dtype=int64, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>

4.
- RaggedTensors
- Sets
- SparseTensors
- TensorArray
- StringTensors
- Queues

5. By default you should write a function, but when your loss define hyperparameters then you should subclass, because otherwise hyperparameter value won't be saved together with model

6. The same, plus when metric need to be stateful over whole dataset

7. When you should create a custom layer versus a custom model? It doesn't really matter, it's only for distinguish these two components

8. Custom loop is requred when you need very advanced features, for example differente optimizer for different part of yor model, or for debuging

9. They can, but they will be only evaluated during graph creation


In [3]:
import tensorflow as tf
from tensorflow import keras

class LayerNormalization(keras.layers.Layer):
    def build(self, input_shape):
        self.alpha = self.add_weight(shape=input_shape[-1:], name="alpha", dtype=tf.float32, initializer="ones")
        self.beta = self.add_weight(shape=input_shape[-1:], name="beta", dtype=tf.float32, initializer="zeros")
        super().build(input_shape)
    def call(self, inputs, **kwargs):
        mean, variance = tf.nn.moments(inputs, axes=-1, keepdims=True)
        epsilon = 0.001
        std = tf.sqrt(variance + epsilon)
        return self.alpha * (inputs - mean) / (std + epsilon) + self.beta


In [4]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full.astype(np.float32) / 255.
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test.astype(np.float32) / 255.
X = X_train.astype(np.float32)

custom_layer_norm = LayerNormalization()
keras_layer_norm = keras.layers.LayerNormalization()

tf.reduce_mean(keras.losses.mean_absolute_error(
    keras_layer_norm(X), custom_layer_norm(X)))

<tf.Tensor: shape=(), dtype=float32, numpy=0.0028432724>

In [None]:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

In [8]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax"),
])

In [6]:
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(lr=0.01)
loss_fn = keras.losses.sparse_categorical_crossentropy
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.SparseCategoricalAccuracy()]

In [None]:
from tqdm import trange
from collections import OrderedDict

def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

with trange(1, n_epochs + 1, desc="All epochs") as epochs:
    for epoch in epochs:
        with trange(1, n_steps + 1, desc="Epoch {}/{}".format(epoch, n_epochs)) as steps:
            for step in steps:
                X_batch, y_batch = random_batch(X_train, y_train)
                with tf.GradientTape() as tape:
                    y_pred = model(X_batch, training=True)
                    main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
                    loss = tf.add_n([main_loss] + model.losses)
                gradients = tape.gradient(loss, model.trainable_variables)
                optimizer.apply_gradients(zip(gradients, model.trainable_variables))
                for variable in model.variables:
                    if variable.constraint is not None:
                        variable.assign(variable.constraint(variable))
                status = OrderedDict()
                mean_loss(loss)
                status["loss"] = mean_loss.result().numpy()
                for metric in metrics:
                    metric(y_batch, y_pred)
                    status[metric.name] = metric.result().numpy()
                steps.set_postfix(status)
            y_pred = model(X_valid)
            status["val_loss"] = np.mean(loss_fn(y_valid, y_pred))
            status["val_accuracy"] = np.mean(keras.metrics.sparse_categorical_accuracy(
                tf.constant(y_valid, dtype=np.float32), y_pred
            ))
            steps.set_postfix(status)
        for metric in [mean_loss] + metrics:
            metric.reset_states()

All epochs:   0%|          | 0/5 [00:00<?, ?it/s]
Epoch 1/5:   0%|          | 0/1718 [00:00<?, ?it/s][A
Epoch 1/5:   0%|          | 0/1718 [00:00<?, ?it/s, loss=2.52, sparse_categorical_accuracy=0.156][A
Epoch 1/5:   0%|          | 0/1718 [00:00<?, ?it/s, loss=2.63, sparse_categorical_accuracy=0.156][A
Epoch 1/5:   0%|          | 0/1718 [00:00<?, ?it/s, loss=2.48, sparse_categorical_accuracy=0.281][A
Epoch 1/5:   0%|          | 0/1718 [00:00<?, ?it/s, loss=2.34, sparse_categorical_accuracy=0.297][A
Epoch 1/5:   0%|          | 0/1718 [00:00<?, ?it/s, loss=2.19, sparse_categorical_accuracy=0.3]  [A
Epoch 1/5:   0%|          | 0/1718 [00:00<?, ?it/s, loss=2.02, sparse_categorical_accuracy=0.339][A
Epoch 1/5:   0%|          | 0/1718 [00:00<?, ?it/s, loss=1.95, sparse_categorical_accuracy=0.357][A
Epoch 1/5:   0%|          | 7/1718 [00:00<00:25, 67.30it/s, loss=1.95, sparse_categorical_accuracy=0.357][A
Epoch 1/5:   0%|          | 7/1718 [00:00<00:25, 67.30it/s, loss=1.9, sparse_ca