### Linear Units in Keras

With the first argument, units, we define how many outputs we want. In this case we are just predicting 'calories', so we'll use units=1.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

# Create a network with 1 linear unit
model = keras.Sequential([
    layers.Dense(units=1, input_shape=[3])
])

<img src='pic/linear_unit.png'>

In [None]:
w, b = model.weights
print("Weights\n{}\n\nBias\n{}".format(w, b))

Weights
<tf.Variable 'dense/kernel:0' shape=(11, 1) dtype=float32, numpy=
array([[ 0.19967222],
       [-0.68170553],
       [ 0.47476345],
       [ 0.392785  ],
       [ 0.14066797],
       [ 0.5223351 ],
       [ 0.25675356],
       [-0.595524  ],
       [ 0.6509276 ],
       [-0.39765382],
       [ 0.29334396]], dtype=float32)>

Bias
<tf.Variable 'dense/bias:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>

### SGD - The Optimizer - Stochastic Gradient Descent

The optimizer is an algorithm that adjusts the weights to minimize the loss.

- 1.Sample some training data and run it through the network to make predictions.
- 2.Measure the loss between the predictions and the true values.
- 3.Finally, adjust the weights in a direction that makes the loss smaller.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(512, activation='relu', input_shape=[11]),
    layers.Dense(512, activation='relu'),
    layers.Dense(512, activation='relu'),
    layers.Dense(1),
])

model.compile(
    optimizer='adam',
    loss='mae',
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=256,
    epochs=10,
)

### Overfitting and Underfitting - Early Stopping

<img src=".\pic\dl_earlystopping.png">

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(
    min_delta=0.001, # minimium amount of change to count as an improvement
    patience=20, # how many epochs to wait before stopping
    restore_best_weights=True, #get to keep the model where validation loss was lowest
)

These parameters say: "If there hasn't been at least an improvement of 0.001 in the validation loss over the previous 20 epochs, then stop the training and keep the best model you found." It can sometimes be hard to tell if the validation loss is rising due to overfitting or just due to random batch variation. The parameters allow us to set some allowances around when to stop.