### Linear Units in Keras

With the first argument, units, we define how many outputs we want. In this case we are just predicting 'calories', so we'll use units=1.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

# Create a network with 1 linear unit
model = keras.Sequential([
    layers.Dense(units=1, input_shape=[3])
])

<img src='pic/linear_unit.png'>

In [None]:
w, b = model.weights
print("Weights\n{}\n\nBias\n{}".format(w, b))

Weights
<tf.Variable 'dense/kernel:0' shape=(11, 1) dtype=float32, numpy=
array([[ 0.19967222],
       [-0.68170553],
       [ 0.47476345],
       [ 0.392785  ],
       [ 0.14066797],
       [ 0.5223351 ],
       [ 0.25675356],
       [-0.595524  ],
       [ 0.6509276 ],
       [-0.39765382],
       [ 0.29334396]], dtype=float32)>

Bias
<tf.Variable 'dense/bias:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>

### SGD - The Optimizer - Stochastic Gradient Descent

The optimizer is an algorithm that adjusts the weights to minimize the loss.

- 1.Sample some training data and run it through the network to make predictions.
- 2.Measure the loss between the predictions and the true values.
- 3.Finally, adjust the weights in a direction that makes the loss smaller.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(512, activation='relu', input_shape=[11]),
    layers.Dense(512, activation='relu'),
    layers.Dense(512, activation='relu'),
    layers.Dense(1),
])

model.compile(
    optimizer='adam',
    loss='mae',
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=256,
    epochs=10,
)

### Overfitting and Underfitting - Early Stopping

<img src=".\pic\dl_earlystopping.png">

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(
    min_delta=0.001, # minimium amount of change to count as an improvement
    patience=20, # how many epochs to wait before stopping
    restore_best_weights=True, #get to keep the model where validation loss was lowest
)

These parameters say: "If there hasn't been at least an improvement of 0.001 in the validation loss over the previous 20 epochs, then stop the training and keep the best model you found." It can sometimes be hard to tell if the validation loss is rising due to overfitting or just due to random batch variation. The parameters allow us to set some allowances around when to stop.

### Dropout ###

To break up these conspiracies, we randomly drop out some fraction of a layer's input units every step of training, making it much harder for the network to learn those spurious patterns in the training data. 避免模式过量学习谬误的特征，从而防止overfitting.

![image.png](attachment:53bea08c-8570-4e82-82ea-c37c6538cc58.png)

In Keras, the dropout rate argument rate defines what percentage of the input units to shut off. Put the Dropout layer just before the layer you want the dropout applied to:

In [None]:
keras.Sequential([
    # ...
    layers.Dropout(rate=0.3), # apply 30% dropout to the next layer
    layers.Dense(16),
    # ...
])

### Batch Normalization ###

With neural networks, it's generally a good idea to put all of your data on a common scale, perhaps with something like scikit-learn's StandardScaler or MinMaxScaler. The reason is that SGD will shift the network weights in proportion to how large an activation the data produces. Features that tend to produce activations of very different sizes can make for unstable training behavior.  
意思就是说有些feature的值范围非常广，使得训练很不稳定，有时候连val_loss的图像都画不出来，因为loss太大了。通常做法是进行正态分布！等同于sscikit-learn的StandardScaler()函数。

In [None]:
layers.Dense(16),
layers.BatchNormalization(),
layers.Activation('relu'),

#### Example - Using Dropout and Batch Normalization ####

When adding dropout, you may need to increase the number of units in your Dense layers.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(1024, activation='relu', input_shape=[11]),
    layers.Dropout(0.3),
    layers.BatchNormalization(),
    layers.Dense(1024, activation='relu'),
    layers.Dropout(0.3),
    layers.BatchNormalization(),
    layers.Dense(1024, activation='relu'),
    layers.Dropout(0.3),
    layers.BatchNormalization(),
    layers.Dense(1),
])

In [None]:
model.compile(
    optimizer='adam',
    loss='mae',
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=256,
    epochs=100,
    verbose=0,
)


# Show the learning curves
history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot();

### Binary Classification ###

![image.png](attachment:44158013-dea7-4097-909e-dc6bf36f1465.png)

![image.png](attachment:47a26408-55fe-4680-a19e-d96cb8581771.png)

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(4, activation='relu', input_shape=[33]),
    layers.Dense(4, activation='relu'),    
    layers.Dense(1, activation='sigmoid'), ### refer to chart above
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy', ### refer to chart above
    metrics=['binary_accuracy'],
)

early_stopping = keras.callbacks.EarlyStopping(
    patience=10,
    min_delta=0.001,
    restore_best_weights=True,
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=512,
    epochs=1000,
    callbacks=[early_stopping],
    verbose=0, # hide the output because we have so many epochs
)

history_df = pd.DataFrame(history.history)
# Start the plot at epoch 5
history_df.loc[5:, ['loss', 'val_loss']].plot()
history_df.loc[5:, ['binary_accuracy', 'val_binary_accuracy']].plot()

print(("Best Validation Loss: {:0.4f}" +\
      "\nBest Validation Accuracy: {:0.4f}")\
      .format(history_df['val_loss'].min(), 
              history_df['val_binary_accuracy'].max()))

![image.png](attachment:2319e94d-ee2b-4c98-9efc-62d1d4e05ed1.png)

Another good demo for multi layers

![image.png](attachment:72911d34-7525-4abe-8ea5-233d22083065.png)

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

# YOUR CODE HERE: define the model given in the diagram
model = keras.Sequential([
    layers.BatchNormalization(),
    layers.Dense(256,activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.3),
    layers.Dense(256, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.3),
    layers.Dense(1,activation='sigmoid'),
])