# Backpropagation and Gradient Descent with Keras

This notebook explores how backpropagation and different variants of
gradient descent work in practice using Keras.


## Sections
1. Simple Neural Network Setup
2. Backpropagation via Training
3. Batch Gradient Descent
4. Stochastic Gradient Descent
5. Mini-batch Gradient Descent


In [5]:
import numpy as np
import tensorflow as tf
import keras 
from keras import layers
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

In [6]:
X = np.array([[1], [2], [3], [4]])
y = np.array([2, 4, 6, 8])

In [8]:
model = Sequential([
    Dense(1, input_shape=(1,))
])

model.compile(
    optimizer=Adam(learning_rate=0.01),
    loss='mse'
)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


This model performs a forward pass and computes loss using Mean Squared Error.

In [10]:
history = model.fit(
    X,
    y,
    epochs=25,
    verbose=1
)

Epoch 1/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step - loss: 23.7309
Epoch 2/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step - loss: 23.3863
Epoch 3/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - loss: 23.0451
Epoch 4/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step - loss: 22.7073
Epoch 5/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - loss: 22.3729
Epoch 6/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step - loss: 22.0419
Epoch 7/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step - loss: 21.7144
Epoch 8/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step - loss: 21.3903
Epoch 9/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step - loss: 21.0697
Epoch 10/25
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step - loss: 20.7525
Epoch 11/

In [11]:
model.get_weights()

[array([[0.39028338]], dtype=float32), array([0.432269], dtype=float32)]

During training, Keras automatically performs backpropagation to compute gradients and update weights.

In [12]:
#Batch Gradient Descent
model_batch = Sequential([Dense(1, input_shape=(1,))])
model_batch.compile(
    optimizer=Adam(learning_rate=0.01),
    loss='mse'
)

model_batch.fit(X, y, epochs=20, batch_size=len(X), verbose=0)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


<keras.src.callbacks.history.History at 0x22bef9cc550>

In [13]:
#Stochastic Gradient Descent
model_sgd = Sequential([Dense(1, input_shape=(1,))])
model_sgd.compile(
    optimizer=Adam(learning_rate=0.01),
    loss='mse'
)

model_sgd.fit(X, y, epochs=20, batch_size=1, verbose=0)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


<keras.src.callbacks.history.History at 0x22bef8a8750>

Batch size controls how frequently weights are updated.