# Batch vs Stochastics Gradient Descent


| Feature                 | Batch Gradient Descent | Stochastic Gradient Descent (SGD) |
|-------------------------|-----------------------|----------------------------------|
| **Update Frequency**    | After processing **all** data | After **each** data point |
| **Speed**               | Slower (needs full dataset) | Faster (updates frequently) |
| **Stability**           | More stable, smooth convergence | Noisy updates, fluctuates a lot |
| **Best for**            | Small datasets | Large datasets |
| **Computation Cost**    | High (needs full dataset for each update) | Low (updates with one sample) |
| **Convergence**         | Can be slow but steady | Fast but may overshoot |
| **Memory Usage**        | High (stores all data) | Low (processes one data point at a time) |


### Code Comparision

In [1]:
import pandas as pd
import numpy as np
import time

In [2]:
df = pd.read_csv('Social_Network_Ads.csv')

In [3]:
df.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


In [4]:
df = df[['Age', 'EstimatedSalary', 'Purchased']]

In [5]:
df.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


In [6]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [7]:
from sklearn.preprocessing import StandardScaler

In [8]:
scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [10]:
X_train.shape

(320, 2)

In [11]:
import tensorflow as tf
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense

In [19]:
model = Sequential()

In [20]:
model.add(Dense(10, activation='relu', input_dim=2))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In [21]:
model.summary()

### Batch Gradient Descent

In [26]:
model.compile(loss='binary_crossentropy', metrics=['accuracy'])
import time
start = time.time()

# use 320 batch_size (number of rows in data) then it means it is Batch Gradient Descent
# use 1 as a batch_size then it means it is a Stochastics Gradient Descent
history = model.fit(X_train, y_train, epochs=10, batch_size=320)
print("Total Time Taken in Training: ", time.time() - start)

Epoch 1/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step - accuracy: 0.6406 - loss: 59.5416
Epoch 2/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 125ms/step - accuracy: 0.3594 - loss: 253.6500
Epoch 3/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 103ms/step - accuracy: 0.6406 - loss: 3.9500
Epoch 4/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 88ms/step - accuracy: 0.3594 - loss: 177.9017
Epoch 5/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 198ms/step - accuracy: 0.6406 - loss: 11.2149
Epoch 6/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 128ms/step - accuracy: 0.3594 - loss: 132.7343
Epoch 7/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 179ms/step - accuracy: 0.6406 - loss: 25.4233
Epoch 8/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 102ms/step - accuracy: 0.3594 - loss: 97.6708
Epoch 9/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[

### Stochastics Gradient Descent

In [27]:
model = Sequential()

In [None]:
model.add(Dense(10, activation='relu', input_dim=2))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In [30]:
model.summary()

In [31]:
model.compile(loss='binary_crossentropy', metrics=['accuracy'])

In [32]:
import time

In [36]:
start = time.time()
# use batch size 1 for Stochastics Gradient Descent
history = model.fit(X_train, y_train, epochs=20, batch_size=1)
print(f"Total Time Taken in Training:  ", time.time() - start)

Epoch 1/20
[1m320/320[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6585 - loss: 0.6423
Epoch 2/20
[1m320/320[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.6836 - loss: 0.6249
Epoch 3/20
[1m320/320[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6286 - loss: 0.6629
Epoch 4/20
[1m320/320[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6546 - loss: 0.6450
Epoch 5/20
[1m320/320[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6491 - loss: 0.6487
Epoch 6/20
[1m320/320[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6304 - loss: 0.6611
Epoch 7/20
[1m320/320[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6440 - loss: 0.6519
Epoch 8/20
[1m320/320[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6480 - loss: 0.6491
Epoch 9/20
[1m320/320[0m [32m━━━━━━━━

### Mini Batch Gradient Descent

- Mini-Batch Gradient Descent is a compromise between Batch Gradient Descent and Stochastic Gradient Descent (SGD). Instead of updating weights after each data point (SGD) or after processing the entire dataset (Batch GD), Mini-Batch Gradient Descent updates weights after processing a small batch of data.

- The dataset is divided into small batches of size m (e.g., 32, 64, or 128 samples per batch).
- The model updates weights after processing each batch instead of the whole dataset or a single sample.
- This balances computational efficiency and model stability.


In [37]:
model = Sequential()

In [38]:
model.add(Dense(10, activation='relu', input_dim=2))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [39]:
model.summary()

In [40]:
model.compile(loss='binary_crossentropy', metrics=['accuracy'])

In [44]:
start = time.time()
# use batch size 1 for Stochastics Gradient Descent
history = model.fit(X_train, y_train, epochs=20, batch_size=32)
print(f"Total Time Taken in Training:  ", time.time() - start)

Epoch 1/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6771 - loss: 0.6316 
Epoch 2/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6390 - loss: 0.6542 
Epoch 3/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.6489 - loss: 0.6482
Epoch 4/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.6304 - loss: 0.6591
Epoch 5/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.6541 - loss: 0.6452
Epoch 6/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.6293 - loss: 0.6597
Epoch 7/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.6394 - loss: 0.6539
Epoch 8/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6434 - loss: 0.6515 
Epoch 9/20
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━