
# 🌄 GRADIENT DESCENT — explained like you’re 5 years old!

---

## 🧸 1. Simple Definition

> **Gradient Descent** is how the AI **learns** by fixing its mistakes… step-by-step.

It’s like walking down a hill 🏞️ to find the **lowest point** — which is the **least amount of mistake (loss)**.

---

## 🧗‍♂️ 2. Real-Life Analogy: Climbing Down a Mountain ⛰️

### Imagine this:

You are blindfolded 😎 on a mountain.
You want to reach the **bottom (least loss)**.
But you don’t know the way. So…

1. You **feel** which way the slope is going down 🖐️
2. You take a **tiny step** that way 👣
3. You repeat again… and again… and again…

Eventually, you **reach the bottom!** 🎯

That’s **gradient descent!**

---

## 📉 3. What Is Actually Going Down?

You are going **down the loss hill** 🏔️

* **Top of the hill** = high loss (bad prediction 😢)
* **Bottom of the hill** = low loss (good prediction 😄)

So the neural network:

> "Keeps adjusting its weights slowly to go toward less loss."

---

## 🧠 4. What Does “Gradient” Mean?

> Gradient = the **slope** or steepness of the hill 📐
> It tells the network:

* **Which direction to go**
* **How fast to go**

🧮 It’s calculated using **derivatives** in math (don’t worry, we’ll keep it simple here).

---

## 🔢 5. The Core Formula (Simplified):

```txt
New Weight = Old Weight - Learning Rate × Gradient
```

* **Weight** = What the AI adjusts
* **Gradient** = How wrong the weight is (direction + amount)
* **Learning Rate** = How big the steps are (we’ll explain next)

---

## ⚙️ 6. Step-by-Step of Gradient Descent

| Step # | What Happens                                   |
| ------ | ---------------------------------------------- |
| 1      | Make a guess (prediction)                      |
| 2      | Calculate how wrong it was (loss)              |
| 3      | Find the gradient (slope of the loss curve)    |
| 4      | Update weights to reduce the loss              |
| 5      | Repeat until you reach the “bottom” (min loss) |

---

## 🧁 7. Tiny Analogy: Cupcake Tasting Game 🍰

You’re baking cupcakes but the taste is *not perfect*.
You try:

* **More sugar?** Too sweet!
* **Less sugar?** Still not right!
* You keep adjusting **bit by bit**
  Until: “YUM! Perfect!” 😋

That’s gradient descent — **tweak the recipe until the error is small.**

---

## 🧪 8. Types of Gradient Descent

| Type                 | Description                                   |
| -------------------- | --------------------------------------------- |
| **Batch**            | Uses all data at once (slow, but accurate)    |
| **Stochastic (SGD)** | Uses 1 data point at a time (fast, but noisy) |
| **Mini-batch**       | Uses small chunks of data (best of both!)     |

---

## 🧠 Recap Like a Kid:

🗣️ "Gradient descent helps my AI fix its guesses by feeling its mistake and stepping a little better each time."

* Loss is high? Step down ⬇️
* Loss is low? Good job! 🎉

---

## 🎯 Visual Summary

```
Loss
 ▲
 │       🧍 <- AI at start (wrong)
 │     /  
 │   /  
 │ /          <- slope (gradient)
 │──────────────▶ Weights
```

---


![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [3]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

In [8]:
import os
os.listdir("../../")

['.git',
 '.gitignore',
 '.vscode',
 'Datasets',
 'logs',
 'Models',
 'Projects',
 'README.md',
 'Tensorflow']

In [9]:
df = pd.read_csv('../../Datasets/insurance_data.csv')
df.head()

Unnamed: 0,age,affordibility,bought_insurance
0,22,1,0
1,25,0,0
2,47,1,1
3,52,0,0
4,46,1,1


In [43]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df[['age','affordibility']],df['bought_insurance'],test_size=0.2,random_state=43)

In [44]:
X_train

Unnamed: 0,age,affordibility
18,19,0
8,62,1
27,46,1
5,56,1
24,50,1
13,29,0
23,45,1
20,21,1
1,25,0
9,61,1


In [45]:
len(X_train)

22

In [46]:
len(df[['age','affordibility']])

28

In [47]:
X_train_scale = X_train.copy()
X_train_scale['age'] = X_train_scale['age'] / 100

X_test_scale = X_test.copy()
X_test_scale['age'] = X_test_scale['age'] / 100

In [48]:
X_train_scale

Unnamed: 0,age,affordibility
18,0.19,0
8,0.62,1
27,0.46,1
5,0.56,1
24,0.5,1
13,0.29,0
23,0.45,1
20,0.21,1
1,0.25,0
9,0.61,1


In [12]:
model = keras.Sequential([
    keras.layers.Dense(1,input_shape=(2,),activation='sigmoid',kernel_initializer='ones',bias_initializer='zeros')
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy', # Same as log loss
    metrics=['accuracy'])

model.fit(X_train_scale,y_train,epochs=5000)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step - accuracy: 0.5000 - loss: 0.7230
Epoch 2/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 96ms/step - accuracy: 0.5000 - loss: 0.7226
Epoch 3/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 89ms/step - accuracy: 0.5000 - loss: 0.7221
Epoch 4/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 101ms/step - accuracy: 0.5000 - loss: 0.7217
Epoch 5/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 91ms/step - accuracy: 0.5000 - loss: 0.7213
Epoch 6/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 97ms/step - accuracy: 0.5000 - loss: 0.7209
Epoch 7/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 127ms/step - accuracy: 0.5000 - loss: 0.7205
Epoch 8/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 125ms/step - accuracy: 0.5000 - loss: 0.7200
Epoch 9/5000
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x27fd6ff6c90>

In [13]:
model.evaluate(X_test_scale,y_test)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 299ms/step - accuracy: 1.0000 - loss: 0.4368


[0.4368150234222412, 1.0]

In [14]:
model.save('../Models/insurance_model.keras')

In [49]:
model = keras.models.load_model('../Models/insurance_model.keras')

In [50]:
model.predict(X_test_scale)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 243ms/step


array([[0.17834583],
       [0.60256636],
       [0.3249999 ],
       [0.48308071],
       [0.5480989 ],
       [0.7682142 ]], dtype=float32)

In [51]:
y_test

12    0
22    1
10    0
6     0
7     1
15    1
Name: bought_insurance, dtype: int64

In [52]:
ans = 1 if model.predict(np.array([[0.2,1]])) > 0.5 else 0 # Age and affordibility
ans

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 149ms/step


0

In [53]:
coef, intercept = model.get_weights()

In [54]:
coef, intercept # First one is coefficient and second one is for intercept b

(array([[5.2139   ],
        [1.2659582]], dtype=float32),
 array([-2.935348], dtype=float32))

In [55]:
import math
def sigmoid(x):
    return 1 / (1 + math.exp(-x))
sigmoid(50)

1.0

## This is our prediction function
![image.png](attachment:image.png)

In [56]:
def prediction_function(age,affordibility):
    weigted_sum = coef[0] * age + coef[1] * affordibility + intercept
    return sigmoid(weigted_sum)

In [57]:
prediction_function(.2,1)

  return 1 / (1 + math.exp(-x))


0.34827965172678205

In [58]:
def sigmoid_numpy(X):
    return 1/(1 + np.exp(-X))

In [59]:
def log_loss(y_true,y_pred):
    eplision = 1e-15
    y_pred_new = [max(i,eplision) for i in y_pred]
    y_pred_new = [min(i,eplision) for i in y_pred_new]
    y_pred_new = np.array(y_pred_new)
    return -np.mean(y_true * np.log(y_pred_new) + (1 - y_true) + np.log(1 - y_pred_new))

In [60]:
def gradient_descent(age,affordability,y_true,epochs,loss_threshold):
    w1 = w2 = 1
    bias = 0
    rate = 0.5
    n = len(age)

    for i in range(epochs):
        weighted_sum = w1 * age + w2 * affordability + bias
        y_predicted = sigmoid_numpy(weighted_sum)

        loss = log_loss(y_true,y_predicted)

        w1d = (1/n) * np.dot(np.transpose(age),(y_predicted - y_true))
        w2d = (1/n) * np.dot(np.transpose(affordability),(y_predicted - y_true))

        bias_d = np.mean(y_predicted - y_true)

        w1 -= rate * w1d
        w2 -= rate * w2d
        bias -= rate * bias_d

        print(f'Epoch {i}, w1 {w1}, w2 {w2}, bias {bias}, loss {loss}')

        if loss < loss_threshold:
            print(f'Loss threshold reached at epoch {i}')
            break

    return w1, w2, bias

In [61]:
gradient_descent(X_train_scale['age'],X_train_scale['affordibility'],y_train,6000, 0.4375)

Epoch 0, w1 0.9755342677219864, w2 0.9343049617339481, bias -0.1210471021515617, loss 16.76938819745534
Epoch 1, w1 0.9572952952908756, w2 0.880453130983193, bias -0.2259503534026453, loss 16.76938819745534
Epoch 2, w1 0.9447997509740013, w2 0.8376482359739109, bias -0.3160704190939209, loss 16.76938819745534
Epoch 3, w1 0.9374048253268601, w2 0.8047199386951234, bias -0.3931254780683189, loss 16.76938819745534
Epoch 4, w1 0.9344032729224915, w2 0.7803257118964337, bias -0.458956781420467, loss 16.76938819745534
Epoch 5, w1 0.9350973957293062, w2 0.7631120101695599, bias -0.5153519355765014, loss 16.76938819745534
Epoch 6, w1 0.9388454022979359, w2 0.7518179662843999, bias -0.5639380298961366, loss 16.76938819745534
Epoch 7, w1 0.9450838869888639, w2 0.745328171974969, bias -0.6061330242022206, loss 16.76938819745534
Epoch 8, w1 0.953333757008475, w2 0.7426895998588574, bias -0.6431364587084469, loss 16.76938819745534
Epoch 9, w1 0.9631964550810905, w2 0.7431071253787629, bias -0.67594

(13.156835848294012, 1.1211417635344023, -6.046244668429951)

In [62]:
coef, intercept

(array([[5.2139   ],
        [1.2659582]], dtype=float32),
 array([-2.935348], dtype=float32))