# Supervised Learning
---

In [51]:
from sklearn.preprocessing import LabelEncoder
import pandas as pd

data = pd.read_csv("data.csv")
data_lable = data.copy()

le = LabelEncoder()
data_lable['Passed_LabelEncoded'] = le.fit_transform(data_lable['Passed'])
data_lable["Gender_LabelEncoded"] = le.fit_transform(data_lable["Gender"])

print("Lable Encoded data")
print(data_lable)

Lable Encoded data
     Name  Age   City Passed Gender  Passed_LabelEncoded  Gender_LabelEncoded
0   Ayush   18  Surat    Yes   Male                    1                    0
1     Joy   13  Surat     No   Male                    0                    0
2    Prem   17  Surat    Yes   Male                    1                    0
3   Abhay   17  Surat    Yes   Male                    1                    0
4  Piyush   21  Surat    Yes   Male                    1                    0
5    Zaid   28  Surat    Yes   Male                    1                    0


---
# üìä StandardScaler

**StandardScaler** is used to standardize features so that they are centered around **0** with a standard deviation of **1**.

This is especially useful for machine learning models that are sensitive to the scale of data (e.g., distance-based or gradient-based models).

---

## üìê Formula

\[
z = \frac{x - \mu}{\sigma}
\]

### Where:

```bash
x  = Actual value  
Œº  = Mean of the column  
œÉ  = Standard deviation of the column  

z  = Standardized value (scaled output)
```

---

## üß† What This Means

- Subtracting the **mean (Œº)** centers the data around 0.
- Dividing by the **standard deviation (œÉ)** scales the data so that its spread becomes 1.
- After scaling:
  - Mean ‚âà 0  
  - Standard Deviation ‚âà 1  

---

## üìù Example

```bash
Note:
I'm currently working on understanding Standard Deviation (œÉ).
I will add a full numerical example once I'm completely confident with it.
```

---


In [46]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

data = {
    "StudyHours": [1,2,3,4,5],
    "TestScore": [40,50,60,70,80]
}

df = pd.DataFrame(data)

standard_scaler = StandardScaler()
standard_scaled = standard_scaler.fit_transform(df)

print("Standard Scaler Output")
print(pd.DataFrame(standard_scaled, columns = ["StudyHours", "TestScore"]))

Standard Scaler Output
   StudyHours  TestScore
0   -1.414214  -1.414214
1   -0.707107  -0.707107
2    0.000000   0.000000
3    0.707107   0.707107
4    1.414214   1.414214


---
# üìä MinMaxScaler

**MinMaxScaler** scales data so that all values fall within a fixed range ‚Äî usually between **0 and 1**.

It preserves the original distribution shape but rescales the magnitude of the values.

---

## üìê Formula

\[
\text{Scaled Value} = \frac{x - \text{min}}{\text{max} - \text{min}}
\]

### Where:

```bash
x     = Actual value  
min   = Smallest value in the column  
max   = Largest value in the column  
```

---

## üß† What This Means

- Subtracting the **minimum value** shifts the data so it starts at 0.
- Dividing by **(max ‚àí min)** rescales the data to fit between 0 and 1.
- After scaling:
  - Minimum value ‚Üí 0  
  - Maximum value ‚Üí 1  
  - All other values ‚Üí Between 0 and 1  

---

## üìù Example

```bash
Original Data:
[1, 2, 3, 4, 5]

min = 1
max = 5

Formula:
(x - min) / (max - min)

If x = 1 ‚Üí (1 - 1) / (5 - 1) = 0.00
If x = 2 ‚Üí (2 - 1) / (5 - 1) = 0.25
If x = 3 ‚Üí (3 - 1) / (5 - 1) = 0.50
If x = 4 ‚Üí (4 - 1) / (5 - 1) = 0.75
If x = 5 ‚Üí (5 - 1) / (5 - 1) = 1.00
```

---


In [50]:
minmax_scaler = MinMaxScaler()
minmax_scaled = minmax_scaler.fit_transform(df)

print("MinMax Scaler Output")
print(pd.DataFrame(minmax_scaled, columns= ["StudyHours", "TestScore"]))

MinMax Scaler Output
   StudyHours  TestScore
0        0.00       0.00
1        0.25       0.25
2        0.50       0.50
3        0.75       0.75
4        1.00       1.00


In [44]:
import numpy as np

matrix = np.array([[1,2,3], [4,5,6], [7,8,9]])

In [45]:
data = pd.DataFrame(matrix)
print(data)

   0  1  2
0  1  2  3
1  4  5  6
2  7  8  9
