<a href="https://colab.research.google.com/github/pareshrnayak/ml-dl-daily/blob/main/%5B1%5D_mean_median_variance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Mean, Median and Variance (From Scratch)**

In [3]:
data = [2, 4, 6, 8]
print(data)

[2, 4, 6, 8]


In [4]:
#-------MEAN--------
def mean(data):
  total = 0
  for x in data:
    total += x
  return total / len(data)

In [6]:
#------MEDIAN-------
def median(data):
    sorted_data = sorted(data)
    n = len(sorted_data)

    mid = n // 2

    if n % 2 == 0:
        return (sorted_data[mid - 1] + sorted_data[mid]) / 2
    else:
        return sorted_data[mid]

In [7]:
#------VARIANCE------
def variance(data):
    m = mean(data)
    total = 0

    for x in data:
        total += (x - m) ** 2

    return total / len(data)

In [8]:
print("Data:", data)
print("Mean:", mean(data))
print("Median:", median(data))
print("Variance:", variance(data))

Data: [2, 4, 6, 8]
Mean: 5.0
Median: 5.0
Variance: 5.0


In [9]:
import numpy as np

print("NumPy Mean:", np.mean(data))
print("NumPy Median:", np.median(data))
print("NumPy Variance:", np.var(data))

NumPy Mean: 5.0
NumPy Median: 5.0
NumPy Variance: 5.0


# Day 01 – Mean, Median & Variance

In Machine Learning, before training any model, it is important to **understand the data**.  
We need to know:
- Where the data is centered
- How spread out it is
- If there are extreme values (outliers)

Three basic statistics help with this: **Mean, Median, and Variance**.

---

## 1️⃣ Mean (Average)

**What it is:**  
Mean is the **average value** of all data points. It tells us the center of the data.

**Formula:**  
$$
\text{Mean} = \frac{\sum x}{n}
$$

Where:  
- \(x\) = each data value  
- \(n\) = total number of values

**Why it matters in ML:**  
- Used in **normalization** to scale features  
- Required to calculate **variance** and **standard deviation**  
- Helps models converge faster during training

---

## 2️⃣ Median (Middle Value)

**What it is:**  
Median is the **middle value** when data is sorted.  
- Odd number of values → pick the middle  
- Even number → average of the two middle values

**Why it matters in ML:**  
- Less affected by **outliers** than mean  
- Helps understand the **typical value** in skewed data  
- Useful for robust data analysis

---

## 3️⃣ Variance (Data Spread)

**What it is:**  
Variance measures **how far data points are from the mean**.  
- Low variance → data points are close together  
- High variance → data points are spread out

**Formula:**  
$$
\text{Variance} = \frac{1}{n} \sum (x - \mu)^2
$$

Where:  
- \(x\) = data value  
- \(\mu\) = mean  
- \(n\) = number of values

**Why it matters in ML:**  
- Helps in **feature scaling**  
- Part of **standard deviation**, used in normalization  
- High variance may cause **overfitting**, low variance may cause **underfitting**  
- Core concept in the **bias–variance tradeoff**

---

## ✅ Summary

- **Mean** → average, shows center  
- **Median** → middle value, robust to outliers  
- **Variance** → spread of data  

Implementing these from scratch helps us **understand the logic**, not just use libraries blindly.
