# Linear Regression Using Deviation Method (Roman Urdu Explanation)

## Purpose
This notebook demonstrates **linear regression** using the deviation method (statistical formula) to fit a line to the Weight-Height dataset. The code and explanations are provided in Roman Urdu for accessibility.

## Dataset
- **Source:** Kaggle, [Weight-Height Dataset](https://www.kaggle.com/datasets/mustafaali96/weight-height)
- **Features:**
  - `Height` (inches)
  - `Weight` (lbs)
  - `Gender`

## Methodology
- Linear regression is implemented using the deviation method:
  - Formula for slope (`m`):
    $$ m = \frac{\sum{(x - \bar{x})(y - \bar{y})}}{\sum{(x - \bar{x})^2}} $$
  - Formula for intercept (`c`):
    $$ c = \bar{y} - m \cdot \bar{x} $$
- The notebook predicts weight from height, prints model parameters, and compares predictions with actual data.

## Usage
1. Download and load the dataset using `kagglehub`.
2. Apply linear regression using the deviation method.
3. Make predictions for given heights.
4. Compare predicted weights with actual values.

## Notes
- All code comments explain logic in Roman Urdu.
- Ideal for beginners learning regression in a familiar language/context.


In [5]:
# Zaroori libraries import karte hain
import numpy as np
import pandas as pd 
import kagglehub

# 1) Dataset download karte hain (Weight-Height dataset)
path = kagglehub.dataset_download("mustafaali96/weight-height")
df = pd.read_csv(path + "/weight-height.csv")

# Sirf Height (x) aur Weight (y) lete hain
x = df["Height"].values
y = df["Weight"].values

def fit_linear_deviation(x, y):
    """
    Linear regression deviation method (statistics formula).
    Formula:
        m = sum( (x - mean(x)) * (y - mean(y)) ) / sum( (x - mean(x))^2 )
        c = mean(y) - m * mean(x)
    Roman Urdu Explanation:
    - numerator = "sum of product of deviations"
    - denominator = "sum of square of deviations of x"
    - m slope deta hai (line ki chadhai)
    - c intercept deta hai (line ka shuru ka point)
    """
    x = np.array(x, dtype=float)
    y = np.array(y, dtype=float)
    
    x_mean = x.mean()   # x ka average
    y_mean = y.mean()   # y ka average
    
    # Slope m nikalte hain
    numerator = np.sum((x - x_mean) * (y - y_mean))   # Deviations ka product aur unka sum
    denominator = np.sum((x - x_mean) ** 2)           # x ki deviations ka square aur unka sum
    m = numerator / denominator
    
    # Intercept c nikalte hain
    c = y_mean - m * x_mean
    return m, c

def predict(x, m, c):
    """
    Prediction function: y = m*x + c
    Roman Urdu: Height do, predicted Weight nikal lo
    """
    return m * x + c

# 2) Model ko dataset par fit karte hain
m, c = fit_linear_deviation(x, y)
print(f"Slope (m): {m:.4f}")
print(f"Intercept (c): {c:.4f}")

# 3) Example prediction
sample_height = 70  # inches
predicted_weight = predict(sample_height, m, c)
print(f"Agar Height = {sample_height} inches ho to predicted Weight ≈ {predicted_weight:.2f} lbs")

# 4) Pehle 5 actual data ke saath comparison
y_pred = predict(x, m, c)
df["PredictedWeight"] = y_pred
print(df.head())


Slope (m): 7.7173
Intercept (c): -350.7372
Agar Height = 70 inches ho to predicted Weight ≈ 189.47 lbs
  Gender     Height      Weight  PredictedWeight
0   Male  73.847017  241.893563       219.161480
1   Male  68.781904  162.310473       180.072546
2   Male  74.110105  212.740856       221.191809
3   Male  71.730978  220.042470       202.831401
4   Male  69.881796  206.349801       188.560728
