# Lesson 2.5: Linear Algebra Basics for ML

## Why Linear Algebra?

Don't worry - we're keeping this **practical, not theoretical**.

Here's why it matters: Every ML algorithm uses linear algebra under the hood.
- **Linear Regression**: finding the best weights (dot products)
- **Neural Networks**: matrix multiplications at every layer
- **Recommendations**: matrix factorization

You don't need to be a math expert, but understanding these basics will help you **debug and understand** what's happening.

### PHP Parallel
Think of it this way: in Laravel, you don't write raw SQL for everything - Eloquent handles it. Similarly, scikit-learn handles the linear algebra for you. But understanding the basics (like understanding SQL) makes you much more effective.

In [None]:
import numpy as np

## Vectors

A vector is just a 1D array of numbers. In ML, vectors represent:
- A single data point (e.g., one water filter's readings)
- Model weights (how important each feature is)

In [None]:
# A water filter reading: [TDS_output, flow_rate, pressure, filter_age]
filter_reading = np.array([45, 2.0, 55, 90])
print("Filter reading (vector):", filter_reading)
print("This vector has", len(filter_reading), "dimensions")

# Model weights (learned from data - how important each feature is)
weights = np.array([0.5, -0.3, -0.1, 0.4])
print("Model weights:", weights)
# Positive weight = feature increases risk
# Negative weight = feature decreases risk

## Dot Product - The Most Important Operation

The **dot product** multiplies corresponding elements and sums them up.

```
[a, b, c] · [x, y, z] = a*x + b*y + c*z
```

This is how ML models make predictions: `prediction = features · weights`

In [None]:
# Simple example
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Dot product: 1*4 + 2*5 + 3*6 = 4 + 10 + 18 = 32
dot = np.dot(a, b)
print(f"Dot product: {dot}")  # 32

# Equivalent manual calculation:
manual = (a * b).sum()
print(f"Manual: {manual}")    # 32 (same!)

In [None]:
# PRACTICAL: Predicting maintenance risk score
# Higher score = more likely to need maintenance

# Features:       [TDS_output, flow_rate, pressure, filter_age_days]
filter_a = np.array([45,  2.0, 55, 90])    # New-ish filter
filter_b = np.array([120, 0.8, 40, 320])   # Old filter

# Weights learned by model (simplified)
weights = np.array([0.5, -2.0, -0.1, 0.3])
# TDS: positive (higher TDS = more risk)
# Flow: negative (lower flow = more risk, so higher flow reduces score)
# Pressure: slight negative (lower pressure = slightly more risk)
# Age: positive (older = more risk)

score_a = np.dot(filter_a, weights)
score_b = np.dot(filter_b, weights)

print(f"Filter A risk score: {score_a:.1f} (newer filter)")
print(f"Filter B risk score: {score_b:.1f} (older filter)")
print(f"\nFilter B has {'higher' if score_b > score_a else 'lower'} risk - makes sense!")

## Matrices

A matrix is a 2D array. In ML:
- Your **dataset** is a matrix (rows = samples, columns = features)
- **Weight matrices** connect layers in neural networks

In [None]:
# Our dataset: 4 filters x 3 features
# [TDS_output, flow_rate, filter_age_days]
X = np.array([
    [45,  2.0, 90],
    [70,  1.5, 180],
    [120, 0.8, 320],
    [35,  2.2, 30],
])
print("Dataset X:")
print(X)
print(f"Shape: {X.shape} → {X.shape[0]} samples, {X.shape[1]} features")

## Matrix Multiplication (@ operator)

Matrix multiplication lets you compute predictions for ALL samples at once!

Instead of looping through each filter, one matrix multiply does it all.

In [None]:
# Weights for our 3 features
weights = np.array([0.5, -2.0, 0.3])

# Predict risk for ALL filters at once!
# Matrix (4x3) @ Vector (3,) = Vector (4,)
all_scores = X @ weights  # Same as np.dot(X, weights)

print("Risk scores for all filters:")
for i, score in enumerate(all_scores):
    status = "NEEDS MAINTENANCE" if score > 50 else "OK"
    print(f"  Filter {i}: score = {score:.1f} → {status}")

## Transpose

Flip rows and columns. Shape `(m, n)` becomes `(n, m)`.

In [None]:
print("Original X (4 filters x 3 features):")
print(X)
print(f"Shape: {X.shape}")

print("\nTransposed X.T (3 features x 4 filters):")
print(X.T)
print(f"Shape: {X.T.shape}")

# Useful for: correlation, covariance calculations

## Identity Matrix

Like multiplying by 1 - it doesn't change anything. Used in regularization.

In [None]:
# Identity matrix: 1s on diagonal, 0s everywhere else
I = np.eye(3)
print("3x3 Identity matrix:")
print(I)

# Anything multiplied by identity stays the same
v = np.array([10, 20, 30])
print("\nv @ I =", v @ I)  # [10, 20, 30] - unchanged!

## Putting It All Together: Manual Linear Regression Prediction

This is EXACTLY what `sklearn.linear_model.LinearRegression` does internally:

```
prediction = X @ weights + bias
```

In [None]:
# Pretend we've trained a model and got these weights
# These predict "days until maintenance needed"
weights = np.array([-0.8, 50.0, -0.5])  # For [TDS, flow_rate, age]
bias = 200  # Starting point (intercept)

# Our filter data
X = np.array([
    [45,  2.0, 90],   # New filter, good readings
    [70,  1.5, 180],  # Middle-aged filter
    [120, 0.8, 320],  # Old filter, bad readings
    [35,  2.2, 30],   # Brand new filter
])

# Prediction: days until maintenance
days_until_maintenance = X @ weights + bias

print("Predicted days until maintenance:")
labels = ["New-ish", "Middle", "Old", "Brand new"]
for label, days in zip(labels, days_until_maintenance):
    print(f"  {label:10s}: {days:.0f} days")

print("\nThis is literally what Linear Regression does!")

## Exercise: Build a Simple Predictor

1. Create a dataset of 5 water filters with features: [TDS, flow_rate, pressure]
2. Create a weight vector and bias (make up reasonable values)
3. Use dot product to predict a "health score" for each filter
4. Classify each filter as "Good", "Warning", or "Critical" based on the score

In [None]:
# YOUR CODE HERE

# 1. Dataset: 5 filters x 3 features [TDS, flow_rate, pressure]
# filters = np.array([...])

# 2. Weights and bias
# weights = np.array([..., ..., ...])
# bias = ...

# 3. Predict health scores
# scores = filters @ weights + bias

# 4. Classify
# for i, score in enumerate(scores):
#     if score > 80:
#         status = "Good"
#     elif score > 50:
#         status = "Warning"
#     else:
#         status = "Critical"
#     print(f"Filter {i}: score={score:.1f} → {status}")