# üß† Machine Learning Techniques ‚Äì Practical Examples & Metrics

**Corporate White + Blue Style ‚Ä¢ Created by Mohd Salman | Corporate Training Series**

---
In this notebook, you‚Äôll practice **six major ML techniques** with **real-world use cases** and **proper evaluation metrics**:
1. **Supervised (Regression)** ‚Äì House Price Prediction ‚Üí MAE, MSE, RMSE, R¬≤  
2. **Supervised (Classification)** ‚Äì Spam Detection ‚Üí Accuracy, Precision, Recall, F1  
3. **Unsupervised** ‚Äì Customer Segmentation ‚Üí Silhouette Score  
4. **Semi-Supervised** ‚Äì Tumor Detection (limited labels) ‚Üí Accuracy  
5. **Reinforcement** ‚Äì Traffic Light Control ‚Üí Cumulative Reward  
6. **Self-Supervised** ‚Äì Masked Word Prediction ‚Üí Perplexity (concept)  

> ‚úÖ All sections are **clean** (no pre-run outputs). Run cell-by-cell and read each section‚Äôs summary first.

---

In [None]:
# Created by Mohd Salman | Corporate Training Series
# This notebook includes section-level Markdown summaries and clean, runnable code.

## üè° 1) Supervised Learning (Regression): House Price Prediction
**Goal:** Predict continuous target (price).  
**Why these metrics?**
- **MAE**: average absolute error (easy to interpret in original units)
- **MSE**: squares errors (penalizes large mistakes)
- **RMSE**: ‚àöMSE (same unit as target; highlights large errors)
- **R¬≤**: variance explained by the model (0‚Äì1, higher is better)

**Task:** Fit `LinearRegression` using size & bedrooms. Evaluate MAE, MSE, RMSE, R¬≤.

---

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Features: [Size (sqft), Bedrooms]
X = np.array([[1200, 2], [1500, 3], [1700, 3], [2000, 4], [2400, 4]])
y = np.array([200, 250, 280, 320, 360])  # Price in $1000

reg = LinearRegression().fit(X, y)
y_pred = reg.predict(X)

mae = mean_absolute_error(y, y_pred)
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y, y_pred)

print("=== Regression Metrics ===")
print(f"MAE : {mae:.2f} (thousand dollars)")
print(f"MSE : {mse:.2f}")
print(f"RMSE: {rmse:.2f} (thousand dollars)")
print(f"R¬≤  : {r2:.3f}")

# Try: add/outlier house or change features; observe how RMSE reacts more than MAE.

## üìß 2) Supervised Learning (Classification): Spam Detection
**Goal:** Predict categorical label (Spam=1, Not Spam=0).  
**Why these metrics?**
- **Accuracy**: overall correctness (good for balanced classes)
- **Precision**: quality of positive predictions (minimize false alarms)
- **Recall**: ability to capture actual positives (minimize misses)
- **F1**: harmonic mean of Precision & Recall (imbalanced data)

**Task:** Evaluate metrics for given `y_true` and `y_pred`.

---

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

y_true = [1, 0, 1, 1, 0, 1, 0, 0]  # 1=Spam, 0=Not Spam
y_pred = [1, 0, 1, 0, 0, 1, 1, 0]

acc = accuracy_score(y_true, y_pred)
prec = precision_score(y_true, y_pred)
rec = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

print("=== Classification Metrics ===")
print(f"Accuracy : {acc:.2f}")
print(f"Precision: {prec:.2f}")
print(f"Recall   : {rec:.2f}")
print(f"F1-Score : {f1:.2f}")

# Try: tweak y_pred to trade off precision vs recall; watch F1 balance both.

## üõçÔ∏è 3) Unsupervised Learning: Customer Segmentation (Clustering)
**Goal:** Discover groups without labels.  
**Why this metric?**
- **Silhouette Score** (‚àí1 to 1): measures cohesion vs separation (higher is better).

**Task:** Cluster by Income & SpendingScore using KMeans; compute Silhouette.

---

In [None]:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

data = pd.DataFrame({
    'Income': [15, 16, 17, 30, 45, 46, 60, 70, 85, 90],
    'SpendingScore': [39, 81, 6, 77, 40, 76, 6, 94, 3, 93]
})

kmeans = KMeans(n_clusters=3, n_init=10, random_state=0)
labels = kmeans.fit_predict(data)
sil = silhouette_score(data, labels)
print("Silhouette Score:", round(sil, 3))

# Try: change n_clusters to 2 or 4; observe how Silhouette changes.

## üß¨ 4) Semi-Supervised Learning: Tumor Detection (Limited Labels)
**Goal:** Leverage few labeled samples + many unlabeled to improve accuracy.  
**Why this metric?**
- **Accuracy/AUC** on a small validation set reflects performance under limited labels.

**Task:** Use `LabelPropagation` to infer labels for unlabeled points; compute accuracy.

---

In [None]:
from sklearn.semi_supervised import LabelPropagation
from sklearn.metrics import accuracy_score
import numpy as np

# Toy dataset: 0 = benign, 1 = malignant; -1 = unlabeled
X = np.array([[1,1],[2,1],[3,1],[6,6],[7,7],[8,6]])
y = np.array([0, 0, 0, -1, -1, 1])  # few labels known

lp = LabelPropagation()
lp.fit(X, y)
y_pred = lp.transduction_
print("Predicted labels:", y_pred)

# Suppose the true labels are known for evaluation
y_true = np.array([0, 0, 0, 1, 1, 1])
print("Accuracy:", round(accuracy_score(y_true, y_pred), 3))

# Try: flip a known label to simulate noise; observe accuracy impact.

## üö¶ 5) Reinforcement Learning: Traffic Light Control (Q-Learning Sketch)
**Goal:** Learn a policy (Red/Green) that maximizes flow (reward).  
**Why this metric?**
- **Cumulative Reward**: measures how effective the learned policy is over time.

**Task:** Simple Q-value updates with stochastic rewards; inspect learned preference.

---

In [None]:
import random

actions = ['Red', 'Green']
Q = {a: 0.0 for a in actions}
alpha, gamma, episodes = 0.1, 0.9, 200
cumulative_reward = 0

for _ in range(episodes):
    # epsilon-greedy action (simple)
    if random.random() < 0.2:
        action = random.choice(actions)
    else:
        action = max(Q, key=Q.get)

    # toy reward: Green tends to be better (+1) than Red (-1)
    reward = 1 if action == 'Green' else -1
    cumulative_reward += reward
    Q[action] = Q[action] + alpha * (reward + gamma * max(Q.values()) - Q[action])

print("Q-values:", Q)
print("Cumulative Reward:", cumulative_reward)

# Try: invert rewards to simulate different traffic conditions.

## üó£Ô∏è 6) Self-Supervised Learning: Masked Word Prediction (Concept)
**Goal:** Learn representations by predicting masked tokens without manual labels.  
**Why this metric?**
- **Perplexity** (lower is better): how well the model predicts next/masked words.

**Task:** Demonstration of masking; metric is conceptual here.

---

In [None]:
import random

sentence = "Machine learning is the future of intelligence."
words = sentence.split()
mask_idx = random.randrange(len(words))
masked = words.copy()
masked[mask_idx] = "[MASK]"

print("Input :", " ".join(masked))
print("Label :", words[mask_idx])
print("Metric: Use Perplexity in real models (lower = better).")

# Try: mask multiple tokens or vary sentences.

---
### üìå Summary Table

| ML Type | Use Case | Key Metrics |
|---|---|---|
| Supervised (Regression) | House Price Prediction | MAE, MSE, RMSE, R¬≤ |
| Supervised (Classification) | Spam Detection | Accuracy, Precision, Recall, F1 |
| Unsupervised | Customer Segmentation | Silhouette Score |
| Semi-Supervised | Tumor Detection | Accuracy (AUC if available) |
| Reinforcement | Traffic Light Control | Cumulative Reward |
| Self-Supervised | Masked Word Prediction | Perplexity |

---
_Created by **Mohd Salman** | Corporate Training Series_