# EN3150 — Assignment 01 Notebook (Refreshed)
**Learning from Data and Related Challenges & Linear Models for Regression**  
**University of Moratuwa — Department of Electronic & Telecommunication Engineering**

**Prepared notebook template:** <auto-generated today>

This refreshed version fixes earlier minor issues and keeps plots simple and compatible.

**Student Name:** _<type here>_  
**Index Number:** _<type here>_  


## Setup

In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Ensure inline plotting
%matplotlib inline

# (Optional) Set a base random seed for reproducibility of demo plots
np.random.seed(0)

# ---------- Helper functions ----------
def linear_fit(x, y):
    """Return slope and intercept for y ≈ m x + c using least squares."""
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    m, c = np.polyfit(x, y, 1)
    return m, c

def predict_line(x, m, c):
    return m * np.asarray(x, dtype=float) + c

def robust_loss_per_sample(y_true, y_pred, beta):
    """Compute per-sample robust loss: r^2 / (r^2 + beta^2)."""
    y_true = np.asarray(y_true, dtype=float)
    y_pred = np.asarray(y_pred, dtype=float)
    r2 = (y_true - y_pred)**2
    beta2 = (beta**2)
    return r2 / (r2 + beta2)

def robust_loss_mean(y_true, y_pred, beta):
    return robust_loss_per_sample(y_true, y_pred, beta).mean()

def mse(y_true, y_pred):
    y_true = np.asarray(y_true, dtype=float)
    y_pred = np.asarray(y_pred, dtype=float)
    return np.mean((y_true - y_pred)**2)

def bce_for_y_equals_1(yhat, eps=1e-12):
    # Clip to avoid log(0) or log(1) producing -inf
    yhat = np.clip(np.asarray(yhat, dtype=float), eps, 1.0 - eps)
    return -np.log(yhat)

# Fallback scaling utilities (no sklearn required)
def standard_scale(x):
    x = np.asarray(x, dtype=float)
    std = x.std(ddof=0)
    return (x - x.mean()) / (std if std != 0 else 1.0)

def minmax_scale(x):
    x = np.asarray(x, dtype=float)
    xmin, xmax = x.min(), x.max()
    return (x - xmin) / (xmax - xmin) if xmax != xmin else np.zeros_like(x)

def maxabs_scale(x):
    x = np.asarray(x, dtype=float)
    denom = np.max(np.abs(x))
    return x / denom if denom != 0 else np.zeros_like(x)


## 1) Linear Regression Impact on Outliers

### Task 1–2: Load dataset (Table 1), fit linear regression, and plot
Data from the assignment (x, y):

In [None]:
# Table 1 data (i from 1..10)
x = np.array([0,1,2,3,4,5,6,7,8,9], dtype=float)
y = np.array([20.26, 5.61, 3.14, -30.00, -40.00, -8.13, -11.73, -16.08, -19.95, -24.03], dtype=float)

# Least squares linear regression
m_hat, c_hat = linear_fit(x, y)
print(f"Learned linear model (Task 2): y = {m_hat:.4f} x + {c_hat:.4f}")

# Plot scatter + fitted line
xx = np.linspace(x.min()-0.5, x.max()+0.5, 200)
yy = predict_line(xx, m_hat, c_hat)

plt.figure(figsize=(7,5))
plt.scatter(x, y, s=35, label='Data points')
plt.plot(xx, yy, linewidth=2, label='Fitted line')
plt.title('Linear Regression on Given Data')
plt.xlabel('x'); plt.ylabel('y')
plt.legend(); plt.grid(True)
plt.show()


### Task 3–4: Robust loss for two given models
Two models:
- **Model 1**: $y = -4x + 12$  
- **Model 2**: $y = -3.55x + 3.91$ (stated as the learned model in Task 2)

Compute per-sample and mean robust loss $L(\theta, \beta)$ for $\beta \in \{1, 10^{-6}, 10^{3}\}$.

In [None]:
# Define the two candidate models
m1, c1 = -4.0, 12.0
m2, c2 = -3.55, 3.91

y1 = predict_line(x, m1, c1)
y2 = predict_line(x, m2, c2)

betas = [1.0, 1e-6, 1e3]

rows = []
for model_name, yp in [('Model 1', y1), ('Model 2', y2)]:
    for b in betas:
        per = robust_loss_per_sample(y, yp, b)
        rows.append({'Model': model_name, 'beta': b, 'L_mean': per.mean()})
results_df = pd.DataFrame(rows)
results_df


(Optional) Detailed per-sample table — uncomment in the next cell to view.

In [None]:
# details = []
# for model_name, yp in [('Model 1', y1), ('Model 2', y2)]:
#     for b in betas:
#         per = robust_loss_per_sample(y, yp, b)
#         for i, val in enumerate(per, start=1):
#             details.append({'Model': model_name, 'beta': b, 'i': i, 'x': x[i-1], 'y': y[i-1], 'loss_i': val})
# details_df = pd.DataFrame(details)
# details_df.head(20)


### (Optional) Visual: Robust loss vs. residual magnitude
Compare how MSE (unbounded) and the robust loss $\frac{r^2}{r^2+\beta^2}$ (bounded by 1) behave.

In [None]:
r = np.linspace(0, 100, 400)
mse_curve = (r**2)
beta_demo = 1.0
robust_curve = (r**2) / (r**2 + beta_demo**2)

plt.figure(figsize=(7,5))
plt.plot(r, mse_curve, label='MSE loss (r^2)')
plt.plot(r, robust_curve, label=f'Robust loss (beta={beta_demo})')
plt.xlabel('Residual magnitude |r|'); plt.ylabel('Loss value')
plt.title('Loss vs Residual Magnitude')
plt.legend(); plt.grid(True)
plt.show()


### Task 5–6: Choose suitable $\beta$ and the better model
Use the table above and your understanding to justify:
- A suitable $\beta$ to mitigate outliers (hint: too small ≈ indicator loss; too large ≈ MSE).
- Which model (1 or 2) is preferred under the chosen $\beta$.

**Your justification (write in your own words):**

- **Chosen β:** _<type here>_  
- **Reasoning:** _<explain why this β balances outlier suppression vs. inlier sensitivity>_  
- **Selected model:** _<Model 1 or Model 2>_  
- **Reasoning:** _<compare L_mean values at your β and discuss fit to data>_  


### Task 7: How does this robust estimator reduce the impact of outliers?
**Your explanation (own words):**  
_Hint: The loss saturates near 1 for large residuals, capping the contribution of outliers to the total loss, unlike MSE which grows without bound._

### Task 8: Another robust loss you could use
Examples include **Huber**, **Tukey’s biweight**, **Cauchy**, **Geman–McClure**.  
Briefly describe one and why it helps.

## 2) Loss Functions — Linear vs Logistic Regression

### Task 1: Fill the table (y=1), compute MSE and BCE, and plot

In [None]:
y_true = 1.0
yhat = np.array([0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], dtype=float)

mse_vals = (y_true - yhat)**2
bce_vals = bce_for_y_equals_1(yhat)

table_df = pd.DataFrame({
    'True y': [1]*len(yhat),
    'Prediction y_hat': yhat,
    'MSE': mse_vals,
    'BCE': bce_vals
})
table_df


In [None]:
# Plot MSE vs y_hat
plt.figure(figsize=(6,4))
plt.plot(yhat, mse_vals, marker='o')
plt.title('MSE for y=1 vs prediction y_hat')
plt.xlabel('y_hat'); plt.ylabel('MSE')
plt.grid(True)
plt.show()

# Plot BCE vs y_hat
plt.figure(figsize=(6,4))
plt.plot(yhat, bce_vals, marker='o')
plt.title('BCE for y=1 vs prediction y_hat')
plt.xlabel('y_hat'); plt.ylabel('BCE')
plt.grid(True)
plt.show()


### Task 2: Which loss for which application?
- **Application 1 (continuous target, Linear Regression):** Prefer **MSE** (matches Gaussian noise assumption and penalizes squared deviations).
- **Application 2 (binary target, Logistic Regression):** Prefer **BCE** (derived from Bernoulli likelihood; strongly penalizes confident wrong predictions and supports probability outputs).

Write a short justification in your own words below.

**Your justification:** _<type here>_

## 3) Data Pre-processing — Generate Features & Choose Scaling

### Task 1: Generate Feature 1 (sparse) and Feature 2 (Gaussian-like noise)
> **Enter your index number** below (digits only, no leading zeros). The code reproduces Listing 1 logic and then applies three scaling methods. Choose one suitable method for each feature and justify.

In [None]:
# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# Enter your index number (digits only, e.g., 220123A -> 220123) — REQUIRED
index_no = 123456  # <-- change this
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

# Listing 1 (adapted)
signal_length = 100
num_nonzero = 10

def generate_signal(signal_length, num_nonzero, seed=None):
    if seed is not None:
        rng = np.random.default_rng(seed)
        signal = np.zeros(signal_length)
        nonzero_indices = rng.choice(signal_length, num_nonzero, replace=False)
        nonzero_values = 10*rng.standard_normal(num_nonzero)
    else:
        signal = np.zeros(signal_length)
        nonzero_indices = np.random.choice(signal_length, num_nonzero, replace=False)
        nonzero_values = 10*np.random.randn(num_nonzero)
    signal[nonzero_indices] = nonzero_values
    return signal

# Make Feature 1 reproducible w.r.t. index number
seed = int(index_no) if isinstance(index_no, (int, np.integer)) else 0
sparse_signal = generate_signal(signal_length, num_nonzero, seed=seed)

# Inject assignment-specific spike
sparse_signal[10] = (int(index_no) % 10)*2 + 10
if int(index_no) % 10 == 0:
    sparse_signal[10] = float(np.random.randn(1) + 30)
sparse_signal = sparse_signal / 5.0

# Feature 2 (Gaussian-like)
rng2 = np.random.default_rng(seed + 1)
epsilon = rng2.normal(0, 15, signal_length)

# Apply scalers
f1_std   = standard_scale(sparse_signal)
f1_minmax = minmax_scale(sparse_signal)
f1_maxabs = maxabs_scale(sparse_signal)

f2_std   = standard_scale(epsilon)
f2_minmax = minmax_scale(epsilon)
f2_maxabs = maxabs_scale(epsilon)

# Plot originals
plt.figure(figsize=(10,4))
plt.stem(sparse_signal)
plt.title('Feature 1 (Sparse) — Original')
plt.xlabel('Index'); plt.ylabel('Value'); plt.grid(True)
plt.show()

plt.figure(figsize=(10,4))
plt.stem(epsilon)
plt.title('Feature 2 (Gaussian-like) — Original')
plt.xlabel('Index'); plt.ylabel('Value'); plt.grid(True)
plt.show()

# Plot scaled variants for Feature 1
plt.figure(figsize=(10,4))
plt.stem(f1_std); plt.title('Feature 1 — Standard Scaled'); plt.grid(True); plt.show()

plt.figure(figsize=(10,4))
plt.stem(f1_minmax); plt.title('Feature 1 — Min-Max Scaled'); plt.grid(True); plt.show()

plt.figure(figsize=(10,4))
plt.stem(f1_maxabs); plt.title('Feature 1 — Max-Abs Scaled'); plt.grid(True); plt.show()

# Plot scaled variants for Feature 2
plt.figure(figsize=(10,4))
plt.stem(f2_std); plt.title('Feature 2 — Standard Scaled'); plt.grid(True); plt.show()

plt.figure(figsize=(10,4))
plt.stem(f2_minmax); plt.title('Feature 2 — Min-Max Scaled'); plt.grid(True); plt.show()

plt.figure(figsize=(10,4))
plt.stem(f2_maxabs); plt.title('Feature 2 — Max-Abs Scaled'); plt.grid(True); plt.show()


### Your choice & justification (write succinctly)
- **Feature 1 (sparse spikes + many exact zeros):** _<choose one: MaxAbs / MinMax / Standard>_  
  - _<justify: preserving zeros, scale of spikes, sign, etc.>_
- **Feature 2 (Gaussian-like noise with mean≈0, high variance):** _<choose one: Standard / MinMax / MaxAbs>_  
  - _<justify: preserving distribution shape, zero-mean/unit-variance, etc.>_

## Appendix — References (for your reading)
- scikit-learn preprocessing: <https://scikit-learn.org/stable/modules/preprocessing.html>  
- Introduction to sparsity in signal processing: <https://eeweb.engineering.nyu.edu/iselesni/lecture_notes/sparsity_intro/sparse_SP_intro.pdf>  
- scikit-learn LinearRegression: <https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html>