# Task 7: Linear Regression

## Introduction to Machine Learning

### What is Machine Learning?

Machine Learning (ML) is a subset of artificial intelligence that enables computers to learn from data and make predictions without being explicitly programmed. Instead of following explicit instructions, ML algorithms identify patterns in data and use them to make decisions.

### Types of Machine Learning

1. **Supervised Learning**: Learning from labeled data (input-output pairs)
   - Classification: Predicting discrete categories
   - Regression: Predicting continuous values

2. **Unsupervised Learning**: Finding patterns in unlabeled data
   - Clustering, Dimensionality Reduction

3. **Reinforcement Learning**: Learning through rewards and penalties

### What is Linear Regression?

Linear Regression is a **supervised learning** algorithm used for **regression** tasks. It models the relationship between a dependent variable (Y) and one or more independent variables (X) by fitting a linear equation.

**Simple Linear Regression**: $y = mx + b$

**Multiple Linear Regression**: $y = b_0 + b_1x_1 + b_2x_2 + ... + b_nx_n$

Where:
- $y$ is the target variable
- $x$ is the feature(s)
- $m$ or $b_1, b_2, ...$ are coefficients (weights)
- $b$ or $b_0$ is the intercept

## 1. Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

## 2. Generate Sample Data

In [None]:
# Generate synthetic data for linear regression
np.random.seed(42)

# Features: Years of Experience
X = np.random.rand(100, 1) * 10  # 100 samples, values between 0-10

# Target: Salary (with some noise)
# True relationship: Salary = 30000 + 5000 * Experience + noise
y = 30000 + 5000 * X + np.random.randn(100, 1) * 5000

# Flatten for easier handling
X = X.flatten()
y = y.flatten()

print("Feature (Years of Experience) shape:", X.shape)
print("Target (Salary) shape:", y.shape)
print("\nFirst 10 samples:")
for i in range(10):
    print(f"Experience: {X[i]:.2f}, Salary: ${y[i]:,.2f}")

## 3. Visualize the Data

In [None]:
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', alpha=0.6, edgecolors='black')
plt.xlabel('Years of Experience')
plt.ylabel('Salary ($)')
plt.title('Years of Experience vs Salary')
plt.grid(True, alpha=0.3)
plt.show()

## 4. Implement Linear Regression from Scratch

### 4.1 Using Normal Equation (Closed-form Solution)

The normal equation provides a closed-form solution for linear regression:

$\theta = (X^T X)^{-1} X^T y$

This method directly computes the optimal weights without iteration.

In [None]:
class LinearRegressionFromScratch:
    """
    Linear Regression implementation using the Normal Equation
    """
    def __init__(self):
        self.weights = None
        self.bias = None
    
    def fit(self, X, y):
        """
        Fit the model using the Normal Equation
        
        Parameters:
        X: numpy array of shape (n_samples, n_features)
        y: numpy array of shape (n_samples,)
        """
        n_samples = len(X)
        
        # Add bias term (column of ones)
        X_b = np.c_[np.ones((n_samples, 1)), X]
        
        # Normal Equation: theta = (X^T * X)^-1 * X^T * y
        # Using np.linalg.pinv for numerical stability
        theta = np.linalg.pinv(X_b.T @ X_b) @ X_b.T @ y
        
        self.bias = theta[0]
        self.weights = theta[1:]
        
        return self
    
    def predict(self, X):
        """
        Make predictions
        
        Parameters:
        X: numpy array of shape (n_samples, n_features)
        
        Returns:
        predictions: numpy array of shape (n_samples,)
        """
        return self.bias + X @ self.weights
    
    def get_params(self):
        """Return model parameters"""
        return {
            'intercept': self.bias,
            'coefficient': self.weights
        }

### 4.2 Using Gradient Descent

In [None]:
class LinearRegressionGradientDescent:
    """
    Linear Regression implementation using Gradient Descent
    """
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None
        self.cost_history = []
    
    def _compute_cost(self, X, y):
        """
        Compute Mean Squared Error cost function
        
        J = (1/2n) * Σ(y_pred - y)^2
        """
        n = len(y)
        y_pred = self.predict(X)
        cost = (1 / (2 * n)) * np.sum((y_pred - y) ** 2)
        return cost
    
    def fit(self, X, y):
        """
        Fit the model using Gradient Descent
        
        Parameters:
        X: numpy array of shape (n_samples, n_features)
        y: numpy array of shape (n_samples,)
        """
        n_samples, n_features = X.shape
        
        # Initialize weights and bias to zero
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient Descent optimization
        for i in range(self.n_iterations):
            # Make prediction
            y_pred = np.dot(X, self.weights) + self.bias
            
            # Compute gradients
            dw = (1 / n_samples) * np.dot(X.T, (y_pred - y))
            db = (1 / n_samples) * np.sum(y_pred - y)
            
            # Update weights and bias
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
            
            # Record cost for visualization
            if i % 100 == 0:
                cost = self._compute_cost(X, y)
                self.cost_history.append(cost)
        
        return self
    
    def predict(self, X):
        """
        Make predictions
        """
        return np.dot(X, self.weights) + self.bias
    
    def get_params(self):
        """Return model parameters"""
        return {
            'intercept': self.bias,
            'coefficients': self.weights
        }

## 5. Train and Evaluate Models

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Reshape for our implementation
X_train_reshaped = X_train.reshape(-1, 1)
X_test_reshaped = X_test.reshape(-1, 1)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

### 5.1 Train Linear Regression from Scratch (Normal Equation)

In [None]:
# Train using Normal Equation implementation
model_normal = LinearRegressionFromScratch()
model_normal.fit(X_train, y_train)

# Make predictions
y_pred_normal = model_normal.predict(X_test)

# Get parameters
params_normal = model_normal.get_params()
print("=== Linear Regression (Normal Equation) ===")
print(f"Intercept (bias): {params_normal['intercept']:,.2f}")
print(f"Coefficient: {params_normal['coefficient'][0]:,.2f}")
print(f"\nEquation: y = {params_normal['intercept']:,.2f} + {params_normal['coefficient'][0]:,.2f} * x")

### 5.2 Train Linear Regression from Scratch (Gradient Descent)

In [None]:
# Train using Gradient Descent implementation
model_gd = LinearRegressionGradientDescent(
    learning_rate=0.01, 
    n_iterations=1000
)
model_gd.fit(X_train_reshaped, y_train)

# Make predictions
y_pred_gd = model_gd.predict(X_test_reshaped)

# Get parameters
params_gd = model_gd.get_params()
print("=== Linear Regression (Gradient Descent) ===")
print(f"Intercept (bias): {params_gd['intercept']:,.2f}")
print(f"Coefficient: {params_gd['coefficients'][0]:,.2f}")
print(f"\nEquation: y = {params_gd['intercept']:,.2f} + {params_gd['coefficients'][0]:,.2f} * x")

### 5.3 Train using Scikit-Learn

In [None]:
# Train using Scikit-Learn
model_sklearn = LinearRegression()
model_sklearn.fit(X_train_reshaped, y_train)

# Make predictions
y_pred_sklearn = model_sklearn.predict(X_test_reshaped)

print("=== Linear Regression (Scikit-Learn) ===")
print(f"Intercept (bias): {model_sklearn.intercept_:,.2f}")
print(f"Coefficient: {model_sklearn.coef_[0]:,.2f}")
print(f"\nEquation: y = {model_sklearn.intercept_:,.2f} + {model_sklearn.coef_[0]:,.2f} * x")

## 6. Evaluate Model Performance

In [None]:
def evaluate_model(y_true, y_pred, model_name):
    """Calculate and print regression metrics"""
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)
    
    print(f"\n=== {model_name} ===")
    print(f"Mean Squared Error (MSE): {mse:,.2f}")
    print(f"Root Mean Squared Error (RMSE): {rmse:,.2f}")
    print(f"Mean Absolute Error (MAE): {mae:,.2f}")
    print(f"R² Score: {r2:.4f}")
    
    return {'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'R2': r2}

In [None]:
# Evaluate all models
results_normal = evaluate_model(y_test, y_pred_normal, "Normal Equation (From Scratch)")
results_gd = evaluate_model(y_test, y_pred_gd, "Gradient Descent (From Scratch)")
results_sklearn = evaluate_model(y_test, y_pred_sklearn, "Scikit-Learn")

### 6.1 Compare Results

In [None]:
# Create comparison dataframe
comparison_df = pd.DataFrame({
    'Model': ['Normal Equation', 'Gradient Descent', 'Scikit-Learn'],
    'Intercept': [params_normal['intercept'], params_gd['intercept'], model_sklearn.intercept_],
    'Coefficient': [params_normal['coefficient'][0], params_gd['coefficients'][0], model_sklearn.coef_[0]],
    'MSE': [results_normal['MSE'], results_gd['MSE'], results_sklearn['MSE']],
    'RMSE': [results_normal['RMSE'], results_gd['RMSE'], results_sklearn['RMSE']],
    'MAE': [results_normal['MAE'], results_gd['MAE'], results_sklearn['MAE']],
    'R2': [results_normal['R2'], results_gd['R2'], results_sklearn['R2']]
})

print("=== Model Comparison ===")
print(comparison_df.to_string(index=False))

## 7. Visualize Results

In [None]:
# Plot predictions vs actual
plt.figure(figsize=(15, 5))

# Subplot 1: All models comparison
plt.subplot(1, 3, 1)
plt.scatter(X_test, y_test, color='blue', alpha=0.6, label='Actual', edgecolors='black')
plt.plot(X_test, y_pred_normal, color='red', linewidth=2, label='Normal Equation')
plt.plot(X_test, y_pred_gd, color='green', linewidth=2, linestyle='--', label='Gradient Descent')
plt.plot(X_test, y_pred_sklearn, color='orange', linewidth=2, linestyle=':', label='Scikit-Learn')
plt.xlabel('Years of Experience')
plt.ylabel('Salary ($)')
plt.title('Predictions Comparison')
plt.legend()
plt.grid(True, alpha=0.3)

# Subplot 2: Gradient Descent Cost History
plt.subplot(1, 3, 2)
plt.plot(range(0, 1000, 100), model_gd.cost_history, color='purple', linewidth=2)
plt.xlabel('Iterations')
plt.ylabel('Cost (MSE)')
plt.title('Gradient Descent Cost History')
plt.grid(True, alpha=0.3)

# Subplot 3: Residuals
plt.subplot(1, 3, 3)
residuals = y_test - y_pred_sklearn
plt.scatter(y_pred_sklearn, residuals, color='brown', alpha=0.6)
plt.axhline(y=0, color='red', linestyle='--', linewidth=2)
plt.xlabel('Predicted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot (Scikit-Learn)')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 8. Multiple Linear Regression Example

Now let's demonstrate multiple linear regression with more than one feature.

In [None]:
# Generate data for multiple linear regression
# Predicting house prices based on: size, bedrooms, age

np.random.seed(42)
n_samples = 200

size = np.random.rand(n_samples) * 2000 + 500  # Size in sq ft
bedrooms = np.random.randint(1, 6, n_samples)  # Number of bedrooms
age = np.random.randint(0, 50, n_samples)  # Age of house

# Target: House price
# True relationship: Price = 50000 + 150*size + 10000*bedrooms - 500*age + noise
price = 50000 + 150 * size + 10000 * bedrooms - 500 * age + np.random.randn(n_samples) * 20000

# Create feature matrix
X_multi = np.column_stack((size, bedrooms, age))
y_multi = price

print("Multiple Linear Regression Data:")
print(f"Features: Size (sq ft), Bedrooms, Age (years)")
print(f"Target: House Price")
print(f"\nSample data (first 5 rows):")
for i in range(5):
    print(f"Size: {X_multi[i, 0]:.0f}, Bedrooms: {X_multi[i, 1]}, Age: {X_multi[i, 2]:.0f} -> Price: ${y_multi[i]:,.0f}")

In [None]:
# Split data
X_train_multi, X_test_multi, y_train_multi, y_test_multi = train_test_split(
    X_multi, y_multi, test_size=0.2, random_state=42
)

# Train using our implementation
model_multi = LinearRegressionFromScratch()
model_multi.fit(X_train_multi, y_train_multi)

# Train using Scikit-Learn
model_multi_sklearn = LinearRegression()
model_multi_sklearn.fit(X_train_multi, y_train_multi)

# Predictions
y_pred_multi = model_multi.predict(X_test_multi)
y_pred_multi_sklearn = model_multi_sklearn.predict(X_test_multi)

print("=== Multiple Linear Regression Results ===")
print("\nFrom Scratch (Normal Equation):")
print(f"Intercept: ${model_multi.bias:,.2f}")
print(f"Coefficients: Size=${model_multi.weights[0]:,.2f}, Bedrooms=${model_multi.weights[1]:,.2f}, Age=${model_multi.weights[2]:,.2f}")

print("\nScikit-Learn:")
print(f"Intercept: ${model_multi_sklearn.intercept_:,.2f}")
print(f"Coefficients: Size=${model_multi_sklearn.coef_[0]:,.2f}, Bedrooms=${model_multi_sklearn.coef_[1]:,.2f}, Age=${model_multi_sklearn.coef_[2]:,.2f}")

# Evaluate
print("\n=== Model Evaluation ===")
print(f"\nFrom Scratch - R² Score: {r2_score(y_test_multi, y_pred_multi):.4f}")
print(f"Scikit-Learn - R² Score: {r2_score(y_test_multi, y_pred_multi_sklearn):.4f}")

## 9. Key Takeaways

### What We Learned:

1. **Linear Regression** is a fundamental supervised learning algorithm for regression tasks

2. **Normal Equation** provides a closed-form solution:
   - Advantages: Exact solution, no iteration needed
   - Disadvantages: Computationally expensive for large datasets, matrix inversion can be unstable

3. **Gradient Descent** provides an iterative solution:
   - Advantages: Works well with large datasets, handles complexity better
   - Disadvantages: Requires hyperparameter tuning (learning rate, iterations)

4. **Scikit-Learn** provides optimized, production-ready implementation
   - Use for real-world applications
   - Understanding the math helps in debugging and optimization

5. **Evaluation Metrics**:
   - MSE/RMSE: Penalizes large errors more
   - MAE: More robust to outliers
   - R²: Explains variance in the data (0-1, higher is better)

## Summary

This notebook demonstrated:
- ✅ Introduction to Machine Learning concepts
- ✅ Linear Regression implementation from scratch (Normal Equation)
- ✅ Linear Regression implementation from scratch (Gradient Descent)
- ✅ Linear Regression using Scikit-Learn
- ✅ Model evaluation and comparison
- ✅ Multiple Linear Regression example
- ✅ Data visualization and analysis