### **Gradient Descent: Predicting House Prices**

#### Objective:
Your task is to implement and use **Gradient Descent** to optimize a regression model for predicting house prices based on various features. You'll work with a real-world dataset to understand the workings of gradient descent, evaluate its performance, and compare it with other optimization techniques.

#### Dataset:
Use the **Boston Housing dataset**, which contains information on 506 houses in Boston suburbs. Each row corresponds to a house, and the columns represent various features such as crime rate, number of rooms, and proximity to employment centers. The target variable is the **median value of owner-occupied homes (in $1000s)**.

Dataset Sources:
- Available in `sklearn.datasets` as `load_boston` (for legacy use, or download the CSV from external sources like Kaggle).
- Alternatively, use a similar housing dataset if preferred.

#### Steps to Complete:

1. **Data Loading and Exploration**  
   - Load the dataset using `pandas` or `sklearn.datasets`.
   - Explore the data: Understand the features, visualize their distributions, and check for missing values or anomalies.
   - Normalize or scale the features to ensure smooth convergence of gradient descent.

2. **Define the Linear Regression Model**  
   - Create a linear regression model with parameters (weights and biases) initialized to zero or small random values.
   - Define the **cost function** as **Mean Squared Error (MSE)**.

3. **Implement Gradient Descent**  
   - Write a custom implementation of Gradient Descent to optimize the weights and biases of the regression model:
     - Compute gradients of the cost function with respect to each parameter.
     - Update parameters iteratively using the gradient descent formula:
       $$
       \theta := \theta - \alpha \cdot \nabla J(\theta)
       $$
       where $ \alpha $ is the learning rate.

4. **Experiment with Hyperparameters**  
   - Test the algorithm with different learning rates ($ \alpha $).
   - Observe and plot the cost function values over iterations to ensure convergence.

5. **Model Evaluation**  
   - Split the dataset into **training** and **test sets**.
   - Evaluate the model's performance on both datasets using:
     - Mean Squared Error (MSE)
     - R² score
   - Compare predictions with actual values.

6. **Visualization**  
   - Plot the cost function against iterations to visualize convergence.
   - Visualize the relationship between features (e.g., number of rooms) and predicted house prices.


#### Bonus Challenges (Optional):

1. **Mini-Batch Gradient Descent**:  
   - Modify your implementation to use mini-batches instead of the entire dataset for each gradient step.
   - Compare the performance and convergence speed of batch and mini-batch gradient descent.

2. **Stochastic Gradient Descent (SGD)**:  
   - Implement stochastic gradient descent and compare its performance with batch gradient descent.

3. **Multiple Models**:  
   - Train and evaluate the same dataset using **Scikit-learn's LinearRegression** for comparison.

#### Deliverables:
- A Python script or Jupyter Notebook containing:
  - Code for loading and preprocessing the data.
  - Implementation of Gradient Descent for linear regression.
  - Results of model training and evaluation.
  - Visualizations of cost function convergence and predictions.
- A brief report answering:
  - How does the learning rate affect convergence?
  - What were the final MSE and R² scores?
  - Insights from comparing different optimization techniques.