# Boston Housing Prices Prediction Notebook

## Introduction
This Jupyter Notebook demonstrates a machine learning workflow to predict house prices in Boston using the **Boston Housing Dataset**. The dataset includes 506 samples and 13 features, such as crime rate, number of rooms, and socio-economic indicators, with the target variable being the median house price (MEDV). This project is designed for beginners to practice regression techniques and understand the end-to-end machine learning process.

## Objectives
- Explore and visualize the dataset to identify key patterns and correlations.
- Preprocess the data to handle missing values and scale features.
- Train and evaluate regression models, including Linear Regression and Random Forest.
- Interpret results to understand feature importance and model performance.

## Dataset
The [Boston Housing Dataset](https://www.kaggle.com/datasets/schirmerchad/bostonhoustingmlnd) is used, with features like:
- **CRIM**: Per capita crime rate by town
- **RM**: Average number of rooms per dwelling
- **LSTAT**: Percentage of lower-status population
- **MEDV**: Median house price (in $1000s, target variable)

## Workflow
1. **Data Loading and Exploration**:
   - Load the dataset using pandas.
   - Visualize distributions and correlations (e.g., heatmap, scatter plots).
2. **Preprocessing**:
   - Check for missing values and outliers.
   - Normalize numerical features using StandardScaler.
   - Split data into training (80%) and testing (20%) sets.
3. **Modeling**:
   - Train a Linear Regression model as a baseline.
   - Train a Random Forest model for improved performance.
   - Hyperparameter tuning using GridSearchCV (optional).
4. **Evaluation**:
   - Compute Mean Squared Error (MSE) and R² score.
   - Visualize predicted vs. actual prices.
   - Analyze feature importance for Random Forest.
5. **Conclusion**:
   - Summarize model performance and key findings.
   - Discuss limitations and potential improvements.

## Expected Outcomes
- Understand the relationship between housing features and prices.
- Compare the performance of simple (Linear Regression) vs. complex (Random Forest) models.
- Gain hands-on experience with data preprocessing, model training, and evaluation.
