# Project 2: Regression - Predicting House Prices

## Dataset: 

House Prices: Advanced Regression Techniques (available from Kaggle)

## Analysis Goals:

1. Load and preprocess the House Prices dataset.
2. Split the dataset into training and testing sets.
3. Train multiple regression models:
    1. Linear Regression
    2. Ridge Regression
    3. Lasso Regression
    4. Random Forest Regression
4. Tune hyperparameters using techniques like GridSearchCV or RandomizedSearchCV.
5. Compare the performance of models using metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared.
6. Visualize the relationship between significant features and the target variable.

## Analysis

### Load Data

| Variable Name | Role     | Type        | Demographic                                       | Description                                            | Units | Missing Values |
|---------------|----------|-------------|---------------------------------------------------|--------------------------------------------------------|-------|----------------|
| age           | Feature  | Integer     | Age                                               |                                                        | years | no             |
| sex           | Feature  | Categorical | Sex                                               |                                                        |       | no             |
| cp            | Feature  | Categorical | Chest pain type                                  |                                                        |       | no             |
| trestbps      | Feature  | Integer     | Resting blood pressure                           | On admission to the hospital                           | mm Hg | no             |
| chol          | Feature  | Integer     | Serum cholestoral                                |                                                        | mg/dl | no             |
| fbs           | Feature  | Categorical | Fasting blood sugar                              | Fasting blood sugar > 120 mg/dl                        |       | no             |
| restecg       | Feature  | Categorical | Resting electrocardiographic results             |                                                        |       | no             |
| thalach       | Feature  | Integer     | Maximum heart rate achieved                      |                                                        |       | no             |
| exang         | Feature  | Categorical | Exercise induced angina                          |                                                        |       | no             |
| oldpeak       | Feature  | Integer     | ST depression induced by exercise relative to rest|                                                        |       | no             |
| slope         | Feature  | Categorical | The slope of the peak exercise ST segment        |                                                        |       | no             |
| ca            | Feature  | Integer     | Number of major vessels (0-3) colored by flourosopy|                                                      |       | yes            |
| thal          | Feature  | Categorical | Thal                                              |                                                        |       | yes            |
| num           | Target   | Integer     | Diagnosis of heart disease                       |                                                        |       | no             |


In [4]:
from ucimlrepo import fetch_ucirepo 

# fetch dataset 
heart_disease = fetch_ucirepo(id=45) 
  
# data (as pandas dataframes) 
X = heart_disease.data.features 
y = heart_disease.data.targets 

X.head()
# metadata 
#print(heart_disease.metadata) 
  
# variable information 
#print(heart_disease.variables) 

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,63,1,1,145,233,1,2,150,0,2.3,3,0.0,6.0
1,67,1,4,160,286,0,2,108,1,1.5,2,3.0,3.0
2,67,1,4,120,229,0,2,129,1,2.6,2,2.0,7.0
3,37,1,3,130,250,0,0,187,0,3.5,3,0.0,3.0
4,41,0,2,130,204,0,2,172,0,1.4,1,0.0,3.0


### Data Preprocessing