<div class="alert alert-block alert-success">
    <h1 align="center">Mebourne Housing Price Prediction</h1>
    <h3 align="center">Interactive Dashboards in Python - Streamlit Part</h3>
    <h3 align="center">Updated Aug. 2023</h3>
    <a align="center" href="https://www.linkedin.com/in/hsheikh7/" >
        <h3 align="center">Hassan Sheikh</h3>
    </a>
</div>


# **Melbourne Housing Price Prediction Project**

## About

I got really excited when I brought this model to life in a Streamlit app. With this app, users could easily input data, receiving real-time predictions for housing prices as a result.

## Data Source

The invaluable dataset for this project hailed from Kaggle's treasure trove. See it here: [Melbourne Housing Snapshot dataset](https://www.kaggle.com/datasets/dansbecker/melbourne-housing-snapshot/code?select=melb_data.csv) 

## Workflow

1. **Data Collection:** The project's foundation was laid with the acquisition of the Melbourne Housing Snapshot dataset from Kaggle.

2. **Data Preprocessing:** The dataset underwent careful preprocessing to address missing values, normalize features, and sculpt categorical variables into a form suitable for training.

3. **Model Building:** I used RandomForestRegressor to build a good model. 

4. **Model Evaluation:** Mean Squared Error shows performance of model.

5. **Model Deployment:** When the model was OK, I saved it by Joblib library, then I load it in VSCode area.

6. **Streamlit Fusion:** In VSCode, I integrated the model into a captivating Streamlit app. This interactive interface invited users to input feature values, yielding instantaneous predictions for housing prices as well as seeing the map.



# Build and Save the Model 

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings("ignore") 

# Load your dataset (replace 'data.csv' with your dataset file)
data = pd.read_csv('melb_data.csv')

# Handle missing values (this is just an example, you may need a more sophisticated imputation strategy)
data.fillna(data.mean(), inplace=True)

# Define features and target variable
X = data.drop('Price', axis=1)
y = data['Price']

# Define features and target variable
X = data.drop('Price', axis=1)
y = data['Price']

In [10]:
# Define numeric and categorical columns
numeric_columns = ['Rooms', 'Distance', 'Bathroom', 'Landsize', 'BuildingArea', 'YearBuilt', 'Car']
categorical_columns = ['Suburb', 'Type', 'Regionname']


In [11]:
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=123)

# Create a ColumnTransformer for preprocessing
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_columns),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_columns)  # Handle unknown categories
    ])


# Create a pipeline with preprocessing and RandomForestRegressor
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('rf_model', RandomForestRegressor(n_estimators=100, random_state=123))  
])


In [12]:
# Fit the pipeline on the training data
pipeline.fit(X_train, y_train)

# Predict on the validation data
y_pred = pipeline.predict(X_val)

# Evaluate the model
mse = mean_squared_error(y_val, y_pred)
print(f"Mean Squared Error: {mse:.2f}")


Mean Squared Error: 80045129697.74


In [13]:
import joblib

# Save the pipeline to a joblib file
joblib.dump(pipeline, 'housing_pipeline.joblib')


['housing_pipeline.joblib']

# Check the Model and Version of Scikit-Learn 

In [14]:
# I am going to check the model using a sample case 
import joblib
import pandas as pd

# Load the pipeline from the joblib file
loaded_pipeline = joblib.load('housing_pipeline.joblib')

# Create a dictionary with the feature values for prediction
# Example values, replace with your own numbers
input_data = {
    'Rooms': 3,
    'Distance': 10,
    'Bathroom': 2,
    'Landsize': 600,
    'BuildingArea': 150,
    'YearBuilt': 2000,
    'Car': 2,
    'Suburb': 'SuburbName',
    'Type': 'h',
    'Regionname': 'RegionName'
}

# Convert the dictionary into a DataFrame
input_df = pd.DataFrame([input_data])

# Make predictions using the loaded pipeline
predictions = loaded_pipeline.predict(input_df)

# Display the predictions
print(predictions)


[851585.]


In [19]:
# Make predictions using the loaded pipeline
predictions = loaded_pipeline.predict(input_df)

# Format the predictions for better readability
formatted_predictions = ["${:,.0f}".format(prediction) for prediction in predictions]

print(f"Predicted Price: {prediction}")



Predicted Price: $851,585


In [3]:
# It is extremly important to check the version of SciKit-Learn since Streamlit only works with the same version in VSCode. 
import sklearn
print(sklearn.__version__)

1.0.2


# Extract Useful Data 

In [6]:
import pandas as pd

data = pd.read_csv('melb_data.csv')

unique_suburbs = data['Suburb'].unique()
print(unique_suburbs)

unique_suburbs = data['Regionname'].unique()
print(unique_suburbs)

['Abbotsford' 'Airport West' 'Albert Park' 'Alphington' 'Altona'
 'Altona North' 'Armadale' 'Ascot Vale' 'Ashburton' 'Ashwood'
 'Avondale Heights' 'Balaclava' 'Balwyn' 'Balwyn North' 'Bentleigh'
 'Bentleigh East' 'Box Hill' 'Braybrook' 'Brighton' 'Brighton East'
 'Brunswick' 'Brunswick West' 'Bulleen' 'Burwood' 'Camberwell'
 'Canterbury' 'Carlton North' 'Carnegie' 'Caulfield' 'Caulfield North'
 'Caulfield South' 'Chadstone' 'Clifton Hill' 'Coburg' 'Coburg North'
 'Collingwood' 'Doncaster' 'Eaglemont' 'Elsternwick' 'Elwood' 'Essendon'
 'Essendon North' 'Fairfield' 'Fitzroy' 'Fitzroy North' 'Flemington'
 'Footscray' 'Glen Iris' 'Glenroy' 'Gowanbrae' 'Hadfield' 'Hampton'
 'Hampton East' 'Hawthorn' 'Heidelberg Heights' 'Heidelberg West'
 'Hughesdale' 'Ivanhoe' 'Kealba' 'Keilor East' 'Kensington' 'Kew'
 'Kew East' 'Kooyong' 'Maidstone' 'Malvern' 'Malvern East' 'Maribyrnong'
 'Melbourne' 'Middle Park' 'Mont Albert' 'Moonee Ponds' 'Moorabbin'
 'Newport' 'Niddrie' 'North Melbourne' 'Northcote'