MachineKnight_hackathon

WEBAPP LINK :

https://joygeo007-machineknight-hackathon-main-2agjg1.streamlitapp.com/

Objective and Dataset Info

The dataset consists of housing properties located in Bengaluru and Chennai. The objective is to create an ML model that can predict the rent of a house based on the given properties. The model has been trained using the train data and makes predictions for the test data. Train.csv has dimensions: 20500 rows X 25 columns, whereas, test.csv has dimensions: 4500 rows X 24 columns

Data Cleaning

No null values were found in the dataset
No duplicate values were found in the dataset

Exploratory Data Analysis (EDA)

Most of the East facing flats have rent in the range of 15000-20000
Most of the North facing flats have rent less than 10000
Most of the 2 BHKs have rent in the range 15000-20000
There was slight correlation between total number of floors and floor
There was slight correlation between bathroom,property_size and rent
Properties with either one of gym, swimming pool or lift facilities have higher chances of having the other 2 amenities.

Data preprocessing and Feature Engineering

Dropped columns with high cardinality
We haven't taken locality into consideration as we had the variables latitude and longitude
Amenities column had already been processed , so we removed it.
Since it is regression problem, we encoded all the categorical variables:
4.1. Categorical variables with distinct hierarchical values were label-encoded
4.2. Rest of the categorical variables were one-hot encoded.
Separated target(rent) and predictor variables.
Scaled the train and test data using Standard Scaler

Approach

After removing the columns previously mentioned, we performed Feature Engineering and EDA to gain initial insights from the given dataset
We used Sweetviz for EDA besides doing the same by ourselves
Next, we checked for the correlation between the columns
We started with the Linear Regression model initially, which was followed by Ridge and Lasso
After this, we used Gradient Boosting Regressor and Support Vector Regressor
This was followed by Random Forest Regressor and Decision Tree
For finding the best model, we calculated a particular model's root mean square error (RMSE) and R2 Score
We found out that Decision tree regressor and random forest regressor were giving the most optimal score
We also deduced that decision tree regressor was getting overfitted.
We first tried to postrun the Decision Tree. Then we used RandomizedSearchCV, for a specific range of parameters, to find the best optimal score
We did the same thing with Random Forest Regressor. Then we got a specific range of parameters, which is giving us the best possible score for a model till now.
We used GridSearchCV on the range of the specific parameters and found out that the best suitable parameters that gave us the best possible model score

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Machineknight_Hackathon.ipynb		Machineknight_Hackathon.ipynb
README.md		README.md
Test_data_result.csv		Test_data_result.csv
main.py		main.py
pickle_model.pkl		pickle_model.pkl
requirements.txt		requirements.txt
std_scaler.bin		std_scaler.bin
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MachineKnight_hackathon

WEBAPP LINK :

Objective and Dataset Info

Data Cleaning

Exploratory Data Analysis (EDA)

Data preprocessing and Feature Engineering

Approach

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MachineKnight_hackathon

WEBAPP LINK :

Objective and Dataset Info

Data Cleaning

Exploratory Data Analysis (EDA)

Data preprocessing and Feature Engineering

Approach

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages