This paper discusses the comparison between three types of machine learning models used in data science and statistics. Many factors impact the final price of apartment sales. Data can only be collected based on what is known and given. Together with many missing pieces to predict apartment prices, the Regression, the Linear, and the Random Forest Algorithms proceeds to complete the task. They are set side by side as to which model gives the most accurate price. The dataset features and the algorithms construct statistical models to predict apartment prices. The dataset comes from Amazon MTruk and contains data from February 2016 until February 2017.
- zipcode
- approx_year_built
- dining_room_type
- fuel_type
- kitchen_type
- maintenance_cost
- num_bedrooms
- num_floors_in_building
- num_full_bathrooms
- num_total_rooms
- parking_charges
- sale_price
- sq_footage
- walk_score
- price_listings
- avg_prices
- cats_allowed
- dogs_allowed
- coop_condo
- price_per_sqft
-
Collecting data from Amazon Turk
-
Cleaning data from Amazon Turk. Includes missing values
-
Researching missing values
-
Using umputation on numerical values with Missforest & MICE.
-
Data Visualizations through heatmaps to not overfit with highly correlated variables.
-
Implementing the Linear Regression model to view its predicting power and RMSE.
-
Implementing the Regression Tree model to view its predicting power and RMSE.
-
Implementing the Random Forest Regression model to view its predicting power and RMSE.
-
Declare which model has the lowest RMSE (Root Mean Squared Error) and best predictability power.