Skip to content

psagun/rent-predictor

Repository files navigation

Data Science Course Project

Predict Asking Rents for NYC apartments

Team: Sagun Pandey, Ahsun Rasool, Phurpa Sherpa, Sanjay Gurung

Goal

The goal of this project is to apply our data handling and modeling skills taught in the class to a real world data set. Our task is to predict asking rents for and answer several modeling questions pertaining to for New York City apartments posted on StreetEasy, an online marketplace for New York City homes. Predictions will be judged on the mean squared error of our estimated rents for the provided test sets.

Important: The datasets from NYC Open Data are large and therefore exceed Githib's 25 MB upload limit. If attempting to replicate the whole modeling process from the start, make sure to download them to your local machine and change the import path. URLs to the datasets are provided in the notebook cells.

Data

The data sets for the project come from a random selection of homes posted for rent on StreetEasy during the summer of 2018. A training set with a sample of 12,000 homes posted in May, June, and July of 2018, along with their respective asking rents and several details pertaining to their listing on StreetEasy, including publicly posted bedroom count, bathroom count, descriptions, and select building and unit amenities. We are required to generate predictions on a random set of listings posted on StreetEasy during August 2018. One full set, including observed rents, is provided with the project posting. We are required to submit predicted rents on two additional sets, including test2 and test3, which do not include the observed rents.

We are expected to attach at least one additional data set to the set provided. The data set includes several data points designed to facilitate attaching additional third party data sets to the StreetEasydata set. Examples of these include the street address, latitude and longitude, and New York City BIN and BBL numbers. Additional data could come from the U.S. Census Bureau, New York City open data, the NYC Geoclient or any number of other open sources.

gif2

Deliverables

  • csv with predictions against test2.csv

  • A 200-300 word explanation:

    • Expected performance of the model in terms of mean squared error
    • Key features driving the team’s modeling performance.
  • A 200-300 word explanation:

    • intended strategy to improve the predictions for the final round
  • csv with predictions against test3.csv.

About

Predicting rents for NYC apartments

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •