NYC real estate data downloader
Download and pre-process New York City real estate transaction data, which is online in Excel files. This project downloads the data and tranforms it into usable csv files, and optionally loads it into a MySQL, PostgreSQL or SQLite database.
The NYC Department of Finance maintains two real estate data sets:
- Rolling Sales: All sales for the last twelve months (usually running with a 2-3 month delay)
- Annualized Sales: Files with sales records for each year.
DOF also provides Summary Data, year-by-year average prices by neighborhood.
You might be interested in a related project, avroll-download, for downloading the City's assessed value database.
- Node v0.12.4 (higher versions will work except for the summary files due to a bug in J)
- NPM packages: J and sheetstack
- optional: MySQL v14+ or SQLite v3.7.15+ or PostgreSQL 9.3+
git clone email@example.com:fitnr/nycre.git cd nycre make install
This will work if you have node and NPM installed.
The download tasks convert DOF's excel files to CSV, but don't do any cleaning or other processing.
To download all the annual sales data, download or clone the repository and run:
This will create a
sales folder with files named things like
The most recent sales data is in DOF's rolling data files. These files generally have sales for a year long period, up to two-to-three months ago.
Download the most recent rolling data to
Download a specific month:
To load this "rolling" data into your database, use a command in this format:
make rolling-mysql-2015-01 make rolling-sqlite-2015-02 make rolling-postgresql-2015-03
If you run the command on a period not currently covered by the rolling data you've downloaded, you'll get an empty result. So if you plan on running this regularly, you'll need to remove the downloaded "raw" data and download current data:
rm -r rolling/raw make rolling
This command will download DOF's annual neighborhood summary data (beginning in 2007) to the
Limiting by year
To limit the download to only certain years, use the YEARS variable:
make YEARS="2010 2011 2012 2013 2014"
In this example, only transaction data from 2010-14 will be downloaded.
Short version: run one of these commands to download the sales data and load it into a local database:
make mysql USER=me PASS=pass make postgresql USER=me make sqlite
tax_class will be created.
The conversion to MySQL tries to split out the apartment number part of the address field, but doesn't do any other processing beside formatting prices and dates.
Load sales data into a MySQL database:
make mysql USER=username PASS=password
This will try to connect to
localhost and create a database named
nycre. You can customize the database name and add any other mysql flags you might need like so:
make mysql DATABASE=mydatabase MYSQLFLAGS="-H myhost.com -P 5432"
You can leave off the password, you'll be prompted several times to enter it. If your account doesn't have a password, I judge your security practices, but you can run:
make mysql USER=username PASSFLAG=.
Tested with PostgreSQL v9.4, will likely work on lower versions.
make postgresql USER=myusername
This assumes that you don't require a password for access. If you do require a password, add the -W flag. You will be prompted to enter the password approximately 60 times (consider temporarily disabling the password requirement).
make postgresql USER=myusername PSQLFLAGS=-W
The data will be loaded into a new database named
nycre. Customize this and add any additional flags like so:
make postgresql DATABASE=mydatabase PSQLFLAGS="-h myhost.com -p 5432"
Requires SQLite v3.7.15 or higher.
make sqlite downloads all available sales data (2003-) and loads it into an SQLite file named
Due to the limitations of SQLite, addresses and apartment numbers are not as well parsed as in the MySQL commands.
Bouncing back from errors
If there's a problem of some kind and your database is damaged, you might want to start over. Run one of these to completely delete the database:
make mysqlclean make postgresqlclean rm nycre.db # for sqlite