Download NYC real estate transaction data and drop it in a database
Makefile
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
LICENSE
Makefile
readme.md
requirements.txt

readme.md

ACRIS Downloader

This Makefile downloads NYC property transfer data and optionally loads it into a database.

It's designed for people who know how to use databases, but don't necessarily want to slog through downloading huge files, manually setting up a schema and importing the those files.

Currently, SQLite, MySQL and PostGreSQL are supported. If you want to use other database software, you already probably know enough to customize the Makefile. It shouldn't be harder than changing a few flags.

the data

The ACRIS data set is big and complicated, see ACRIS Datasets below for some explanatory notes.

The Department of Finance supposedly updates the online records regularly, do you might use this Makefile, along with a cron job, to regularly update a mirror of their database.

Requirements

At least 10 GB of free disk space for the data and:

  • csvkit, a Python package
  • MySQL, SQLite or PostGreSQL

Installation

Download (or git clone) this repository and open the folder in your terminal.

To install MySQL, start here.

To install csvkit, follow the instructions in the csvkit docs, or try one of these:

# If you have admin privileges
sudo make install

# If you don't have admin privileges. Might not work.
make install INSTALLFLAGS=--user

Downloading the data

Run the following command:

make

The data/ folder will slowly fill up with files. If you want to work directly with CSVs, you're done.

MySQL

Check that you have mysql up and running on your machine, and a user capable of creating databases. Don't use root!

make mysql USER=username PASS=mypass

(If you don't want to type your password in plaintext, you can leave off the PASS argument. You'll just have to enter your password many times.)

This will run the following tasks:

  • download the ACRIS real property datasets in CSV format (it will be slow)
  • dedupe the CSVs and reformat them slightly
  • generate schemas for the new MySQL tables
  • Create a new MySQL database (acris) and import the data into several tables
  • Add indices to sensible fields in each table. You may find it profitable to add more indices yourself.

If the downloads are interrupted, just run the command again. That's the power of make!

By default, only the real property datasets will be downloaded. To download and create tables for the personal property datasets:

make mysql_personal USER=myuser PASS=mypass

The ACRIS dataset also includes voluminous cross-reference and remarks files that aren't downloaded by default. To download them and load into MySQL:

make mysql_real_complete USER=mysqluser PASS=mysqlpass
make mysql_personal_complete USER=mysqluser PASS=mysqlpass

Using an existing database

If you want to add the data to tables in an existing database, run:

make DATABASE=mydb USER=myuser PASS=mypass

If you have other connection requirements:

make DATABASE=mydb USER=myuser PASS=mypass HOST=example.com MYSQLFLAGS="--port=123 --example-flag"

SQLite

This command will create acris.db, a database containing the real property datasets.

make sqlite

Download and load even more data into acris.db:

make sqlite_real_complete
make sqlite_personal_complete

PostGreSQL

make psql USER=username

Even more:

make psql_real_complete USER=username
make psql_personal_complete USER=username

Add custom connection paramaters:

make psql_real_complete USER=username PSQLFLAGS="--host=foo.com"

ACRIS Datasets

(The following is a reformatted version of a document published by NYC Department of Finance.)

ACRIS has two types of documents:

Real Property Records include documents in the Deeds and Other Conveyance, Mortgages & Instruments and other documents classes in ACRIS. These documents typically impact rights to real property and as such follow the real property rather than an individual.

Personal Property Records include documents in the UCC and Federal Liens class in ACRIS. These documents typically impact rights to personal property associated with real property and as such follow the individual party rather than the real property.

Each Real Property Record or Personal Property Record contains:

  • A master record
  • Zero or more lot(property) records
  • Zero or more party records
  • Zero or more cross-reference records
  • Zero or more remarks records

The Document ID in the master record is used to link all other record types to a master record. To find all of the lot (property) records associated with a master record, simply retrieve all records in the property dataset with the same “Document ID” as the selected master record. The same process should be repeated for Party, Cross Reference and Remark records.

Real property records

Personal property records

Code mappings

In ACRIS, documents are stored with codes representing longer descriptions that are displayed on images generated by ACRIS and in Document Search. The translation from these codes is done via the following code look up tables:

Known issues

There's a bug in how csvkit <=0.9.1 handles fields that contain only the letter 'A' - they're converted into dates. This will break the recordtype column in certain tables.

License

General Public License version 3