This Makefile downloads NYC property transfer data and optionally loads it into a database.
It's designed for people who know how to use databases, but don't necessarily want to slog through downloading huge files, manually setting up a schema and importing the those files.
Currently, SQLite, MySQL and PostGreSQL are supported. If you want to use other database software, you already probably know enough to customize the Makefile. It shouldn't be harder than changing a few flags.
The ACRIS data set is big and complicated, see
ACRIS Datasets below for some explanatory notes.
The Department of Finance supposedly updates the online records regularly, so you might use this Makefile, along with a cron job, to regularly update a mirror of their database.
At least 10 GB of free disk space for the data and:
- csvkit, a Python package
- MySQL, SQLite or PostGreSQL
$ docker-compose run --rm setup
Go out for happy hour, this will take some time.
When it finishes, use either of the access methods below.
You can also set an env var ACRIS_DATASET, see docker-compose.yml for options, e.g.
$ ACRIS_DATASET=mysql_personal docker-compose run --rm setup
Direct DB access
$ docker-compose up db
$ docker-compose exec db mysql -uroot -ppass
$ docker-compose up myadmin
In a browser got to http://localhost:8080 user: root, pass: pass
When finished with either access method, shut down:
$ docker-compose down
To reset the database, delete ./data/mysql/ To reset the downloads, delete ./data/downloads/
git clone) this repository and open the folder in your terminal.
To install MySQL, start here.
To install csvkit, follow the instructions in the csvkit docs, or try one of these:
# If you have admin privileges sudo make install # If you don't have admin privileges. Might not work. make install INSTALLFLAGS=--user
Downloading the data
Run the following command:
data/ folder will slowly fill up with files. If you want to work directly with CSVs, you're done.
Check that you have mysql up and running on your machine, and a user capable of creating databases. Don't use root!
make mysql USER=username PASS=mypass
(If you don't want to type your password in plaintext, you can leave off the PASS argument. You'll just have to enter your password many times.)
This will run the following tasks:
- download the ACRIS real property datasets in CSV format (it will be slow)
- dedupe the CSVs and reformat them slightly
- generate schemas for the new MySQL tables
- Create a new MySQL database (
acris) and import the data into several tables
- Add indices to sensible fields in each table. You may find it profitable to add more indices yourself.
If the downloads are interrupted, just run the command again. That's the power of make!
By default, only the real property datasets will be downloaded. To download and create tables for the personal property datasets:
make mysql_personal USER=myuser PASS=mypass
The ACRIS dataset also includes voluminous cross-reference and remarks files that aren't downloaded by default. To download them and load into MySQL:
make mysql_real_complete USER=mysqluser PASS=mysqlpass make mysql_personal_complete USER=mysqluser PASS=mysqlpass
Using an existing database
If you want to add the data to tables in an existing database, run:
make DATABASE=mydb USER=myuser PASS=mypass
If you have other connection requirements:
make DATABASE=mydb USER=myuser PASS=mypass HOST=example.com MYSQLFLAGS="--port=123 --example-flag"
This command will create
acris.db, a database containing the real property datasets.
Download and load even more data into
make sqlite_real_complete make sqlite_personal_complete
make psql USER=username
make psql_real_complete USER=username make psql_personal_complete USER=username
Add custom connection paramaters:
make psql_real_complete USER=username PSQLFLAGS="--host=foo.com"
(The following is a reformatted version of a document published by NYC Department of Finance.)
ACRIS has two types of documents:
Real Property Records include documents in the Deeds and Other Conveyance, Mortgages & Instruments and other documents classes in ACRIS. These documents typically impact rights to real property and as such follow the real property rather than an individual.
Personal Property Records include documents in the UCC and Federal Liens class in ACRIS. These documents typically impact rights to personal property associated with real property and as such follow the individual party rather than the real property.
Each Real Property Record or Personal Property Record contains:
- A master record
- Zero or more lot(property) records
- Zero or more party records
- Zero or more cross-reference records
- Zero or more remarks records
Document ID in the master record is used to link all other record types to a master record. To find all of the lot (property) records associated with a master record, simply retrieve all records in the
property dataset with the same “Document ID” as the selected master record. The same process should be repeated for Party, Cross Reference and Remark records.
Real property records
- Real Property Master (document details)
- Real Property Legals (property details)
- Real Property Parties (party names and addresses)
- Real Property References
- Real Property Remarks
Personal property records
- Personal Property Master (document details)
- Personal Property Legals (property details)
- Personal Property Parties (party names and addresses)
- Personal Property References
- Personal Property Remarks
In ACRIS, documents are stored with codes representing longer descriptions that are displayed on images generated by ACRIS and in Document Search. The translation from these codes is done via the following code look up tables:
- Document Control Codes - codes in the real and personal property master datasets
- UCC Collateral Codes - codes in the personal property master dataset
- Property Types Codes - codes in the personal and real property legals datasets
- States Codes - codes in the real and personal parties property datasets
- Country Codes - codes in the real and personal parties property datasets
This example query selects all the transactions for a particular property in Brooklyn. Multiple joins are required to the
real_property_parties table, as there are two (or more) parties per transaction.
SELECT streetnumber, streetname, documentid, c.description, m.recordtype, d.doctypedescription, docdate, docamount, d.party1type, p1.name party1name, d.party2type, p2.name party2name FROM real_property_legals a LEFT JOIN real_property_master m USING (documentid) LEFT JOIN real_property_parties p1 USING (documentid) LEFT JOIN real_property_parties p2 USING (documentid) LEFT JOIN property_type_codes c USING (propertytype) LEFT JOIN document_control_codes d USING (doctype) WHERE a.lot = 65 AND a.borough = 3 AND a.block = 429 and p1.partytype = 1 AND p2.partytype = 2;
There's a bug in how csvkit <=0.9.1 handles fields that contain only the letter 'A' - they're converted into dates. This will break the recordtype column in certain tables.