Skip to content

Dataset: Marshal Evictions

Sam Gartenstein edited this page Jan 20, 2024 · 6 revisions

marshal_evictions

The marshal_evictions dataset can be added to NYCDB by running:

nycdb --download marshal_evictions
nycdb --load marshal_evictions

Provenance

The dataset's original source is the NYC open data portal's Evictions dataset, which is published by the NYC Department of Investigation (DOI). However, since the data published on the NYC open data portal contains very messy addresses and cannot be matched easily to buildings, we rely on some data processing from the ANHD DAP Portal which cleans, deduplicates, and geocodes the raw data from DOI and provides the processed data via API.

The code that ANHD uses to process the data is open source on Github. This code does three things:

  1. It cleans the evictionaddress field from the DOI data using this python script.
  2. It deduplicates any rows that have the same exact evictionaddress, evictionaptnum, executeddate, and marshallastname.
  3. It appends a bbl code to each eviction record using the GeoSearch API provided by NYC Planning Labs. Given the process of cleaning the eviction addresses, GeoSearch typically is able to find a bbl code for about 96% of eviction records from the raw data. See below why having the bbl code is so important.

Beyond the source code, you can also learn more about the ANHD DAP Portal API through their API documentation.

Data dictionary

You can download the data dictionary PDF from NYC Open Data, which provides a decent amount of detail on all of the original columns that DOI provides.

ANHD's DAP Portal also adds a few columns:

  • bbl is the 10-digit padded Borough Block Lot (BBL) number for the record. Having this bbl code allows us to connect this eviction data with all of the other building-level datasets in nycdb.
  • cleanedaddress is the result of running the original value for evictionaddress through ANHD's address cleaning python script. The goal of the script is to standardize these eviction addresses while not actually changing the location they refer to.
  • geosearchaddress is the address recognized by the GeoSearch geocoding service that ANHD DAP Portal uses to find a bbl value for each address. This field is only populated if it differs from the cleanedaddress.
  • uniqueid is a concatenation of evictionaddress, evictionaptnum, executeddate, and marshallastname, which you can treat as a unique identifier for each eviction record.

Tables

This dataset has the following tables:

  • marshal_evictions_all contains all eviction data from January 1st, 2017 to the current day. This data updates daily.
  • marshal_evictions_17, marshal_evictions_18, and marshal_evictions_19 are tables that just have eviction data for a given year. For 2017 and 2018, some pieces of the cleaning, geocoding, and deduplication process were done manually, or using other tools like R. The 2019 data was pulled directly from the ANHD DAP Portal API described above. Other than some potential minor discrepancies, this data should be the same data found in the marshal_evictions_all table.

Example queries

Following are some useful SQL queries related to the dataset.

Getting all residential eviction data in history

Remember, the full dataset contains both residential AND commercial evictions

SELECT *
FROM marshal_evictions_all
WHERE residentialcommercialind = 'RESIDENTIAL'

Finding the top buildings in the Bronx with the most residential evictions since 2017

SELECT bbl, count(*) AS total_evictions
FROM marshal_evictions_all
WHERE residentialcommercialind = 'RESIDENTIAL' AND BOROUGH = 'BRONX'
GROUP BY bbl
ORDER BY total_evictions DESC

Finding the names of Court Marshals who evicted folks in 2020

SELECT concat(marshalfirstname,' ',marshallastname) AS marshalfullname
FROM marshal_evictions_all
WHERE executeddate >= '2020-01-01' AND executeddate < '2021-01-01'
GROUP BY marshalfullname

Finding the count of evictions by year (most recent first)

SELECT date_part('year',executeddate) as year, COUNT(*)
FROM marshal_evictions_all
GROUP BY year
ORDER BY year DESC