Skip to content

Dataset: DCP Housing Database

Joanne Ramadani edited this page Jan 20, 2024 · 3 revisions

DCP Housing Database

The dcp_housingdb dataset can be added to NYCDB by running:

> nycdb --download dcp_housingdb
> nycdb --load dcp_housingdb

Provenance

The dataset comes from the NYC Department of City Planning:

  • DCP Housing Database

    The NYC Department of City Planning’s (DCP) Housing Database contains all NYC Department of Buildings (DOB)-approved housing construction and demolition jobs filed or completed in NYC since January 1, 2010. It includes the three primary construction job types that add or remove residential units: new buildings, major alterations, and demolitions, and can be used to determine the change in legal housing units across time and space. Records in the Database are geocoded to the greatest level of precision possible, subject to numerous quality assurance and control checks, recoded for usability, and joined to other housing data sources relevant to city planners and analysts.

Context

DOB Open Data are the primary source for the NYC Department of City Planning’s (DCP) Housing Database, and despite efforts DCP makes to correct errors, there are still unknown inaccuracies throughout the data. The data are administrative in nature, and most of the information associated with a record is provided by the applicant, and not necessarily verified by DCP.

Data dictionary

You can download the data dictionary XLSX.

Click to Expand the dcp_housingdb Data Dictionary
Column Name Column Description Expected/Allowed Values Field Limitations Additional Notes
Job_Number The DOB job application number assigned when the applicant begins the application. This is the unique identifier for the application submitted to the Department of Buildings (DOB). It may contain several work types, and more work types may be added as the application review and the work continues. It is a 9-digit number where the first digit indicates the borough where the building is located. These values are mapped from the dob_jobapplications field jobnumber
Job_Type DOB's type category for the job application. More information is available here. The following types are included in this database:

New Building (NB): an application to build a new structure. “NB” cannot be selected if any existing building elements are to remain—for example a part of an old foundation, a portion of a façade that will be incorporated into the construction, etc.

Alteration Type I (A1): a major alteration that will change the use, egress, or occupancy of the building.

Demolition (DM): an application to fully or partially demolish an existing building. Note that many demolition permits are only for partial demolitions and for garages (these are also captured).
Mapped from dob_jobapplications field jobtype as follows:
A1 to Alteration
DM to Demolition
NB to New Building
Administrative records are removed based on this field in combination with job_description.
ResidFlag This field is used to identify jobs in buildings containing residential uses. A value of "residential" indicates that the job affects residential units in some way through new construction, alteration, or demolition. Only those jobs with a value of "residential" are included in the housing database. Manual research was conducted at DCP to help ensure that all work on buildings with residences receive this flag, though some records may remain misclassified. This field is dependent on hotel_init, hotel_prop, otherb_init, otherb_prop, classa_init, and classa_prop.
If any of the dependent fields are not NULL, then the job is flagged as being residential.
Manual research adds/removes this flag as necessary
NonresidFlag This field is used to identify jobs in buildings containing non-residential uses. A value of "non-residential" indicates that the job affects some use type beyond residential, including commercial, industrial, or community facility uses. Mixed-use buildings will have values in both the Resid_Flag and Nonresid_Flag, since they contain both residential and nonresidential uses, but having flags in both of those columns does not necessarily mean that it is a mixed-use development since the flags do not distinguish between the initial and proposed uses in the building. This field is dependent on job_description, occ_initial, occ_proposed, and resid_flag.
nonresid_flag is true if:
job_description has one of these keywords OR
occ_initial or occ_proposed are one of these categories. OR
resid_flag IS NULL
Job_Inactv* This field is used to identify job applications that are likely inactive, either because the job is withdrawn, is a duplicate of another job, or the application has stalled for three or more years. Records flagged as inactive should be excluded in most analyses of incomplete jobs since they are extremely unlikely to ever reach completion. Note that this definition is probability based and therefore can not capture all permits that may eventually become inactive. Inactive: Withdrawn: The job status is 9. Withdrawn
Inactive: Stalled: A job has a status of Filed Application, Approved Application, OR Permitted for Construction AND date_lastupdt is 3 or more years before the vintage date.
Inactive: Duplicate: A job with a status that is Filed Application, Approved Application, OR Permitted for Construction MATCHES with a job that has a status of either 4. Partially Completed Construction OR 5. Completed Construction ON the following fields:
-classa_init (where units are NOT NULL) AND
-classa_prop (where units are NOT NULL) AND
-address AND
-job_type AND
-the date_lastupdt of the incomplete job is before the date_lastupdt of the complete job.
note: job_inactive can also get set to Inactive: Duplicate through manual research
NULL: Not inactive
*This field only appears in the HousingDB_post2010_inactive_included dataset
Job_Status DCP recode of DOB's status label. This describes the status of the job at the date of the data vintage. For example, a job marked as "3. Permitted" was at that status as of June 30, 2020 if using version 20Q2 of the DCP Housing Database. More details on each DOB status is available here:

https://www1.nyc.gov/assets/buildings/pdf/bisjobstatus.pdf

Jobs typically move through status A through X over time as they reach certain approval milestones:

1. Filed: job application is at status A - G at the time of publication. Application submitted, but review is not yet in progress.

2. Plan Examination: application is at status H - P. Plan examination is in progress, but not yet approved.

3. Permitted: application is at status Q and R and may begin construction.

4. Partial Complete: application at status U and X, and CO issued for NB or A1 job type, and the CO is a Temporary CO AND less than 80% of the units are completed for a building with 20 or more units.

5. Complete: For new buildings and alterations, application is at status U and X, or a CO has been issued. For demolitions, the application is at status X. DCP has decided to mark demolitions as complete when they reach status X, but list the completion date as equal to status Q because this is likely when the building must be vacated, and it appears that many buildings are physically demolished some time before receiving sign off (status X).

9. Withdrawn: application is at status 3. The application has been withdrawn by the applicant.
This field is dependent on on job_type, co_latest_certtype, classa_complt_pct, classa_complt_diff, classa_net, x_withdrawal, and the date fields.
CompltYear Year the job was completed. For new buildings and alterations, this is defined as the year of the first certificate of occupancy issuance. For demolitions, this is the year that the demolition was permitted (reached status Q). Dependent on date_complete, this field is the year of date_complete.
PermitYear Year the job was permitted. For all job types, this is defined as the status Q year. Dependent on date_permittd, this field is the year of date_permittd
ClassAInit Number of units that initially existed in the building at the time of the job application, as reported by the applicant. This field is edited by DCP to only count Class A units, which are units in houses or apartment buildings intended for long-term residential use (greater than 30 days), and typically do not require the use of shared kitchens and bathrooms. The definition of Class A and other unit types is available here:

https://www1.nyc.gov/assets/buildings/pdf/MultipleDwellingLaw.pdf
Mapped from the dob_jobapplications field existingdwellingunits
Set to 0 for New Buildings
ClassAProp Number of units proposed in the job application after the proposed work has been completed, as reported by the applicant and edited by DCP to count only Class A units. Mapped from the dob_jobapplications field proposeddwellingunits
Set to 0 for Demolitions
ClassANet Net change in Class A unit count between the number of units existing at the time of application and the number of units proposed. Dependent on classa_init and classa_prop
The difference between classa_init and classa_prop
HotelInit Number of hotel units that initially existed in the building at the time of the job application, as determined through DCP research. Note that hotel units are a subtype of Class B units which are typically occupied as for-profit businesses for short-term (less than 30 days), full-service lodging. This field is not populated programatically. All values are directly from the manual research table.
Hotel_Prop Number of hotel units proposed in the job application after the proposed work has been completed, as determined through DCP research. This field is not populated programatically. All values are directly from the manual research table.
OtherBInit Number of Class B units (excluding hotel units) that initially existed in building at the time of the job application, as determined through DCP research. Other Class B units include all dwellings that are not Class A units or hotels, and may include single room occupancy units, dormitories, certain kinds of supportive housing and assisted living, homeless shelters, convents and monasteries, among many other forms of temporary lodging or lodging with communal kitchens or bathrooms. The definition of Class B and other unit types is available here:

https://www1.nyc.gov/assets/buildings/pdf/MultipleDwellingLaw.pdf
This field is not populated programatically. All values are directly from the manual research table.
OtherBProp Number of Class B units (excluding hotel units) proposed in the job application after the proposed work has been completed, as determined through DCP research. This field is not populated programatically. All values are directly from the manual research table.
Units_CO Number of dwelling units provided on the temporary or final certificates of occupancy. This unit count may include Class A units in addition to Class B units and hotel units. Mapped from dob_cofos field numofdwellingunits
Boro The NYC borough code where the proposed work will take place. 1 = Manhattan
2 = Bronx
3 = Brooklyn
4 = Queens
5 = Staten Island
BIN Building Identification Number (BIN), supplied by Geosupport, that identifies each unique building in the city. A 7 digit string starting with the borough code and 6 digit unique building number
BBL Borough-Block-Lot tax ID number of the parcel where the proposed work will take place.
AddressNum The house number for the building where the proposed work will take place. This is the house number returned by Geosupport during geocoding. If, however, Geosupport did not return address information, then this is the housenumber from dob_jobapplications.
AddressSt The street name for the building where the proposed work will take place. This is the street name returned by Geosupport during geocoding. If, however, Geosupport did not return address information, then this is the streetname from dob_jobapplications.
Occ_Init Description of the existing occupancy type at the time of the job application. This indicates what a site was used for before the proposed job. It is a more descriptive recode of the occupancy code that the applicant submitted to DOB as the initial occupancy type of the building. Note that the applicant may only choose one occupancy code, even in a building containing multiple uses, so this code is only able to describe one of many possible uses. This field is mapped from the dob_jobapplications field existingoccupancy and translated using the lookup table:

https://github.com/NYCPlanning/db-developments/blob/dev/developments_build/data/lookup_occ.csv
Occ_Prop Description of the proposed occupancy type at the time of the job application. This indicates what a site will be used for after the proposed job is complete. It is a more descriptive recode of the occupancy code that the applicant submitted to DOB as the proposed occupancy type of the building. Note that the applicant may only choose one occupancy code, even in a building containing multiple uses, so this code is only able to describe one of many possible uses. This field is mapped from the dob_jobapplications field proposedoccupancy and translated using the lookup table:

https://github.com/NYCPlanning/db-developments/blob/dev/developments_build/data/lookup_occ.csv
Bldg_Class Building Class code. A code used by Department of Finance (DOF) and Department of Buildings (DOB) describing the major use of a structure. For more information see the Building Classifiaction code list on DOF's website: https://www1.nyc.gov/assets/finance/jump/hlpbldgcode.html
Job_Desc The general description of the work being applied for. This field is free text, and is filled out by the applicant. This field is mapped from the dob_jobapplications field jobdescription.
Administrative records, or jobs with no actual construction work, are removed based on this field in combination with job_type.
Test records are removed based on this field.
DateFiled Date of job status A (pre-filing application). This is the first step in the process for all job applications. The job application # is assigned at this status. This occurs when the applicant submits any part of the application (even a single form), in person or electronically. This field is mapped from the dob_jobapplications field prefilingdate
DatePermit Date of job status Q (first partial permit issuance). This is when construction work may begin. This field should be used for identifying the number of permits approved in a given year. This field is mapped from the earliest date in the dob_permitissuance field issuancedate
DateLstUpd The date of the last update to the DOB record for the job filing. This field is mapped from the dob_jobapplications field latestactiondate
DateComplt DCP's best estimate of completion date for all jobs. For new buildings and alterations, date complete is equal to the date of the earliest certificate of occupancy. For demolitions, date complete is equal to status Q (permit issued), since demolitions do not receive certificates of occupancy. Blank indicates no certificate of occupancy has been issued. Typically, a building can be considered complete at this stage. Large buildings with many units may have units receiving certificates of occupancy over a longer period of time. This field depends on job_type, date_permitted, and date_statusx
For New Buildings and Alterations, this field is mapped from the earliest effectivedate from dob_cofos
For Demolitions, this field is mapped from date_permitted for as long as date_statusx IS NOT NULL
ZoningDist1 The primary zoning district of the tax lot per the applicant at time of application. For more information see http://www1.nyc.gov/site/planning/zoning/about-zoning.page. This field is the dob_jobapplications field zoningdist1
ZoningDist2 The secondary zoning district of the tax lot per the applicant at time of application. This field is the dob_jobapplications field zoningdist2
ZoningDist3 The tertiary zoning district of the tax lot per the applicant at time of application. This field is the dob_jobapplications field zoningdist3
SpecialDist1 The primary special zoning district of the tax lot per the applicant at time of application. Other zoning designations may appear in this field, such as industrial business zones (IBZ), mandatory inclusionary housing (MIH) areas, or other zoning designations. This field is provided by the applicant, and is likely inconsistent. This field is mapped from the dob_jobapplications field specialdistrict1
SpecialDist2 The secondary special zoning district of the tax lot per the applicant at time of application. This field is mapped from the dob_jobapplications field specialdistrict2
Landmark Indicates that the building has been designated as a landmark building by the Landmarks Preservation Commission. This field is mapped from the dob_jobapplications field landmarked
FloorsInit The existing number of stories/floors in the building, as reported by the applicant. This field depends on job_type
This field is mapped from the dob_jobapplications field existingnumstories if the record is an Alteration or Demolition
Values of '0' are replaced with NULL
FloorsProp The number of stories/floors in the building after the work is done, as reported by the applicant. This field depends on job_type
This field is mapped from the dob_jobapplications field proposednumstories if the record is an Alteration or New Building
Values of '0' are replaced with NULL
Enlargemnt This indicates if the work to be done under the application will result in a horizontal and/or vertical enlargement, as reported by the applicant. Values include: Horizontal, Vertical, Horizontal and Vertical, [blank]. This field is mapped from the dob_jobapplications fields horizontalenlrgmt and verticalenlrgmt
When horizontalenlrgmt = 'Y' AND verticalenlrgmt <> 'Y' then 'Horizontal'
When horizontalenlrgmt <> 'Y' AND verticalenlrgmt = 'Y' then 'Vertical'
When horizontalenlrgmt = 'Y' AND verticalenlrgmt = 'Y' then 'Horizontal and Vertical'
Ownership This indicates whether the property is government owned, the ownership structure, and non-profit status, as reported by the applicant and recoded by DCP. This field is mapped from dob_jobapplications fields cityowned, ownertype, and nonprofit
The three input values are translated into a single value using this lookup table:

https://github.com/NYCPlanning/db-developments/blob/dev/developments_build/data/lookup_ownership.csv
CenBlock20 2020 Census Block FIPS code. This is a reformatted version of bctcb2020 that uses county FIPS instead of one-digit borough code.
CenTract20 2020 Census Tract FIPS code. This is a reformatted version of bct2020 that uses county FIPS instead of one-digit borough code.
BCTCB2020 The Borough - 2020 Census Tract - Census Block code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with Census Block boundaries.
BCT2020 The Borough - 2020 Census Tract code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with Census Tract boundaries.
NTA2020 2020 Neighborhood Tabulation Area (NTA) code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with 2020 NTA boundaries.
NTAName20 2020 Neighborhood Tabulation Area (NTA) descriptive name. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with 2020 NTA boundaries.
CDTA2020 2020 Community District Tabulation Area (NTA) code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with 2020 CDTA boundaries.
CDTAName20 2020 Community District Tabulation Area (NTA) descriptive name. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with 2020 CDTA boundaries.
ComunityDist NYC Community District code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with community district boundaries.
CouncilDist NYC City Council District code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with city council district boundaries.
SchoolSubDist NYC School Subdistrict code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with school subdistrict boundaries.
SchoolCommnty NYC Community School District code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with community school district boundaries.
SchoolElmntry NYC Elementary School Zone code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with elementary school zone boundaries.
SchoolMiddle NYC Middle School Zone code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with middle school zone boundaries.
FireCmpany NYC Fire Company code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with fire company boundaries.
FireBattln NYC Fire Battalion code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with fire battalion boundaries.
FireDivsn NYC Fire Division code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with fire division boundaries.
PolicePcnt NYC Police Precinct code. This field comes from Geosupport, as long as the address returns a value and the record is not in TPAD. Otherwise, the value is generated from a spatial join with police precinct boundaries.
PL_FIRM07 A value of 1 means that some portion of the tax lot falls within the 1% annual chance floodplain as determined by FEMA’s 2007 Flood Insurance Rate Map. Note that buildings on the tax lot may or may not be in the portion of the tax lot that is within the 1% annual chance floodplain. This field is mapped from dcp_mappluto which is the clipped version of MapPLUTO, by joining on the geo_bbl to the PLUTO bbl PL is short for PLUTO
PL_PFIRM15 A value of 1 means that some portion of the tax lot falls within the 1% annual chance floodplain as determined by FEMA’s 2015 Preliminary Flood Insurance Rate Map. Note that buildings on the tax lot may or may not be in the portion of the tax lot that is within the 1% annual chance floodplain. This field is mapped from dcp_mappluto which is the clipped version of MapPLUTO, by joining on the geo_bbl to the PLUTO bbl PL is short for PLUTO
Latitude Latitude in WGS84 / SRID:4326 The latitude and longitude value are assigned in the following order. The associated geomsource values are in parentheses:
DoITT building footprints center point using dob_bin as a join_key (BIN DOB buildingfootprints or BIN DOB buildingfootprints (historical))
DoITT building footprints center point using geo_bin as a join_key
Returned from GeoSupport 1B function (Lat/Lon DCP, Lat/Lon geosupport)
BBL centroid from shoreline clipped MapPLUTO using dob_bbl as a join_key (BBL DOB MapPLUTO)
latitude and longitude from dob_jobapplications (Lat/Lon DOB)
Longitude Longitude in WGS84 / SRID:4326 See the description in latitude above
GeomSource Source of the geographic coordinates for the record. This field is dependent on latitude and longitude
It is assigned as follows:
BIN DOB buildingfootprints: Geometry is the centroid of the DOITT building footprint that has the same BIN as the DOB job BIN
BIN DCP geosupport: Geometry is the centroid of the DOITT building footprint that has the same BIN as the BIN returned by Geosupport
Lat/Lon geosupport: Geometry is created from the latitude and longitude values returned by Geosupport
BBL DOB MapPLUTO: Geometry is the centroid of the Mappluto lot that has the same BBL as the DOB job BBL
BIN DOB buildingfootprints (historical): Geometry is the centroid of the historical DOITT building footprint that has the same BIN as the DOB job BIN
Lat/Lon DOB: Geometry is created from the latitude and longitude values in the DOB source data
Lat/Lon DCP: Geometry is created from latitude and longitude provided in the manual corrections file
DCPEdited List of fields that were edited by DCP. This only lists fields where original DOB source data was overwritten by DCP, and doesn't include fields where DCP recodes data as part of the standard methodology. Whenever a field is altered by a manual correction (as opposed to the programatic logic described in this Data Dictionary), the field name gets added to dcpeditfields
Version Version of the Developments Database.

Tables

This dataset has the following tables:

  • dcp_housingdb is the only table in this dataset.

Example queries

Following are some useful SQL queries related to the dataset.

Get top 50 longest complete alterations

The dcp_housingdb table has information on all NYC Department of Buildings (DOB)-approved housing construction and demolition jobs filed or completed in NYC since January 1, 2010. dcp_housingdb contains, in particular, information on type of occupancy, length of time, type of job, and the status of the job, which can tell you about how long certain renovations in residential buildings can go on. There are two types of date considerations, which are the columns compltyear and permityear as integers, and datefiled and datecomplt as datetime objects. For our purposes, we're using the integer columns as the datecomplt column is not always filled in.

After creating a new column with the length of time, we filter the dataset by types and ownership in order to gain specific insight into types of buildings. Finally, we order the created column in descending order and limit it to 50 rows in order to see the most egregious lengths of time. This script can be modified to answer similar questions by changing the filters, order by, or limit.

select *, compltyear-permityear as constlen from dcp_housingdb
where occprop = 'Residential: 3 or More Units (R-2)'
and jobtype = 'Alteration'
and ownership = 'Private For-Profit: Corporation'
and jobstatus = '5. Completed Construction'
order by constlen desc
limit (50)

Retrieve total amount of alterations in residential buildings grouped by status

As a small preliminary note, the dcp_housing db dataset is updated annually around June for the previous year. With the current data that we have, we can retrieve aggregated amounts of filings that indicate the project's status. In this particular example, we're also filtering by jobtype, occprop, and ownership, which will tell us about a specific subset of projects. We then group the remaining rows by their job status and list them in descending order to understand which job statuses are more numerous.

select count(*), jobstatus from dcp_housingdb
where occprop = 'Residential: 3 or More Units (R-2)'
and jobtype = 'Alteration'
and ownership = 'Private For-Profit: Corporation'
group by jobstatus
order by count desc

Known Issues and Data Pitfalls

The DCP Housing Database is based on DOB Open Data to identify housing construction. However, not all construction in NYC occurs legally by approval through the DOB. Illegal conversions of the housing stock may be prevalent in certain neighborhoods but are not represented in this data.

DCP uses information such as job description, occupancy codes, and unit change to differentiate hotels and class B units from class A housing units. However, there may be hotels and class B units that DCP could not identify using these means.