## Project 5
- created 5-16-22 by GTP
- https://docs.google.com/document/d/1LIJTlCsx54zIG5sOX3heSj00YqdzhZ9c/edit
- *Description*: BSEED has entered a lot of data into free text fields within Accela. Would be useful to find ways to scrape and organize this data so it is useable. Unit data and Certificates of Occupancy are some of our biggest gaps. This might be a way to use administrative data to version and validate 2020 data.
- Technical Skill Level: Medium-High. Skilled at applying Regex to text strings using SQL and/or Python. Experience working with geospatial data, in ArcGIS or otherwise.
- Scope: There are 595 records in the Certificates of Occupancy dataset and 5,930 records in the Certificates of Compliance dataset. Depending on skill level, this could take 6-8 weeks.
- Inputs: Certificates of Compliance, Certificates of Occupancy, Rental Registration data
- General Process:
- Use GIS or Base Units Explorer tool to link Certificates of Occupancy to specific building ids, to create timestamps for when a building was ready for occupants.
- Geocode the addresses in the Certificate of Compliance and Rental Registration datasets and note any addresses that can’t be matched through a manual rematching process and may be missing altogether from the database.


In [1]:
#import data libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
import numbers
import decimal
#import data science packages
import scipy
import scipy.stats as stats

np.random.seed(222)
%matplotlib inline

In [2]:
import re

In [3]:
#import geographic analysis libraries
import geopandas as gpd
from geopandas import GeoDataFrame
import shapely as shp
from shapely.geometry import Point
from shapely.geometry import shape
import os
import re
from fiona.crs import from_epsg
import pysal as ps

You can install them with  `pip install urbanaccess pandana` or `conda install -c udst pandana urbanaccess`
  warn(
  from .sqlite import head_to_sql, start_sql


In [4]:
#set crs for entire analysis
crs = {'init': 'epsg:4326'}

### data sources

Certificates of Occupancy: https://data.detroitmi.gov/datasets/certificates-of-occupancy-1/explore
- BSEED says that new building (or a rehabbed / renovated building) has satisfied their requirements for habitation, and people can move in / it is ready for occupancy
- note: alice says that this can be issued for individual floors 
- _goal_: deliverable should be a table that is the certificate of occupancy number (record_id) and building footprint ids - sometimes these are 1 to 1 and then sometimes multiple occupancy numbers might relate to a single id
- the census challenge is interested in having this as a record to when exactly a new building was technically 'habitable' - the "birthdate" of the property in terms of occupancy

Certificates of Compliance: https://data.detroitmi.gov/datasets/certificates-of-compliance-1/explore
- this is for properties to be certified as 'compliant' by the city
- Alice has access to the dataset of compliance that has 'description' - which should contain additional details..?
- I think this will be more trying to geocode the ones that don't have lat/lon
- _goal_: there's 33 that didn't geocode - goal is to geocode these and then give description if couldn't geocode

Rental Registrations: https://data.detroitmi.gov/datasets/rental-statuses-1/explore
- (6-1-22): I'll address this next week with Alice on our next call

Base Units: https://base-units-detroitmi.hub.arcgis.com/datasets/detroitmi::units-1/about
- jimmy mcbroom put this together

https://cityofdetroit.github.io/base-unit-tools/explorer

In [5]:
compliance_gdf = gpd.read_file('../data/Certificates_Of_Compliance/Certificates_Of_Compliance.shp')

In [6]:
len(compliance_gdf)

6071

In [7]:
compliance_gdf.sample(5)

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,task,status,record_sta,parcel_id,lon,lat,ObjectId,geometry
5254,PMB2021-02475,16933,,LITTLEFIELD,,Issue CofC,Issued,2022-02-01,22027203.,-83.1768,42.416109,5255,POINT (-83.17680 42.41611)
2656,PMB2009-05106,9012,,ROHNS,,Issue CofC,Issued,2021-12-08,19009268-9,-83.013897,42.395935,2657,POINT (-83.01390 42.39594)
385,PMB2004-07242,16600,,FIVE POINTS,,Issue CofC,Issued,2022-01-24,22124550.,-83.286121,42.410761,386,POINT (-83.28612 42.41076)
2356,PMB2012-04765,15808,,CHAPEL,,Issue CofC,Issued,2022-05-12,22111339-40,-83.253106,42.406001,2357,POINT (-83.25311 42.40600)
4493,PMB2018-06422,4814,,CABOT,,Issue CofC,Issued,2019-12-17,20007814.,-83.147358,42.327188,4494,POINT (-83.14736 42.32719)


In [8]:
len(compliance_gdf[compliance_gdf['geometry'].isna()])

33

In [9]:
compliance_gdf[compliance_gdf['geometry'].isna()]

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,task,status,record_sta,parcel_id,lon,lat,ObjectId,geometry
40,PMB2003-01577,924,E,LAFAYETTE,,Issue CofC,Issued,2021-08-24,,,,41,
55,PMB2003-01845,2671,,LAFAYETTE,,Issue CofC,Issued,2021-06-09,,,,56,
141,PMB2004-03507,9658,,NORTHLAWN,,Issue CofC,Issued,2019-11-01,,,,142,
165,PMB2004-04234,287,,EDSEL FORD,,Issue CofC,Issued,2021-03-08,,,,166,
260,PMB2004-14249,1387,,LARNED,,Issue CofC,Issued,2021-05-19,,,,261,
831,PMB2004-10797,4727,,THIRD,,Issue CofC,Issued,2020-01-30,,,,832,
833,PMB2004-10802,930,W,FOREST,,Issue CofC,Issued,2020-01-30,,,,834,
843,PMB2004-10938,13401,E,SEVEN MILE,Rd,Issue CofC,Issued,2019-01-11,,,,844,
1109,PMB2005-14865,1330,,PLUM,,Issue CofC,Issued,2021-06-05,,,,1110,
1140,PMB2005-19368,1511,E,LARNED,,Issue CofC,Issued,2021-07-23,,,,1141,


In [10]:
compliance_gdf.sample(2)

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,task,status,record_sta,parcel_id,lon,lat,ObjectId,geometry
4395,PMB2018-00847,15741,,WARD,,Issue CofC,Issued,2021-09-03,22025162.0,-83.174144,42.406453,4396,POINT (-83.17414 42.40645)
1493,PMB2005-08825,6626,,FIRWOOD,,Issue CofC,Issued,2020-01-10,14011887.0,-83.115758,42.353702,1494,POINT (-83.11576 42.35370)


In [11]:
occupancy_gdf = gpd.read_file('../data/Certificates_Of_Occupancy/Certificates_Of_Occupancy.shp')

In [12]:
len(occupancy_gdf[occupancy_gdf['geometry'].isna()])/len(occupancy_gdf)

0.16166666666666665

In [13]:
len(occupancy_gdf[occupancy_gdf['geometry'].isna()])

97

In [14]:
occupancy_gdf[occupancy_gdf['geometry'].isna()]

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,descriptio,status,date_statu,parcel_id,lon,lat,ObjectId,geometry
4,BLD2017-06254,8401,,WOODMONT 14,,"SEE BLD2017-00831\nAKA 8227, 8237, 8243, 8251 ...",CofO Issued,2019-02-23,,,,5,
12,BLD2019-03478,1541,,Fisher Frwy,,AKA 1541 W. Fisher Freeway Unit 29. Per BZA #4...,CofO Issued,2020-08-12,,,,13,
23,BLD2019-04976,2327,,Trumbull,,"AKA 2327 Trumbull Ave. Unit 20. Per BZA #4-18,...",CofO Issued,2020-10-15,,,,24,
25,BLD2018-05987,1230,,LIBRARY,,PERMANENT CERTIFICATE OF OCCUPANCY ISSUED,CofO Issued,2019-04-22,,,,26,
36,BLD2019-00054,2817,,BRUSH,,"ERECT A 4 STORY, TOWNHOME AS PER EPLANS AND CE...",CofO Issued,2021-06-01,,,,37,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
576,BLD2017-09621,286,,ALFRED,,"(AKA 292 ALFRED) ERECT A 3 STORY, TOWNHOUSE W/...",CofO Issued,2021-09-07,,,,577,
582,BLD2019-00679,2807,,Brush,,New residential structure 8 units 4 story tow...,CofO Issued,2021-05-28,,,,583,
592,BLD2020-00477,4501,,St. Aubin,,Revision to BLD2019-02421 per plans.\r\n(Const...,CofO Issued,2021-02-24,,,,593,
595,BLD2019-02529,692,,AMSTERDAM,,Revision to BLD2018-07772 to reflect changes t...,CofO Issued,2020-08-19,,,,596,


In [15]:
occupancy_gdf['descriptio'][occupancy_gdf['record_id']=='BLD2021-02415'].values[0]

'(AKA 3321 Cochrane) Construct (11) unit Rowhouse building and Accessory Garages per BZA (41-19) & (SLU2019-00020) per Plans.\r\n(Permit reviewed under BLD2019-03775)'

In [16]:
occupancy_gdf[occupancy_gdf['record_id']=='BLD2021-02415']

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,descriptio,status,date_statu,parcel_id,lon,lat,ObjectId,geometry
467,BLD2021-02415,3303,,COCHRANE,,(AKA 3321 Cochrane) Construct (11) unit Rowhou...,CofO Issued,2021-10-11,8006537.001,-83.074239,42.339555,468,POINT (-83.07424 42.33956)


In [17]:
len(occupancy_gdf)

600

## Occupancy DF
- notes: descriptio column has free text that we could leverage to fill in empty geometry cells
- "AKA 2327 Trumbull Ave" is an example of the free text - as in, AKA "address" is a common pattern, but this already exists / has been extracted into the street_num / street_nam columns - maybe those just have to be geocoded

- we're looking for the relationship between housing units and certificates of occupancy. That relationship is often mediated by a building id (use https://cityofdetroit.github.io/base-unit-tools/explorer?id=3263&type=buildings&streetview=true) / there's a xwalk that alice will send over

- deliverable should be a table that is the certificate of occupancy number (record_id) and building footprint ids - sometimes these are 1 to 1 and then sometimes multiple occupancy numbers might relate to a single id

- the census challenge is interested in having this as a record to when exactly a new building was technically 'habitable' - the "birthdate" of the property in terms of occupancy

In [18]:
occupancy_gdf_empty = occupancy_gdf[occupancy_gdf['geometry'].isna()]

In [19]:
occupancy_gdf[~occupancy_gdf['parcel_id'].isna()].sample(10)

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,descriptio,status,date_statu,parcel_id,lon,lat,ObjectId,geometry
316,BLD2018-06795,680,,AMSTERDAM,,CONSTRUCT AND ERECT A 5 STORY ACCESSORY PARKIN...,CofO Issued,2020-08-19,04001346.002,-83.076345,42.365318,317,POINT (-83.07634 42.36532)
113,BLD2019-01882,441,,Canfield,,TBD/Interior renovations as per attached docum...,CofO Issued,2019-09-27,02000890.,-83.065518,42.350978,114,POINT (-83.06552 42.35098)
329,BLD2019-03417,9000,,Livernois,,"Per BZA#111-17 & BSEED #12-17 , Add used motor...",CofO Issued,2020-01-31,16017311-3,-83.13806,42.361167,330,POINT (-83.13806 42.36117)
94,BLD2018-07653,10000,,LINWOOD,,(a.k.a. 2470 Collingwood - local address) Chan...,CofO Issued,2019-07-18,10007460.,-83.110907,42.382198,95,POINT (-83.11091 42.38220)
167,BLD2020-02893,12651,,STOUT,,Revision to BLD2020-01402 to reflect Electrica...,CofO Issued,2020-10-12,22101939-46,-83.241149,42.380392,168,POINT (-83.24115 42.38039)
333,BLD2016-08798,2471,,EWALD CIRCLE,,,CofO Issued,2019-07-29,14005949-60,-83.138432,42.391711,334,POINT (-83.13843 42.39171)
446,BLD2019-03717,8126,,GREENFIELD,,Change of Occupancy/Use to Office and alterati...,CofO Issued,2021-01-27,22049495-6,-83.196373,42.352743,447,POINT (-83.19637 42.35274)
543,BLD2017-00001,240,,ALFRED,,"EXPIRES: MAY 10, 2019ERECT A 4 STORY TOWNHOUSE...",CofO Issued,2022-02-03,-501,-83.0517,42.3438,544,POINT (-83.05170 42.34380)
386,BLD2019-06446,465,,SCHAEFER,,"Modify BSEED #36-16, Change of Occupancy to ad...",CofO Issued,2021-02-22,20017996.002,-83.163066,42.285944,387,POINT (-83.16307 42.28594)
404,BLD2019-05993,1375,,Michigan,Ave,Revision to BLD2019-00172 per plans.\r\n(Per S...,CofO Issued,2021-07-30,06000393-9,-83.064965,42.331106,405,POINT (-83.06497 42.33111)


In [20]:
occupancy_gdf_empty[occupancy_gdf_empty['record_id']=='BLD2019-00680']

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,descriptio,status,date_statu,parcel_id,lon,lat,ObjectId,geometry
84,BLD2019-00680,2809,,Brush,,"Erect 4 story , 8 unit townhomes as per eplan...",CofO Issued,2021-04-05,,,,85,


In [21]:
occupancy_gdf_empty['descriptio'][occupancy_gdf_empty['record_id']=='BLD2019-00680'].values

array(['Erect  4 story , 8 unit townhomes as per eplans w/ a certificate of appropriateness'],
      dtype=object)

In [22]:
occupancy_gdf_empty['descriptio'][occupancy_gdf_empty['record_id']=='BLD2020-01564'].values

array(['Modify previous Change of Use Permit to Provisioning Center by adding grow facility; changes to the restroom facilities.'],
      dtype=object)

In [23]:
occupancy_gdf_empty['descriptio'][occupancy_gdf_empty['record_id']=='BLD2019-00033'].values

array(['INTERIOR ALTERATIONS TO ESTABLISH USE FOR TENANT SPACE AS COSMETIC RETAIL\nPERMANENT CERTIFICATE OF OCCUPANCY ISSUED (03-20-2019)'],
      dtype=object)

In [24]:
occupancy_gdf_empty['descriptio'][occupancy_gdf_empty['record_id']=='BLD2020-04413'].values[0]

'Interior alterations per plans.(1500 E. Woodbridge Suite address per plans, Separate Tenant Build-Out Permit required to establish Occupancy). Subject to all Applicable Federal, State, and Local Executive Orders.\r\n(AKA 1583 Franklin)'

- note: this building is at the corner of e. woodbridge and franklin (hence the aka 1583 franklin)

In [25]:
occupancy_gdf_empty[occupancy_gdf_empty['record_id']=='BLD2020-04413']

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,descriptio,status,date_statu,parcel_id,lon,lat,ObjectId,geometry
393,BLD2020-04413,1522,,WOODBRIDGE,,Interior alterations per plans.(1500 E. Woodbr...,CofO Issued,2021-04-12,,,,394,


In [26]:
occupancy_gdf_empty['descriptio'][occupancy_gdf_empty['record_id']=='BLD2019-04976'].values

array(["AKA 2327 Trumbull Ave. Unit 20. Per BZA #4-18, Construct 34' L X 21' W X 37' H Townhouse per plans."],
      dtype=object)

In [27]:
occupancy_gdf_empty['descriptio'][occupancy_gdf_empty['record_id']=='BLD2017-06240'].values

array(['AKA 8032, 8040, 8046, 8056 MEMORIAL. ERECTION OF ONE 4 UNIT ONE STORY WOOD FRAMED TOWNHOUSE AS PER PLANS. SEE BLD2017-00831 FOR MASTER SET OF PLANS.'],
      dtype=object)

## Rental Statuses / Rental Registrations
### Notes:
- https://data.detroitmi.gov/datasets/rental-statuses-1/explore
- no descriptions / free text, so could be a limitation on using that to geocode
- BSEED data overly relies on parcels, so they can't really take into account buildings that have both rentals _and_ owner occupied units, like 120 seward
- so, those mixed buildings will not all be in this data, and the counts will be low. 120 seward only has 2 observations
- bseed probably only cares about building level data i.e. is the building sound and safe
- bseed is city department (https://detroitmi.gov/departments/buildings-safety-engineering-and-environmental-department)

In [28]:
rental_gdf = gpd.read_file('../data/Rental_Statuses/Rental_Statuses.shp')

In [29]:
len(rental_gdf[rental_gdf['geometry'].isna()])/len(rental_gdf)

0.00803870216790339

In [30]:
len(rental_gdf[rental_gdf['geometry'].isna()])

221

In [31]:
len(rental_gdf)

27492

In [35]:
rental_gdf[(rental_gdf['street_num']=='120') & (rental_gdf['street_nam']=='SEWARD')]

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,date_statu,zip,record_typ,owner_name,owner_addr,owner_city,owner_stat,owner_zip,task,status,parcel_id,lon,lat,ObjectId,geometry
8322,REG2020-08775,120,,SEWARD,,2020-12-21,48202,Rental Registration,"THOMPSON, BARRY E",120 SEWARD UNIT 401,DETROIT,MI,48202,Issue Registration,Issued,2001199.0,-83.079031,42.374195,8323,POINT (-83.07903 42.37420)
25324,RNT2020-00032,120,,SEWARD,,2020-01-03,48202,Rental Property Initial Registration,ELIZABETH TINTINALLI,863 BARRINGTON ROAD,GROSSE POINTE PARK,MI,48230,Issue Registration,Issued,2001199.0,-83.079031,42.374195,25325,POINT (-83.07903 42.37420)


- for 18224 hartwell, could be a typo (google maps shows a point right at the lodge and hartwell, without an address there) - we _could_ look donna coulter up in the assessor's db (parcel dataset on open data portal) to see if the address is correct there

In [45]:
rental_gdf[rental_gdf['geometry'].isna()].sample(5)

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,date_statu,zip,record_typ,owner_name,owner_addr,owner_city,owner_stat,owner_zip,task,status,parcel_id,lon,lat,ObjectId,geometry
12290,REG2021-04347,1533.0,,LARNED,,2021-05-21,48207.0,Rental Registration,ORLEANS OWNER LLC,2550 TELEGRAPH RD SUITE 200,BLOOMFIELD HILLS,MI,48302,Issue Registration,Issued,,,,12291,
5678,REG2020-06129,4018.0,,29th,,2020-09-08,48221.0,Rental Registration,Newberry Homes LDHA-LP,16250 Northland Dr. Suite 301,Southfield,MI,48075,Issue Registration,Issued,,,,5679,
14427,REG2021-06367,5010.0,,OPAL,,2021-08-05,48236.0,Rental Registration,"NUNZIO, RUISE",5010 CANYON ST,GROSSE POINTE,MI,48236 221,Issue Registration,Issued,,,,14428,
27482,RNTR2021-00099,5061.0,,-67 E OUTER DRIVE BLDG F,,2021-05-05,48234.0,Rental Property Registration Renewal,INTERNATIONAL APARTMENTS LLC,32500 W EIGHT MILE,FARMINGTON,MI,48336,Issue Registration,Issued,,,,27483,
25180,RNT2019-08295,23401.0,WEST,Eight MILE,,2019-09-16,48219.0,Rental Property Initial Registration,BONNIEVIEW APARTMENTS LLC,15777 W 10 MILE RD suite 101,SOUTHFIELD,MI,48076,Issue Registration,Issued,,,,25181,


### example for RNTR2021-00099 / 5061 E outer drive bldg F
- 17016320.003 is parcel ID
- dashes (like -67 E outer drive) trips up the geocoder
- we manually looked this up on https://cityofdetroit.github.io/base-unit-tools

### example for REG2021-06367 / 5010 OPAL
- 21078528. is parcel ID
- looking up in base units explorer, it has it as 5010 canyon - bseed thought this was opal (on the corner of opal and canyon) - could be that the assessor thought it was on opal
- we can report this to the assessor's to fix this

## Parcel Data
- https://data.detroitmi.gov/datasets/parcels-2/explore?location=42.352680%2C-83.099134%2C10.81
- fuzzy match on Donna Coulter

In [36]:
parcel_gdf = gpd.read_file('../data/Parcels/Parcels.shp')

In [40]:
parcel_gdf_nona = parcel_gdf[~parcel_gdf['taxpayer_1'].isna()]

In [42]:
parcel_gdf_nona[(parcel_gdf_nona['taxpayer_1'].str.contains('COULTER')) & parcel_gdf_nona['taxpayer_1'].str.contains('DONNA')]

Unnamed: 0,OBJECTID,object_id,parcel_num,ward,address,council_di,zip_code,taxpayer_1,taxpayer_2,taxpayer_s,...,assessed_v,taxable_va,landmap,related,zoning,subdivisio,legal_desc,SHAPE_Leng,SHAPE_Area,geometry
33379,33380,37280,16019088.0,16,18224 STOEPEL,2,48221,"COULTER, DONNA M",,7390 OLD MILL RD,...,26200.0,16094.0,169,,R1,,E STOEPEL 371 CANTERBURY GARDEN NO 1 L37 P66 P...,0.000964,3.711491e-08,"POLYGON ((-83.14213 42.42488, -83.14175 42.424..."


- alright, 18224 HARTWELL should've been 18224 STOEPEL - alternate goal is to document any mistakes I find in the geocoding / data entry, and give that to the city. Needs to be added to the db

## Geocoding
- first 2.5k free...?
https://developers.google.com/maps/documentation/geocoding/#Limits