## Project 5 - Rental Statuses
- created 7-10-22 by GTP
- https://docs.google.com/document/d/1LIJTlCsx54zIG5sOX3heSj00YqdzhZ9c/edit
- this notebook specifically focuses on rental status dataset:
### Goals:
1. geocode missing lat/lons for rental statuses
2. rental status table, but with building ids attached (address id is nice to have)
- *Description*: BSEED has entered a lot of data into free text fields within Accela. Would be useful to find ways to scrape and organize this data so it is useable. Unit data and Certificates of Occupancy are some of our biggest gaps. This might be a way to use administrative data to version and validate 2020 data.
- Technical Skill Level: Medium-High. Skilled at applying Regex to text strings using SQL and/or Python. Experience working with geospatial data, in ArcGIS or otherwise.
- Scope: There are 595 records in the Certificates of Occupancy dataset and 5,930 records in the Certificates of Compliance dataset. Depending on skill level, this could take 6-8 weeks.
- Inputs: Certificates of Compliance, Certificates of Occupancy, Rental Registration data
- General Process:
- Use GIS or Base Units Explorer tool to link Certificates of Occupancy to specific building ids, to create timestamps for when a building was ready for occupants.
- Geocode the addresses in the Certificate of Compliance and Rental Registration datasets and note any addresses that can’t be matched through a manual rematching process and may be missing altogether from the database.


In [31]:
#import data libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
import numbers
import decimal
#import data science packages
import scipy
import scipy.stats as stats

np.random.seed(222)
%matplotlib inline
pd.option_context('display.max_columns',999)

<pandas._config.config.option_context at 0x16750f820>

In [3]:
#import geographic analysis libraries
import geopandas as gpd
from geopandas import GeoDataFrame
import shapely as shp
from shapely.geometry import Point
from shapely.geometry import shape
import os
import re
from fiona.crs import from_epsg
import pysal as ps
import re
from googlemaps import Client as GoogleMaps
import googlemaps
import gmaps

You can install them with  `pip install urbanaccess pandana` or `conda install -c udst pandana urbanaccess`
  warn(
  from .sqlite import head_to_sql, start_sql


In [4]:
# This is where we will need the API key
gmaps = googlemaps.Client(key=os.environ['GOOGLE_GEOCODER_API'])

In [5]:
#set crs for entire analysis
crs = {'init': 'epsg:4326'}

### data sources

Rental Registrations: https://data.detroitmi.gov/datasets/rental-statuses-1/explore
- (6-1-22): I'll address this next week with Alice on our next call

Base Units: https://base-units-detroitmi.hub.arcgis.com/datasets/detroitmi::units-1/about
- jimmy mcbroom put this together

https://cityofdetroit.github.io/base-unit-tools/explorer

## Rental Statuses / Rental Registrations
### Notes:
- https://data.detroitmi.gov/datasets/rental-statuses-1/explore
- no descriptions / free text, so could be a limitation on using that to geocode
- BSEED data overly relies on parcels, so they can't really take into account buildings that have both rentals _and_ owner occupied units, like 120 seward
- so, those mixed buildings will not all be in this data, and the counts will be low. 120 seward only has 2 observations
- bseed probably only cares about building level data i.e. is the building sound and safe
- bseed is city department (https://detroitmi.gov/departments/buildings-safety-engineering-and-environmental-department)

## Questions for Alice (7-10-22):
1. 1. What is the relationship between unit_id and occupancy_gdf? Addr_id is null for everything
2. What is the city’s geocoder?


In [6]:
rental_gdf = gpd.read_file('../data/Rental_Statuses/Rental_Statuses.shp')

In [7]:
len(rental_gdf[rental_gdf['geometry'].isna()])/len(rental_gdf)

0.00803870216790339

In [8]:
len(rental_gdf[rental_gdf['geometry'].isna()])

221

In [9]:
rental_gdf_nogeocode = rental_gdf[rental_gdf['geometry'].isna()]

In [10]:
len(rental_gdf_nogeocode)

221

In [11]:
rental_gdf_nogeocode['address_for_geocode'] = rental_gdf_nogeocode['street_num'].astype(str) + ' ' +\
                                                  rental_gdf_nogeocode['street_dir'].astype(str) + ' ' +\
                                                  rental_gdf_nogeocode['street_nam'].astype(str) + ' ' +\
                                                  rental_gdf_nogeocode['street_typ'].astype(str) + ' ' +\
                                                  'DETROIT MI'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)


In [19]:
rental_gdf_nogeocode['address_for_geocode'] = rental_gdf_nogeocode['address_for_geocode'].str.replace(r'None', '')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)


In [20]:
rental_gdf_nogeocode.sample(5)

Unnamed: 0,record_id,street_num,street_dir,street_nam,street_typ,date_statu,zip,record_typ,owner_name,owner_addr,...,status,parcel_id,lon,lat,ObjectId,geometry,address_for_geocode,lat_lon,new_lat,new_lon
27480,RNTR2021-00097,5091.0,,-93 E OUTER DRIVE BLDG D,,2021-05-05,48234.0,Rental Property Registration Renewal,INTERNATIONAL APARTMENTS LLC,32500 W EIGHT MILE,...,Issued,,,,27481,,5091 -93 E OUTER DRIVE BLDG D DETROIT MI,"(42.331427, -83.0457538)",42.331427,-83.045754
130,REG2020-00785,1503.0,,LARNED,,2020-02-01,48207.0,Rental Registration,ORLEANS OWNERS LLC,1531 E LARNED,...,Issued,,,,131,,1503 LARNED DETROIT MI,"(42.3384166, -83.0243986)",42.338417,-83.024399
22375,REG2022-04536,13910.0,,SOUTHFIELD,,2022-03-31,48227.0,Rental Registration,NORTH WEST INVESTMENTS 3 LLC,21348 Telegraph Road,...,Issued,,,,22376,,13910.0 SOUTHFIELD DETROIT MI,"(42.3521947, -83.2159785)",42.352195,-83.215979
12263,REG2021-04320,1505.0,,LARNED,,2021-05-21,48207.0,Rental Registration,ORLEANS OWNER LLC,2550 TELEGRAPH RD SUITE 200,...,Issued,,,,12264,,1505.0 LARNED DETROIT MI,"(42.3384166, -83.0243986)",42.338417,-83.024399
17282,REG2021-10829,15452.0,,SOUTHFIELD,,2021-12-01,48227.0,Rental Registration,FORECLOSURE CAPITAL OF AMERICA,1801 CENTURY PARK E STE 2400,...,Issued,,,,17283,,15452.0 SOUTHFIELD DETROIT MI,"(42.3521947, -83.2159785)",42.352195,-83.215979


In [21]:
def google_geocode(address_to_geocode):
    geocode_result = gmaps.geocode(address_to_geocode)
    lat = geocode_result[0]['geometry']['location']['lat']
    lon = geocode_result[0]['geometry']['location']['lng']
    return lat, lon

In [22]:
def return_lat(lat_lon):
    return lat_lon[0]

def return_lon(lat_lon):
    return lat_lon[1]

In [23]:
rental_gdf_nogeocode['lat_lon'] = rental_gdf_nogeocode['address_for_geocode'].apply(lambda x: google_geocode(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)


In [24]:
rental_gdf_nogeocode['new_lat'] = rental_gdf_nogeocode['lat_lon'].apply(return_lat)
rental_gdf_nogeocode['new_lon'] = rental_gdf_nogeocode['lat_lon'].apply(return_lon)

In [26]:
rental_gdf_nogeocode.to_csv('../data/exports/rental_gdf_nogeocode_geocoded.csv')

## Match building ids to record_ids - rental_gdf_nogeocode (address id is nice to have)
- Process for looking up building IDs in rental df:
1. Manually look up (with zoom / etc) on the https://cityofdetroit.github.io/base-unit-tools/explorer 
2. If I can find something close, great! 
3. If not, look up owner_name in rental_df in the parcel_df (i.e. parcel_gdf[(parcel_gdf['taxpayer_1'].str.contains(<name to look up>))]
4. If able to find building_id, add to https://docs.google.com/spreadsheets/d/1DVmnUbSJ4FDOLdcrCOZX5To_WMc1nk9agYTxK5jyFBU/edit#gid=0
5. If unable, add to notes column noting that

### base units table
- source: https://base-units-detroitmi.hub.arcgis.com/

- for 18224 hartwell, could be a typo (google maps shows a point right at the lodge and hartwell, without an address there) - we _could_ look donna coulter up in the assessor's db (parcel dataset on open data portal) to see if the address is correct there

### example for RNTR2021-00099 / 5061 E outer drive bldg F
- 17016320.003 is parcel ID
- dashes (like -67 E outer drive) trips up the geocoder
- we manually looked this up on https://cityofdetroit.github.io/base-unit-tools

### example for REG2021-06367 / 5010 OPAL
- 21078528. is parcel ID
- looking up in base units explorer, it has it as 5010 canyon - bseed thought this was opal (on the corner of opal and canyon) - could be that the assessor thought it was on opal
- we can report this to the assessor's to fix this

## Parcel Data
- https://data.detroitmi.gov/datasets/parcels-2/explore?location=42.352680%2C-83.099134%2C10.81
- fuzzy match on Donna Coulter

In [33]:
parcel_gdf = gpd.read_file('../data/Parcels/Parcels.shp')

In [41]:
parcel_gdf_nona = parcel_gdf[~parcel_gdf['taxpayer_1'].isna()]

In [46]:
parcel_gdf_nona[(parcel_gdf_nona['taxpayer_1'].str.contains('MARTINA'))]

Unnamed: 0,OBJECTID,object_id,parcel_num,ward,address,council_di,zip_code,taxpayer_1,taxpayer_2,taxpayer_s,...,assessed_v,taxable_va,landmap,related,zoning,subdivisio,legal_desc,SHAPE_Leng,SHAPE_Area,geometry
432,433,3514,16013207.,16,4013 JUNCTION,6,48210.0,"SANCHEZ, MARTINA, TORRES, YOLANDA &",,4013 JUNCTION,...,12500.0,5339.0,104,,R2,,"W JUNCTION 35 BLK E-BRUSHS SUB L16 P24 PLATS, ...",0.000878,3.045535e-08,"POLYGON ((-83.11346 42.33280, -83.11314 42.332..."
493,494,3576,16001715-6,16,5656 MICHIGAN AVE,6,48210.0,"SANROMAN, JAMIE & MARTINA",,32763 BONDIE DR,...,103000.0,16657.0,104,,B3,,N MICHIGAN 11 & 12 BLK B-BRUSHS SUB L16 P24 PL...,0.000919,5.081433e-08,"POLYGON ((-83.11280 42.33160, -83.11262 42.331..."
5207,5208,7902,18002344-5,18,7123 MICHIGAN AVE,6,48210.0,"MEDINA, MARTINA",,7111 MICHIGAN AVE,...,3900.0,3900.0,112,,B4,,S MICHIGAN 143&142 EXC MICHIGAN AVE AS WD C A ...,0.000657,2.669341e-08,"POLYGON ((-83.13236 42.33097, -83.13221 42.330..."
5242,5243,7937,18002343.,18,7111 MICHIGAN AVE,6,48210.0,"MEDINA, MARTINA",,7111 MICHIGAN AVE,...,34200.0,16671.0,112,,B4,,S MICHIGAN 144 EXC MICHIGAN AVE AS WD C A & J ...,0.000511,1.345893e-08,"POLYGON ((-83.13221 42.33097, -83.13214 42.330..."
5270,5271,7964,18002342.,18,7109 MICHIGAN AVE,6,48210.0,"MEDINA, MARTINA",,7109 MICHIGAN AVE.,...,13300.0,4967.0,112,,B4,,S MICHIGAN 145 EXC MICHIGAN AVE AS WD C A & J ...,0.000511,1.34592e-08,"POLYGON ((-83.13214 42.33097, -83.13206 42.330..."
5318,5319,9610,18008974.,18,2364 GREEN,6,48209.0,"MEDINA, FRANCISCO A & MARTINA",,2394 GREEN,...,23700.0,14690.0,115,,R2,,"E GREEN 57 RIEDENS SUB L29 P77 PLATS, W C R 18...",0.001038,3.743559e-08,"POLYGON ((-83.12166 42.31489, -83.12165 42.314..."
9401,9402,13293,20007311.,20,2385 INGLIS,6,48209.0,"HERNANDEZ, JOSE & GARCIA, MARTINA",,2385 INGLIS ST,...,13400.0,6756.0,124,,R2,,"W INGLIS 136 GRANTORS SUB L14 P27 PLATS, W C R...",0.000878,3.039149e-08,"POLYGON ((-83.12966 42.31195, -83.12965 42.311..."
9951,9952,13842,18009780.,18,4632 CENTRAL,6,48210.0,"MEDINA, MARTINA",,4632 CENTRAL,...,0.0,0.0,126,,R2,,PROPERTY EXEMPT FROM AD VALOREM TAXES AND ASSE...,0.000952,3.372137e-08,"POLYGON ((-83.13648 42.32964, -83.13612 42.329..."
10833,10834,13117,20007304.,20,2431 INGLIS,6,48209.0,"MEDINA, FRANCISCO A & MARTINA",,2392 N GREEN ST,...,11800.0,7406.0,124,,R2,,"W INGLIS 129 GRANTORS SUB L14 P27 PLATS, W C R...",0.000878,3.039157e-08,"POLYGON ((-83.13003 42.31246, -83.13002 42.312..."
11046,11047,12548,20004192.,20,7743 W VERNOR,6,48209.0,"SANROMAN, JAMIE & MARTINA",,32762 BONDIE DR,...,85600.0,31609.0,124,,B4,,S VERNOR HIGHWAY 50&51 FERNDALE AVE SUB L30 P5...,0.000881,4.077866e-08,"POLYGON ((-83.12505 42.31222, -83.12492 42.312..."


In [42]:
len(parcel_gdf_nona)

381183

- alright, 18224 HARTWELL should've been 18224 STOEPEL - alternate goal is to document any mistakes I find in the geocoding / data entry, and give that to the city. Needs to be added to the db