## Capstone Project "What's Happening in my Neighborhood"
by Lori Butler    

**Data Questions**  
1. Which neighborhoods have had the most, and the least, growth during the past three years based on the count of building permits by type, and by cost of construction.
2. Do any neighborhoods have recent increases in building permit applications which may signal growth in the near future?
3. In instances where zoning changes are sought, might those be a leading indicator of a subsequent increase in building permit applications? If so, what is the average time lag?  

**Data Sources**  
- All data is through Friday 6/5/2020
- CSV
    - df_bldg_apps = Building Permit Applications, rolling 3 years
    - df_bldg_issued = Building Permits Issues, rolling 3 years
    - df_planning = Planning (Zoning) Department Applications (all pending), and issued (rolling 2 months after issuance)
- GEOJSON
    - df_na_bound = Neighborhood Boundaries GeoJSON (polygon) folders   

**Directory Structure:**  
- data
    - cleaned = clean files, ready to be used for EDA  
    - interim = temporary files used during cleaning
    - raw = unedited csv and Excel files used as teh source of analysis
- notebooks
    - 1_dataprep = standard cleanup of columns, data types, and additional data manipulation specific to this project (nulls, lat/lon, rolling week/month, etc.)
    - 2_eda = EDA to create reports, and generate clean files to export to visualization tool.
- notes_and_docs = documents created for reference, metadata, etc
- reports = final reports for presentations  

*Notebook and data structure inspired by **Practicle Business Python** article by Chris Moffitt https://pbpython.com/notebook-process.html*

## Data Prep 01 Notebook: Column and Data Type Cleanup

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd   # Prerequisite: Anotes_and_docsctivate geospatial environment via Conda Prompt
import matplotlib.pyplot as plt  
import folium                   
from folium.plugins import MarkerCluster
import requests  # For use with Google API / Geocoding
import datetime

# May not need this until I do EDA, but importing now as a reminder.
from shapely.geometry import Point  

# To use RegEx to pull ot lat/long from building permit applications/issued
import re

## Read in raw files

In [2]:
# Building Dept. Permit Applications

df_bldg_apps = pd.read_csv('../data/raw/Building_Permit_Applications_2020_06_05.csv')
df_bldg_apps.head(2)

Unnamed: 0,Permit #,Permit Type Description,Permit Subtype Description,Parcel,Date Entered,Date Issued,Construction Cost,Address,City,State,ZIP,Subdivision / Lot,Contact,Permit Type,Permit Subtype,IVR Tracking #,Purpose,Council District,Mapped Location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [3]:
# Confirming shape. OK = 3106 rows (# from data.Nashville.gov website)

df_bldg_apps.shape

(3106, 19)

In [4]:
# Building Dept. Permits Issued
# low_memory = False was added to remove a low-memory warning. Doing this prevents the
# system from trying to assign dtypes until after the full file has been read
# Resource: https://tinyurl.com/stackoverflow-low-memory

df_bldg_issued = pd.read_csv('../data/raw/Building_Permits_Issued_2020_06_05.csv'
                             , low_memory=False)
df_bldg_issued.head(2)

Unnamed: 0,Permit #,Permit Type Description,Permit Subtype Description,Parcel,Date Entered,Date Issued,Construction Cost,Address,City,State,ZIP,Subdivision / Lot,Contact,Permit Type,Permit Subtype,IVR Tracking #,Purpose,Council District,Census Tract,Mapped Location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [5]:
# Confirming shape. OK = 33909 rows (# from data.Nashville.gov website)

df_bldg_issued.shape

(33909, 20)

In [6]:
# Planning/Zoning Applications & Issued
# Initially read in file through 6/5/2020, but that had wrong dates.

df_planning = pd.read_csv('../data/raw/Planning_Department_Development_Applications_2020_06_05.csv')
df_planning.head(2)

Unnamed: 0,Date Submitted,Application Type Description,MPC Case #,Ordinance #,Status,MPC Meeting Date,MPC Action,Project Name,Location,Reviewer,...,Applicant Address 2,Applicant City,Applicant State,Applicant ZIP,Council 3rd Reading Date,Council 3rd Reading Action,Council District,Latitude,Longitude,Mapped Location
0,04/01/2019,Subdivision (Final Plat),2019S-086-001,,PENDING,06/11/2020,,FINAL PLAT RESUBDIVISION OF LOT 3 AND 4 ON THE...,227 MARCIA AVE 37209,Joren Dunnavant,...,,Nashville,TN,37203,,,20 (Mary Carolyn Roberts),36.143923,-86.868254,"(36.143922831000054, -86.86825400699996)"
1,11/27/2019,Specific Plan (Final Site Plan),2016SP-076-008,,PENDING,01/16/2020,,RED OAKS TOWNHOMES,0 DEW ST 37206,Abbie Rickoff,...,,Nashville,TN,37204,,,06 (Brett Withers),36.165962,-86.75349,"(36.165961579000054, -86.75348957099999)"


In [7]:
# Confirming shape. OK = 521 rows (# from data.Nashville.gov website)

df_planning.shape

(521, 27)

In [8]:
# Neighborhood Assoc boundaries GIS file, using geopandas

df_na_bound = gpd.read_file('../data/raw/Neighborhood Association Boundaries (GIS)_2020_06_03.geojson')
print(df_na_bound.crs)
df_na_bound.head(2)

epsg:4326


Unnamed: 0,name,geometry
0,Historic Buena Vista,"MULTIPOLYGON (((-86.79511 36.17576, -86.79403 ..."
1,Charlotte Park,"MULTIPOLYGON (((-86.87460 36.15758, -86.87317 ..."


## Column Name Cleanup - Bldg Permit Applications

In [9]:
df_bldg_apps.columns

Index(['Permit #', 'Permit Type Description', 'Permit Subtype Description',
       'Parcel', 'Date Entered', 'Date Issued', 'Construction Cost', 'Address',
       'City', 'State', 'ZIP', 'Subdivision / Lot', 'Contact', 'Permit Type',
       'Permit Subtype', 'IVR Tracking #', 'Purpose', 'Council District',
       'Mapped Location'],
      dtype='object')

In [10]:
df_bldg_apps.columns = (df_bldg_apps.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision___lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [11]:
# To correct issue with too many underscores in subdivision_lot column name

df_bldg_apps = df_bldg_apps.rename(columns = {'subdivision___lot': 'subdivision_lot'})
df_bldg_apps.columns

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'date_issued', 'construction_cost', 'address', 'city',
       'state', 'zip', 'subdivision_lot', 'contact', 'permit_type',
       'permit_subtype', 'ivr_tracking_number', 'purpose', 'council_district',
       'mapped_location'],
      dtype='object')

## Column Name Cleanup -  Bldg Permits Issued

In [12]:
df_bldg_issued.columns

Index(['Permit #', 'Permit Type Description', 'Permit Subtype Description',
       'Parcel', 'Date Entered', 'Date Issued', 'Construction Cost', 'Address',
       'City', 'State', 'ZIP', 'Subdivision / Lot', 'Contact', 'Permit Type',
       'Permit Subtype', 'IVR Tracking #', 'Purpose', 'Council District',
       'Census Tract', 'Mapped Location'],
      dtype='object')

In [13]:
df_bldg_issued.columns = (df_bldg_issued.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision___lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [14]:
# To correct issue with too many underscores in subdivision_lot column name

df_bldg_issued = df_bldg_issued.rename(columns = {'subdivision___lot': 'subdivision_lot'})
df_bldg_issued.columns

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'date_issued', 'construction_cost', 'address', 'city',
       'state', 'zip', 'subdivision_lot', 'contact', 'permit_type',
       'permit_subtype', 'ivr_tracking_number', 'purpose', 'council_district',
       'census_tract', 'mapped_location'],
      dtype='object')

## Column Name Cleanup -  Planning Dept

In [15]:
df_planning.columns

Index(['Date Submitted', 'Application Type Description', 'MPC Case #',
       'Ordinance #', 'Status', 'MPC Meeting Date', 'MPC Action',
       'Project Name', 'Location', 'Reviewer', 'Reviewer Email',
       'Case Description', 'Applicant', 'Applicant Representative',
       'Applicant Email', 'Applicant Phone', 'Applicant Address 1',
       'Applicant Address 2', 'Applicant City', 'Applicant State',
       'Applicant ZIP', 'Council 3rd Reading Date',
       'Council 3rd Reading Action', 'Council District', 'Latitude',
       'Longitude', 'Mapped Location'],
      dtype='object')

In [16]:
df_planning.columns = (df_planning.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_planning.head(2)

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_3rd_reading_date,council_3rd_reading_action,council_district,latitude,longitude,mapped_location
0,04/01/2019,Subdivision (Final Plat),2019S-086-001,,PENDING,06/11/2020,,FINAL PLAT RESUBDIVISION OF LOT 3 AND 4 ON THE...,227 MARCIA AVE 37209,Joren Dunnavant,...,,Nashville,TN,37203,,,20 (Mary Carolyn Roberts),36.143923,-86.868254,"(36.143922831000054, -86.86825400699996)"
1,11/27/2019,Specific Plan (Final Site Plan),2016SP-076-008,,PENDING,01/16/2020,,RED OAKS TOWNHOMES,0 DEW ST 37206,Abbie Rickoff,...,,Nashville,TN,37204,,,06 (Brett Withers),36.165962,-86.75349,"(36.165961579000054, -86.75348957099999)"


## Data type cleanup

## Bldg Permit Applications - dtype cleanup

In [17]:
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [18]:
# Confirming dtypes
# NEED TO CHANGE: date_entered should be datetime fields (date only).

df_bldg_apps.info()  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3106 entries, 0 to 3105
Data columns (total 19 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   permit_number         3106 non-null   object 
 1   permit_type_descr     3106 non-null   object 
 2   permit_subtype_descr  3106 non-null   object 
 3   parcel                3106 non-null   object 
 4   date_entered          3106 non-null   object 
 5   date_issued           0 non-null      float64
 6   construction_cost     1651 non-null   float64
 7   address               3106 non-null   object 
 8   city                  3106 non-null   object 
 9   state                 3106 non-null   object 
 10  zip                   3106 non-null   int64  
 11  subdivision_lot       3105 non-null   object 
 12  contact               3105 non-null   object 
 13  permit_type           3106 non-null   object 
 14  permit_subtype        3106 non-null   object 
 15  ivr_tracking_number  

In [19]:
# All values in date_issued are null. Will remove this from df

df_bldg_apps.isnull().sum()

permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
date_issued             3106
construction_cost       1455
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            1
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                   22
council_district           7
mapped_location            0
dtype: int64

In [20]:
# To drop date_issued from building applications df (all values are null)

df_bldg_apps = df_bldg_apps.drop(columns = ['date_issued'])

In [21]:
# To confirm column dropped successfully. DONE!

df_bldg_apps.columns  

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'construction_cost', 'address', 'city', 'state', 'zip',
       'subdivision_lot', 'contact', 'permit_type', 'permit_subtype',
       'ivr_tracking_number', 'purpose', 'council_district',
       'mapped_location'],
      dtype='object')

In [22]:
# Convert date_entered to datetimee.

df_bldg_apps.date_entered = pd.to_datetime(df_bldg_apps.date_entered)
df_bldg_apps.date_entered.head(2)

0   2020-03-11
1   2019-12-02
Name: date_entered, dtype: datetime64[ns]

In [23]:
# Double-checking min/max dates in this df
# .dt.date causes only the date to show

print(df_bldg_apps.date_entered.min())
print(df_bldg_apps.date_entered.max())

2017-06-01 00:00:00
2020-06-04 00:00:00


## Bldg Permits Issued - dtype cleanup

In [24]:
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [25]:
# Confirming dtypes
# NEED TO CHANGE: date_entered and date_issued should be datetime fields (date only).

df_bldg_issued.info()  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33909 entries, 0 to 33908
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   permit_number         33909 non-null  object 
 1   permit_type_descr     33909 non-null  object 
 2   permit_subtype_descr  33909 non-null  object 
 3   parcel                33909 non-null  object 
 4   date_entered          33909 non-null  object 
 5   date_issued           33909 non-null  object 
 6   construction_cost     33899 non-null  float64
 7   address               33909 non-null  object 
 8   city                  33909 non-null  object 
 9   state                 33909 non-null  object 
 10  zip                   33909 non-null  int64  
 11  subdivision_lot       33909 non-null  object 
 12  contact               33908 non-null  object 
 13  permit_type           33909 non-null  object 
 14  permit_subtype        33909 non-null  object 
 15  ivr_tracking_number

In [26]:
# Convert date_entered to datetime

df_bldg_issued.date_entered = pd.to_datetime(df_bldg_issued.date_entered)
df_bldg_issued.date_entered.head(2)

0   2019-11-18
1   2020-03-12
Name: date_entered, dtype: datetime64[ns]

In [27]:
# Convert date_issued to datetime, keeping DATE only, not time.
df_bldg_issued.date_issued = pd.to_datetime(df_bldg_issued.date_issued)
df_bldg_issued.date_issued.head(2)

0   2019-12-09
1   2020-03-12
Name: date_issued, dtype: datetime64[ns]

In [28]:
# Double-checking min/max dates in this df

print(df_bldg_issued.date_issued.min())
print(df_bldg_issued.date_issued.max())

2017-06-01 00:00:00
2020-06-04 00:00:00


## Planning Dept Applications & Issued - dtype cleanup

In [29]:
df_planning.head(2)

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_3rd_reading_date,council_3rd_reading_action,council_district,latitude,longitude,mapped_location
0,04/01/2019,Subdivision (Final Plat),2019S-086-001,,PENDING,06/11/2020,,FINAL PLAT RESUBDIVISION OF LOT 3 AND 4 ON THE...,227 MARCIA AVE 37209,Joren Dunnavant,...,,Nashville,TN,37203,,,20 (Mary Carolyn Roberts),36.143923,-86.868254,"(36.143922831000054, -86.86825400699996)"
1,11/27/2019,Specific Plan (Final Site Plan),2016SP-076-008,,PENDING,01/16/2020,,RED OAKS TOWNHOMES,0 DEW ST 37206,Abbie Rickoff,...,,Nashville,TN,37204,,,06 (Brett Withers),36.165962,-86.75349,"(36.165961579000054, -86.75348957099999)"


In [30]:
df_planning.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 521 entries, 0 to 520
Data columns (total 27 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   date_submitted              521 non-null    object 
 1   application_type_descr      521 non-null    object 
 2   mpc_case_number             520 non-null    object 
 3   ordinance_number            157 non-null    object 
 4   status                      521 non-null    object 
 5   mpc_meeting_date            521 non-null    object 
 6   mpc_action                  235 non-null    object 
 7   project_name                417 non-null    object 
 8   location                    498 non-null    object 
 9   reviewer                    521 non-null    object 
 10  reviewer_email              465 non-null    object 
 11  case_descr                  503 non-null    object 
 12  applicant                   501 non-null    object 
 13  applicant_representative    497 non

In [31]:
df_planning.date_submitted = pd.to_datetime(df_planning.date_submitted)
df_planning.date_submitted.head(2)

0   2019-04-01
1   2019-11-27
Name: date_submitted, dtype: datetime64[ns]

In [32]:
df_planning.mpc_meeting_date = pd.to_datetime(df_planning.mpc_meeting_date)
df_planning.mpc_meeting_date.head(2)

0   2020-06-11
1   2020-01-16
Name: mpc_meeting_date, dtype: datetime64[ns]

In [33]:
# Double-checking min/max dates in this df

print(df_planning.date_submitted.min())   #2017-02-28
print(df_planning.date_submitted.max())   #2020-06-04
print(df_planning.mpc_meeting_date.min())  #2017-04-13
print(df_planning.mpc_meeting_date.max())  #2020-07-23 - Future date is correct

2017-02-28 00:00:00
2020-06-04 00:00:00
2017-04-13 00:00:00
2020-07-23 00:00:00


## Fix mapped_location, pull out lat/lon in Bldg Permit Apps & Issued

In [34]:
# Building Permit Applications

df_bldg_apps.mapped_location.unique()

array(['748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944, -86.879062)',
       '4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36.242681, -86.929594)',
       '4119 MURFREESBORO PIKE\nANTIOCH, TN 37013\n(36.032211, -86.594799)',
       ...,
       '6680 CHARLOTTE PIKE B-5\nNASHVILLE, TN 37209\n(36.136609, -86.883701)',
       '3805 CHARLOTTE AVE\nNASHVILLE, TN 37209\n(36.152561, -86.831473)',
       '5610A GRANNY WHITE PIKE\nBRENTWOOD, TN 37027\n(36.046438, -86.815953)'],
      dtype=object)

In [35]:
# To pull out lat/lng from:
# '748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944, -86.879062)'
# Regular Expression - pattern matching
# RegEx link:  https://regex101.com/r/cAI6sh/1

pattern = re.compile(r'.*\((\d*\S\d*)\S\s(\S\d*\S\d*)\)', flags = re.MULTILINE)

def extract_lat_lon(map_loc):
    try:
        lat_lon_match = pattern.search(map_loc)
        lat = float(lat_lon_match.group(1))
        lon = float(lat_lon_match.group(2))
        return(lat, lon)
    except:
        return(np.NaN, np.NaN)

In [36]:
lat_lon = [extract_lat_lon(map_loc) for map_loc in df_bldg_apps.mapped_location]  #list comprehension

In [37]:
df_bldg_apps['lat'] = [lat for lat, lon in lat_lon]

In [38]:
df_bldg_apps['lon'] = [lon for lat, lon in lat_lon]

In [39]:
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,2020-03-11,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944...",36.125944,-86.879062
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,2019-12-02,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36....",36.242681,-86.929594


In [40]:
# Find out how many of the mapped locations had only the address, not the lat/lon
# Nulls in lat/lon 297 out of 3,106 = 9.5%
# To many to leave 'as is' Will try to add lat/lon fromUS Census geocoding (free) service:
# https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form
# US CENSUS GEOCODING didn't work. Will try Google Maps API next.

print(df_bldg_apps.shape)
print(df_bldg_apps.isnull().sum())

(3106, 20)
permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
construction_cost       1455
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            1
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                   22
council_district           7
mapped_location            0
lat                      297
lon                      297
dtype: int64


In [41]:
# Building Permits Issued

df_bldg_issued.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33909 entries, 0 to 33908
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   permit_number         33909 non-null  object        
 1   permit_type_descr     33909 non-null  object        
 2   permit_subtype_descr  33909 non-null  object        
 3   parcel                33909 non-null  object        
 4   date_entered          33909 non-null  datetime64[ns]
 5   date_issued           33909 non-null  datetime64[ns]
 6   construction_cost     33899 non-null  float64       
 7   address               33909 non-null  object        
 8   city                  33909 non-null  object        
 9   state                 33909 non-null  object        
 10  zip                   33909 non-null  int64         
 11  subdivision_lot       33909 non-null  object        
 12  contact               33908 non-null  object        
 13  permit_type     

In [42]:
# Review mapped_location in bldg_issued df before applying regex/function

df_bldg_issued.mapped_location.unique()

array(['1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218',
       '210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\n(36.042219, -86.764816)',
       '812 BRIAR CIR\nMADISON, TN 37115', ...,
       '131 EDENWOLD RD\nMADISON, TN 37115\n(36.287001, -86.703591)',
       '110 2ND AVE N\nNASHVILLE, TN 37201\n(36.162296, -86.77544)',
       '1382 RURAL HILL RD 320\nANTIOCH, TN 37013\n(36.056805, -86.649469)'],
      dtype=object)

In [43]:
# Applying function written for building applications to this building permits issued df

lat_lon = [extract_lat_lon(map_loc) for map_loc in df_bldg_issued.mapped_location]  

In [44]:
df_bldg_issued['lat'] = [lat for lat, lon in lat_lon]

In [45]:
df_bldg_issued['lon'] = [lon for lat, lon in lat_lon]

In [46]:
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location,lat,lon
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,2019-11-18,2019-12-09,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,...,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218",,
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,2020-03-12,2020-03-12,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,...,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\...",36.042219,-86.764816


## Looked at addresses that have no lat/lon from Bldg Permit Application & Issued dfs
- Discovered there was a meaningful amount of relevant addresses that are missing lat/lon
- Will submit these to census tool to get lat/lon
- Chose to do this extra step because a meaninful number of addresses were missing lat/lon:
    - Bldg Permit Applications missing 297 out of 3,106 = 9.5% and 249 are new residential
    - Bldg Permits Issued missing 3,928 out of 33,909 = 11.5%

In [47]:
# Find out how many of the mapped locations had only the address, not the lat/lon: Bldg Permit Applications
# Number of rows missing lat & lon:  297

print(df_bldg_apps.shape)
print(df_bldg_apps.isnull().sum())

(3106, 20)
permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
construction_cost       1455
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            1
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                   22
council_district           7
mapped_location            0
lat                      297
lon                      297
dtype: int64


In [48]:
# Find out how many of the mapped locations had only the address, not the lat/lon: Bldg Permits Issued
# Number of rows missing lat & lon:  3928

print(df_bldg_issued.shape)
print(df_bldg_issued.isnull().sum())

(33909, 22)
permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
date_issued                0
construction_cost         10
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            0
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                  467
council_district          46
census_tract              43
mapped_location            0
lat                     3928
lon                     3928
dtype: int64


In [49]:
# BLDG PERMIT APPLICATIONS
# First: Checked to see which items had null in lat/lon. Do I need these?
#        YES - there are a lot for new residential permits 
# Second: Created new df to submit to census geocoder website.

df_bldg_apps_null_latlon = df_bldg_apps.loc[df_bldg_apps['lat'].isnull()].reset_index(drop = True) 
df_bldg_apps_null_latlon.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon
0,T2020034764,Building Residential - New,Single Family Residence,165140A20100CO,2020-06-04,,4929 CHUTNEY DR,ANTIOCH,TN,37013,LOT 201 DAVENPORT DOWNS PH 2,AMH DEVELOPMENT TENNESSEE GC LLC,CARN,CAA01R301,3830089,to construct a single family residence with 22...,33.0,"4929 CHUTNEY DR\nANTIOCH, TN 37013",,
1,T2020032881,Building Residential - New,Single Family Residence,173100D02400CO,2020-05-27,260000.0,329 BODDINGTON LN,ANTIOCH,TN,37013,LOT 24 DELVIN DOWNS PH 6,CAPITOL HOMES INC,CARN,CAA01R301,3824736,New two story residential home in a approved P...,31.0,"329 BODDINGTON LN\nANTIOCH, TN 37013",,


In [50]:
# BLDG PERMIT APPLICATIONS
# How many of these are meaningful in Bldg Permit Applications?   
# ANSWER: Most are important to know about for residential/commercial growth

df_bldg_apps_null_latlon.permit_type_descr.value_counts()

Building Residential - New                 249
Building Commercial - Tenant Finish Out     19
Building Use & Occupancy                    10
Building Sign Permit                         4
Building Commercial - Rehab                  4
Building Residential - Addition              3
Building Demolition Permit                   2
Building Commercial - New                    2
Building Residential Rehab Storm Damage      1
Building Residential - Roofing / Siding      1
Building Commercial Rehab Storm Damage       1
Building Blasting Permit                     1
Name: permit_type_descr, dtype: int64

In [51]:
# BLDG PERMITS ISSUED
# First: Checked to see which items had null in lat/lon. Do I need these? 
#        YES - there are a lot for new residential permits 
# Second: Created new df to submit to census geocoder website.

df_bldg_issued_null_latlon = df_bldg_issued.loc[df_bldg_issued['lat'].isnull()].reset_index(drop = True) 
df_bldg_issued_null_latlon.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location,lat,lon
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,2019-11-18,2019-12-09,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,...,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218",,
1,2019011084,Building Use & Occupancy,"Multifamily, Townhome",051100J90000CO,2019-02-25,2019-07-22,0.0,812 BRIAR CIR,MADISON,TN,...,COLE INVESTMENTS LLC,CAUO,CAA03R301,3611315,MASTER PERMIT ONLY� � NO CONSTRUCTION� �MULTI-...,8.0,37010802.0,"812 BRIAR CIR\nMADISON, TN 37115",,


In [52]:
# How many of these are meaningful in Bldg Permits Issued?   
# ANSWER: Most are important to know about for residential/commercial growth

df_bldg_issued_null_latlon.permit_type_descr.value_counts()

Building Residential - New                  3294
Building Commercial - Tenant Finish Out      172
Building Commercial - New                    100
Building Sign Permit                          74
Building Use & Occupancy                      57
Building Tree Removal Permit                  45
Building Commercial - Rehab                   43
Building Residential - Addition               25
Building Commercial - Foundation              19
Building Residential - Tenant Finish Out      18
Building Commercial - Shell                   14
Building Residential - Rehab                  14
Building Demolition Permit                    13
Building Residential - Amend Permit           12
Building Commercial - Addition                11
Building Blasting Permit                       6
Building Commercial - Roofing / Siding         6
Building Residential - Shell                   2
Building Residential - Roofing / Siding        1
Building Residential - Change Contractor       1
Building Moving Perm

### Creating new dfs for Bldg Permit Applications & Bldg Permits Issued with ONLY addresses that have no lat/lon, to submit to US Census Geocoder tool
- Will export to CSV; submit to census geocoder tool to get lat & lon, then add addresses back to original file (matching on addresss/city/st/zip columns)

In [53]:
df_bldg_apps_null_latlon.columns

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'construction_cost', 'address', 'city', 'state', 'zip',
       'subdivision_lot', 'contact', 'permit_type', 'permit_subtype',
       'ivr_tracking_number', 'purpose', 'council_district', 'mapped_location',
       'lat', 'lon'],
      dtype='object')

In [54]:
# Dropping all but the address columns, in preparation for uploading to geocoder website

df_bldg_apps_null_latlon = df_bldg_apps_null_latlon[['address', 'city', 'state', 'zip']]
df_bldg_apps_null_latlon.head(2)

Unnamed: 0,address,city,state,zip
0,4929 CHUTNEY DR,ANTIOCH,TN,37013
1,329 BODDINGTON LN,ANTIOCH,TN,37013


In [55]:
df_bldg_issued_null_latlon = df_bldg_issued_null_latlon[['address', 'city', 'state', 'zip']]
df_bldg_issued_null_latlon.head(2)

Unnamed: 0,address,city,state,zip
0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218
1,812 BRIAR CIR,MADISON,TN,37115


In [56]:
df_bldg_apps_null_latlon.to_csv('../data/interim/bldg_apps_addresses.csv')

In [57]:
df_bldg_issued_null_latlon.to_csv('../data/interim/bldg_issued_addresses.csv')

### Census Geocoder tool only found 1% of addresses. Changed direction: Will use Google Maps API to get geocodes.
- The census geocoder tool only found 1% of addresses in each file as exact match, and additional 1% in one of the files as non-exact match - but the remaining 98-99% weren't found.
- Aborting this process becasue it isn't worth the time to fuss with clean and import just 1%. It would be cumbersome because:
    - Addresses were submitted as 4 columns for address; but returned with address, city, state and zip all concatenated in a single cell.
    - Returns [lat, lon] in single column, in addition to quite a bit of other information in other columns I don't need.
- Returned files are in ../data/interim folder for future reference
- US Census website, for reference: https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form

### Creating new columns for rolling week, month, quarter, year in Bldg Permit Applications & Issued, and Planning Dept Permits 

### ABORTED rolling date idea on 6/12/2020:
- Discovered that it won't work due to needing to first group by category (Residential, Commercial, Other).
- It'll be better to do groupby, counts, etc. in the 2_eda notebook as needed, and do additional slicing in Tableau later on.

## Exploring and Cleaning: Planning Dept Applications / Issued

In [58]:
# Checking nulls, building permit applications

df_planning.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 521 entries, 0 to 520
Data columns (total 27 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   date_submitted              521 non-null    datetime64[ns]
 1   application_type_descr      521 non-null    object        
 2   mpc_case_number             520 non-null    object        
 3   ordinance_number            157 non-null    object        
 4   status                      521 non-null    object        
 5   mpc_meeting_date            521 non-null    datetime64[ns]
 6   mpc_action                  235 non-null    object        
 7   project_name                417 non-null    object        
 8   location                    498 non-null    object        
 9   reviewer                    521 non-null    object        
 10  reviewer_email              465 non-null    object        
 11  case_descr                  503 non-null    object        

In [59]:
# Counting null values
# Odd to have just one null case number

df_planning.isnull().sum()

date_submitted                  0
application_type_descr          0
mpc_case_number                 1
ordinance_number              364
status                          0
mpc_meeting_date                0
mpc_action                    286
project_name                  104
location                       23
reviewer                        0
reviewer_email                 56
case_descr                     18
applicant                      20
applicant_representative       24
applicant_email                25
applicant_phone                23
applicant_address_1            24
applicant_address_2           382
applicant_city                 24
applicant_state                24
applicant_zip                  24
council_3rd_reading_date      521
council_3rd_reading_action    521
council_district               32
latitude                       32
longitude                      32
mapped_location                32
dtype: int64

In [60]:
# Finding row with null case number
# Project name says "Created in error". Will delete this row

df_planning[df_planning.mpc_case_number.isnull()]

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_3rd_reading_date,council_3rd_reading_action,council_district,latitude,longitude,mapped_location
224,2020-03-02,Rezoning,,,PENDING,2020-03-12,,created in error,,FRONT COUNTER,...,,,,,,,,,,


In [61]:
# Dropping row noted above with project name "created in error"

df_planning = df_planning.dropna(subset=['mpc_case_number'])

In [62]:
# Double-checking that df is now 520 rows. 

df_planning.shape

(520, 27)

In [63]:
# Finding the remaining 22 rows with null 'location'
# Won't be able to get lat/lon for these. Will delete them.

df_planning[df_planning.location.isnull()]


Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_3rd_reading_date,council_3rd_reading_action,council_district,latitude,longitude,mapped_location
17,2019-09-23,Text Amendment,2019Z-015TX-001,BL2019-8,CNCLACTIVE,2020-05-28,Disapprove,AMENDMENT TO SIDEWALK ORDINANCE,,FRONT COUNTER,...,,,,,,,,,,
21,2020-04-13,Text Amendment,2020Z-008TX-001,BL2020-277,CNCLACTIVE,2020-05-28,Approve with Conditions,NONCONFORMING STRUCTURES,,Lisa Milligan,...,Suite 204,Nashville,TN,37219.0,,,,,,
26,2020-04-23,Text Amendment,2020Z-009TX-001,BL2020-288,CNCLACTIVE,2020-05-28,Approve with Conditions,STREET TREES,,Shawn Shepard,...,Suite 204,Nashville,TN,37219.0,,,,,,
29,2020-06-04,Rezoning,2020Z-081PR-001,,NEW,2020-07-23,,,,FRONT COUNTER,...,,,,,,,,,,
30,2020-06-04,Subdivision (Final Plat),2020S-116-001,,NEW,2020-07-23,,,,FRONT COUNTER,...,,,,,,,,,,
31,2020-06-04,Subdivision (Final Plat),2020S-117-001,,NEW,2020-07-23,,,,FRONT COUNTER,...,,,,,,,,,,
69,2020-05-13,Urban Design Overlay (Modify),2013UD-002-024,,PENDING,2020-05-23,,,,FRONT COUNTER,...,,,,,,,,,,
72,2020-05-18,Subdivision (Final Plat),2020S-113-001,,NEW,2020-07-23,,,,FRONT COUNTER,...,,,,,,,,,,
97,2020-02-11,Text Amendment,2020Z-007TX-001,BL2020-188,CNCLACTIVE,2020-04-23,Approve with Conditions,DRIVEWAY ORDINANCE,,Lisa Milligan,...,Suite 204,Nashville,TN,37219.0,,,,,,
138,2020-01-10,Text Amendment,2020Z-005TX-001,BL2020-151,PENDING,2020-02-27,,,,FRONT COUNTER,...,,,,,,,,,,


In [64]:
# Dropping 22 rows with null 'location'
# Resulting df should have 498 rows

df_planning = df_planning.dropna(subset=['location'])
print(df_planning.shape)

(498, 27)


In [65]:
# Testing to ensure all rows with null in 'location' have been dropped.

df_planning[df_planning.location.isnull()]

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_3rd_reading_date,council_3rd_reading_action,council_district,latitude,longitude,mapped_location


In [66]:
# Looking at remaining rows that have a 'location', but null in 'mapped_location'
# Will submit them to Google Maps API to get geocoding

# PLANNING DEPT
# First: Checked to see which items had null in lat/lon. Do I need these? 
#        YES, it would be good to keep these.
# Second: Creating new df to submit to census geocoder website.

df_planning_null_latlon = df_planning.loc[df_planning['latitude'].isnull()].reset_index(drop = True) 
df_planning_null_latlon

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_3rd_reading_date,council_3rd_reading_action,council_district,latitude,longitude,mapped_location
0,2020-02-14,Specific Plan (Final Site Plan),2016SP-076-010,,PENDING,2020-04-09,,BOSCOBEL HEIGHTS LIBRARY,998 SEVIER ST 37210,Abbie Rickoff,...,Suite 210,NASHVILLE,TN,37209.0,,,06 (Brett Withers),,,
1,2019-12-04,Community Plan Amendment,2020CP-000-001,,PENDING,2020-05-12,,16TH AVENUE NORTH,961 16TH AVE N 37208,Marty Sewell,...,Suite 425,Nashville,TN,37203.0,,,19 (Freddie O'Connell),,,
2,2019-04-25,Mandatory Referral Easement,2019M-042ES-001,BL2019-1664,CNCLACTIVE,2019-06-13,Recommend Approval,BOSCOBEL III - CAYCE PLACE,707 S 7TH ST 37206,Sharon O'Conner,...,,,,,,,06 (Brett Withers),,,
3,2018-08-16,Mandatory Referral Easement,2018M-059ES-001,BL2019-41,CNCLACTIVE,2018-10-11,Recommend Approval,2906 AND 2908 FELICIA STREET EASEMENT RIGHTS A...,2908 FELICIA ST 37209,Sharon O'Conner,...,,Nashville,TN,37210.0,,,21 (Ed Kindall),,,
4,2019-11-01,Specific Plan (Final Site Plan),2013SP-030-006,,PENDING,2020-01-16,,PORTER ROAD PHASE 2,1505 PORTER RD 37206,Patrick Napier,...,,Nashville,TN,37205.0,,,07 (Emily Benedict),,,
5,2020-02-26,Specific Plan (Final Site Plan),2015SP-005-011,,PENDING,2020-04-09,,MEDICAL OFFICE BUILDING AT CENTURY FARMS,0 CANE RIDGE RD,Logan Elliott,...,,Nashville,TN,37210.0,,,32 (Joy Styles),,,
6,2020-02-26,Specific Plan (Final Site Plan),2016SP-083-002,,PENDING,2020-04-09,,50 MUSIC SQUARE WEST HOTEL,50 MUSIC SQ W #901 37203,Jason Swaggart,...,Suite 210,Nashville,TN,37209.0,,,19 (Freddie O'Connell),,,
7,2020-04-29,Mandatory Referral Easement,2020M-049ES-001,,PENDING,2020-06-11,,HOBSON FLATS SEWER EXTENSION,0 MURFREESBORO PIKE 37013,Sharon O'Conner,...,,Nashville,TN,37208.0,,,32 (Joy Styles),,,
8,2019-12-10,Mandatory Referral Easement,2020M-005ES-001,BL2020-167,CNCLACTIVE,2020-02-13,Recommend Approval,STATE STREET WOODFIELD EASEMENTS,301 MCMILLIN ST 37203,Sharon O'Conner,...,,Nashville,TN,37208.0,,,21 (Brandon Taylor),,,


In [67]:
# Creating new df wtih only the 'location' column, to send out for geocoding
# Since there are so few (just 9 rows), I'll edit the location address in Excel to make it fit Google's requirements.

df_planning_null_latlon = df_planning_null_latlon[['location']]
df_planning_null_latlon.head(2)

Unnamed: 0,location
0,998 SEVIER ST 37210
1,961 16TH AVE N 37208


In [68]:
df_planning_null_latlon.to_csv('../data/interim/planning_addresses.csv')

In [69]:
# Taking a look at the types of info in Planning Dept data

print(df_planning.application_type_descr.nunique())
print(df_planning.application_type_descr.value_counts())

31
Rezoning                                    86
Subdivision (Final Plat)                    81
Mandatory Referral Easement                 64
Specific Plan (Final Site Plan)             50
Specific Plan (New)                         39
Mandatory Referral Encroachment             24
Community Plan Amendment                    17
Planned Unit Development (Final Site Pl)    15
Mandatory Referral Agreement                15
Downtown Code (Final Site Plan)             13
Subdivision (Concept Plan)                   9
Mandatory Referral Property                  8
Downtown Code (Modify)                       8
Subdivision (Amendment)                      8
Specific Plan (Amend)                        8
Planned Unit Development (Cancel)            8
Planned Unit Development (Amend)             8
Mandatory Referral  R.O.W. Abandonment       7
Urban Design Overlay (Final)                 6
Historic Landmark (New)                      5
Urban Design Overlay (Modify)                3
Mandatory 

In [70]:
# Taking a look at the types of info in Planning Dept data

print(df_planning.status.nunique())
print(df_planning.status.value_counts())

5
PENDING        216
CNCLACTIVE     195
NEW             64
MPCCOMPLETE     22
UNKNOWN          1
Name: status, dtype: int64


## Exploring and Cleaning: Neighborhood Assoc Boundaries (GIS)
- The only change needed was to fix the crs code from epsg:4326 go EPSG:4326

In [71]:
df_na_bound.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 288 entries, 0 to 287
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   name      288 non-null    object  
 1   geometry  288 non-null    geometry
dtypes: geometry(1), object(1)
memory usage: 4.6+ KB


In [72]:
# To confirm whether any are null.  No nulls! Good :-)

df_na_bound.isnull().sum()

name        0
geometry    0
dtype: int64

In [73]:
# Confirming crs type

print(df_na_bound.crs)

epsg:4326


In [74]:
# Converting to uppercase EPSG

df_na_bound.crs = "EPSG:4326"
print(df_na_bound.crs)

EPSG:4326


## Review permit types, permit subtypes, etc. in Bldg Permit Applications, Bldg Permits Issued, and Planning dfs.
- Are there any types, subtypes, or columns not needed for EDA?

In [75]:
# Bldg Permit Applications
# Which columns not needed?  
# KEEPING ALL. Might want them for popups on visualization

df_bldg_apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3106 entries, 0 to 3105
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   permit_number         3106 non-null   object        
 1   permit_type_descr     3106 non-null   object        
 2   permit_subtype_descr  3106 non-null   object        
 3   parcel                3106 non-null   object        
 4   date_entered          3106 non-null   datetime64[ns]
 5   construction_cost     1651 non-null   float64       
 6   address               3106 non-null   object        
 7   city                  3106 non-null   object        
 8   state                 3106 non-null   object        
 9   zip                   3106 non-null   int64         
 10  subdivision_lot       3105 non-null   object        
 11  contact               3105 non-null   object        
 12  permit_type           3106 non-null   object        
 13  permit_subtype    

In [76]:
# Bldg Permit Applications
# Which PERMIT TYPES are needed to answer data questions?
# DECISION: KEEPING ALL
# REASON: Even though some types only have one count,
#         several with one value may be in a single neighborhood assoc boundary
#         May group this once I start exploring, but will leave all details in place for now.

df_bldg_apps.permit_subtype_descr.value_counts()

Single Family Residence                    1420
Demolition Permit - Residential             173
Accessory Structure, Garage                 167
Sign - Ground /  Wall Signs                 136
Tents, Stages                                99
                                           ... 
Telephone Services - Small Cell Towers        1
Personal Care Svcs, Barber/Beauty Shops       1
Recreation Center, Arenas                     1
Bed & Breakfast Inn, Hotel / Motel            1
Outpatient Clinic                             1
Name: permit_subtype_descr, Length: 94, dtype: int64

In [77]:
# Bldg Permit Applications
# Which PERMIT SUB-TYPES are needed to answer data questions?
# DECISION: KEEPING ALL
# REASON: Even though some types only have one count,
#         several with one value may be in a single neighborhood assoc boundary
#         May group this once I start exploring, but will leave all details in place for now.

# To get full results that aren't truncated, set to 4000:
pd.options.display.max_rows = 50
print(pd.options.display.max_rows)

list_bldg_apps_subtype_counts = df_bldg_apps.permit_subtype_descr.value_counts()
print(list_bldg_apps_subtype_counts)

50
Single Family Residence                    1420
Demolition Permit - Residential             173
Accessory Structure, Garage                 167
Sign - Ground /  Wall Signs                 136
Tents, Stages                                99
                                           ... 
Telephone Services - Small Cell Towers        1
Personal Care Svcs, Barber/Beauty Shops       1
Recreation Center, Arenas                     1
Bed & Breakfast Inn, Hotel / Motel            1
Outpatient Clinic                             1
Name: permit_subtype_descr, Length: 94, dtype: int64


In [78]:
# Bldg Permits Issued
# Which columns not needed?  
# DECISION: KEEPING ALL
# REASON: Might want this information for popups on visualization

df_bldg_issued.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33909 entries, 0 to 33908
Data columns (total 22 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   permit_number         33909 non-null  object        
 1   permit_type_descr     33909 non-null  object        
 2   permit_subtype_descr  33909 non-null  object        
 3   parcel                33909 non-null  object        
 4   date_entered          33909 non-null  datetime64[ns]
 5   date_issued           33909 non-null  datetime64[ns]
 6   construction_cost     33899 non-null  float64       
 7   address               33909 non-null  object        
 8   city                  33909 non-null  object        
 9   state                 33909 non-null  object        
 10  zip                   33909 non-null  int64         
 11  subdivision_lot       33909 non-null  object        
 12  contact               33908 non-null  object        
 13  permit_type     

In [79]:
# Bldg Permits Issued
# Which PERMIT TYPES are needed to answer data questions?
# DECISION: KEEPING ALL
# REASON: Even though some types only have small counts,
#         several may be in a single neighborhood assoc boundary
#         May group this once I start exploring, but will leave all details in place for now.

df_bldg_issued.permit_type_descr.value_counts()

Building Residential - New                  12780
Building Commercial - Rehab                  3384
Building Residential - Rehab                 3273
Building Residential - Addition              3200
Building Demolition Permit                   2796
Building Sign Permit                         2436
Building Use & Occupancy                     1524
Building Commercial - Tenant Finish Out      1113
Building Commercial - New                    1047
Building Tree Removal Permit                  495
Building Commercial - Addition                338
Building Residential - Roofing / Siding       240
Building Commercial - Roofing / Siding        239
Building Commercial - Shell                   173
Building Residential Rehab Storm Damage       119
Building Residential - Tenant Finish Out      117
Building Commercial - Foundation              114
Building Blasting Permit                      107
Building Residential - Change Contractor       91
Building Residential - Fire Damage             90


In [80]:
# Bldg Permits Issued
# Which PERMIT SUB-TYPES are needed to answer data questions?
# DECISION: KEEPING ALL
# REASON: Even though some types only have small counts,
#         several may be in a single neighborhood assoc boundary
#         May group this once I start exploring, but will leave all details in place for now.

list_bldg_issued_subtype_counts = df_bldg_issued.permit_subtype_descr.value_counts()
print(list_bldg_issued_subtype_counts)

Single Family Residence                    15399
Demolition Permit - Residential             2339
Sign - Ground /  Wall Signs                 2313
Multifamily, Townhome                       1346
General Office, Professional Services       1308
                                           ...  
Outpatient Clinic                              1
Camp, Gymnasiums                               1
Building Contractor Supply, Storage S-2        1
Day Care Center (Over 75) - Other              1
Country Club, Banquet Hall                     1
Name: permit_subtype_descr, Length: 178, dtype: int64


In [81]:
# Planning / Zoning Dept. Applications and Permits Issued
# This data shows ALL PENDING, and only the LAST TWO MONTHS of ISSUED.
# ACTIONS TAKEN BELOW: 
#    1. Converted date fields to datetime
#    2. Dropped the two columns that were all null values

df_planning.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 498 entries, 0 to 520
Data columns (total 27 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   date_submitted              498 non-null    datetime64[ns]
 1   application_type_descr      498 non-null    object        
 2   mpc_case_number             498 non-null    object        
 3   ordinance_number            144 non-null    object        
 4   status                      498 non-null    object        
 5   mpc_meeting_date            498 non-null    datetime64[ns]
 6   mpc_action                  223 non-null    object        
 7   project_name                404 non-null    object        
 8   location                    498 non-null    object        
 9   reviewer                    498 non-null    object        
 10  reviewer_email              442 non-null    object        
 11  case_descr                  489 non-null    object        

In [82]:
# Convert date_submitted to datetime 

df_planning.date_submitted = pd.to_datetime(df_planning.date_submitted)
df_planning.date_submitted.head(2)

0   2019-04-01
1   2019-11-27
Name: date_submitted, dtype: datetime64[ns]

In [83]:
# Convert mpc_meeting_date to datetime 

df_planning.mpc_meeting_date = pd.to_datetime(df_planning.mpc_meeting_date)
df_planning.mpc_meeting_date.head(2)

0   2020-06-11
1   2020-01-16
Name: mpc_meeting_date, dtype: datetime64[ns]

In [84]:
# To count nulls

df_planning.isnull().sum()

date_submitted                  0
application_type_descr          0
mpc_case_number                 0
ordinance_number              354
status                          0
mpc_meeting_date                0
mpc_action                    275
project_name                   94
location                        0
reviewer                        0
reviewer_email                 56
case_descr                      9
applicant                       4
applicant_representative        8
applicant_email                 9
applicant_phone                 7
applicant_address_1             8
applicant_address_2           364
applicant_city                  8
applicant_state                 8
applicant_zip                   7
council_3rd_reading_date      498
council_3rd_reading_action    498
council_district                9
latitude                        9
longitude                       9
mapped_location                 9
dtype: int64

In [85]:
# Dropping the two columns for '...3rd_reading_...' that have all null values. Keeping same df name.

df_planning = df_planning.drop(columns = ['council_3rd_reading_date'
                                          , 'council_3rd_reading_action'
                                         ])
df_planning.isnull().sum()

date_submitted                0
application_type_descr        0
mpc_case_number               0
ordinance_number            354
status                        0
mpc_meeting_date              0
mpc_action                  275
project_name                 94
location                      0
reviewer                      0
reviewer_email               56
case_descr                    9
applicant                     4
applicant_representative      8
applicant_email               9
applicant_phone               7
applicant_address_1           8
applicant_address_2         364
applicant_city                8
applicant_state               8
applicant_zip                 7
council_district              9
latitude                      9
longitude                     9
mapped_location               9
dtype: int64

In [86]:
# Confirming that the columns were dropped.

df_planning.columns

Index(['date_submitted', 'application_type_descr', 'mpc_case_number',
       'ordinance_number', 'status', 'mpc_meeting_date', 'mpc_action',
       'project_name', 'location', 'reviewer', 'reviewer_email', 'case_descr',
       'applicant', 'applicant_representative', 'applicant_email',
       'applicant_phone', 'applicant_address_1', 'applicant_address_2',
       'applicant_city', 'applicant_state', 'applicant_zip',
       'council_district', 'latitude', 'longitude', 'mapped_location'],
      dtype='object')

In [87]:
# Looking at value counts for mpc (Municipal Planning Committee) actions
# Choosing, again, to leave them all in place, for now.

df_planning.mpc_action.value_counts()

Recommend Approval                         98
Approved by MPC                            59
Approve with Conditions                    53
Deferred Indefinitely by App at MPC         4
Approved by Executive Director              2
Withdrawn                                   2
Deferred Indefinitely by App before MPC     2
Deferred by Applic before MPC               1
Deferred by MPC                             1
Disapprove with Conditions                  1
Name: mpc_action, dtype: int64

In [88]:
# Status has no nulls: This includes every entry
# mpc_action, above, is apparently only filled in when an action is taken.

df_planning.status.value_counts()

PENDING        216
CNCLACTIVE     195
NEW             64
MPCCOMPLETE     22
UNKNOWN          1
Name: status, dtype: int64

## Making new category column, for Residential, Commercial, Other

In [89]:
# Building Permit Applications df
# Stackoverflow resource: https://stackoverflow.com/questions/36653419/str-contains-to-create-new-column-in-pandas-dataframe

# Set a default value for new category column
df_bldg_apps['category'] = 'Other'

# Assign Commercial, Residential based on permit_type AND permit_subtype, to capture the most possible of each one
# Doing BOTH type and sub_type increased Commercial category the most
df_bldg_apps.loc[df_bldg_apps['permit_type_descr'].str.contains('Commercial'), 'category'] = 'Commercial'
df_bldg_apps.loc[df_bldg_apps['permit_subtype_descr'].str.contains('Commercial'), 'category'] = 'Commercial'

df_bldg_apps.loc[df_bldg_apps['permit_type_descr'].str.contains('Residential'), 'category'] = 'Residential'
df_bldg_apps.loc[df_bldg_apps['permit_subtype_descr'].str.contains('Residential'), 'category'] = 'Residential'

df_bldg_apps['category'].value_counts()

Residential    2204
Commercial      458
Other           444
Name: category, dtype: int64

In [90]:
# Building Permits Issued df
# Stackoverflow resource: https://stackoverflow.com/questions/36653419/str-contains-to-create-new-column-in-pandas-dataframe

# Set a default value for new category column
df_bldg_issued['category'] = 'Other'

# Assign Commercial, Residential based on permit_type AND permit_subtype, to capture the most possible of each one
# Doing BOTH type and sub_type increased Commercial category the most
df_bldg_issued.loc[df_bldg_issued['permit_type_descr'].str.contains('Commercial'), 'category'] = 'Commercial'
df_bldg_issued.loc[df_bldg_issued['permit_subtype_descr'].str.contains('Commercial'), 'category'] = 'Commercial'

df_bldg_issued.loc[df_bldg_issued['permit_type_descr'].str.contains('Residential'), 'category'] = 'Residential'
df_bldg_issued.loc[df_bldg_issued['permit_subtype_descr'].str.contains('Residential'), 'category'] = 'Residential'

df_bldg_issued['category'].value_counts()

Residential    22893
Commercial      7029
Other           3987
Name: category, dtype: int64

In [91]:
# Confirm new column added (at end)

df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,...,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location,lat,lon,category
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,2019-11-18,2019-12-09,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,...,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218",,,Residential
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,2020-03-12,2020-03-12,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,...,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\...",36.042219,-86.764816,Residential


In [92]:
df_planning.head(2)

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_phone,applicant_address_1,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_district,latitude,longitude,mapped_location
0,2019-04-01,Subdivision (Final Plat),2019S-086-001,,PENDING,2020-06-11,,FINAL PLAT RESUBDIVISION OF LOT 3 AND 4 ON THE...,227 MARCIA AVE 37209,Joren Dunnavant,...,615-490-3236,1711 Hayes Street,,Nashville,TN,37203,20 (Mary Carolyn Roberts),36.143923,-86.868254,"(36.143922831000054, -86.86825400699996)"
1,2019-11-27,Specific Plan (Final Site Plan),2016SP-076-008,,PENDING,2020-01-16,,RED OAKS TOWNHOMES,0 DEW ST 37206,Abbie Rickoff,...,615-351-3634,214 Oceanside Drive,,Nashville,TN,37204,06 (Brett Withers),36.165962,-86.75349,"(36.165961579000054, -86.75348957099999)"


In [93]:
# Planning Dept df
# This df doesn't reference commercial vs residential, so 'category' column isn't applicable.

df_planning.application_type_descr.value_counts()

Rezoning                                    86
Subdivision (Final Plat)                    81
Mandatory Referral Easement                 64
Specific Plan (Final Site Plan)             50
Specific Plan (New)                         39
Mandatory Referral Encroachment             24
Community Plan Amendment                    17
Planned Unit Development (Final Site Pl)    15
Mandatory Referral Agreement                15
Downtown Code (Final Site Plan)             13
Subdivision (Concept Plan)                   9
Mandatory Referral Property                  8
Downtown Code (Modify)                       8
Subdivision (Amendment)                      8
Specific Plan (Amend)                        8
Planned Unit Development (Cancel)            8
Planned Unit Development (Amend)             8
Mandatory Referral  R.O.W. Abandonment       7
Urban Design Overlay (Final)                 6
Historic Landmark (New)                      5
Urban Design Overlay (Modify)                3
Mandatory Ref

In [94]:
df_planning.status.value_counts()

PENDING        216
CNCLACTIVE     195
NEW             64
MPCCOMPLETE     22
UNKNOWN          1
Name: status, dtype: int64

 ## Sorting dfs by dates, descending.

In [95]:
# Bldg Permit Applications: Sorting by date_entered, descending.

df_bldg_apps = df_bldg_apps.sort_values(by = 'date_entered', ascending = False)
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon,category
563,T2020034761,Building Residential - New,Single Family Residence,8211018000,2020-06-04,,518 N 2ND ST,NASHVILLE,TN,37207,...,EASTEND CONTRACTING LLC,CARN,CAA01R301,3830078,to construct 2034SF single family residence. 5...,5.0,"518 N 2ND ST\nNASHVILLE, TN 37207\n(36.18049, ...",36.18049,-86.771778,Residential
1033,T2020034642,Building Residential - Rehab,Single Family Residence,9115008600,2020-06-04,80000.0,221 53RD AVE N,NASHVILLE,TN,37209,...,THE KINGSTON GROUP,CARR,CAA01R301,3829667,Finish out a bonus room (that is already frame...,24.0,"221 53RD AVE N\nNASHVILLE, TN 37209\n(36.14734...",36.147342,-86.850887,Residential


In [96]:
# Double-checking min/max in this df

print(df_bldg_apps.date_entered.min())
print(df_bldg_apps.date_entered.max())

2017-06-01 00:00:00
2020-06-04 00:00:00


In [97]:
# Bldg Permits Issued: Sorting by date_issued, descending.

df_bldg_issued = df_bldg_issued.sort_values(by = 'date_issued', ascending = False)
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,...,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location,lat,lon,category
12549,2020032857,Building Commercial Rehab Storm Damage,"Multifamily, Condominium 3&4 Unit Bldg",8214005500,2020-05-27,2020-06-04,800000.0,186 N 1ST ST,NASHVILLE,TN,...,CACL,CAA03R298,3824603,",there is some remodeling to be performed in t...",5.0,37019300.0,"186 N 1ST ST\nNASHVILLE, TN 37213\n(36.173878,...",36.173878,-86.774064,Commercial
12552,2020033336,Building Residential - Rehab,Single Family Residence,7309021300,2020-05-29,2020-06-04,80000.0,1913 ROSEBANK AVE,NASHVILLE,TN,...,CARR,CAA01R301,3826163,General rehabilitation of home within existing...,7.0,37011500.0,"1913 ROSEBANK AVE\nNASHVILLE, TN 37216\n(36.19...",36.198145,-86.704423,Residential


In [98]:
# Double-checking min/max in this df

print(df_bldg_issued.date_issued.min())
print(df_bldg_issued.date_issued.max())

2017-06-01 00:00:00
2020-06-04 00:00:00


In [99]:
# Planning Dept. Applications & Issued: Sorting by mpc_meeting_date, descending.
# Chose mpc_meeting_date because that 

df_planning = df_planning.sort_values(by = 'mpc_meeting_date', ascending = False)
df_planning.head(2)

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_phone,applicant_address_1,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_district,latitude,longitude,mapped_location
59,2020-05-20,Mandatory Referral Property,2020M-007PR-001,BL2020-305,CNCLACTIVE,2020-07-23,Recommend Approval,WEST HAMILTON ACQUISITION-EASEMENT,3129 W HAMILTON AVE 37218,Sharon O'Conner,...,,"METROPOLITAN COURTHOUSE, SUITE 108",P.O. BOX 196300,NASHVILLE,TN,37219-6300,01 (Jonathan Hall),36.215672,-86.822374,"(36.21567153700005, -86.82237439299996)"
377,2020-05-26,Mandatory Referral R.O.W. Abandonment,2020M-008AB-001,,NEW,2020-07-23,,UNNUMBERED ALLEY (OFF CENTER STREET) RIGHT-OF-...,0 CENTER ST 37138,Sharon O'Conner,...,615-862-8781,720 SOUTH FIFTH STREET,,NASHVILLE,TN,37206,11 (Larry Hagar),36.225994,-86.629702,"(36.22599424900005, -86.62970214499995)"


In [100]:
# Double-checking min/max dates in this df

print("Min date_submitted is: ",df_planning.date_submitted.min())
print("Max date_submitted is: ",df_planning.date_submitted.max())
print("Min mpc_meeting_date is: ",df_planning.mpc_meeting_date.min())
print("Max mpc_meeting_date is: ",df_planning.mpc_meeting_date.max())    # Future date is correct

Min date_submitted is:  2017-02-28 00:00:00
Max date_submitted is:  2020-06-04 00:00:00
Min mpc_meeting_date is:  2017-04-13 00:00:00
Max mpc_meeting_date is:  2020-07-23 00:00:00


## LAST STEP: Save output files in .. / data / processed / filename_clean

In [101]:
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon,category
563,T2020034761,Building Residential - New,Single Family Residence,8211018000,2020-06-04,,518 N 2ND ST,NASHVILLE,TN,37207,...,EASTEND CONTRACTING LLC,CARN,CAA01R301,3830078,to construct 2034SF single family residence. 5...,5.0,"518 N 2ND ST\nNASHVILLE, TN 37207\n(36.18049, ...",36.18049,-86.771778,Residential
1033,T2020034642,Building Residential - Rehab,Single Family Residence,9115008600,2020-06-04,80000.0,221 53RD AVE N,NASHVILLE,TN,37209,...,THE KINGSTON GROUP,CARR,CAA01R301,3829667,Finish out a bonus room (that is already frame...,24.0,"221 53RD AVE N\nNASHVILLE, TN 37209\n(36.14734...",36.147342,-86.850887,Residential


In [102]:
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,...,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location,lat,lon,category
12549,2020032857,Building Commercial Rehab Storm Damage,"Multifamily, Condominium 3&4 Unit Bldg",8214005500,2020-05-27,2020-06-04,800000.0,186 N 1ST ST,NASHVILLE,TN,...,CACL,CAA03R298,3824603,",there is some remodeling to be performed in t...",5.0,37019300.0,"186 N 1ST ST\nNASHVILLE, TN 37213\n(36.173878,...",36.173878,-86.774064,Commercial
12552,2020033336,Building Residential - Rehab,Single Family Residence,7309021300,2020-05-29,2020-06-04,80000.0,1913 ROSEBANK AVE,NASHVILLE,TN,...,CARR,CAA01R301,3826163,General rehabilitation of home within existing...,7.0,37011500.0,"1913 ROSEBANK AVE\nNASHVILLE, TN 37216\n(36.19...",36.198145,-86.704423,Residential


In [103]:
df_planning.head(2)

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_phone,applicant_address_1,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_district,latitude,longitude,mapped_location
59,2020-05-20,Mandatory Referral Property,2020M-007PR-001,BL2020-305,CNCLACTIVE,2020-07-23,Recommend Approval,WEST HAMILTON ACQUISITION-EASEMENT,3129 W HAMILTON AVE 37218,Sharon O'Conner,...,,"METROPOLITAN COURTHOUSE, SUITE 108",P.O. BOX 196300,NASHVILLE,TN,37219-6300,01 (Jonathan Hall),36.215672,-86.822374,"(36.21567153700005, -86.82237439299996)"
377,2020-05-26,Mandatory Referral R.O.W. Abandonment,2020M-008AB-001,,NEW,2020-07-23,,UNNUMBERED ALLEY (OFF CENTER STREET) RIGHT-OF-...,0 CENTER ST 37138,Sharon O'Conner,...,615-862-8781,720 SOUTH FIFTH STREET,,NASHVILLE,TN,37206,11 (Larry Hagar),36.225994,-86.629702,"(36.22599424900005, -86.62970214499995)"


In [104]:
df_na_bound.head(2)

Unnamed: 0,name,geometry
0,Historic Buena Vista,"MULTIPOLYGON (((-86.79511 36.17576, -86.79403 ..."
1,Charlotte Park,"MULTIPOLYGON (((-86.87460 36.15758, -86.87317 ..."


## Using Google Geocoding API
- Using this on all 3 dfs
    - Bldg Permit Applications: df_bldg_apps_null_latlon - 4 columns: address, city, state, zip
    - Bldg Permits Issued: df_bldg_issued_null_latlon - 4 columns: address, city, state, zip
    - Planning Dept.: df_planning_null_latlon - 1 columns: location zip (doesnt't always have zip)

In [105]:
pd.options.display.max_rows = 50
print(pd.options.display.max_rows)


50


In [106]:
# FOR: Bldg Permit Applications
# FIRST: Concatenate addresses to match Googles format, all in one cell separated by spaces

df_bldg_apps_null_latlon['full_address'] = (df_bldg_apps_null_latlon['address'].map(str) 
                                            + " " + df_bldg_apps_null_latlon['city'].map(str) 
                                            + " " + df_bldg_apps_null_latlon['state'].map(str) 
                                            + " " + df_bldg_apps_null_latlon['zip'].map(str)
                                           )

print(df_bldg_apps_null_latlon.shape)
df_bldg_apps_null_latlon.head(2)

(297, 5)


Unnamed: 0,address,city,state,zip,full_address
0,4929 CHUTNEY DR,ANTIOCH,TN,37013,4929 CHUTNEY DR ANTIOCH TN 37013
1,329 BODDINGTON LN,ANTIOCH,TN,37013,329 BODDINGTON LN ANTIOCH TN 37013


## New problems: Some addresses have "0" street number (new construction) 
- Will check to see if any have lat/lon in main dfs, then decide whether to keep or drop the rows.

## FIXING "0" STREET ADDRESS ISSUE FOR:  BLDG PERMIT APPLICATIONS for [1] Main df AND [2] subset df

In [107]:
# Finding address that start with 0 in address files where lat/lon is missing

df_bldg_apps_null_latlon.loc[df_bldg_apps_null_latlon['address'].str.startswith('0')]

Unnamed: 0,address,city,state,zip,full_address
17,0 BROOKSBORO PL,NASHVILLE,TN,37217,0 BROOKSBORO PL NASHVILLE TN 37217
94,0 UNKNOWN,NASHVILLE,TN,0,0 UNKNOWN NASHVILLE TN 0
120,0 ROBINSON RD,OLD HICKORY,TN,37138,0 ROBINSON RD OLD HICKORY TN 37138
136,0 CENTENNIAL BLVD,NASHVILLE,TN,37209,0 CENTENNIAL BLVD NASHVILLE TN 37209
189,0 VESTER RD,WHITES CREEK,TN,37189,0 VESTER RD WHITES CREEK TN 37189
249,0 ROBERTA ST,NASHVILLE,TN,37206,0 ROBERTA ST NASHVILLE TN 37206
255,0 CAROTHERS RD,NOLENSVILLE,TN,37135,0 CAROTHERS RD NOLENSVILLE TN 37135
270,0 ELM HILL PIKE,NASHVILLE,TN,37214,0 ELM HILL PIKE NASHVILLE TN 37214


In [108]:
# Curious to see if lat/lon is available for any addresses in FULL df for 
#      these types of addresses wtih "0" house number
# Returns 10 rows, 8 of which don't have lat/lon. The two WITH lat/lon are the same address.
# DECISION: Too few rows to be concerned with. Will drop them in next cell.

df_bldg_apps.loc[df_bldg_apps['address'].str.startswith('0')].head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon,category
251,T2020034772,Building Use & Occupancy,Master Permit Application,13500043700,2020-06-04,1222424.0,0 BROOKSBORO PL,NASHVILLE,TN,37217,...,Steelhead Building Group,CAUO,CAZ03A001,3830071,Ibex Grocery. REJECTED: COMMERCIAL APPLICATIO...,29.0,"0 BROOKSBORO PL\nNASHVILLE, TN 37217",,,Other
1065,T2020027243,Building Commercial Rehab Storm Damage,"Multifamily, Condominium 3&4 Unit Bldg",7900011600,2020-04-30,,0 CENTENNIAL BLVD,NASHVILLE,TN,37209,...,BluSky Restoration,CACL,CAA03R298,3810950,Replace metal roof and insulation from March 3...,20.0,"0 CENTENNIAL BLVD\nNASHVILLE, TN 37209",,,Commercial


In [109]:
# Building Permit Applications - Keeping all rows that DON'T have "0" for house number (total 10 rows)
# Original dataset 3106 rows. After change, should be 3106 - 10 = 3096 rows

df_bldg_apps = df_bldg_apps[~df_bldg_apps.address.str.startswith('0')]  # tilda means take everything EXECPT
df_bldg_apps.shape

(3096, 21)

In [110]:
df_bldg_apps = df_bldg_apps.reset_index(drop = True)
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon,category
0,T2020034761,Building Residential - New,Single Family Residence,8211018000,2020-06-04,,518 N 2ND ST,NASHVILLE,TN,37207,...,EASTEND CONTRACTING LLC,CARN,CAA01R301,3830078,to construct 2034SF single family residence. 5...,5.0,"518 N 2ND ST\nNASHVILLE, TN 37207\n(36.18049, ...",36.18049,-86.771778,Residential
1,T2020034642,Building Residential - Rehab,Single Family Residence,9115008600,2020-06-04,80000.0,221 53RD AVE N,NASHVILLE,TN,37209,...,THE KINGSTON GROUP,CARR,CAA01R301,3829667,Finish out a bonus room (that is already frame...,24.0,"221 53RD AVE N\nNASHVILLE, TN 37209\n(36.14734...",36.147342,-86.850887,Residential


In [111]:
# In subset (addresses only), dropping addresses that start with "0"
# There are 8, which is as expected, per findings in main df. 
# After dropping these 8, will expect there to be 297 - 8 = 289 rows

print(df_bldg_apps_null_latlon.shape)
df_bldg_apps_null_latlon.loc[df_bldg_apps_null_latlon['address'].str.startswith('0')]


(297, 5)


Unnamed: 0,address,city,state,zip,full_address
17,0 BROOKSBORO PL,NASHVILLE,TN,37217,0 BROOKSBORO PL NASHVILLE TN 37217
94,0 UNKNOWN,NASHVILLE,TN,0,0 UNKNOWN NASHVILLE TN 0
120,0 ROBINSON RD,OLD HICKORY,TN,37138,0 ROBINSON RD OLD HICKORY TN 37138
136,0 CENTENNIAL BLVD,NASHVILLE,TN,37209,0 CENTENNIAL BLVD NASHVILLE TN 37209
189,0 VESTER RD,WHITES CREEK,TN,37189,0 VESTER RD WHITES CREEK TN 37189
249,0 ROBERTA ST,NASHVILLE,TN,37206,0 ROBERTA ST NASHVILLE TN 37206
255,0 CAROTHERS RD,NOLENSVILLE,TN,37135,0 CAROTHERS RD NOLENSVILLE TN 37135
270,0 ELM HILL PIKE,NASHVILLE,TN,37214,0 ELM HILL PIKE NASHVILLE TN 37214


In [112]:
# Dropping the rows with addresses that start with "0"
# Confirmed, 289 rows now.

df_bldg_apps_null_latlon = df_bldg_apps_null_latlon[~df_bldg_apps_null_latlon.address.str.startswith('0')]
print(df_bldg_apps_null_latlon.shape)

(289, 5)


# FIXING "0" STREET ADDRESS ISSUE FOR:  BLDG PERMITS ISSUED for [1] Main df AND [2] subset df

In [113]:
# Finding address that start with 0 in address
# There are 40 rows total with "0" house number. Not significant amount in df of 33k rows total
# Lat/lon isn't available for most. Will just drop these rows from BOTH the main and subset(address only) dfs

print("Full df shape is: ", df_bldg_issued.shape)
print("Rows that have address with 0 house number: ", df_bldg_issued.loc[df_bldg_issued['address'].str.startswith('0')].shape)
# df_bldg_issued.loc[df_bldg_issued['address'].str.startswith('0')]   # Commented out so it doesn't show df

Full df shape is:  (33909, 23)
Rows that have address with 0 house number:  (40, 23)


In [114]:
# Keeping all rows that DON'T have "0" for house number (total 10 rows)
# Original dataset 3106 rows. After change, should be 33909 - 40 = 33869 rows

df_bldg_issued = df_bldg_issued[~df_bldg_issued.address.str.startswith('0')]  # tilda means take everything EXECPT...
df_bldg_issued.shape

(33869, 23)

In [115]:
# In subset (addresses only), dropping addresses that start with "0"
# There are 39. This is correct, because one of the rows wiht "0" house number had lat/lon so it isn't in this df.
# After dropping these 39, will expect there to be 3928 - 39 = 3889 rows

print("Full df shape is: ", df_bldg_issued_null_latlon.shape)
print("Rows that have address with 0 house number: "
      , df_bldg_issued_null_latlon.loc[df_bldg_issued_null_latlon['address'].str.startswith('0')].shape)
#df_bldg_issued_null_latlon.loc[df_bldg_issued_null_latlon['address'].str.startswith('0')]  #Run this to see full df

Full df shape is:  (3928, 4)
Rows that have address with 0 house number:  (39, 4)


In [116]:
# Dropping the rows with addresses that start with "0"
# Confirmed, 3889 rows.

df_bldg_issued_null_latlon = df_bldg_issued_null_latlon[~df_bldg_issued_null_latlon.address.str.startswith('0')]
print(df_bldg_issued_null_latlon.shape)

(3889, 4)


## FIXING "0" STREET ADDRESS ISSUE FOR:  PLANNING DEPT for [1] Main df AND [2] subset df

In [117]:
# Finding address that start with 0 in address files where lat/lon is missing
# There are 93 rows total with "0" house number HOWEVER - most of them have lat/lon so they don't need to be dropped

print("Full df shape is: ", df_planning.shape)
print("Rows that have address with 0 house number: ", df_planning.loc[df_planning['location'].str.startswith('0')].shape)
df_planning.loc[df_planning['location'].str.startswith('0')].head(2)

Full df shape is:  (498, 25)
Rows that have address with 0 house number:  (93, 25)


Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_phone,applicant_address_1,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_district,latitude,longitude,mapped_location
377,2020-05-26,Mandatory Referral R.O.W. Abandonment,2020M-008AB-001,,NEW,2020-07-23,,UNNUMBERED ALLEY (OFF CENTER STREET) RIGHT-OF-...,0 CENTER ST 37138,Sharon O'Conner,...,615-862-8781,720 SOUTH FIFTH STREET,,NASHVILLE,TN,37206,11 (Larry Hagar),36.225994,-86.629702,"(36.22599424900005, -86.62970214499995)"
459,2020-03-02,Subdivision (Concept Plan),2020S-078-001,,NEW,2020-07-23,,BELLA SERRA,0 BLUFF RD 37027,Jason Swaggart,...,615-297-5166,516 Heather Place,,Nashville,TN,37204,04 (Robert Swope),36.004748,-86.70563,"(36.00474826200008, -86.70562993799996)"


In [118]:
# In subset (addresses only), checking to see how many have "0" house number
# Only 2 rows
# After dropping these 2, will expect there to be 9 - 2 = 7 rows
# This is a small number of rows, however I want to have the code because I've requested more data from 
#     data.nashville.gov and hope to have several thousand rows soon!

print("Full df shape is: ", df_planning_null_latlon.shape)
print("Rows that have address with 0 house number: "
      , df_planning_null_latlon.loc[df_planning_null_latlon['location'].str.startswith('0')].shape)
df_planning_null_latlon.loc[df_planning_null_latlon['location'].str.startswith('0')]

Full df shape is:  (9, 1)
Rows that have address with 0 house number:  (2, 1)


Unnamed: 0,location
5,0 CANE RIDGE RD
7,0 MURFREESBORO PIKE 37013


In [119]:
# Dropping the rows with addresses that start with "0"
# Confirmed, 7 rows

df_planning_null_latlon = df_planning_null_latlon[~df_planning_null_latlon.location.str.startswith('0')]
print(df_planning_null_latlon.shape)

(7, 1)


## Concatenating addresses for use with Google maps API; and creating new dfs with full_address, only

## Building Permit Applications - Full Address df

In [120]:
# FOR: Bldg Permit Applications
# FIRST: Concatenate addresses to match Googles format, all in one cell separated by spaces

df_bldg_apps_null_latlon['full_address'] = (df_bldg_apps_null_latlon['address'].map(str) 
                                            + " " + df_bldg_apps_null_latlon['city'].map(str) 
                                            + " " + df_bldg_apps_null_latlon['state'].map(str) 
                                            + " " + df_bldg_apps_null_latlon['zip'].map(str)
                                           )
print(df_bldg_apps_null_latlon.shape)
df_bldg_apps_null_latlon.head(2)

(289, 5)


Unnamed: 0,address,city,state,zip,full_address
0,4929 CHUTNEY DR,ANTIOCH,TN,37013,4929 CHUTNEY DR ANTIOCH TN 37013
1,329 BODDINGTON LN,ANTIOCH,TN,37013,329 BODDINGTON LN ANTIOCH TN 37013


In [121]:
# READY TO USE IN GOOGLE MAPS API
# Created new df that has only the full address column

df_bldg_apps_full_address = df_bldg_apps_null_latlon['full_address'].to_frame()
df_bldg_apps_full_address.head(2)

Unnamed: 0,full_address
0,4929 CHUTNEY DR ANTIOCH TN 37013
1,329 BODDINGTON LN ANTIOCH TN 37013


In [122]:
type(df_bldg_apps_full_address)

pandas.core.frame.DataFrame

## Building Permits Issued - Full Address df

In [123]:
# FOR: Bldg Permits Issued
# FIRST: Concatenate addresses to match Googles format, all in one cell separated by spaces

df_bldg_issued_null_latlon['full_address'] = (df_bldg_issued_null_latlon['address'].map(str) 
                                            + " " + df_bldg_issued_null_latlon['city'].map(str) 
                                            + " " + df_bldg_issued_null_latlon['state'].map(str) 
                                            + " " + df_bldg_issued_null_latlon['zip'].map(str)
                                           )
df_bldg_issued_null_latlon.head(2)

Unnamed: 0,address,city,state,zip,full_address
0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,1037 LAWSONS RIDGE DR NASHVILLE TN 37218
1,812 BRIAR CIR,MADISON,TN,37115,812 BRIAR CIR MADISON TN 37115


In [124]:
# READY TO USE IN GOOGLE MAPS API
# Created new df that has only the full address column

df_bldg_issued_full_address = df_bldg_issued_null_latlon['full_address'].to_frame()
df_bldg_issued_full_address.head(2)

Unnamed: 0,full_address
0,1037 LAWSONS RIDGE DR NASHVILLE TN 37218
1,812 BRIAR CIR MADISON TN 37115


In [125]:
type(df_bldg_issued_full_address)

pandas.core.frame.DataFrame

## Planning Dept - Full Address df
- df_planning_null_latlon only has one column. Will just RENAME it to be "df_planning_full_address" for use with Google Maps API

In [126]:
# Renaming df_planning_null_latlon, and changing column header to "full_address"

df_planning_null_latlon.columns = ['full_address']

In [127]:
# To identify that this full address doesn't have city/state, naming it accordingly

df_planning_full_address_no_cityst = df_planning_null_latlon['full_address'].to_frame()
df_planning_full_address_no_cityst.head(2)

Unnamed: 0,full_address
0,998 SEVIER ST 37210
1,961 16TH AVE N 37208


## Using Google Maps API to get missing lat/lon
- Made code that works for single addres
- NEXT: Make for loop to run it on a list
- AFTER THAT: Make function 

In [128]:
# Read in 6/16/2020 at 11 am, then key is removed for security

google_api_key = 'AIzaSyDReKOB4irCUrQqlv6UsTPMEpi4LTIZrgM'

In [129]:
# A Geocoding API request takes the following form.... from this website:
# From: https://developers.google.com/maps/documentation/geocoding/intro#GeocodingRequests

endpoint = 'https://maps.googleapis.com/maps/api/geocode/json'

In [130]:
# Experimenting with for loop

for key in df_bldg_apps_full_address.iteritems():
    print(key)

('full_address', 0                4929  CHUTNEY DR ANTIOCH TN 37013
1              329  BODDINGTON LN ANTIOCH TN 37013
2              1536  DAVIDGE DR NASHVILLE TN 37221
3                4948  CHUTNEY DR ANTIOCH TN 37013
4             5404  LAKE WATER CT ANTIOCH TN 37013
                          ...                     
292            1328  MARITIME PRT ANTIOCH TN 37013
293              1024  TREVINO PL ANTIOCH TN 37013
294    3418  SHELBY BOTTOMS BND NASHVILLE TN 37206
295             2156  CAREFREE LN ANTIOCH TN 37013
296       4351  STONE HALL BLVD HERMITAGE TN 37076
Name: full_address, Length: 289, dtype: object)


In [131]:
df_bldg_apps_full_address.shape

(289, 1)

In [132]:
# Creating 10-row dataset to use for testing for loop

df_bldg_apps_full_add_tenrows = df_bldg_apps_full_address.iloc[:10]
df_bldg_apps_full_add_tenrows

Unnamed: 0,full_address
0,4929 CHUTNEY DR ANTIOCH TN 37013
1,329 BODDINGTON LN ANTIOCH TN 37013
2,1536 DAVIDGE DR NASHVILLE TN 37221
3,4948 CHUTNEY DR ANTIOCH TN 37013
4,5404 LAKE WATER CT ANTIOCH TN 37013
5,212 MUIR AVE NOLENSVILLE TN 37135
6,816 TWIN FALLS DR JOELTON TN 37080
7,4877 CHUTNEY DR ANTIOCH TN 37013
8,1356 GREENSTONE LN NASHVILLE TN 37221
9,833 HAMILTON CROSSINGS ANTIOCH TN 37013


## This isn't working yet. Taking longer than expected. Setting it aside and focusing on MVP with existing data. If I have time I'll come back to this to get the missing addresses.

- **ISSUE:** Missing lat/lon for about 10% of addresses in each df.
- **GOAL:** Get the lat/lon using Google Maps Geocoding API:
    - Bldg Permit Applications (for ~300 addresses)
    - Bldg Permits Issued (for ~3k addresses)
    - Planning Dept Applications / Issued (for ~10 addresses - maybe lots more later if my public info request is fulfilled by Metro)
- **DONE:** 
    - Dropped all addresses that have "0" as house number
    - Created dfs that have full_address field (only) with spaces, as required by Google Maps API. There are in the ../data/interim folder
        - df_bldg_apps_full_address
        - df_bldg_issued_full_address
        - df_planning_full_address_no_cityst  (street address & zip, only)
    - Created small 10-row df for testing: 
        - ddf_bldg_apps_full_add_tenrows
    - Wrote for loop that iterates over rows and pulls data from Google
- **NEXT:** 
    - Write function to turn for loop results into df (if that's the best approach? 
    - Pull out lat/lon and get it back into the original df, matched to the correct rows. 

CODE (in MarkDown field so it doesn't run)  

#def get_latlon  
'''  
To get lat/lon from Google Maps Geocoding API  
'''  
**for params in df_bldg_apps_full_add_tenrows.iterrows():  
    params = {  
        'address': df_bldg_apps_full_add_tenrows['full_address']  
        , 'key': google_api_key  
    }  
    response = requests.get(endpoint, params = params)  
    print(type(response.json()))  
    df_bldg_apps_ggl_results = pd.DataFrame.from_dict(response).append(response.json(), ignore_index = True)  
    df_bldg_apps_ggl_results**  

#TRIED THIS, didn't work  df_bldg_apps_ggl_results = pd.DataFrame.from_dict(response.json())  
     
**PUT IN NEXT CELLS, WHEN RUNNING CODE:  
requests.get(endpoint, params = params)  
response = requests.get(endpoint, params = params)  
response.json()#['results'][0]['geometry']['location']    # save responses to a list**  

## Saving cleaned files to use for EDA
- What "cleaned" means:
    - Columns renamed (kept all columns that had data; will subset them during EDA)
    - Data types cleaned (datetime, ESPG)
    - Used regex to extract lat/lon from mapped_location file
    - Deleted rows that had "0" for house number in street address

In [133]:
# Save Building Permit Applications file to data\cleaned folder
# Using index = False to prevent duplicate index from being created when files are read into EDA notebook.

df_bldg_apps.to_csv('../data/cleaned/bldg_permit_applications_clean.csv', index = False)

# Printing shape, will add to 2_eda notebook to validate it when it's read in
df_bldg_apps.shape

(3096, 21)

In [134]:
df_bldg_issued.to_csv('../data/cleaned/bldg_permits_issued_clean.csv', index = False)
df_bldg_issued.shape

(33869, 23)

In [135]:
df_planning.to_csv('../data/cleaned/planning_dept_clean.csv', index = False)
df_planning.shape

(498, 25)

In [136]:
df_na_bound.to_file('../data/cleaned/neighborhood_association_boundaries_clean.shp', index = False)
df_na_bound.shape

(288, 2)