## Capstone Project "What's Happening in my Neighborhood"
by Lori Butler  
*Notebook and data structure inspired by **Practicle Business Python** article by Chris Moffitt https://pbpython.com/notebook-process.html*

**Data Questions**
1. Which neighborhoods have had the most, and the least, growth during the past three years based on the count of building permits by type, and by cost of construction.
2. Do any neighborhoods have recent increases in building permit applications which may signal growth in the near future?
3. In instances where zoning changes are sought, might those be a leading indicator of a subsequent increase in building permit applications? If so, what is the average time lag?

**Data Sources**
- All data is through Friday 6/5/2020
- CSV
    - df_bldg_apps = Building Permit Applications, rolling 3 years
    - df_bldg_issued = Building Permits Issues, rolling 3 years
    - df_planning = Planning (Zoning) Department Applications (all pending), and issued (rolling 2 months after issuance)
- GEOJSON
    - df_na_boundaries = Neighborhood Boundaries geoJSON (polygon) folders

**Directory Structure:**
- data
    - raw = contains the unedited csv and Excel files used as teh source of analysis
    - interim = scratch/temp location
    - processed = final, clean files used for creating visualizations
    - might_use = data that I considered using (but didn't use)   
- notebooks
    - 1_dataprep = standard cleanup of columns, data types, and additional data manipulation specific to this project (nulls, lat/lon, rolling week/month, etc.)
    - 2_eda = EDA to create reports, and generate clean files to export to visualization tool.
- notes_and_docs = documents created for reference, metadata, etc
- reports = final reports for presentations

## Data Prep 01 Notebook: Column and Data Type Cleanup

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd   # Prerequisite: Anotes_and_docsctivate geospatial environment via Conda Prompt
import matplotlib.pyplot as plt  
import folium                   
from folium.plugins import MarkerCluster

# May not need this until I do EDA, but importing now as a reminder.
from shapely.geometry import Point  

# To use RegEx to pull ot lat/long from building permit applications/issued
import re         

# To create rolling dates to view past week, 30 days, etc.
from datetime import timedelta  

## Read in raw files

In [2]:
# Building Dept. Permit Applications

df_bldg_apps = pd.read_csv('../data/raw/Building_Permit_Applications_2020_06_05.csv')
df_bldg_apps.head(2)

Unnamed: 0,Permit #,Permit Type Description,Permit Subtype Description,Parcel,Date Entered,Date Issued,Construction Cost,Address,City,State,ZIP,Subdivision / Lot,Contact,Permit Type,Permit Subtype,IVR Tracking #,Purpose,Council District,Mapped Location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [3]:
# Building Dept. Permits Issued
# low_memory = False was added to remove a low-memory warning. Doing this prevents the
# system from trying to assign dtypes until after the full file has been read
# Resource: https://tinyurl.com/stackoverflow-low-memory

df_bldg_issued = pd.read_csv('../data/raw/Building_Permits_Issued_2020_06_05.csv', low_memory=False)
df_bldg_issued.head(2)

Unnamed: 0,Permit #,Permit Type Description,Permit Subtype Description,Parcel,Date Entered,Date Issued,Construction Cost,Address,City,State,ZIP,Subdivision / Lot,Contact,Permit Type,Permit Subtype,IVR Tracking #,Purpose,Council District,Census Tract,Mapped Location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [4]:
# Planning/Zoning Applications & Issued

df_planning = pd.read_csv('../data/raw/Planning_Department_Development_Applications_2020_06_05.csv')
df_planning.head(2)

Unnamed: 0,Date Submitted,Application Type Description,MPC Case #,Ordinance #,Status,MPC Meeting Date,MPC Action,Project Name,Location,Reviewer,...,Applicant Address 2,Applicant City,Applicant State,Applicant ZIP,Council 3rd Reading Date,Council 3rd Reading Action,Council District,Latitude,Longitude,Mapped Location
0,04/01/2019,Subdivision (Final Plat),2019S-086-001,,PENDING,06/11/2020,,FINAL PLAT RESUBDIVISION OF LOT 3 AND 4 ON THE...,227 MARCIA AVE 37209,Joren Dunnavant,...,,Nashville,TN,37203,,,20 (Mary Carolyn Roberts),36.143923,-86.868254,"(36.143922831000054, -86.86825400699996)"
1,11/27/2019,Specific Plan (Final Site Plan),2016SP-076-008,,PENDING,01/16/2020,,RED OAKS TOWNHOMES,0 DEW ST 37206,Abbie Rickoff,...,,Nashville,TN,37204,,,06 (Brett Withers),36.165962,-86.75349,"(36.165961579000054, -86.75348957099999)"


In [5]:
# Neighborhood Assoc boundaries GIS file, using geopandas

df_na_bound = gpd.read_file('../data/raw/Neighborhood Association Boundaries (GIS)_2020_06_03.geojson')
print(df_na_bound.crs)
df_na_bound.head(2)

epsg:4326


Unnamed: 0,name,geometry
0,Historic Buena Vista,"MULTIPOLYGON (((-86.79511 36.17576, -86.79403 ..."
1,Charlotte Park,"MULTIPOLYGON (((-86.87460 36.15758, -86.87317 ..."


## Column Name Cleanup - Bldg Permit Applications

In [6]:
df_bldg_apps.columns

Index(['Permit #', 'Permit Type Description', 'Permit Subtype Description',
       'Parcel', 'Date Entered', 'Date Issued', 'Construction Cost', 'Address',
       'City', 'State', 'ZIP', 'Subdivision / Lot', 'Contact', 'Permit Type',
       'Permit Subtype', 'IVR Tracking #', 'Purpose', 'Council District',
       'Mapped Location'],
      dtype='object')

In [7]:
df_bldg_apps.columns = (df_bldg_apps.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision___lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [8]:
# To correct issue with too many underscores in subdivision_lot column name

df_bldg_apps = df_bldg_apps.rename(columns = {'subdivision___lot': 'subdivision_lot'})
df_bldg_apps.columns

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'date_issued', 'construction_cost', 'address', 'city',
       'state', 'zip', 'subdivision_lot', 'contact', 'permit_type',
       'permit_subtype', 'ivr_tracking_number', 'purpose', 'council_district',
       'mapped_location'],
      dtype='object')

In [9]:
# To drop date_issued from building applications df

df_bldg_apps = df_bldg_apps.drop(columns = ['date_issued'])

In [10]:
# To confirm column dropped successfully 

df_bldg_apps.columns  

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'construction_cost', 'address', 'city', 'state', 'zip',
       'subdivision_lot', 'contact', 'permit_type', 'permit_subtype',
       'ivr_tracking_number', 'purpose', 'council_district',
       'mapped_location'],
      dtype='object')

## Column Name Cleanup -  Bldg Permits Issued

In [11]:
df_bldg_issued.columns

Index(['Permit #', 'Permit Type Description', 'Permit Subtype Description',
       'Parcel', 'Date Entered', 'Date Issued', 'Construction Cost', 'Address',
       'City', 'State', 'ZIP', 'Subdivision / Lot', 'Contact', 'Permit Type',
       'Permit Subtype', 'IVR Tracking #', 'Purpose', 'Council District',
       'Census Tract', 'Mapped Location'],
      dtype='object')

In [12]:
df_bldg_issued.columns = (df_bldg_issued.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision___lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [13]:
# To correct issue with too many underscores in subdivision_lot column name

df_bldg_issued = df_bldg_issued.rename(columns = {'subdivision___lot': 'subdivision_lot'})
df_bldg_issued.columns

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'date_issued', 'construction_cost', 'address', 'city',
       'state', 'zip', 'subdivision_lot', 'contact', 'permit_type',
       'permit_subtype', 'ivr_tracking_number', 'purpose', 'council_district',
       'census_tract', 'mapped_location'],
      dtype='object')

## Column Name Cleanup -  Planning Dept

In [14]:
df_planning.columns

Index(['Date Submitted', 'Application Type Description', 'MPC Case #',
       'Ordinance #', 'Status', 'MPC Meeting Date', 'MPC Action',
       'Project Name', 'Location', 'Reviewer', 'Reviewer Email',
       'Case Description', 'Applicant', 'Applicant Representative',
       'Applicant Email', 'Applicant Phone', 'Applicant Address 1',
       'Applicant Address 2', 'Applicant City', 'Applicant State',
       'Applicant ZIP', 'Council 3rd Reading Date',
       'Council 3rd Reading Action', 'Council District', 'Latitude',
       'Longitude', 'Mapped Location'],
      dtype='object')

In [15]:
df_planning.columns = (df_planning.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_planning.head(2)

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_3rd_reading_date,council_3rd_reading_action,council_district,latitude,longitude,mapped_location
0,04/01/2019,Subdivision (Final Plat),2019S-086-001,,PENDING,06/11/2020,,FINAL PLAT RESUBDIVISION OF LOT 3 AND 4 ON THE...,227 MARCIA AVE 37209,Joren Dunnavant,...,,Nashville,TN,37203,,,20 (Mary Carolyn Roberts),36.143923,-86.868254,"(36.143922831000054, -86.86825400699996)"
1,11/27/2019,Specific Plan (Final Site Plan),2016SP-076-008,,PENDING,01/16/2020,,RED OAKS TOWNHOMES,0 DEW ST 37206,Abbie Rickoff,...,,Nashville,TN,37204,,,06 (Brett Withers),36.165962,-86.75349,"(36.165961579000054, -86.75348957099999)"


## Convert date fields to datetime: Bldg Permit Apps & Issued

In [16]:
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [17]:
# Confirming that all dtypes are as expected
# NEED TO CHANGE: date_entered should be datetime.

df_bldg_apps.dtypes   

permit_number            object
permit_type_descr        object
permit_subtype_descr     object
parcel                   object
date_entered             object
construction_cost       float64
address                  object
city                     object
state                    object
zip                       int64
subdivision_lot          object
contact                  object
permit_type              object
permit_subtype           object
ivr_tracking_number       int64
purpose                  object
council_district        float64
mapped_location          object
dtype: object

In [18]:
# Further info about datatypes

df_bldg_apps.info()  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3106 entries, 0 to 3105
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   permit_number         3106 non-null   object 
 1   permit_type_descr     3106 non-null   object 
 2   permit_subtype_descr  3106 non-null   object 
 3   parcel                3106 non-null   object 
 4   date_entered          3106 non-null   object 
 5   construction_cost     1651 non-null   float64
 6   address               3106 non-null   object 
 7   city                  3106 non-null   object 
 8   state                 3106 non-null   object 
 9   zip                   3106 non-null   int64  
 10  subdivision_lot       3105 non-null   object 
 11  contact               3105 non-null   object 
 12  permit_type           3106 non-null   object 
 13  permit_subtype        3106 non-null   object 
 14  ivr_tracking_number   3106 non-null   int64  
 15  purpose              

In [19]:
# All values in date_issued are null. Will remove this from df

df_bldg_apps.isnull().sum()

permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
construction_cost       1455
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            1
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                   22
council_district           7
mapped_location            0
dtype: int64

In [20]:
# Convert date_entered to datetime 

df_bldg_apps.date_entered = pd.to_datetime(df_bldg_apps.date_entered)
df_bldg_apps.date_entered.head()

0   2020-03-11
1   2019-12-02
2   2018-11-29
3   2019-08-07
4   2020-06-04
Name: date_entered, dtype: datetime64[ns]

In [21]:
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [22]:
# Confirming that all dtypes are as expected

df_bldg_issued.dtypes   

permit_number            object
permit_type_descr        object
permit_subtype_descr     object
parcel                   object
date_entered             object
date_issued              object
construction_cost       float64
address                  object
city                     object
state                    object
zip                       int64
subdivision_lot          object
contact                  object
permit_type              object
permit_subtype           object
ivr_tracking_number       int64
purpose                  object
council_district        float64
census_tract            float64
mapped_location          object
dtype: object

In [23]:
# Further info about datatypes

df_bldg_issued.info()  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33909 entries, 0 to 33908
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   permit_number         33909 non-null  object 
 1   permit_type_descr     33909 non-null  object 
 2   permit_subtype_descr  33909 non-null  object 
 3   parcel                33909 non-null  object 
 4   date_entered          33909 non-null  object 
 5   date_issued           33909 non-null  object 
 6   construction_cost     33899 non-null  float64
 7   address               33909 non-null  object 
 8   city                  33909 non-null  object 
 9   state                 33909 non-null  object 
 10  zip                   33909 non-null  int64  
 11  subdivision_lot       33909 non-null  object 
 12  contact               33908 non-null  object 
 13  permit_type           33909 non-null  object 
 14  permit_subtype        33909 non-null  object 
 15  ivr_tracking_number

In [24]:
# Convert date_entered to datetime 

df_bldg_issued.date_entered = pd.to_datetime(df_bldg_issued.date_entered)
df_bldg_issued.date_entered.head()

0   2019-11-18
1   2020-03-12
2   2019-02-25
3   2019-07-15
4   2019-07-22
Name: date_entered, dtype: datetime64[ns]

In [25]:
# Convert date_entered to datetime 

df_bldg_issued.date_issued = pd.to_datetime(df_bldg_issued.date_issued)
df_bldg_issued.date_issued.head()

0   2019-12-09
1   2020-03-12
2   2019-07-22
3   2019-07-22
4   2019-07-22
Name: date_issued, dtype: datetime64[ns]

## Fix mapped_location, pull out lat/lon in Bldg Permit Apps & Issued

In [26]:
# Building Permit Applications

df_bldg_apps.mapped_location.unique()

array(['748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944, -86.879062)',
       '4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36.242681, -86.929594)',
       '4119 MURFREESBORO PIKE\nANTIOCH, TN 37013\n(36.032211, -86.594799)',
       ...,
       '6680 CHARLOTTE PIKE B-5\nNASHVILLE, TN 37209\n(36.136609, -86.883701)',
       '3805 CHARLOTTE AVE\nNASHVILLE, TN 37209\n(36.152561, -86.831473)',
       '5610A GRANNY WHITE PIKE\nBRENTWOOD, TN 37027\n(36.046438, -86.815953)'],
      dtype=object)

In [27]:
# To pull out lat/lng from:
# '748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944, -86.879062)'
# Regular Expression - pattern matching: Created on website https://regex101.com/
# RegEx link:  https://regex101.com/r/cAI6sh/1

# FULL PATTERN to get TWO groups to output:
# .*\((\d*\S\d*)\S\s(\S\d*\S\d*)\)

#  How do I tell it which df and column to look at to test? Do I need to test?
# Best to find both at once, or one at a tie?
# How save results to new lat/long columns?

pattern = re.compile(r'.*\((\d*\S\d*)\S\s(\S\d*\S\d*)\)', flags = re.MULTILINE)

def extract_lat_lon(map_loc):
    try:
        lat_lon_match = pattern.search(map_loc)
        lat = float(lat_lon_match.group(1))
        lon = float(lat_lon_match.group(2))
        return(lat, lon)
    except:
        return(np.NaN, np.NaN)

In [28]:
lat_lon = [extract_lat_lon(map_loc) for map_loc in df_bldg_apps.mapped_location]  #list comprehension

In [29]:
df_bldg_apps['lat'] = [lat for lat, lon in lat_lon]

In [30]:
df_bldg_apps['lon'] = [lon for lat, lon in lat_lon]

In [31]:
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,2020-03-11,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944...",36.125944,-86.879062
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,2019-12-02,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36....",36.242681,-86.929594


In [32]:
# Find out how many of the mapped locations had only the address, not the lat/lon
# Nulls in lat/lon 297 out of 3,106 = 9.5%
# To many to leave 'as is' Will try to add lat/lon fromUS Census geocoding (free) service:
# https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form

print(df_bldg_apps.shape)
print(df_bldg_apps.isnull().sum())

(3106, 20)
permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
construction_cost       1455
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            1
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                   22
council_district           7
mapped_location            0
lat                      297
lon                      297
dtype: int64


In [33]:
# Building Permits Issued

df_bldg_issued.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33909 entries, 0 to 33908
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   permit_number         33909 non-null  object        
 1   permit_type_descr     33909 non-null  object        
 2   permit_subtype_descr  33909 non-null  object        
 3   parcel                33909 non-null  object        
 4   date_entered          33909 non-null  datetime64[ns]
 5   date_issued           33909 non-null  datetime64[ns]
 6   construction_cost     33899 non-null  float64       
 7   address               33909 non-null  object        
 8   city                  33909 non-null  object        
 9   state                 33909 non-null  object        
 10  zip                   33909 non-null  int64         
 11  subdivision_lot       33909 non-null  object        
 12  contact               33908 non-null  object        
 13  permit_type     

In [34]:
# Review mapped_location in bldg_issued df before applying regex/function
df_bldg_issued.mapped_location.unique()

array(['1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218',
       '210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\n(36.042219, -86.764816)',
       '812 BRIAR CIR\nMADISON, TN 37115', ...,
       '131 EDENWOLD RD\nMADISON, TN 37115\n(36.287001, -86.703591)',
       '110 2ND AVE N\nNASHVILLE, TN 37201\n(36.162296, -86.77544)',
       '1382 RURAL HILL RD 320\nANTIOCH, TN 37013\n(36.056805, -86.649469)'],
      dtype=object)

In [35]:
# List comprehension

lat_lon = [extract_lat_lon(map_loc) for map_loc in df_bldg_issued.mapped_location]  

In [36]:
df_bldg_issued['lat'] = [lat for lat, lon in lat_lon]

In [37]:
df_bldg_issued['lon'] = [lon for lat, lon in lat_lon]

In [38]:
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location,lat,lon
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,2019-11-18,2019-12-09,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,...,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218",,
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,2020-03-12,2020-03-12,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,...,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\...",36.042219,-86.764816


In [39]:
# Find out how many of the mapped locations had only the address, not the lat/lon

print(df_bldg_issued.shape)
print(df_bldg_issued.isnull().sum())

(33909, 22)
permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
date_issued                0
construction_cost         10
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            0
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                  467
council_district          46
census_tract              43
mapped_location            0
lat                     3928
lon                     3928
dtype: int64


# NEXT ACTION ITEM with dates: Get lat/lon for addresses that didn't have them, using census geocoder website (note is on kanban board)

## Creating new columns for rolling week, month, quarter, year in Bldg Permit Applications & Issued

In [40]:
# New columns in df_bldg_apps for prior 
# week, month, quarter, year based on last date in date_entered column
# FIRST: check date format - it's object.

df_bldg_apps.date_entered.head()

0   2020-03-11
1   2019-12-02
2   2018-11-29
3   2019-08-07
4   2020-06-04
Name: date_entered, dtype: datetime64[ns]

In [41]:
# Calculate timedelta:  today's date minus max date in date_entered
# https://stackoverflow.com/questions/46459868/add-column-of-new-dates-from-existing-columns-using-pandas

# df_bldg_apps_clean01.prior_week = df_bldg_apps.date_entered - timedelta(days = 7)
# df_bldg_appss_clean01.prior_week.head()

## Exploring and Cleaning: Planning Dept Applications / Issued

In [42]:
# Checking nulls, building permit applications

df_planning.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 521 entries, 0 to 520
Data columns (total 27 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   date_submitted              521 non-null    object 
 1   application_type_descr      521 non-null    object 
 2   mpc_case_number             520 non-null    object 
 3   ordinance_number            157 non-null    object 
 4   status                      521 non-null    object 
 5   mpc_meeting_date            521 non-null    object 
 6   mpc_action                  235 non-null    object 
 7   project_name                417 non-null    object 
 8   location                    498 non-null    object 
 9   reviewer                    521 non-null    object 
 10  reviewer_email              465 non-null    object 
 11  case_descr                  503 non-null    object 
 12  applicant                   501 non-null    object 
 13  applicant_representative    497 non

In [43]:
# Taking a look at the types of info in Planning Dept data

print(df_planning.application_type_descr.nunique())
print(df_planning.application_type_descr.value_counts())

31
Rezoning                                    88
Subdivision (Final Plat)                    86
Mandatory Referral Easement                 65
Specific Plan (Final Site Plan)             50
Specific Plan (New)                         40
Mandatory Referral Encroachment             24
Community Plan Amendment                    17
Mandatory Referral Agreement                15
Planned Unit Development (Final Site Pl)    15
Text Amendment                              14
Downtown Code (Final Site Plan)             13
Subdivision (Concept Plan)                   9
Downtown Code (Modify)                       8
Planned Unit Development (Cancel)            8
Subdivision (Amendment)                      8
Specific Plan (Amend)                        8
Mandatory Referral Property                  8
Planned Unit Development (Amend)             8
Mandatory Referral  R.O.W. Abandonment       7
Urban Design Overlay (Final)                 6
Historic Landmark (New)                      5
Urban Desi

In [44]:
# Taking a look at the types of info in Planning Dept data

print(df_planning.status.nunique())
print(df_planning.status.value_counts())

5
PENDING        220
CNCLACTIVE     207
NEW             71
MPCCOMPLETE     22
UNKNOWN          1
Name: status, dtype: int64


## Exploring and Cleaning: Neighborhood Assoc Boundaries (GIS)

In [45]:
df_na_bound.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 288 entries, 0 to 287
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   name      288 non-null    object  
 1   geometry  288 non-null    geometry
dtypes: geometry(1), object(1)
memory usage: 4.6+ KB


In [46]:
df_na_bound.isnull().sum()

name        0
geometry    0
dtype: int64

## LAST STEP: Save output files in .. / data / processed / filename_clean
NOTE: "clean02" means the cleaning was done in dataprep02 notebook

df = df.to_csv('../data/processed/filename_clean.csv', index = False)

### Need to ask how to save GeoJSON file