## Capstone Project "What's Happening in my Neighborhood"
by Lori Butler  
*Notebook and data structure inspired by **Practicle Business Python** article by Chris Moffitt https://pbpython.com/notebook-process.html*

**Data Questions**
1. Which neighborhoods have had the most, and the least, growth during the past three years based on the count of building permits by type, and by cost of construction.
2. Do any neighborhoods have recent increases in building permit applications which may signal growth in the near future?
3. In instances where zoning changes are sought, might those be a leading indicator of a subsequent increase in building permit applications? If so, what is the average time lag?

**Data Sources**
- All data is through Friday 6/5/2020
- CSV
    - df_bldg_apps = Building Permit Applications, rolling 3 years
    - df_bldg_issued = Building Permits Issues, rolling 3 years
    - df_planning = Planning (Zoning) Department Applications (all pending), and issued (rolling 2 months after issuance)
- GEOJSON
    - df_na_bound = Neighborhood Boundaries GeoJSON (polygon) folders

**Directory Structure:**
- data
    - raw = contains the unedited csv and Excel files used as teh source of analysis
    - processed = final, clean files used for creating visualizations
    - final = clean files, ready to be used for creating visualizations   
- notebooks
    - 1_dataprep = standard cleanup of columns, data types, and additional data manipulation specific to this project (nulls, lat/lon, rolling week/month, etc.)
    - 2_eda = EDA to create reports, and generate clean files to export to visualization tool.
- notes_and_docs = documents created for reference, metadata, etc
- reports = final reports for presentations

## Data Prep 01 Notebook: Column and Data Type Cleanup

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd   # Prerequisite: Anotes_and_docsctivate geospatial environment via Conda Prompt
import matplotlib.pyplot as plt  
import folium                   
from folium.plugins import MarkerCluster

# May not need this until I do EDA, but importing now as a reminder.
from shapely.geometry import Point  

# To use RegEx to pull ot lat/long from building permit applications/issued
import re         

# To create rolling dates to view past week, 30 days, etc.
from datetime import timedelta  

## Read in raw files

In [2]:
# Building Dept. Permit Applications

df_bldg_apps = pd.read_csv('../data/raw/Building_Permit_Applications_2020_06_05.csv')
df_bldg_apps.head(2)

Unnamed: 0,Permit #,Permit Type Description,Permit Subtype Description,Parcel,Date Entered,Date Issued,Construction Cost,Address,City,State,ZIP,Subdivision / Lot,Contact,Permit Type,Permit Subtype,IVR Tracking #,Purpose,Council District,Mapped Location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [3]:
# Building Dept. Permits Issued
# low_memory = False was added to remove a low-memory warning. Doing this prevents the
# system from trying to assign dtypes until after the full file has been read
# Resource: https://tinyurl.com/stackoverflow-low-memory

df_bldg_issued = pd.read_csv('../data/raw/Building_Permits_Issued_2020_06_05.csv', low_memory=False)
df_bldg_issued.head(2)

Unnamed: 0,Permit #,Permit Type Description,Permit Subtype Description,Parcel,Date Entered,Date Issued,Construction Cost,Address,City,State,ZIP,Subdivision / Lot,Contact,Permit Type,Permit Subtype,IVR Tracking #,Purpose,Council District,Census Tract,Mapped Location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [4]:
# Planning/Zoning Applications & Issued

df_planning = pd.read_csv('../data/raw/Planning_Department_Development_Applications_2020_06_05.csv')
df_planning.head(2)

Unnamed: 0,Date Submitted,Application Type Description,MPC Case #,Ordinance #,Status,MPC Meeting Date,MPC Action,Project Name,Location,Reviewer,...,Applicant Address 2,Applicant City,Applicant State,Applicant ZIP,Council 3rd Reading Date,Council 3rd Reading Action,Council District,Latitude,Longitude,Mapped Location
0,04/01/2019,Subdivision (Final Plat),2019S-086-001,,PENDING,06/11/2020,,FINAL PLAT RESUBDIVISION OF LOT 3 AND 4 ON THE...,227 MARCIA AVE 37209,Joren Dunnavant,...,,Nashville,TN,37203,,,20 (Mary Carolyn Roberts),36.143923,-86.868254,"(36.143922831000054, -86.86825400699996)"
1,11/27/2019,Specific Plan (Final Site Plan),2016SP-076-008,,PENDING,01/16/2020,,RED OAKS TOWNHOMES,0 DEW ST 37206,Abbie Rickoff,...,,Nashville,TN,37204,,,06 (Brett Withers),36.165962,-86.75349,"(36.165961579000054, -86.75348957099999)"


In [5]:
# Neighborhood Assoc boundaries GIS file, using geopandas

df_na_bound = gpd.read_file('../data/raw/Neighborhood Association Boundaries (GIS)_2020_06_03.geojson')
print(df_na_bound.crs)
df_na_bound.head(2)

epsg:4326


Unnamed: 0,name,geometry
0,Historic Buena Vista,"MULTIPOLYGON (((-86.79511 36.17576, -86.79403 ..."
1,Charlotte Park,"MULTIPOLYGON (((-86.87460 36.15758, -86.87317 ..."


## Column Name Cleanup - Bldg Permit Applications

In [6]:
df_bldg_apps.columns

Index(['Permit #', 'Permit Type Description', 'Permit Subtype Description',
       'Parcel', 'Date Entered', 'Date Issued', 'Construction Cost', 'Address',
       'City', 'State', 'ZIP', 'Subdivision / Lot', 'Contact', 'Permit Type',
       'Permit Subtype', 'IVR Tracking #', 'Purpose', 'Council District',
       'Mapped Location'],
      dtype='object')

In [7]:
df_bldg_apps.columns = (df_bldg_apps.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision___lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [8]:
# To correct issue with too many underscores in subdivision_lot column name

df_bldg_apps = df_bldg_apps.rename(columns = {'subdivision___lot': 'subdivision_lot'})
df_bldg_apps.columns

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'date_issued', 'construction_cost', 'address', 'city',
       'state', 'zip', 'subdivision_lot', 'contact', 'permit_type',
       'permit_subtype', 'ivr_tracking_number', 'purpose', 'council_district',
       'mapped_location'],
      dtype='object')

In [9]:
# To drop date_issued from building applications df

df_bldg_apps = df_bldg_apps.drop(columns = ['date_issued'])

In [10]:
# To confirm column dropped successfully 

df_bldg_apps.columns  

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'construction_cost', 'address', 'city', 'state', 'zip',
       'subdivision_lot', 'contact', 'permit_type', 'permit_subtype',
       'ivr_tracking_number', 'purpose', 'council_district',
       'mapped_location'],
      dtype='object')

## Column Name Cleanup -  Bldg Permits Issued

In [11]:
df_bldg_issued.columns

Index(['Permit #', 'Permit Type Description', 'Permit Subtype Description',
       'Parcel', 'Date Entered', 'Date Issued', 'Construction Cost', 'Address',
       'City', 'State', 'ZIP', 'Subdivision / Lot', 'Contact', 'Permit Type',
       'Permit Subtype', 'IVR Tracking #', 'Purpose', 'Council District',
       'Census Tract', 'Mapped Location'],
      dtype='object')

In [12]:
df_bldg_issued.columns = (df_bldg_issued.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision___lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [13]:
# To correct issue with too many underscores in subdivision_lot column name

df_bldg_issued = df_bldg_issued.rename(columns = {'subdivision___lot': 'subdivision_lot'})
df_bldg_issued.columns

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'date_issued', 'construction_cost', 'address', 'city',
       'state', 'zip', 'subdivision_lot', 'contact', 'permit_type',
       'permit_subtype', 'ivr_tracking_number', 'purpose', 'council_district',
       'census_tract', 'mapped_location'],
      dtype='object')

## Column Name Cleanup -  Planning Dept

In [14]:
df_planning.columns

Index(['Date Submitted', 'Application Type Description', 'MPC Case #',
       'Ordinance #', 'Status', 'MPC Meeting Date', 'MPC Action',
       'Project Name', 'Location', 'Reviewer', 'Reviewer Email',
       'Case Description', 'Applicant', 'Applicant Representative',
       'Applicant Email', 'Applicant Phone', 'Applicant Address 1',
       'Applicant Address 2', 'Applicant City', 'Applicant State',
       'Applicant ZIP', 'Council 3rd Reading Date',
       'Council 3rd Reading Action', 'Council District', 'Latitude',
       'Longitude', 'Mapped Location'],
      dtype='object')

In [15]:
df_planning.columns = (df_planning.columns
                        .str.replace(" ", "_")
                        .str.replace("/", "_")
                        .str.replace("Description", "descr")
                        .str.replace("#", "number")
                        .str.lower())
df_planning.head(2)

Unnamed: 0,date_submitted,application_type_descr,mpc_case_number,ordinance_number,status,mpc_meeting_date,mpc_action,project_name,location,reviewer,...,applicant_address_2,applicant_city,applicant_state,applicant_zip,council_3rd_reading_date,council_3rd_reading_action,council_district,latitude,longitude,mapped_location
0,04/01/2019,Subdivision (Final Plat),2019S-086-001,,PENDING,06/11/2020,,FINAL PLAT RESUBDIVISION OF LOT 3 AND 4 ON THE...,227 MARCIA AVE 37209,Joren Dunnavant,...,,Nashville,TN,37203,,,20 (Mary Carolyn Roberts),36.143923,-86.868254,"(36.143922831000054, -86.86825400699996)"
1,11/27/2019,Specific Plan (Final Site Plan),2016SP-076-008,,PENDING,01/16/2020,,RED OAKS TOWNHOMES,0 DEW ST 37206,Abbie Rickoff,...,,Nashville,TN,37204,,,06 (Brett Withers),36.165962,-86.75349,"(36.165961579000054, -86.75348957099999)"


## Convert date fields to datetime: Bldg Permit Apps & Issued

In [16]:
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,03/11/2020,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944..."
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,12/02/2019,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36...."


In [17]:
# Confirming that all dtypes are as expected
# NEED TO CHANGE: date_entered should be datetime.

df_bldg_apps.dtypes   

permit_number            object
permit_type_descr        object
permit_subtype_descr     object
parcel                   object
date_entered             object
construction_cost       float64
address                  object
city                     object
state                    object
zip                       int64
subdivision_lot          object
contact                  object
permit_type              object
permit_subtype           object
ivr_tracking_number       int64
purpose                  object
council_district        float64
mapped_location          object
dtype: object

In [18]:
# Further info about datatypes

df_bldg_apps.info()  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3106 entries, 0 to 3105
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   permit_number         3106 non-null   object 
 1   permit_type_descr     3106 non-null   object 
 2   permit_subtype_descr  3106 non-null   object 
 3   parcel                3106 non-null   object 
 4   date_entered          3106 non-null   object 
 5   construction_cost     1651 non-null   float64
 6   address               3106 non-null   object 
 7   city                  3106 non-null   object 
 8   state                 3106 non-null   object 
 9   zip                   3106 non-null   int64  
 10  subdivision_lot       3105 non-null   object 
 11  contact               3105 non-null   object 
 12  permit_type           3106 non-null   object 
 13  permit_subtype        3106 non-null   object 
 14  ivr_tracking_number   3106 non-null   int64  
 15  purpose              

In [19]:
# All values in date_issued are null. Will remove this from df

df_bldg_apps.isnull().sum()

permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
construction_cost       1455
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            1
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                   22
council_district           7
mapped_location            0
dtype: int64

In [20]:
# Convert date_entered to datetime 

df_bldg_apps.date_entered = pd.to_datetime(df_bldg_apps.date_entered)
df_bldg_apps.date_entered.head()

0   2020-03-11
1   2019-12-02
2   2018-11-29
3   2019-08-07
4   2020-06-04
Name: date_entered, dtype: datetime64[ns]

In [21]:
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,11/18/2019,12/09/2019,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218,LOT 49 CARRINGTON PLACE PH 5,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218"
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,03/12/2020,03/12/2020,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,37027,UNIT 70 HEARTHSTONE MANOR CONDOMINIUM PHASE 4,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\..."


In [22]:
# Confirming that all dtypes are as expected

df_bldg_issued.dtypes   

permit_number            object
permit_type_descr        object
permit_subtype_descr     object
parcel                   object
date_entered             object
date_issued              object
construction_cost       float64
address                  object
city                     object
state                    object
zip                       int64
subdivision_lot          object
contact                  object
permit_type              object
permit_subtype           object
ivr_tracking_number       int64
purpose                  object
council_district        float64
census_tract            float64
mapped_location          object
dtype: object

In [23]:
# Further info about datatypes

df_bldg_issued.info()  

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33909 entries, 0 to 33908
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   permit_number         33909 non-null  object 
 1   permit_type_descr     33909 non-null  object 
 2   permit_subtype_descr  33909 non-null  object 
 3   parcel                33909 non-null  object 
 4   date_entered          33909 non-null  object 
 5   date_issued           33909 non-null  object 
 6   construction_cost     33899 non-null  float64
 7   address               33909 non-null  object 
 8   city                  33909 non-null  object 
 9   state                 33909 non-null  object 
 10  zip                   33909 non-null  int64  
 11  subdivision_lot       33909 non-null  object 
 12  contact               33908 non-null  object 
 13  permit_type           33909 non-null  object 
 14  permit_subtype        33909 non-null  object 
 15  ivr_tracking_number

In [24]:
# Convert date_entered to datetime 

df_bldg_issued.date_entered = pd.to_datetime(df_bldg_issued.date_entered)
df_bldg_issued.date_entered.head()

0   2019-11-18
1   2020-03-12
2   2019-02-25
3   2019-07-15
4   2019-07-22
Name: date_entered, dtype: datetime64[ns]

In [25]:
# Convert date_entered to datetime 

df_bldg_issued.date_issued = pd.to_datetime(df_bldg_issued.date_issued)
df_bldg_issued.date_issued.head()

0   2019-12-09
1   2020-03-12
2   2019-07-22
3   2019-07-22
4   2019-07-22
Name: date_issued, dtype: datetime64[ns]

## Fix mapped_location, pull out lat/lon in Bldg Permit Apps & Issued

In [26]:
# Building Permit Applications

df_bldg_apps.mapped_location.unique()

array(['748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944, -86.879062)',
       '4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36.242681, -86.929594)',
       '4119 MURFREESBORO PIKE\nANTIOCH, TN 37013\n(36.032211, -86.594799)',
       ...,
       '6680 CHARLOTTE PIKE B-5\nNASHVILLE, TN 37209\n(36.136609, -86.883701)',
       '3805 CHARLOTTE AVE\nNASHVILLE, TN 37209\n(36.152561, -86.831473)',
       '5610A GRANNY WHITE PIKE\nBRENTWOOD, TN 37027\n(36.046438, -86.815953)'],
      dtype=object)

In [27]:
# To pull out lat/lng from:
# '748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944, -86.879062)'
# Regular Expression - pattern matching: Created on website https://regex101.com/
# RegEx link:  https://regex101.com/r/cAI6sh/1

# FULL PATTERN to get TWO groups to output:
# .*\((\d*\S\d*)\S\s(\S\d*\S\d*)\)

#  How do I tell it which df and column to look at to test? Do I need to test?
# Best to find both at once, or one at a tie?
# How save results to new lat/long columns?

pattern = re.compile(r'.*\((\d*\S\d*)\S\s(\S\d*\S\d*)\)', flags = re.MULTILINE)

def extract_lat_lon(map_loc):
    try:
        lat_lon_match = pattern.search(map_loc)
        lat = float(lat_lon_match.group(1))
        lon = float(lat_lon_match.group(2))
        return(lat, lon)
    except:
        return(np.NaN, np.NaN)

In [28]:
lat_lon = [extract_lat_lon(map_loc) for map_loc in df_bldg_apps.mapped_location]  #list comprehension

In [29]:
df_bldg_apps['lat'] = [lat for lat, lon in lat_lon]

In [30]:
df_bldg_apps['lon'] = [lon for lat, lon in lat_lon]

In [31]:
df_bldg_apps.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon
0,T2020016213,Building Residential - New,Single Family Residence,10216006100,2020-03-11,,748 DARDEN PL,NASHVILLE,TN,37205,LOT 168 SEC 9 PT 2 HILLWOOD EST,Kingdom Builders of Tennesse,CARN,CAA01R301,3781725,New Single family dwelling. REJECTED: APPLICA...,23.0,"748 DARDEN PL\nNASHVILLE, TN 37205\n(36.125944...",36.125944,-86.879062
1,T2019073204,Building Moving Permit,Moving Permit - Residential,4600002700,2019-12-02,2500.0,4836 BULL RUN RD,ASHLAND CITY,TN,37015,N OF BULL RUN RD W OF OLD HICKORY BLVD,CLAYTON HOMES #054,CAMV,CAZ09A001,3736813,Move existing mobile home from property out of...,1.0,"4836 BULL RUN RD\nASHLAND CITY, TN 37015\n(36....",36.242681,-86.929594


In [32]:
# Find out how many of the mapped locations had only the address, not the lat/lon
# Nulls in lat/lon 297 out of 3,106 = 9.5%
# To many to leave 'as is' Will try to add lat/lon fromUS Census geocoding (free) service:
# https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form

print(df_bldg_apps.shape)
print(df_bldg_apps.isnull().sum())

(3106, 20)
permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
construction_cost       1455
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            1
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                   22
council_district           7
mapped_location            0
lat                      297
lon                      297
dtype: int64


In [33]:
# Building Permits Issued

df_bldg_issued.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33909 entries, 0 to 33908
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   permit_number         33909 non-null  object        
 1   permit_type_descr     33909 non-null  object        
 2   permit_subtype_descr  33909 non-null  object        
 3   parcel                33909 non-null  object        
 4   date_entered          33909 non-null  datetime64[ns]
 5   date_issued           33909 non-null  datetime64[ns]
 6   construction_cost     33899 non-null  float64       
 7   address               33909 non-null  object        
 8   city                  33909 non-null  object        
 9   state                 33909 non-null  object        
 10  zip                   33909 non-null  int64         
 11  subdivision_lot       33909 non-null  object        
 12  contact               33908 non-null  object        
 13  permit_type     

In [34]:
# Review mapped_location in bldg_issued df before applying regex/function

df_bldg_issued.mapped_location.unique()

array(['1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218',
       '210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\n(36.042219, -86.764816)',
       '812 BRIAR CIR\nMADISON, TN 37115', ...,
       '131 EDENWOLD RD\nMADISON, TN 37115\n(36.287001, -86.703591)',
       '110 2ND AVE N\nNASHVILLE, TN 37201\n(36.162296, -86.77544)',
       '1382 RURAL HILL RD 320\nANTIOCH, TN 37013\n(36.056805, -86.649469)'],
      dtype=object)

In [35]:
# Applying function written for building applications to this building permits issued df

lat_lon = [extract_lat_lon(map_loc) for map_loc in df_bldg_issued.mapped_location]  

In [36]:
df_bldg_issued['lat'] = [lat for lat, lon in lat_lon]

In [37]:
df_bldg_issued['lon'] = [lon for lat, lon in lat_lon]

In [38]:
df_bldg_issued.head(2)

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location,lat,lon
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,2019-11-18,2019-12-09,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,...,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218",,
1,2020016259,Building Residential - Rehab,Single Family Residence,160150A07000CO,2020-03-12,2020-03-12,12000.0,210 HEARTHSTONE MANOR LN,BRENTWOOD,TN,...,ACCESS & MOBILITY INC,CARR,CAA01R301,3781961,to install a new elevator/platform lift from g...,4.0,37018803.0,"210 HEARTHSTONE MANOR LN\nBRENTWOOD, TN 37027\...",36.042219,-86.764816


## Looked at addresses that have no lat/lon from Bldg Permit Application & Issued dfs
- Discovered there was a meaningful amount of relevant addresses that are missing lat/lon
- Will submit these to census tool to get lat/lon
- Chose to do this extra step because a meaninful number of addresses were missing lat/lon:
    - Bldg Permit Applications missing 297 out of 3,106 = 9.5% and 249 are new residential
    - Bldg Permits Issued missing 3,928 out of 33,909 = 11.5%

In [39]:
# Find out how many of the mapped locations had only the address, not the lat/lon: Bldg Permit Applications

print(df_bldg_apps.shape)
print(df_bldg_apps.isnull().sum())

(3106, 20)
permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
construction_cost       1455
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            1
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                   22
council_district           7
mapped_location            0
lat                      297
lon                      297
dtype: int64


In [40]:
# Find out how many of the mapped locations had only the address, not the lat/lon: Bldg Permits Issued

print(df_bldg_issued.shape)
print(df_bldg_issued.isnull().sum())

(33909, 22)
permit_number              0
permit_type_descr          0
permit_subtype_descr       0
parcel                     0
date_entered               0
date_issued                0
construction_cost         10
address                    0
city                       0
state                      0
zip                        0
subdivision_lot            0
contact                    1
permit_type                0
permit_subtype             0
ivr_tracking_number        0
purpose                  467
council_district          46
census_tract              43
mapped_location            0
lat                     3928
lon                     3928
dtype: int64


In [41]:
# BLDG PERMIT APPLICATIONS
# First: Checked to see which items had null in lat/lon. Do I need these?
#        YES - there are a lot for new residential permits 
# Second: Created new df to submit to census geocoder website.

df_bldg_apps_null_latlon = df_bldg_apps.loc[df_bldg_apps.lat.isnull()].reset_index(drop = True) 
df_bldg_apps_null_latlon.head()

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,construction_cost,address,city,state,zip,subdivision_lot,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,mapped_location,lat,lon
0,T2020034764,Building Residential - New,Single Family Residence,165140A20100CO,2020-06-04,,4929 CHUTNEY DR,ANTIOCH,TN,37013,LOT 201 DAVENPORT DOWNS PH 2,AMH DEVELOPMENT TENNESSEE GC LLC,CARN,CAA01R301,3830089,to construct a single family residence with 22...,33.0,"4929 CHUTNEY DR\nANTIOCH, TN 37013",,
1,T2020032881,Building Residential - New,Single Family Residence,173100D02400CO,2020-05-27,260000.0,329 BODDINGTON LN,ANTIOCH,TN,37013,LOT 24 DELVIN DOWNS PH 6,CAPITOL HOMES INC,CARN,CAA01R301,3824736,New two story residential home in a approved P...,31.0,"329 BODDINGTON LN\nANTIOCH, TN 37013",,
2,T2019035619,Building Residential - New,Single Family Residence,126140A04500CO,2019-06-18,250794.0,1536 DAVIDGE DR,NASHVILLE,TN,37221,LOT 45 TRAVIS TRACE SUB PH 3,"JONES CO OF TENNESSEE LLC, THE",CARN,CAA01R301,3681462,HARPETH VALLEY WATER AND SEWER DISTRICT; SINGL...,35.0,"1536 DAVIDGE DR\nNASHVILLE, TN 37221",,
3,T2020034044,Building Residential - New,Single Family Residence,165140A06200CO,2020-06-02,,4948 CHUTNEY DR,ANTIOCH,TN,37013,LOT 62 DAVENPORT DOWNS PH 2,AMH DEVELOPMENT TENNESSEE GC LLC,CARN,CAA01R301,3827970,to construct a single family residence with 19...,33.0,"4948 CHUTNEY DR\nANTIOCH, TN 37013",,
4,T2020034080,Building Residential - New,Single Family Residence,165140A18400CO,2020-06-02,,5404 LAKE WATER CT,ANTIOCH,TN,37013,LOT 184 DAVENPORT DOWNS PH 2,AMH DEVELOPMENT TENNESSEE GC LLC,CARN,CAA01R301,3828052,to construct a single family residence with 22...,33.0,"5404 LAKE WATER CT\nANTIOCH, TN 37013",,


In [42]:
# BLDG PERMIT APPLICATIONS
# How many of these are meaningful in Bldg Permit Applications?   
# ANSWER: Most are important to know about for residential/commercial growth

df_bldg_apps_null_latlon.permit_type_descr.value_counts()

Building Residential - New                 249
Building Commercial - Tenant Finish Out     19
Building Use & Occupancy                    10
Building Commercial - Rehab                  4
Building Sign Permit                         4
Building Residential - Addition              3
Building Demolition Permit                   2
Building Commercial - New                    2
Building Commercial Rehab Storm Damage       1
Building Blasting Permit                     1
Building Residential Rehab Storm Damage      1
Building Residential - Roofing / Siding      1
Name: permit_type_descr, dtype: int64

In [43]:
# BLDG PERMITS ISSUED
# First: Checked to see which items had null in lat/lon. Do I need these? 
#        YES - there are a lot for new residential permits 
# Second: Created new df to submit to census geocoder website.

df_bldg_issued_null_latlon = df_bldg_issued.loc[df_bldg_issued.lat.isnull()].reset_index(drop = True) 
df_bldg_issued_null_latlon.head()

Unnamed: 0,permit_number,permit_type_descr,permit_subtype_descr,parcel,date_entered,date_issued,construction_cost,address,city,state,...,contact,permit_type,permit_subtype,ivr_tracking_number,purpose,council_district,census_tract,mapped_location,lat,lon
0,2019070460,Building Residential - New,Single Family Residence,058100C04900CO,2019-11-18,2019-12-09,270585.0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,...,CELEBRATION HOMES LLC,CARN,CAA01R301,3733056,To construct a single family residence of 2402...,1.0,37010105.0,"1037 LAWSONS RIDGE DR\nNASHVILLE, TN 37218",,
1,2019011084,Building Use & Occupancy,"Multifamily, Townhome",051100J90000CO,2019-02-25,2019-07-22,0.0,812 BRIAR CIR,MADISON,TN,...,COLE INVESTMENTS LLC,CAUO,CAA03R301,3611315,MASTER PERMIT ONLY� � NO CONSTRUCTION� �MULTI-...,8.0,37010802.0,"812 BRIAR CIR\nMADISON, TN 37115",,
2,2019043157,Building Residential - Rehab,Single Family Residence,181070A05400CO,2019-07-22,2019-07-22,20000.0,9405 KAPLAN AVE,BRENTWOOD,TN,...,SHAW CONSTRUCTION,CARR,CAA01R301,3693150,Repairs to single family residence due to tree...,31.0,37019114.0,"9405 KAPLAN AVE\nBRENTWOOD, TN 37027",,
3,2019039568,Building Residential - New,Single Family Residence,085040A40800CO,2019-07-03,2019-07-22,300888.0,4327 STONE HALL BLVD,HERMITAGE,TN,...,MERITAGE HOMES OF TENNESSEE INC,CARN,CAA01R301,3687554,New Single Family construction - Total Sq foot...,14.0,37015402.0,"4327 STONE HALL BLVD\nHERMITAGE, TN 37076",,
4,2019045897,Building Residential - New,"Multifamily, Townhome",104150M02700CO,2019-08-01,2019-08-07,296945.0,544 LITTLE CHANNING WAY,NASHVILLE,TN,...,"CERTIFIED CONSTRUCTION SERVICES, LLC",CARN,CAA03R301,3697323,to construct 2636Sf single family residence wi...,18.0,37016900.0,"544 LITTLE CHANNING WAY\nNASHVILLE, TN 37212",,


In [44]:
# How many of these are meaningful in Bldg Permits Issued?   
# ANSWER: Most are important to know about for residential/commercial growth

df_bldg_issued_null_latlon.permit_type_descr.value_counts()

Building Residential - New                  3294
Building Commercial - Tenant Finish Out      172
Building Commercial - New                    100
Building Sign Permit                          74
Building Use & Occupancy                      57
Building Tree Removal Permit                  45
Building Commercial - Rehab                   43
Building Residential - Addition               25
Building Commercial - Foundation              19
Building Residential - Tenant Finish Out      18
Building Residential - Rehab                  14
Building Commercial - Shell                   14
Building Demolition Permit                    13
Building Residential - Amend Permit           12
Building Commercial - Addition                11
Building Blasting Permit                       6
Building Commercial - Roofing / Siding         6
Building Residential - Shell                   2
Building Residential - Change Contractor       1
Building Residential - Roofing / Siding        1
Building Moving Perm

### Creating new dfs for Bldg Permit Applications & Bldg Permits Issued with ONLY addresses - to submit to US Census Geocoder tool
- Will export to CSV; submit to census geocoder tool to get lat & lon, then add addresses back to original file (matching on addresss/city/st/zip columns)

In [45]:
df_bldg_apps_null_latlon.columns

Index(['permit_number', 'permit_type_descr', 'permit_subtype_descr', 'parcel',
       'date_entered', 'construction_cost', 'address', 'city', 'state', 'zip',
       'subdivision_lot', 'contact', 'permit_type', 'permit_subtype',
       'ivr_tracking_number', 'purpose', 'council_district', 'mapped_location',
       'lat', 'lon'],
      dtype='object')

In [46]:
# Dropping all but the address columns, in preparation for uploading to geocoder website

df_bldg_apps_null_latlon = df_bldg_apps_null_latlon[['address', 'city', 'state', 'zip']]
df_bldg_apps_null_latlon.head(2)

Unnamed: 0,address,city,state,zip
0,4929 CHUTNEY DR,ANTIOCH,TN,37013
1,329 BODDINGTON LN,ANTIOCH,TN,37013


In [47]:
df_bldg_issued_null_latlon = df_bldg_issued_null_latlon[['address', 'city', 'state', 'zip']]
df_bldg_issued_null_latlon.head()

Unnamed: 0,address,city,state,zip
0,1037 LAWSONS RIDGE DR,NASHVILLE,TN,37218
1,812 BRIAR CIR,MADISON,TN,37115
2,9405 KAPLAN AVE,BRENTWOOD,TN,37027
3,4327 STONE HALL BLVD,HERMITAGE,TN,37076
4,544 LITTLE CHANNING WAY,NASHVILLE,TN,37212


In [48]:
df_bldg_apps_null_latlon.to_csv('../data/interim/bldg_apps_addresses.csv')

In [49]:
df_bldg_issued_null_latlon.to_csv('../data/interim/bldg_issued_addresses.csv')

### Aborted lat/lon search on 6/12/2020. Census Geocoder tool only found 1% of addresses.
- The census geocoder tool only found 1% of addresses in each file as exact match, and additional 1% in one of the files as non-exact match - but the remaining 98-99% weren't found.
- Aborting this process becasue it isn't worth the time to fuss with clean and import just 1%. It would be cumbersome because:
    - Addresses were submitted as 4 columns for address; but returned with address, city, state and zip all concatenated in a single cell.
    - Returns [lat, lon] in single column, in addition to quite a bit of other information in other columns I don't need.
- Returned files are in ../data/interim folder for future reference
- US Census website, for reference: https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form

## ABORTED 6/12/2020:  Creating new columns for rolling week, month, quarter, year in Bldg Permit Applications & Issued,
- Discovered that I can do what's needed in Tableau. Here are two videos that show a couple ways to accomplish this:
- https://www.youtube.com/watch?v=viIKlFsmLWs
- https://www.youtube.com/watch?v=hMsQ8TvVjo4

In [50]:
df_bldg_apps.date_entered.head()

0   2020-03-11
1   2019-12-02
2   2018-11-29
3   2019-08-07
4   2020-06-04
Name: date_entered, dtype: datetime64[ns]

In [51]:
df_bldg_issued.date_entered.head()

0   2019-11-18
1   2020-03-12
2   2019-02-25
3   2019-07-15
4   2019-07-22
Name: date_entered, dtype: datetime64[ns]

In [52]:
df_bldg_issued.date_issued.head()

0   2019-12-09
1   2020-03-12
2   2019-07-22
3   2019-07-22
4   2019-07-22
Name: date_issued, dtype: datetime64[ns]

## Exploring and Cleaning: Planning Dept Applications / Issued

In [53]:
# Checking nulls, building permit applications

df_planning.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 521 entries, 0 to 520
Data columns (total 27 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   date_submitted              521 non-null    object 
 1   application_type_descr      521 non-null    object 
 2   mpc_case_number             520 non-null    object 
 3   ordinance_number            157 non-null    object 
 4   status                      521 non-null    object 
 5   mpc_meeting_date            521 non-null    object 
 6   mpc_action                  235 non-null    object 
 7   project_name                417 non-null    object 
 8   location                    498 non-null    object 
 9   reviewer                    521 non-null    object 
 10  reviewer_email              465 non-null    object 
 11  case_descr                  503 non-null    object 
 12  applicant                   501 non-null    object 
 13  applicant_representative    497 non

In [54]:
# Taking a look at the types of info in Planning Dept data

print(df_planning.application_type_descr.nunique())
print(df_planning.application_type_descr.value_counts())

31
Rezoning                                    88
Subdivision (Final Plat)                    86
Mandatory Referral Easement                 65
Specific Plan (Final Site Plan)             50
Specific Plan (New)                         40
Mandatory Referral Encroachment             24
Community Plan Amendment                    17
Planned Unit Development (Final Site Pl)    15
Mandatory Referral Agreement                15
Text Amendment                              14
Downtown Code (Final Site Plan)             13
Subdivision (Concept Plan)                   9
Planned Unit Development (Cancel)            8
Planned Unit Development (Amend)             8
Mandatory Referral Property                  8
Subdivision (Amendment)                      8
Downtown Code (Modify)                       8
Specific Plan (Amend)                        8
Mandatory Referral  R.O.W. Abandonment       7
Urban Design Overlay (Final)                 6
Historic Landmark (New)                      5
Urban Desi

In [55]:
# Taking a look at the types of info in Planning Dept data

print(df_planning.status.nunique())
print(df_planning.status.value_counts())

5
PENDING        220
CNCLACTIVE     207
NEW             71
MPCCOMPLETE     22
UNKNOWN          1
Name: status, dtype: int64


## Exploring and Cleaning: Neighborhood Assoc Boundaries (GIS)
- The only change needed was to fix the crs code from epsg:4326 go EPSG:4326

In [56]:
df_na_bound.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 288 entries, 0 to 287
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   name      288 non-null    object  
 1   geometry  288 non-null    geometry
dtypes: geometry(1), object(1)
memory usage: 4.6+ KB


In [57]:
# To confirm whether any are null.  No nulls! Good :-)

df_na_bound.isnull().sum()

name        0
geometry    0
dtype: int64

In [58]:
# Confirming crs type

print(df_na_bound.crs)

epsg:4326


In [59]:
# Converting to uppercase EPSG

df_na_bound.crs = "EPSG:4326"
print(df_na_bound.crs)

EPSG:4326


## Review permit types, permit subtypes, etc. in Bldg Permit Applications, Bldg Permits Issued, and Planning dfs.
- Are there any types, subtypes, or columns not needed for EDA?

In [60]:
# Bldg Permit Applications
# Which columns not needed?  
# KEEPING ALL. Might want them for popups on visualization

df_bldg_apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3106 entries, 0 to 3105
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   permit_number         3106 non-null   object        
 1   permit_type_descr     3106 non-null   object        
 2   permit_subtype_descr  3106 non-null   object        
 3   parcel                3106 non-null   object        
 4   date_entered          3106 non-null   datetime64[ns]
 5   construction_cost     1651 non-null   float64       
 6   address               3106 non-null   object        
 7   city                  3106 non-null   object        
 8   state                 3106 non-null   object        
 9   zip                   3106 non-null   int64         
 10  subdivision_lot       3105 non-null   object        
 11  contact               3105 non-null   object        
 12  permit_type           3106 non-null   object        
 13  permit_subtype    

In [61]:
# Bldg Permit Applications
# Which PERMIT TYPES are needed to answer data questions?
# DECISION: KEEPING ALL
# REASON: Even though some types only have one count,
#         several with one value may be in a single neighborhood assoc boundary
#         May group this once I start exploring, but will leave all details in place for now.

df_bldg_apps.permit_type_descr.value_counts()

Building Residential - New                  1184
Building Residential - Addition              388
Building Use & Occupancy                     331
Building Residential - Rehab                 308
Building Demolition Permit                   266
Building Commercial - Rehab                  161
Building Sign Permit                         145
Building Commercial - Tenant Finish Out       80
Building Commercial - New                     63
Building Residential Rehab Storm Damage       38
Building Residential - Roofing / Siding       27
Building Tree Removal Permit                  25
Building Commercial - Addition                20
Building Residential New Storm Damage         16
Building Commercial - Foundation              10
Building Commercial Rehab Storm Damage         9
Building Commercial - Shell                    8
Building Blasting Permit                       6
Building Commercial - Roofing / Siding         6
Building Moving Permit                         4
Building Residential

In [62]:
# Bldg Permit Applications
# Which PERMIT SUB-TYPES are needed to answer data questions?
# DECISION: KEEPING ALL
# REASON: Even though some types only have one count,
#         several with one value may be in a single neighborhood assoc boundary
#         May group this once I start exploring, but will leave all details in place for now.

# To get full results that aren't truncated:
pd.options.display.max_rows = 4000
print(pd.options.display.max_rows)

list_bldg_apps_subtype_counts = df_bldg_apps.permit_subtype_descr.value_counts()
print(list_bldg_apps_subtype_counts)

4000
Single Family Residence                     1420
Demolition Permit - Residential              173
Accessory Structure, Garage                  167
Sign - Ground /  Wall Signs                  136
Tents, Stages                                 99
Demolition Permit - Commercial                92
Multifamily, Townhome                         83
Multifamily, Condominium > 5 Unit Bldg        69
General Office, Professional Services         68
Accessory Structure, Shed / Storage Bldg      68
Accessory Structure, Decks                    68
Accessory Structure, Pools - Residential      59
Duplex                                        58
Retail, Department / Retail Stores            49
Multifamily, Tri-Plex, Quad, Apartments       38
Home Occupation, Single Family Residence      30
Master Permit Application                     30
Detached Accessory Dwelling Unit              28
Accessory Structure, Carport                  27
Multifamily, Condominium 3&4 Unit Bldg        24
Tree Removal Pe

In [63]:
# Bldg Permits Issued
# Which columns not needed?  
# KEEPING ALL. Might want them for popups on visualization

df_bldg_issued.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33909 entries, 0 to 33908
Data columns (total 22 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   permit_number         33909 non-null  object        
 1   permit_type_descr     33909 non-null  object        
 2   permit_subtype_descr  33909 non-null  object        
 3   parcel                33909 non-null  object        
 4   date_entered          33909 non-null  datetime64[ns]
 5   date_issued           33909 non-null  datetime64[ns]
 6   construction_cost     33899 non-null  float64       
 7   address               33909 non-null  object        
 8   city                  33909 non-null  object        
 9   state                 33909 non-null  object        
 10  zip                   33909 non-null  int64         
 11  subdivision_lot       33909 non-null  object        
 12  contact               33908 non-null  object        
 13  permit_type     

In [64]:
# Bldg Permits Issued
# Which PERMIT TYPES are needed to answer data questions?
# DECISION: KEEPING ALL
# REASON: Even though some types only have small counts,
#         several may be in a single neighborhood assoc boundary
#         May group this once I start exploring, but will leave all details in place for now.

df_bldg_issued.permit_type_descr.value_counts()

Building Residential - New                  12780
Building Commercial - Rehab                  3384
Building Residential - Rehab                 3273
Building Residential - Addition              3200
Building Demolition Permit                   2796
Building Sign Permit                         2436
Building Use & Occupancy                     1524
Building Commercial - Tenant Finish Out      1113
Building Commercial - New                    1047
Building Tree Removal Permit                  495
Building Commercial - Addition                338
Building Residential - Roofing / Siding       240
Building Commercial - Roofing / Siding        239
Building Commercial - Shell                   173
Building Residential Rehab Storm Damage       119
Building Residential - Tenant Finish Out      117
Building Commercial - Foundation              114
Building Blasting Permit                      107
Building Residential - Change Contractor       91
Building Residential - Fire Damage             90


In [65]:
# Bldg Permits Issued
# Which PERMIT SUB-TYPES are needed to answer data questions?
# DECISION: KEEPING ALL
# REASON: Even though some types only have small counts,
#         several may be in a single neighborhood assoc boundary
#         May group this once I start exploring, but will leave all details in place for now.

list_bldg_issued_subtype_counts = df_bldg_issued.permit_subtype_descr.value_counts()
print(list_bldg_issued_subtype_counts)

Single Family Residence                     15399
Demolition Permit - Residential              2339
Sign - Ground /  Wall Signs                  2313
Multifamily, Townhome                        1346
General Office, Professional Services        1308
Accessory Structure, Garage                  1227
Accessory Structure, Decks                    714
Multifamily, Tri-Plex, Quad, Apartments       647
Retail, Department / Retail Stores            623
Multifamily, Apt / Twnhome > 5 Unit Bldg      538
Accessory Structure, Pools - Residential      536
Multifamily, Condominium > 5 Unit Bldg        534
Accessory Structure, Shed / Storage Bldg      526
Tree Removal Permit                           476
Demolition Permit - Commercial                429
Restaurant (Full Service)                     420
Accessory Structure, Carport                  334
Warehouse, Storage S-1                        287
Medical Office, Professional Services         268
Detached Accessory Dwelling Unit              210


In [66]:
# Planning / Zoning Dept. Applications and Permits Issued
# This data shows ALL PENDING, and only the LAST TWO MONTHS of ISSUED.
# ACTIONS TAKEN BELOW: 
#    1. Converted date fields to datetime
#    2. Dropped the two columns that were all null values

df_planning.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 521 entries, 0 to 520
Data columns (total 27 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   date_submitted              521 non-null    object 
 1   application_type_descr      521 non-null    object 
 2   mpc_case_number             520 non-null    object 
 3   ordinance_number            157 non-null    object 
 4   status                      521 non-null    object 
 5   mpc_meeting_date            521 non-null    object 
 6   mpc_action                  235 non-null    object 
 7   project_name                417 non-null    object 
 8   location                    498 non-null    object 
 9   reviewer                    521 non-null    object 
 10  reviewer_email              465 non-null    object 
 11  case_descr                  503 non-null    object 
 12  applicant                   501 non-null    object 
 13  applicant_representative    497 non

In [67]:
# Convert date_submitted to datetime 

df_planning.date_submitted = pd.to_datetime(df_planning.date_submitted)
df_planning.date_submitted.head()

0   2019-04-01
1   2019-11-27
2   2020-02-14
3   2019-12-26
4   2020-02-12
Name: date_submitted, dtype: datetime64[ns]

In [68]:
# Convert mpc_meeting_date to datetime 

df_planning.mpc_meeting_date = pd.to_datetime(df_planning.mpc_meeting_date)
df_planning.mpc_meeting_date.head()

0   2020-06-11
1   2020-01-16
2   2020-04-09
3   2020-06-11
4   2020-05-14
Name: mpc_meeting_date, dtype: datetime64[ns]

In [69]:
# To count nulls

df_planning.isnull().sum()

date_submitted                  0
application_type_descr          0
mpc_case_number                 1
ordinance_number              364
status                          0
mpc_meeting_date                0
mpc_action                    286
project_name                  104
location                       23
reviewer                        0
reviewer_email                 56
case_descr                     18
applicant                      20
applicant_representative       24
applicant_email                25
applicant_phone                23
applicant_address_1            24
applicant_address_2           382
applicant_city                 24
applicant_state                24
applicant_zip                  24
council_3rd_reading_date      521
council_3rd_reading_action    521
council_district               32
latitude                       32
longitude                      32
mapped_location                32
dtype: int64

In [70]:
# Dropping two columns that have all null values. Keeping same df name.

df_planning = df_planning.drop(columns = ['council_3rd_reading_date'
                                          , 'council_3rd_reading_action'
                                         ])
df_planning.isnull().sum()

date_submitted                0
application_type_descr        0
mpc_case_number               1
ordinance_number            364
status                        0
mpc_meeting_date              0
mpc_action                  286
project_name                104
location                     23
reviewer                      0
reviewer_email               56
case_descr                   18
applicant                    20
applicant_representative     24
applicant_email              25
applicant_phone              23
applicant_address_1          24
applicant_address_2         382
applicant_city               24
applicant_state              24
applicant_zip                24
council_district             32
latitude                     32
longitude                    32
mapped_location              32
dtype: int64

In [71]:
# Confirming that the columns were dropped.

df_planning.columns

Index(['date_submitted', 'application_type_descr', 'mpc_case_number',
       'ordinance_number', 'status', 'mpc_meeting_date', 'mpc_action',
       'project_name', 'location', 'reviewer', 'reviewer_email', 'case_descr',
       'applicant', 'applicant_representative', 'applicant_email',
       'applicant_phone', 'applicant_address_1', 'applicant_address_2',
       'applicant_city', 'applicant_state', 'applicant_zip',
       'council_district', 'latitude', 'longitude', 'mapped_location'],
      dtype='object')

In [72]:
# Looking at value counds for mpc (Municipal Planning Committee) actions
# Choosing, again, to leave them all in place, for now.

df_planning.mpc_action.value_counts()

Recommend Approval                         99
Approved by MPC                            62
Approve with Conditions                    59
Deferred Indefinitely by App at MPC         4
Withdrawn                                   2
Deferred Indefinitely by App before MPC     2
Approved by Executive Director              2
Disapprove with Conditions                  1
Deferred by MPC                             1
Deferred by Applic before MPC               1
Neutral / No Position                       1
Disapprove                                  1
Name: mpc_action, dtype: int64

In [73]:
# Status has no nulls: This includes every entry
# mpc_action, above, is apparently only filled in when an action is taken.

df_planning.status.value_counts()

PENDING        220
CNCLACTIVE     207
NEW             71
MPCCOMPLETE     22
UNKNOWN          1
Name: status, dtype: int64

##  Making new dfs with COUNTS
- df_bldg_apps
- df_bldg_issued
- df_planning


## LAST STEP: Save output files in .. / data / processed / filename_clean
NOTE: "clean02" means the cleaning was done in dataprep02 notebook

df = df.to_csv('../data/processed/filename_clean.csv', index = False)

Save Geojson file to .shp (shapefile)