# Working with Imperfect Data

### This data set concerns building permits for demolition issued in the past five years or currently in progress

### Attributes:

1. **Application/Permit Number:** *The tracking number used to refer to this application or permit record in various Seattle DCI tracking systems.*
2. **Permit Type:** *Type of activity covered by the permit.*
3. **Address:** *Street address of the work site.*
4. **Description:** *Brief description of the work that will be done under this permit. This is subject to change prior to issuance of the permit, but generally more stable if an issue date exists. Very long descriptions have been truncated.*
5. **Category:** *The broad category of use or occupancy for the building where work is proposed. Valid choices are Commercial, Industrial, Institutional, Multifamily, and Single Family/Duplex. Mixed use structures are generally represented as Commercial.*
6. **Action Type:** *Subclassification for type of work being proposed. Valid choices will vary depending on the permit type.*
7. **Work Type:** *An indicator of the complexity of the project proposed. Easier projects can be issued without plan review; more complex projects generally require plan submittal and review.*
8. **Value:** *The value of the work being proposed based on fair market value (parts plus labor). The value displayed (if any) represents the best available information to date, and is subject to change if the project is modified. Value is not collected for all permit types.*
9. **Applicant Name:** *The name of the person or company listed on the application as the “primary applicant”. This may be the property owner, contractor, design professional, or other type of agent.*
10. **Application Date:** *The date the application was accepted as a complete submittal. If no Application Date exists this generally means the application is in a very early stage.*
11. **Issue Date:** *The date the application was issued as a valid permit. If an Application Date exists but no Issue Date exists, this generally means the application is still under review.*
12. **Final Date:** *The date the permit had all its inspections completed. If an Issue Date exists but no Final Date exists, this generally means the permit is still under inspection.*
13. **Expiration Date:** *The date the application is due to expire. Generally, this is the date by which work is supposed to be completed (baring renewals or further extensions). If no Expiration Date exists, this generally means the application is has not been issued yet.*
14. **Status:**	*The current status in the application/review/inspection lifecycle. Indicates the last process step that was fully completed.*
15. **Permit and Complaint Status URL:** *Link to view full details and current status information about this permit at Seattle DCI's website.*
16. **Latitude:** *Latitude of the worksite where permit activity occurs. May be missing for a small number of permits considered "Unaddressable"*
17. **Longitude:** *Longitude of the worksite where permit activity occurs. May be missing for a small number of permits considered "Unaddressable"*
18. **Location:** *Mapping coordinates for the permit address.*


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
demolition_data = pd.read_csv("../seattle-demolition.csv", sep=",")

*This data set has a lot of columns and is quite large. There are a few columns we are not interested in, for example we know that this data set is filtered so all the permit type and action type are going to be demolition.
We are also not interested in latitute and longitude and location as we will focus on the address field.
Lastly, we don't need the permit and complaint Status Url as it doesn't concern us at the moment.*

In [3]:
cleaner_demolition_data = pd.DataFrame(demolition_data,
                                       columns = [
                                           "Application/Permit Number",
                                           "Address",
                                           "Description",
                                           "Category",
                                           "Work Type",
                                           "Value",
                                           "Applicant Name",
                                           "Application Date",
                                           "Issue Date",
                                           "Final Date",
                                           "Expiration Date",
                                           "Status",
                                       ])

In [4]:
cleaner_demolition_data

Unnamed: 0,Application/Permit Number,Address,Description,Category,Work Type,Value,Applicant Name,Application Date,Issue Date,Final Date,Expiration Date,Status
0,6578443,1908 NOB HILL AVE N,"Demolish existing single family residence, sub...",SINGLE FAMILY / DUPLEX,No plan review,$0.00,"CHRISTIANSON, KIRK",01/30/2017,,,,Reviews Completed
1,6578362,2636 1ST AVE N,Demolish existing SFR per (STFI) Subject to Fi...,SINGLE FAMILY / DUPLEX,No plan review,$0.00,"BIDDLE, DAVE",01/30/2017,,,,Reviews Completed
2,6578484,4027 LATONA AVE NE,Demolish Existing duplex to construct new SFR ...,SINGLE FAMILY / DUPLEX,No plan review,$0.00,"BIDDLE, DAVE",01/27/2017,,,,Reviews Completed
3,6578123,4311 SW BRANDON ST,"Demo existing single family home, subject to f...",COMMERCIAL,No plan review,$0.00,"KENNAN-MEYER, LISA",01/27/2017,,,,Reviews Completed
4,6578249,3909 E HOWELL ST,Demolish existing single family residence for ...,SINGLE FAMILY / DUPLEX,No plan review,$0.00,"ELLIS, CAMPIE",01/27/2017,,,,Reviews Completed
5,6565362,6955 DELRIDGE WAY SW,"Demolish apartment building, foundation to rem...",MULTIFAMILY,Plan Review,$0.00,"SIMONSON, WENDY",01/26/2017,,,,Application Accepted
6,6577721,3935 2ND AVE NE,"Demolish existing single family residence, sub...",MULTIFAMILY,No plan review,$0.00,"LEMONS, JONATHAN",01/26/2017,,,,Reviews Completed
7,6577047,6016 SW ADMIRAL WAY,"Demolish existing duplex, subject to field ins...",SINGLE FAMILY / DUPLEX,No plan review,$0.00,"NOVION, SHAUN",01/26/2017,,,,Reviews Completed
8,6577046,3046 61ST AVE SW,"Demolish existing duplex, subject to field ins...",SINGLE FAMILY / DUPLEX,No plan review,$0.00,"NOVION, SHAUN",01/26/2017,,,,Reviews Completed
9,6578106,3044 38TH AVE SW,Demolition of single famaily residence and ass...,SINGLE FAMILY / DUPLEX,No plan review,$0.00,"DOTY, DWIGHT",01/26/2017,,,,Reviews Completed


###  1. What is the proportion of single family house being demolished?

In [5]:
single_family = cleaner_demolition_data.loc[cleaner_demolition_data['Category'] == "SINGLE FAMILY / DUPLEX"]

In [6]:
prop_single_family = len(single_family) / len(cleaner_demolition_data) * 100

In [7]:
prop_single_family

61.00820633059789

### 2. What is the proportion of pending permits ?

In [8]:
pending_permit = cleaner_demolition_data.loc[cleaner_demolition_data["Status"] != "Permit Issued"]

*We effectively excluded all the permits with a status of permit issued. But we still have some permits that are not pending!*

In [9]:
pending_permit = pending_permit.loc[cleaner_demolition_data["Status"] != "Permit Closed"]

*Now we excluded the closed permits. We still need to account for the cancelled ones*

In [10]:
pending_permit = pending_permit.loc[cleaner_demolition_data["Status"] != "CANCELLED"]

In [11]:
len(pending_permit) / len(cleaner_demolition_data) * 100

24.126611957796015

*The percentage of pending permits is about 24.13%*

### 3. Who has the most applications pending ?

In [12]:
applicants_pending = pending_permit["Applicant Name"].value_counts()

In [13]:
applicants_pending

WEBER, JULIAN                50
O'HARE, JON                  41
PATTERSON-O'HARE, JODI       32
NOVION, SHAUN                27
KHOURI, BRADLEY              19
ARD, BRITTANI                18
BIDDLE, DAVE                 17
LEMONS, JONATHAN             16
HUMBLE, ROBERT               15
SQUIRES, GREG                13
NOVION, EINAR                12
CHRISTIANSON, KIRK           11
NOVION, ANDREW               11
PIERCE, PAUL                 10
SCHAEFFER, HUGH               9
CARTER, TIM                   9
PAROLINE, ANDY                8
SPAAN, RANDALL                7
DUFFUS, DAN                   7
BRANT, GREG                   6
JACKSON, MICHAEL              5
SUSSEX, JAMES                 5
TALLAR, PETER                 4
LINARDIC, ED                  4
DRISCOLL, MATT                4
BARTHOLOMEW, TOM              4
NEIMAN, DAVID                 4
BULL, STEVE                   4
WHOOLERY, AKASHA              4
TRAN, BEN                     4
                             ..
GARY, BR

### 4. Who has the most applications ?

In [14]:
applicants = cleaner_demolition_data["Applicant Name"].value_counts()

In [15]:
applicants

WEBER, JULIAN                229
BIDDLE, DAVE                 200
PATTERSON-O'HARE, JODI       157
NOVION, EINAR                107
NOVION, SHAUN                101
NOVION, ANDREW                85
KHOURI, BRADLEY               80
O'HARE, JON                   79
LEDOUX, JULIE                 60
PIERCE, PAUL                  55
CHRISTIANSON, KIRK            49
ARD, BRITTANI                 49
SCHAEFFER, HUGH               49
HUMBLE, ROBERT                47
ZHANG, MOON                   41
SQUIRES, GREG                 41
WHOOLERY, AKASHA              38
STRAIN, JESSICA               34
STEPHENSON, RYAN              31
COBB, PATRICK                 31
KATSAROS, ESTER               30
HAIZLIP, MARK                 30
WEGENER, JEFF                 27
BRANT, GREG                   25
TALLAR, PETER                 24
BELCHER, CRAIG                24
WIERENGA, MARK                21
LEMONS, JONATHAN              20
DUFFUS, DAN                   20
TRAN, BEN                     18
          

*H*

*We can theorize applicants with several applications are contractors and applicants with one applications may be homeowners.*

### What is the proportion of single applications applicants vs multiple applications applicants?

In [16]:
total_single_application_applicants = 0
total_multiple_applications_applicants = 0

for i in applicants:
    if i == 1:
        total_single_application_applicants += 1
    else:
        total_multiple_applications_applicants += 1

In [17]:
print("multiple = " + str(total_multiple_applications_applicants/len(applicants)*100) + "%")
print("single = " + str(total_single_application_applicants/len(applicants) * 100) + "%")

multiple = 34.4275420336269%
single = 65.5724579663731%
