San Francisco Building Permits

A Planning Permit is a legal document that gives you permission to use or develop land in a certain way. It usually includes conditions and approved plans which must be complied with. 

A Building Permit is a legal document, issued before construction commences, to ensure the building meets the minimum requirements for the health, safety and amenity of occupants and the public.

HISTORICAL PERSPECTIVE
* The Department of Building Inspection (DBI) was created by voter referendum which removed the Bureau of Building Inspection from the Department of Public Works. 
* The creation of the new department was the result of the public’s demand for more responsive and more responsible building inspection services. 
VISION
Work with all San Franciscans for a safer community.
MISSION
Under the direction and management of the seven-member citizen Building Inspection
Commission, to oversee the effective, efficient, fair and safe enforcement of the City and
County of San Francisco’s Building, Housing, Plumbing, Electrical and Mechanical codes,
along with the Disability Access Regulations.

For more info: https://sfdbi.org, https://sfdbi.org/ftp/uploadedfiles/dbi/Key_Information/19GettingCityPermitWeb1006.pdf

The Permit Process is comprised of a series of reviews by various permitting agencies in order
to ensure that the work you wish to do meets all required codes. That process begins at the
San Francisco Permit Center which is located at 1660 Mission Street and is open from 8:00
a.m. until 4:00 p.m. Monday thru Friday. Please remember that applications for permit are
not accepted after 4:00 p.m.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import  train_test_split

In [3]:
df = pd.read_csv("C:/Users/Janu&Vaibh/Desktop/HDSfoundation/Datasets/Building_Permits.csv", low_memory=False)

In [4]:
df

Unnamed: 0,Permit Number,Permit Type,Permit Type Definition,Permit Creation Date,Block,Lot,Street Number,Street Number Suffix,Street Name,Street Suffix,...,Existing Construction Type,Existing Construction Type Description,Proposed Construction Type,Proposed Construction Type Description,Site Permit,Supervisor District,Neighborhoods - Analysis Boundaries,Zipcode,Location,Record ID
0,2.01505E+11,4,sign - erect,5/6/2015,326,23,140,,Ellis,St,...,3.0,constr type 3,,,,3.0,Tenderloin,94102.0,"(37.785719256680785, -122.40852313194863)",1.380610e+12
1,2.01604E+11,4,sign - erect,4/19/2016,306,7,440,,Geary,St,...,3.0,constr type 3,,,,3.0,Tenderloin,94102.0,"(37.78733980600732, -122.41063199757738)",1.420160e+12
2,2.01605E+11,3,additions alterations or repairs,5/27/2016,595,203,1647,,Pacific,Av,...,1.0,constr type 1,1.0,constr type 1,,3.0,Russian Hill,94109.0,"(37.7946573324287, -122.42232562979227)",1.424860e+12
3,2.01611E+11,8,otc alterations permit,11/7/2016,156,11,1230,,Pacific,Av,...,5.0,wood frame (5),5.0,wood frame (5),,3.0,Nob Hill,94109.0,"(37.79595867909168, -122.41557405519474)",1.443570e+12
4,2.01611E+11,6,demolitions,11/28/2016,342,1,950,,Market,St,...,3.0,constr type 3,,,,6.0,Tenderloin,94102.0,"(37.78315261897309, -122.40950883997789)",1.445480e+11
5,2.01706E+11,8,otc alterations permit,6/14/2017,4105,9,800,,Indiana,St,...,1.0,constr type 1,1.0,constr type 1,,10.0,Potrero Hill,94107.0,"(37.75922331346539, -122.39170402628598)",1.466910e+12
6,2.01706E+11,8,otc alterations permit,6/30/2017,1739,20,1291,,11th,Av,...,5.0,wood frame (5),5.0,wood frame (5),,5.0,Inner Sunset,94122.0,"(37.764145640138565, -122.46875112470363)",1.468970e+12
7,M803667,8,otc alterations permit,6/30/2017,4789,14,1465,,Revere,Av,...,,,,,,10.0,Bayview Hunters Point,94124.0,"(37.73005099023611, -122.38784938916618)",1.469040e+12
8,M804227,8,otc alterations permit,7/5/2017,1212,54,2094,,Fell,St,...,,,,,,5.0,Lone Mountain/USF,94117.0,"(37.772393498502595, -122.45231466824669)",1.469200e+12
9,M804767,8,otc alterations permit,7/6/2017,1259,16,89,,Alpine,Tr,...,,,,,,8.0,Haight Ashbury,94117.0,"(37.7691724293766, -122.43734859051908)",1.469320e+11


In [5]:
df.columns

Index(['Permit Number', 'Permit Type', 'Permit Type Definition',
       'Permit Creation Date', 'Block', 'Lot', 'Street Number',
       'Street Number Suffix', 'Street Name', 'Street Suffix', 'Unit',
       'Unit Suffix', 'Description', 'Current Status', 'Current Status Date',
       'Filed Date', 'Issued Date', 'Completed Date',
       'First Construction Document Date', 'Structural Notification',
       'Number of Existing Stories', 'Number of Proposed Stories',
       'Voluntary Soft-Story Retrofit', 'Fire Only Permit',
       'Permit Expiration Date', 'Estimated Cost', 'Revised Cost',
       'Existing Use', 'Existing Units', 'Proposed Use', 'Proposed Units',
       'Plansets', 'TIDF Compliance', 'Existing Construction Type',
       'Existing Construction Type Description', 'Proposed Construction Type',
       'Proposed Construction Type Description', 'Site Permit',
       'Supervisor District', 'Neighborhoods - Analysis Boundaries', 'Zipcode',
       'Location', 'Record ID'],
     

In [6]:
describe = pd.read_excel("C:/Users/Janu&Vaibh/Desktop/HDSfoundation/Datasets/DataDictionaryBuildingPermit.xlsx")

In [7]:
describe[["Column name","Description"]]

Unnamed: 0,Column name,Description
0,Permit Number,Number assigned while filing
1,Permit Type,Type of the permit represented numerically.
2,Permit Type Definition,"Description of the Permit type, for example\n ..."
3,Permit Creation Date,"Date on which permit created, later than \nor ..."
4,Block,Related to address
5,Lot,Related to address
6,Street Number,Related to address
7,Street Number Suffix,Related to address
8,Street Name,Related to address
9,Street Name Suffix,Related to address


In [8]:
df.dtypes

Permit Number                              object
Permit Type                                 int64
Permit Type Definition                     object
Permit Creation Date                       object
Block                                      object
Lot                                        object
Street Number                               int64
Street Number Suffix                       object
Street Name                                object
Street Suffix                              object
Unit                                      float64
Unit Suffix                                object
Description                                object
Current Status                             object
Current Status Date                        object
Filed Date                                 object
Issued Date                                object
Completed Date                             object
First Construction Document Date           object
Structural Notification                    object


Checking what sorts of non null values are there in some of the columns....

In [28]:
missing_values_count = df.isnull().sum()

In [29]:
missing_values_count[0:43]

Permit Number                                  0
Permit Type                                    0
Permit Type Definition                         0
Permit Creation Date                           0
Block                                          0
Lot                                            0
Street Number                                  0
Street Number Suffix                      196684
Street Name                                    0
Street Suffix                               2768
Unit                                      169421
Unit Suffix                               196939
Description                                  290
Current Status                                 0
Current Status Date                            0
Filed Date                                     0
Issued Date                                14940
Completed Date                            101709
First Construction Document Date           14946
Structural Notification                   191978
Number of Existing S

In [30]:
df.shape

(198900, 43)

It is observed that there are many columns with more than 190000 missing values when the no. of instances is 198900.
Lets take a close look at those...

In [27]:
df.head(6)

Unnamed: 0,Permit Number,Permit Type,Permit Type Definition,Permit Creation Date,Block,Lot,Street Number,Street Number Suffix,Street Name,Street Suffix,...,Existing Construction Type,Existing Construction Type Description,Proposed Construction Type,Proposed Construction Type Description,Site Permit,Supervisor District,Neighborhoods - Analysis Boundaries,Zipcode,Location,Record ID
0,201505000000.0,4,sign - erect,5/6/2015,326,23,140,,Ellis,St,...,3.0,constr type 3,,,,3.0,Tenderloin,94102.0,"(37.785719256680785, -122.40852313194863)",1380610000000.0
1,201604000000.0,4,sign - erect,4/19/2016,306,7,440,,Geary,St,...,3.0,constr type 3,,,,3.0,Tenderloin,94102.0,"(37.78733980600732, -122.41063199757738)",1420160000000.0
2,201605000000.0,3,additions alterations or repairs,5/27/2016,595,203,1647,,Pacific,Av,...,1.0,constr type 1,1.0,constr type 1,,3.0,Russian Hill,94109.0,"(37.7946573324287, -122.42232562979227)",1424860000000.0
3,201611000000.0,8,otc alterations permit,11/7/2016,156,11,1230,,Pacific,Av,...,5.0,wood frame (5),5.0,wood frame (5),,3.0,Nob Hill,94109.0,"(37.79595867909168, -122.41557405519474)",1443570000000.0
4,201611000000.0,6,demolitions,11/28/2016,342,1,950,,Market,St,...,3.0,constr type 3,,,,6.0,Tenderloin,94102.0,"(37.78315261897309, -122.40950883997789)",144548000000.0
5,201706000000.0,8,otc alterations permit,6/14/2017,4105,9,800,,Indiana,St,...,1.0,constr type 1,1.0,constr type 1,,10.0,Potrero Hill,94107.0,"(37.75922331346539, -122.39170402628598)",1466910000000.0


In [31]:
df[['Street Number Suffix','Unit Suffix','Structural Notification','Voluntary Soft-Story Retrofit','TIDF Compliance','Site Permit']].head(6)

Unnamed: 0,Street Number Suffix,Unit Suffix,Structural Notification,Voluntary Soft-Story Retrofit,TIDF Compliance,Site Permit
0,,,,,,
1,,,,,,
2,,,,,,
3,,,,,,
4,,,,,,
5,,,,,,


Lets try to understand what those columns mean in the domain and as to why the values are missing...

In [9]:
set(df[['Permit Type Definition']].T.stack().tolist())

{'additions alterations or repairs',
 'demolitions',
 'grade or quarry or fill or excavate',
 'new construction',
 'new construction wood frame',
 'otc alterations permit',
 'sign - erect',
 'wall or painted sign'}

In [10]:
set(df[['Current Status']].T.stack().tolist())

{'appeal',
 'approved',
 'cancelled',
 'complete',
 'disapproved',
 'expired',
 'filed',
 'incomplete',
 'issued',
 'plancheck',
 'reinstated',
 'revoked',
 'suspend',
 'withdrawn'}

Notice that 'Current Status' has no missing values or null values but 'Issued Date' and 'Completed Date' have significant missing values. This shows that a permit process completion details regarding 'Issued Date' and 'Completed Date' have not been diligently recorded or they hardly impact the dataset.


In [41]:
set(df[['Fire Only Permit']].stack().tolist())

{'Y'}

In [12]:
set(df[['Structural Notification']].T.stack().tolist())

{'Y'}

In [13]:
a=set(df[['Existing Use']].T.stack().tolist())
a

{'1 family dwelling',
 '2 family dwelling',
 'accessory cottage',
 'adult entertainment',
 'ambulance service',
 'amusement center',
 'animal sale or care',
 'antenna',
 'apartments',
 'artist live/work',
 'auto repairs',
 'automobile sales',
 'barber/beauty salon',
 'bath house',
 'building materials',
 'car wash',
 'chemical processing',
 'child care',
 'christmas tree lot',
 'church',
 'clinics-medic/dental',
 'club',
 'convalescent home',
 'dairies/dairy equip.',
 'dance hall',
 'day care center',
 'day care home 7 - 12',
 'day care home gt 12',
 'day care home lt 7',
 'day care, non-res',
 'dry cleaners',
 'fence/retaining wall',
 'filling/service stn',
 'food/beverage hndlng',
 'garment shops',
 'greenhouse',
 'health studios & gym',
 'hospital',
 'jail',
 'laundry/laundromat',
 'lending institution',
 'library',
 'manufacturing',
 'massage parlor',
 'meat/produce marts',
 'misc group residns.',
 'mortuary',
 'moving & storage',
 'muni carbarn',
 'muni driver restroom',
 'museum'

In [14]:
len(a)

93

In [15]:
b=set(df[['Proposed Use']].T.stack().tolist())
b

{'1 family dwelling',
 '2 family dwelling',
 'accessory cottage',
 'adult entertainment',
 'ambulance service',
 'amusement center',
 'animal sale or care',
 'antenna',
 'apartments',
 'artist live/work',
 'auto repairs',
 'automobile sales',
 'barber/beauty salon',
 'bath house',
 'building materials',
 'car wash',
 'chemical processing',
 'child care',
 'christmas tree lot',
 'church',
 'clinics-medic/dental',
 'club',
 'convalescent home',
 'dairies/dairy equip.',
 'dance hall',
 'day care center',
 'day care home 7 - 12',
 'day care home gt 12',
 'day care home lt 7',
 'day care, non-res',
 'dry cleaners',
 'fence/retaining wall',
 'filling/service stn',
 'food/beverage hndlng',
 'garment shops',
 'greenhouse',
 'health studios & gym',
 'hospital',
 'jail',
 'laundry/laundromat',
 'lending institution',
 'library',
 'manufacturing',
 'massage parlor',
 'meat/produce marts',
 'misc group residns.',
 'mortuary',
 'moving & storage',
 'muni carbarn',
 'muni driver restroom',
 'museum'

In [16]:
len(b)

94

In [17]:
set(b-a)

{'not applicable'}

So all the categories of 'Existing Use' are applicable to 'Proposed Use'. However there is an extra value called 'not applicable in 'Proposed Use'. Hence not much difference.

If the use of a building is changing, dwelling units are being added, or
additional living space is being proposed in a basement, garage or attic, it must be
confirmed that all proposed uses and dwelling unit densities are permitted by the
Planning Code.

In [18]:
set(df[['TIDF Compliance']].T.stack().tolist())

{'P', 'Y'}

The Transit Impact Development Fee (TIDF) is an impact fee levied on most nonresidential new development citywide to offset new development’s impacts on the
transit system. 
The column is about TIDF compliant or not, this is a new legal requirement

In [19]:
set(df[['Existing Construction Type Description']].T.stack().tolist())

{'constr type 1',
 'constr type 2',
 'constr type 3',
 'constr type 4',
 'wood frame (5)'}

In [20]:
set(df[['Proposed Construction Type Description']].T.stack().tolist())

{'constr type 1',
 'constr type 2',
 'constr type 3',
 'constr type 4',
 'wood frame (5)'}

'Existing Construction Type Description' to 'Proposed Construction Type Description'basically says about the permit for remodelling of the building.

In [21]:
set(df[['Site Permit']].T.stack().tolist())

{'Y'}

A Site Permit is a set of drawings for new construction that is considered entirely
design development drawings. The primary purpose of a Site Permit is to gain approval
of the physical mass of the building as it relates to the site.
A site permit is needed for work that is generally located out of the building pad or footprint. 
Once Plan Commission approves a plat, a Site Permit may be issued.

This column is about whether a site permit is issued or not...

In [22]:
set(df[['Supervisor District']].T.stack().tolist())

{1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0}

This means that there are branches of offices which take responsibilty of their zone.

In [33]:
l=set(df[['Neighborhoods - Analysis Boundaries']].T.stack().tolist())
l

{'Bayview Hunters Point',
 'Bernal Heights',
 'Castro/Upper Market',
 'Chinatown',
 'Excelsior',
 'Financial District/South Beach',
 'Glen Park',
 'Golden Gate Park',
 'Haight Ashbury',
 'Hayes Valley',
 'Inner Richmond',
 'Inner Sunset',
 'Japantown',
 'Lakeshore',
 'Lincoln Park',
 'Lone Mountain/USF',
 'Marina',
 'McLaren Park',
 'Mission',
 'Mission Bay',
 'Nob Hill',
 'Noe Valley',
 'North Beach',
 'Oceanview/Merced/Ingleside',
 'Outer Mission',
 'Outer Richmond',
 'Pacific Heights',
 'Portola',
 'Potrero Hill',
 'Presidio',
 'Presidio Heights',
 'Russian Hill',
 'Seacliff',
 'South of Market',
 'Sunset/Parkside',
 'Tenderloin',
 'Treasure Island',
 'Twin Peaks',
 'Visitacion Valley',
 'West of Twin Peaks',
 'Western Addition'}

In [24]:
len(l)

41

The San Francisco Planning Department officially identifies 36 neighborhoods. Within these 36 official neighborhoods are a large number of minor districts, some of which are historical, and some of which are overlapping.

This column identifies 41 neighbourhoods. Then assumption is that for convinience the office made use of minor districts.

In [25]:
loca=set(df[['Location']].T.stack().tolist())

In [26]:
len(loca)

57604

'Location' contains the Longitude and Latitude of the building for which the permit is applied for.

In [32]:
set(df[['Voluntary Soft-Story Retrofit']].T.stack().tolist())

{'Y'}

Voluntary Soft-Story Retrofit: The process of strengthening older buildings in order to make them earthquake resistant. 
                               This column is of object datatype. But contains a string representing whether a person did a retrofit or not. Since this is "Voluntary" not many would have taken it seriously. Under the ordinance of Mandatory soft story retrofit in 2013, everyone would have done it but not recorded in this dataset. 
                               However it would be safe to check the relationship between the For more info: https://sfdbi.org/soft-story-faq

In [34]:
missing_values_count['Voluntary Soft-Story Retrofit']

198865