# Preparing demolition data by the City of Austin

- We start with [Construction permits](https://data.austintexas.gov/Building-and-Development/Issued-Construction-Permits/3syk-w9eu/data) published on the city's Socrata data porta.
- We've flitered it to permits for [full building demolitions](https://data.austintexas.gov/Building-and-Development/demolitions-full-post2017/4d8v-cjdw) and for [partial demos](https://data.austintexas.gov/Building-and-Development/demolitions-partial-post2007/8qw5-9tag).
- We combine those and apply some global filtering to get active/complete residential permits and to clip permits newer than June 30, 2018.
- We then split again to apply filters specific to full or partial demos.
- We export the data gain for analysis in other notebooks.

## Set up and configurations

In [1]:
import pandas as pd

## Download file from Socrata

This is the full building demolitions filter noted above.

In [2]:
%%bash
curl -L -o ../data-raw/full-downloaded.csv \
https://data.austintexas.gov/resource/4d8v-cjdw.csv?\$limit=10000
curl -L -o ../data-raw/partials-downloaded.csv \
https://data.austintexas.gov/resource/8qw5-9tag.csv?\$limit=10000

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5829k    0 5829k    0     0   475k      0 --:--:--  0:00:12 --:--:--  559k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5936k    0 5936k    0     0   459k      0 --:--:--  0:00:12 --:--:--  699k


## Import and processing

In [3]:
# Column type fixes
column_types = {
    "ApplicantPhone": pd.np.str,
    "ContractorPhone": pd.np.str,
    'CalendarYearIssued': pd.np.str,
}

# import raw data
raw_full = pd.read_csv(
    '../data-raw/full-downloaded.csv',
    index_col=None,
    dtype=column_types,
    parse_dates=['IssuedDate']
)

raw_full.shape

(7930, 66)

In [4]:
# import partials data
raw_partial = pd.read_csv(
    '../data-raw/partials-downloaded.csv',
    index_col=None,
    dtype=column_types,
    parse_dates=['IssuedDate']
)

raw_partial.shape

(6347, 66)

### Combine full and partial demos

In [5]:
data_raw = pd.concat([raw_full,raw_partial])

data_raw.shape

(14277, 66)

## Global filters

For any filters that might apply beyond residential permits, in case we look at those later.

### Create Partial vs Full flag

Since we are working with both full and partial demolitions, we need a flag to separate them so we can apply filters later that are specific to each.

In [6]:
# function to determine type
def set_demo_type(row):
    """ Function to evaluate demolition type. """

    if row['WorkClass'] == 'Demolition':
        return 'Full'
    else:
        return 'Partial'

# apply get_winner function to new column through .assign
data_typed = data_raw.assign(
    DemoType=data_raw.apply(
        set_demo_type,
        axis=1
        )
)

# peek at it
data_typed.DemoType.value_counts()

Full       7930
Partial    6347
Name: DemoType, dtype: int64

### Filter for permit status

We want only Active and Final, per interview with the city. 

In [7]:
# Consider permit status. 
data_typed.StatusCurrent.value_counts()

Final                              9730
Expired                            2067
Active                             1634
VOID                                612
Withdrawn                           214
Inactive Pending Revision            10
Aborted                               4
Cancelled - Contractor Required       2
Closed                                2
On Hold                               2
Name: StatusCurrent, dtype: int64

In [8]:
# We want only Active and Final, per city
permit_status_filtered = data_typed[
    (data_typed['StatusCurrent'] == 'Active')
    | (data_typed['StatusCurrent'] == 'Final')
]
permit_status_filtered.shape

(11364, 67)

### Filter for residential vs commercial

In [9]:
# show residential vs commercial
permit_status_filtered.PermitClassMapped.value_counts()

Residential    10547
Commercial       817
Name: PermitClassMapped, dtype: int64

In [10]:
# filter to just residential premits
filtered_residential = permit_status_filtered[
    permit_status_filtered['PermitClassMapped'] == 'Residential'
]
filtered_residential.shape

(10547, 67)

### Cut off June 30, 2018

In [11]:
# filter by date to set at half the year
cutoff = filtered_residential[filtered_residential['IssuedDate'] < '2018-07-01']
cutoff.shape

(10489, 67)

### Set demolitions_all dataframe and export

In [12]:
# Set final dataframe for analysis
demolitions_all = cutoff
demolitions_all.to_csv('../data-processed/demolitions_all.csv')
demolitions_all.shape

(10489, 67)

## Filters specific to full demolitions

We filter the full demolitions by one- and two-family homes to remove garages, etc.

In [13]:
# Look at Permit class to spot those the are full demos
demolitions_all['PermitClass'].value_counts().sort_index()

R- 101 Single Family Houses                 15
R- 102 Secondary Apartment                   9
R- 103 Two Family Bldgs                      2
R- 329 Res Structures Other Than Bldg       20
R- 330  Accessory Use to Primary            18
R- 434 Addition & Alterations             5059
R- 435 Renovations/Remodel                 588
R- 436 Addn to increase housing units        1
R- 437 Residential Boat Dock                 1
R- 438 Residential Garage/Carport Addn      11
R- 645 Demolition One Family Homes        3039
R- 646 Demolition Two Family Bldgs         121
R- 649 Demolition All Other Bldgs Res     1605
Name: PermitClass, dtype: int64

In [14]:
# Filter to only full demo homes people live in
filtered_homes = demolitions_all[
    (demolitions_all['PermitClass'] == "R- 645 Demolition One Family Homes")
    | (demolitions_all['PermitClass'] == "R- 646 Demolition Two Family Bldgs")
]

# peek at the results
filtered_homes.PermitClass.value_counts()

R- 645 Demolition One Family Homes    3039
R- 646 Demolition Two Family Bldgs     121
Name: PermitClass, dtype: int64

### Set Full demolitions dataframe and export

In [15]:
demolitions_full = filtered_homes
demolitions_full.to_csv('../data-processed/demolitions_full.csv')
demolitions_full.shape

(3160, 67)

## Filters specific to partial demolitions

In [16]:
# filter all demos to those we designated as Partials
filtered_demotype_partial = demolitions_all[demolitions_all['DemoType'] == 'Partial']
filtered_demotype_partial.shape

(5724, 67)

In [17]:
# look at the WorkClass so we can filter on them.
filtered_demotype_partial.WorkClass.value_counts()

Addition and Remodel    4386
Addition                 678
Remodel                  532
New                       72
Repair                    52
Life Safety                4
Name: WorkClass, dtype: int64

In [18]:
# Filter on WorkClass to get to possible additions
# SHOULD I REMOVE THE "REMODEL" WORKCLASS??????? 
partials_workclass = filtered_demotype_partial[
    (filtered_demotype_partial['WorkClass'] == "Addition and Remodel")
    | (filtered_demotype_partial['WorkClass'] == "Addition")
]

In [19]:
# Look at PermitClass to filter them
partials_workclass.PermitClass.value_counts()

R- 434 Addition & Alterations             5059
R- 438 Residential Garage/Carport Addn       4
R- 436 Addn to increase housing units        1
Name: PermitClass, dtype: int64

In [20]:
# filter to remove garage/carports
partials_permitclass = partials_workclass[
    (partials_workclass['PermitClass'] != 'R- 438 Residential Garage/Carport Addn')
]
partials_permitclass.shape

(5060, 67)

### Set partials dataframe and export

In [21]:
demolitions_partial = partials_permitclass
demolitions_partial.to_csv('../data-processed/demolitions_partial.csv')
demolitions_partial.shape

(5060, 67)

## Reference
Ignore this. It's just for reference.

In [22]:
data_typed.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14277 entries, 0 to 6346
Data columns (total 67 columns):
PermitType                    14277 non-null object
PermitTypeDesc                14277 non-null object
PermitNum                     14277 non-null object
PermitClassMapped             14277 non-null object
PermitClass                   14277 non-null object
WorkClass                     14277 non-null object
Condominium                   14277 non-null object
ProjectName                   14277 non-null object
Description                   14277 non-null object
TCAD_ID                       14184 non-null object
PropertyLegalDescription      13290 non-null object
AppliedDate                   14277 non-null object
IssuedDate                    14277 non-null datetime64[ns]
DayIssued                     14277 non-null object
CalendarYearIssued            14277 non-null object
FiscalYearIssued              14277 non-null int64
IssuedInLast30Days            14277 non-null object
I