# Preparing demolition data by the City of Austin

- We start with [Construction permits](https://data.austintexas.gov/Building-and-Development/Issued-Construction-Permits/3syk-w9eu/data) published on the city's Socrata data porta.
- We've flitered it to permits for [full building demolitions](https://data.austintexas.gov/Building-and-Development/demolitions-full-post2017/4d8v-cjdw).
- In this notebook, we further filter and work the data based on what we've learned about the data, and the export it for analysis in other notebooks.
- This is currently set to filter the data to older than July 1, 2018 to have a cutoff date.

## Set up and configurations

In [1]:
import pandas as pd

## Download file from Socrata

This is the full building demolitions filter noted above.

In [2]:
%%bash
curl -L -o ../data-raw/full-downloaded.csv \
https://data.austintexas.gov/resource/4d8v-cjdw.csv?\$limit=10000

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5806k    0 5806k    0     0   753k      0 --:--:--  0:00:07 --:--:--  790k


## Import and processing

In [3]:
# Column type fixes
column_types = {
    "ApplicantPhone": pd.np.str,
    "ContractorPhone": pd.np.str,
    'CalendarYearIssued': pd.np.str,
}

# import raw data
data_raw = pd.read_csv(
    '../data-raw/full-downloaded.csv',
    index_col=None,
    dtype=column_types,
    parse_dates=['IssuedDate']
)

data_raw.shape

(7897, 66)

## Global filters

For any filters that might apply beyond residential permits, in case we look at those later.

### Filter for permit status

We want only Active and Final, per interview with the city. 

In [4]:
# Consider permit status. 
data_raw.StatusCurrent.value_counts()

Final                              4661
Expired                            1640
Active                              925
VOID                                525
Withdrawn                           138
Aborted                               3
Closed                                2
On Hold                               2
Cancelled - Contractor Required       1
Name: StatusCurrent, dtype: int64

In [5]:
# We want only Active and Final, per city
permit_status_filtered = data_raw[
    (data_raw['StatusCurrent'] == 'Active')
    | (data_raw['StatusCurrent'] == 'Final')
]
permit_status_filtered.shape

(5586, 66)

## Preparing file for residential demolitions

### Filter for residential vs commercial

In [6]:
# show residential vs commercial
permit_status_filtered.PermitClassMapped.value_counts()

Residential    4784
Commercial      802
Name: PermitClassMapped, dtype: int64

In [7]:
# filter to just residential premits
filtered_residential = permit_status_filtered[
    permit_status_filtered['PermitClassMapped'] == 'Residential'
]
filtered_residential.shape

(4784, 66)

### Filter for permit class

In [8]:
# Look at Permit class to remove non-residential
# "All Other" are typically garages, pools, sheds, etc.
filtered_residential['PermitClass'].value_counts().sort_index()

R- 645 Demolition One Family Homes       3056
R- 646 Demolition Two Family Bldgs        122
R- 649 Demolition All Other Bldgs Res    1606
Name: PermitClass, dtype: int64

In [9]:
# Filter to only homes people live in
filtered_homes = filtered_residential[
    (filtered_residential['PermitClass'] == "R- 645 Demolition One Family Homes")
    | (filtered_residential['PermitClass'] == "R- 646 Demolition Two Family Bldgs")
]
filtered_homes.shape

(3178, 66)

## Cut off June 30, 2018

In [10]:
# filter by date to set at half the year
cutoff = filtered_homes[filtered_homes['IssuedDate'] < '2018-07-01']
cutoff.shape

(3167, 66)

## Set final dataframe and export

In [11]:
# Set final dataframe for analysis
residential_demolitions = cutoff
residential_demolitions.shape

(3167, 66)

### Export

In [12]:
residential_demolitions.to_csv('../data-processed/residential_demolitions.csv')

## Reference
Ignore this. It's just for reference.

In [13]:
data_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7897 entries, 0 to 7896
Data columns (total 66 columns):
PermitType                    7897 non-null object
PermitTypeDesc                7897 non-null object
PermitNum                     7897 non-null object
PermitClassMapped             7897 non-null object
PermitClass                   7897 non-null object
WorkClass                     7897 non-null object
Condominium                   7897 non-null object
ProjectName                   7897 non-null object
Description                   7897 non-null object
TCAD_ID                       7807 non-null object
PropertyLegalDescription      7006 non-null object
AppliedDate                   7897 non-null object
IssuedDate                    7897 non-null datetime64[ns]
DayIssued                     7897 non-null object
CalendarYearIssued            7897 non-null object
FiscalYearIssued              7897 non-null int64
IssuedInLast30Days            7897 non-null object
IssuanceMethod     