# Preparing demolition data by the City of Austin

- We start with [Construction permits](https://data.austintexas.gov/Building-and-Development/Issued-Construction-Permits/3syk-w9eu/data) published on the city's Socrata data porta.
- We've flitered it to permits for [full building demolitions](https://data.austintexas.gov/Building-and-Development/demolitions-full-post2017/4d8v-cjdw).
- In this notebook, we further filter and work the data based on what we've learned about the data, and the export it for analysis in other notebooks.

## Set up and configurations

In [1]:
import pandas as pd

## Download file from Socrata

This is the full building demolitions filter noted above.

In [2]:
%%bash
curl -L -o ../data-raw/full-downloaded.csv \
https://data.austintexas.gov/resource/4d8v-cjdw.csv?\$limit=10000

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5796k    0 5796k    0     0   491k      0 --:--:--  0:00:11 --:--:--  479k


## Import and processing

In [3]:
# Column type fixes
column_types = {
    "ApplicantPhone": pd.np.str,
    "ContractorPhone": pd.np.str,
    'CalendarYearIssued': pd.np.str,
}

# import raw data
data_raw = pd.read_csv(
    '../data-raw/full-downloaded.csv',
    index_col=None,
    dtype=column_types
)

data_raw.shape

(7882, 66)

## Global filters

For any filters that might apply beyond residential permits, in case we look at those later.

### Filter for permit status

We want only Active and Final, per interview with the city. 

In [4]:
# Consider permit status. 
data_raw.StatusCurrent.value_counts()

Final                              4654
Expired                            1634
Active                              923
VOID                                525
Withdrawn                           138
Aborted                               3
Closed                                2
On Hold                               2
Cancelled - Contractor Required       1
Name: StatusCurrent, dtype: int64

In [5]:
# We want only Active and Final, per city
permit_status_filtered = data_raw[
    (data_raw['StatusCurrent'] == 'Active')
    | (data_raw['StatusCurrent'] == 'Final')
]
permit_status_filtered.shape

(5577, 66)

## Preparing file for residential demolitions

### Filter for residential vs commercial

In [6]:
# show residential vs commercial
permit_status_filtered.PermitClassMapped.value_counts()

Residential    4774
Commercial      803
Name: PermitClassMapped, dtype: int64

In [7]:
# filter to just residential premits
filtered_residential = permit_status_filtered[
    permit_status_filtered['PermitClassMapped'] == 'Residential'
]
filtered_residential.shape

(4774, 66)

### Filter for permit class

In [8]:
# Look at Permit class to remove non-residential
# "All Other" are typically garages, pools, sheds, etc.
filtered_residential['PermitClass'].value_counts().sort_index()

R- 645 Demolition One Family Homes       3047
R- 646 Demolition Two Family Bldgs        122
R- 649 Demolition All Other Bldgs Res    1605
Name: PermitClass, dtype: int64

In [9]:
# Filter to only homes people live in
filtered_homes = filtered_residential[
    (filtered_residential['PermitClass'] == "R- 645 Demolition One Family Homes")
    | (filtered_residential['PermitClass'] == "R- 646 Demolition Two Family Bldgs")
]
filtered_homes.shape

(3169, 66)

## Set final dataframe and export

In [10]:
# Set final dataframe for analysis
residential_demolitions = filtered_homes
residential_demolitions.shape

(3169, 66)

### Export

In [11]:
residential_demolitions.to_csv('../data-processed/residential_demolitions.csv')