# GTHA housing market database
# OSEMN methodology Step 3: Explore
# Exploratory Spatial Data Analysis (ESDA) of the Teranet dataset
# Alpha-shapes by first sale

This notebook describes the process of generating [alpha shapes](https://en.wikipedia.org/wiki/Alpha_shape) colored by attribute `first_sale` from Teranet records by year. This is intended to represent annual supply of new land use, as represented by transactions on the real estate market.  

## Alpha shapes
From [wikipedia](https://en.wikipedia.org/wiki/Alpha_shape):  
In computational geometry, an alpha shape, or α-shape, is a family of piecewise linear simple curves in the Euclidean plane associated with the shape of a finite set of points. They were first defined by [Edelsbrunner, Kirkpatrick & Seidel (1983)](https://ieeexplore.ieee.org/document/1056714). The alpha-shape associated with a set of points is a generalization of the concept of the convex hull, i.e. every convex hull is an alpha-shape but not every alpha shape is a convex hull.

<img src='img/alpha_shapes.png'>

In this notebook, alpha shapes (polygons) will be generated from Teranet point data using PySal library in Python.

## Import dependencies

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import contextily as ctx
import seaborn as sns
import os
import sys
from pysal.lib.cg import alpha_shape_auto
from shapely.geometry import Point
from time import time

  from .sqlite import head_to_sql, start_sql


## Load Teranet data

In [2]:
teranet_path = '../../data/teranet/'
os.listdir(teranet_path)

['1.1_Teranet_DA.csv',
 '1.3_Teranet_DA_TAZ_PG_FSA.csv',
 '2_Teranet_consistent.csv',
 'parcel16_epoi13.csv',
 '1.2_Teranet_DA_TAZ.csv',
 '4_Teranet_lu_encode.csv',
 '1.4_Teranet_DA_TAZ_FSA_LU_LUDMTI.csv',
 '1.4_Teranet_DA_TAZ_FSA_LU.csv',
 '.ipynb_checkpoints',
 'ParcelLandUse.zip',
 'ParcelLandUse',
 'HHSaleHistory.csv',
 '3_Teranet_nonan_new_cols.csv',
 'GTAjoinedLanduseSales']

In [3]:
# load DataFrame with Teranet records
t = time()
teranet_df = pd.read_csv(teranet_path + '4_Teranet_lu_encode.csv',
                         parse_dates=['registration_date'])
elapsed = time() - t
print("----- DataFrame loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(teranet_df.shape[0], teranet_df.shape[1]) + 
      "\n-- Column names:\n", teranet_df.columns)

  interactivity=interactivity, compiler=compiler, result=result)


----- DataFrame loaded
in 61.70 seconds
with 5,188,513 rows
and 76 columns
-- Column names:
 Index(['transaction_id', 'lro_num', 'pin', 'consideration_amt',
       'registration_date', 'postal_code', 'unitno', 'street_name',
       'street_designation', 'street_direction', 'municipality',
       'street_number', 'x', 'y', 'dauid', 'csduid', 'csdname', 'taz_o', 'fsa',
       'pca_id', 'postal_code_dmti', 'pin_lu', 'landuse', 'prop_code',
       'dmti_lu', 'street_name_raw', 'year', 'year_month', 'year3',
       'census_year', 'census2001_year', 'tts_year', 'tts1991_year', 'xy',
       'pin_total_sales', 'xy_total_sales', 'pin_prev_sales', 'xy_prev_sales',
       'xy_first_sale', 'pin_years_since_last_sale',
       'xy_years_since_last_sale', 'da_days_since_last_sale',
       'da_years_since_last_sale', 'pin_sale_next_6m', 'pin_sale_next_1y',
       'pin_sale_next_3y', 'xy_sale_next_6m', 'xy_sale_next_1y',
       'xy_sale_next_3y', 'price_2016', 'pin_price_cum_sum',
       'xy_price_cum_

## Generate maps of alpha shapes by municipality from Teranet records

In [4]:
save_dir = 'results/maps/alpha_shapes/'
os.listdir(save_dir)

['teranet_alpha_1988.png',
 'teranet_alpha_2001.png',
 'teranet_alpha_1993.png',
 'teranet_alpha_1996.png',
 'teranet_alpha_1995.png',
 'teranet_alpha_1989.png',
 'teranet_alpha_2005.png',
 'teranet_alpha_1987.png',
 'teranet_alpha_2007.png',
 'teranet_alpha_2012.png',
 'teranet_alpha_1998.png',
 'teranet_alpha_1986.png',
 'lucr',
 'teranet_alpha_2010.png',
 'teranet_alpha_2014.png',
 'teranet_alpha_2009.png',
 'teranet_alpha_1985.png',
 'teranet_alpha_2016.png',
 'teranet_alpha_2002.png',
 'teranet_alpha_2008.png',
 'teranet_alpha_1990.png',
 'first_sale',
 'teranet_alpha_2004.png',
 'teranet_alpha_2011.png',
 'teranet_alpha_1994.png',
 'teranet_alpha_2006.png',
 'first_saleteranet_alpha_xy_first_sale_1998.png',
 'teranet_alpha_1999.png',
 'teranet_alpha_2015.png',
 'teranet_alpha_2017.png',
 '.ipynb_checkpoints',
 'teranet_alpha_1992.png',
 'teranet_alpha_2000.png',
 'teranet_alpha_1997.png',
 'teranet_alpha_2013.png',
 'teranet_alpha_2003.png',
 'teranet_alpha_1991.png']

### Colored by `first_sale`

In [5]:
start_year = 1985
end_year = 2017
teranet_crs = {'proj': 'latlong', 'ellps': 'WGS84', 'datum': 'WGS84', 'no_defs': True}
mun_col = 'csdname'
color_col = 'xy_first_sale'
path = 'first_sale/'

t = time()
error_count = 0

for year in range(start_year, end_year + 1):
    s_year = teranet_df.query('year == {0}'.format(year))
    print("Generating alpha shapes from the {0:,} Teranet records from {1}...".format(len(s_year), year))

    f, ax = plt.subplots(1, figsize=(12, 12))

    mun_list = s_year[mun_col].unique()

    for mun in mun_list:
        mask2 = s_year[mun_col] == mun
        s_mun = s_year[mask2]
        
        # try generating and mapping alpha shapes from the subset
        try: 
            alpha = s_mun.groupby(color_col)[['x', 'y']].apply(lambda x: alpha_shape_auto(x.values))
            alpha = gpd.GeoDataFrame({'geometry': alpha}, crs=teranet_crs).to_crs(epsg=3857).reset_index()
            mask3 = alpha[color_col] == True
            alpha[mask3].plot(ax=ax, color='yellow') # map first sales
            alpha[~mask3].plot(ax=ax, color='purple') # map subsequent sales
                        
        # exception for QhullErrors when not enough points in the subset to form alpha shapes
        except:
            error_count += 1
            
    # add a basemap
    ctx.add_basemap(ax=ax, url=ctx.sources.ST_TONER_BACKGROUND)
    # configure axis parameters
    ax.set_xlim(-8940996.776086302, -8723064.623629777)
    ax.set_ylim(5313237.739935117, 5555494.494204169)
    ax.set_title("Alpha shapes produced from {0:,} Teranet records from {1}"\
                 .format(len(s_year), year) + "\ncolored by {0}".format(color_col), fontsize=20)
    ax.set_axis_off()
    
    plt.savefig(save_dir + path + 'teranet_alpha_' + color_col + '_' + str(year) + '.png', dpi=400, bbox_inches='tight')
    plt.close(f)
    
elapsed = time() - t
print("\n----- Finished plotting, took {0:,.2f} seconds ({1:,.2f} minutes), {2} errors when generating alpha shapes."
      .format(elapsed, elapsed / 60, error_count))

Generating alpha shapes from the 19,912 Teranet records from 1985...




Generating alpha shapes from the 35,291 Teranet records from 1986...
Generating alpha shapes from the 36,529 Teranet records from 1987...
Generating alpha shapes from the 51,180 Teranet records from 1988...
Generating alpha shapes from the 81,903 Teranet records from 1989...
Generating alpha shapes from the 80,297 Teranet records from 1990...
Generating alpha shapes from the 81,096 Teranet records from 1991...
Generating alpha shapes from the 87,769 Teranet records from 1992...
Generating alpha shapes from the 80,936 Teranet records from 1993...
Generating alpha shapes from the 100,207 Teranet records from 1994...
Generating alpha shapes from the 88,685 Teranet records from 1995...
Generating alpha shapes from the 141,955 Teranet records from 1996...
Generating alpha shapes from the 154,189 Teranet records from 1997...
Generating alpha shapes from the 145,558 Teranet records from 1998...
Generating alpha shapes from the 166,631 Teranet records from 1999...
Generating alpha shapes from 

### Colored by relabeled land use ("house", "condo", "other")

In [8]:
teranet_df['lucr'].unique()

array(['other', 'house', 'condo'], dtype=object)

In [9]:
start_year = 1985
end_year = 2017
teranet_crs = {'proj': 'latlong', 'ellps': 'WGS84', 'datum': 'WGS84', 'no_defs': True}
mun_col = 'csdname'
color_col = 'lucr'
path = 'lucr/'
color_col_values = ['house', 'condo', 'other']
colors = ['blue', 'purple', 'red']

t = time()
error_count = 0

for year in range(start_year, end_year + 1):
    s_year = teranet_df.query('year == {0}'.format(year))
    print("Generating alpha shapes from the {0:,} Teranet records from {1}...".format(len(s_year), year))

    f, ax = plt.subplots(1, figsize=(12, 12))

    mun_list = s_year[mun_col].unique()

    for mun in mun_list:
        mask2 = s_year[mun_col] == mun
        s_mun = s_year[mask2]
        
        # try generating and mapping alpha shapes from the subset
        try: 
            alpha = s_mun.groupby(color_col)[['x', 'y']].apply(lambda x: alpha_shape_auto(x.values))
            alpha = gpd.GeoDataFrame({'geometry': alpha}, crs=teranet_crs).to_crs(epsg=3857).reset_index()
            for i in range(len(color_col_values)):
                mask3 = alpha[color_col] == color_col_values[i]
                alpha[mask3].plot(ax=ax, color=colors[i]) # map first sales
                        
        # exception for QhullErrors when not enough points in the subset to form alpha shapes
        except:
            error_count += 1
            
    # add a basemap
    ctx.add_basemap(ax=ax, url=ctx.sources.ST_TONER_BACKGROUND)
    # configure axis parameters
    ax.set_xlim(-8940996.776086302, -8723064.623629777)
    ax.set_ylim(5313237.739935117, 5555494.494204169)
    ax.set_title("Alpha shapes produced from {0:,} Teranet records from {1}"\
                 .format(len(s_year), year) + "\ncolored by {0}".format(color_col), fontsize=20)
    ax.set_axis_off()
    
    plt.savefig(save_dir + path + 'teranet_alpha_' + color_col + '_' + str(year) + '.png', dpi=400, bbox_inches='tight')
    plt.close(f)
    
elapsed = time() - t
print("\n----- Finished plotting, took {0:,.2f} seconds ({1:,.2f} minutes), {2} errors when generating alpha shapes."
      .format(elapsed, elapsed / 60, error_count))

Generating alpha shapes from the 19,912 Teranet records from 1985...
Generating alpha shapes from the 35,291 Teranet records from 1986...
Generating alpha shapes from the 36,529 Teranet records from 1987...
Generating alpha shapes from the 51,180 Teranet records from 1988...
Generating alpha shapes from the 81,903 Teranet records from 1989...
Generating alpha shapes from the 80,297 Teranet records from 1990...
Generating alpha shapes from the 81,096 Teranet records from 1991...
Generating alpha shapes from the 87,769 Teranet records from 1992...
Generating alpha shapes from the 80,936 Teranet records from 1993...
Generating alpha shapes from the 100,207 Teranet records from 1994...
Generating alpha shapes from the 88,685 Teranet records from 1995...
Generating alpha shapes from the 141,955 Teranet records from 1996...
Generating alpha shapes from the 154,189 Teranet records from 1997...
Generating alpha shapes from the 145,558 Teranet records from 1998...
Generating alpha shapes from t