# GTHA housing market database
# OSEMN methodology Step 1: Obtain
# Obtain DA-level Census Income Data

---

This notebook describes _Step 1: Obtain_ of OSEMN methodology, the process of obtaining DA-level Census Income Data.

---

For description of OSEMN methodology, see `methodology/0.osemn/osemn.pdf`.

## Profile of Income by Dissemination Area - Greater Toronto Area, 2016 Census

The dataset with profiles of Income by Dissemination Area provided by York Municipal Government can be found on Esri's Open Data Portal.

Shared By: YorkMunicipalGovt   
Data Source: services1.arcgis.com 

https://hub.arcgis.com/datasets/9d262f8a576842fbb2afbc8c51a64178_1

## Description of the dataset

This table of Income profile information for dissemination area was downloaded from the Statistics Canada website and joined with bndDisseminationAreaGTHA2016 in DEM. It contains the information gathered during the 2016 Census with respect to the population within a dissemination area and the population breakdown of income and earnings by family, individuals, people in economic families, and the prevalence of low income and household income. This data covers the dissemination area in the Greater Toronto Hamilton Area.

Statistics Canada has suppressed the profiles for certain areas due to very low population count. Suppressed areas will appear as NULL values in the attribute table.

For more information regarding this data, please refer to the reference document here:   
http://www12.statcan.gc.ca/census-recensement/2016/ref/98-501/98-501-x2016006-eng.cfm

## Import dependencies

In [1]:
import pandas as pd
import geopandas as gpd
import os
from time import time

In [5]:
data_path = '../../data/da_census/'
os.listdir(data_path)

[]

## Load geometry of GTHA Dissemination Areas (DAs)

In [4]:
t = time()
api_url = 'https://opendata.arcgis.com/datasets/9d262f8a576842fbb2afbc8c51a64178_1.geojson'
gdf = gpd.read_file(api_url)
elapsed = time() - t

print("----- GeoDataFrame loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(gdf.shape[0], gdf.shape[1]) + 
      "\n-- Column names:\n", gdf.columns)
gdf.plot();

----- GeoDataFrame loaded
in 19.88 seconds
with 9,182 rows
and 218 columns
-- Column names:
 Index(['OBJECTID', 'DAUID', 'CSDUID', 'CSDNAME', 'POP_TOT_INC',
       'NUM_TOT_INC_PVT_HH', 'MEDIAN_TOT_INC', 'NUM_AFT_TAX_INC_PVT_HH',
       'MEDIAN_AFT_INC', 'NUM_MKT_INC_PVT_HH',
       ...
       'AVG_AFTER_TAX_INC_CPL_W_CHILD', 'TOT_INC_LONE_PARENT_25_SAMP',
       'AVG_INC_LONE_PARENT', 'AVG_AFTER_TAX_LONE_PARENT',
       'TOT_INC_NOT_IN_ECF_25_SAMP', 'AVG_INC_NOT_IN_ECF',
       'AVG_AFTER_TAX_INC_NOT_IN_ECF', 'Shape__Area', 'Shape__Length',
       'geometry'],
      dtype='object', length=218)


## Save results to a .csv file

In [7]:
save_path = data_path + 'da_census_profiles_income.csv'
t = time()
gdf.to_csv(save_path, index=False)
elapsed = time() - t
print("DataFrame saved to file:\n", save_path,
      "\ntook {0:.2f} seconds".format(elapsed))

DataFrame saved to file:
 ../../data/da_census/da_census_profiles_income.csv 
took 6.91 seconds
