# 2 - Geographical Mapping
In the last post, I looked at the most valued neighbourhoods in the city by average assessed value. We were coming across neighbourhoods like "Uplands", "Decoteau", and "Aster"... where are these neighbourhoods? I have no clue to be honest. I can't even say I've heard of these neighbourhoods or anything resembling their names. "Decoteau"? I almost don't even believe that's in Edmonton...

## gmaps Library
In my last project, librosa was a gem of a library that did wonders for audio signal processing. I'm hoping gmaps provides the same kind of breakthrough for me here because I found it using the same methodical approach that I used to find librosa... first result on google!

![](https://s-media-cache-ak0.pinimg.com/originals/bc/aa/a0/bcaaa0df3a76cb47b6e0674f458fcea3.gif)

In all seriousness though, it looks like gmaps is able to embed a google maps interface right into Jupyter and allow you to plot on top of that. That sounds pretty enticing for now and definitely worth checking out for what I want to do here.

In [1]:
# Enable plots in the notebook
%matplotlib inline
import matplotlib.pyplot as plt

# Seaborn makes our plots prettier
import seaborn
seaborn.set(style = 'ticks')

# Import jupyter widgets
from ipywidgets import widgets

import numpy as np
import pandas as pd
import os
import gmaps
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Initiate and configure gmaps with API Key
gmaps.configure(api_key=os.environ["GOOGLE_API_KEY"])

In [3]:
fig = gmaps.figure()
fig

Wow! That was easy. We straight up have an interactive google maps display within jupyter! Let's check out some of gmaps' capabilities using some of its out of the box datasets.

In [4]:
# Import data sets
import gmaps.datasets

In [5]:
# Map all sites of political violence in Africal between 1997 and 2015
locations = gmaps.datasets.load_dataset_as_df("acled_africa")
fig = gmaps.figure()
heatmap_layer = gmaps.heatmap_layer(locations)
fig.add_layer(heatmap_layer)
fig

So there's the heatmap capabilities. It looks like you can also plot dots.

In [6]:
# Plot all the starbucks locations in the UK
df = gmaps.datasets.load_dataset_as_df("starbucks_kfc_uk")

starbucks_df = df[df["chain_name"] == "starbucks"]
starbucks_df = starbucks_df[['latitude', 'longitude']]

starbucks_layer = gmaps.symbol_layer(
    starbucks_df, fill_color="green", stroke_color="green", scale=2
)
fig = gmaps.figure()
fig.add_layer(starbucks_layer)
fig

Awesome. Those are some great tools to start with. Let's look back at the property assessment data now.
## Edmonton Property Assessment Data

In [7]:
# Load data set
edm_data = pd.read_csv('../data/Property_Assessment_Data.csv')

In [8]:
edm_data.dtypes

Account Number        int64
Suite                object
House Number        float64
Street Name          object
Assessed Value       object
Assessment Class     object
Neighbourhood        object
Garage               object
Latitude            float64
Longitude           float64
dtype: object

In [9]:
# Replace dollar signs and cast to int
edm_data['Assessed Value'] = edm_data['Assessed Value'].str.replace('$', '').astype(int)

In [10]:
# Filter for only residential buildings
edm_data_res = edm_data[edm_data['Assessment Class'] == 'Residential']

In [11]:
edmonton_res_heatmap_fig = gmaps.figure()
edmonton_res_heatmap_layer = gmaps.heatmap_layer(edm_data_res[['Latitude', 'Longitude']])
edmonton_res_heatmap_fig.add_layer(edmonton_res_heatmap_layer)
edmonton_res_heatmap_fig

![](https://s3.ca-central-1.amazonaws.com/2017edmfasatb/edmonton_property_assessment/images/density.png)

Alright, so with a heatmap, we're purely looking at _**density**_. There are a lot of units downtown and near Whyte, which totally make sense. Lot's of condos and apartments. We are weighting line of data (each unit) equally here, so a condo with 50 units on one block will look 10x more dense as 5 large houses on one block. There is also quite a bit of density now along Edmonton South.

Another thing that I'm seeing right off the bat is that there seems to be data missing in Edmonton east and Edmonton NW. I have two theories:
1. This is under another jurisdiction (unlikely)
2. These are no "residential" units in these areas per se, and rather they are more industrial (remember I took out commercial from the data set)

That strip along Calgary Trail is blank as well, and I know for a fact that it's basically all commercial properties there, so I'm inclined to think that largely it's due to \#2, but maybe the city just doesn't have this data for another reason.

Let's take a look at the top 50 communities mapped out.

In [12]:
# Generate statistics per neighborhood
edm_data_neighbour_grouped = edm_data_res.groupby(['Neighbourhood', 'Assessment Class']).agg({
    'Assessed Value': [np.mean, np.size],
    'Latitude': [np.mean],
    'Longitude': [np.mean]
}).reset_index()

In [13]:
# Show most valued neighbourhoods with greater than 20 units
most_valuable_50_neighbourhoods = edm_data_neighbour_grouped[edm_data_neighbour_grouped[('Assessed Value', 'size')] > 20].sort_values([('Assessed Value', 'mean')], ascending = False).head(50)
most_valuable_50_neighbourhoods.columns = most_valuable_50_neighbourhoods.columns.droplevel(-1)

In [14]:
# Check results
most_valuable_50_neighbourhoods

Unnamed: 0,Neighbourhood,Assessment Class,Latitude,Assessed Value,Assessed Value.1,Longitude
313,THE UPLANDS,Residential,53.465887,2974396.0,24,-113.658513
196,MAPLE RIDGE INDUSTRIAL,Residential,53.503755,1095500.0,21,-113.351915
330,WESTBROOK ESTATE,Residential,53.469744,1002055.0,338,-113.548747
269,RIVERVIEW AREA,Residential,53.428226,879403.8,156,-113.669871
341,WINDSOR PARK,Residential,53.52456,874942.5,583,-113.534766
89,DECOTEAU,Residential,53.414599,874164.9,97,-113.385806
22,ASTER,Residential,53.454904,844000.0,22,-113.352495
276,RURAL NORTH EAST SOUTH STURGEON,Residential,53.627509,828682.8,268,-113.334802
106,ELLERSLIE INDUSTRIAL,Residential,53.414771,773386.4,22,-113.486126
244,QUESNELL HEIGHTS,Residential,53.509377,756525.4,138,-113.573524


In [15]:
# Plot most highly valued 50 communities with at least 20 units
edm_top_50_layer = gmaps.symbol_layer(
    most_valuable_50_neighbourhoods[['Latitude', 'Longitude']], 
    fill_color = "green", 
    stroke_color = "green", 
    scale = 2,
    info_box_content = most_valuable_50_neighbourhoods['Neighbourhood'].tolist()
)
edm_top_50_fig = gmaps.figure()
edm_top_50_fig.add_layer(edm_top_50_layer)
edm_top_50_fig

![](https://s3.ca-central-1.amazonaws.com/2017edmfasatb/edmonton_property_assessment/images/top_50_neighbourhoods.png)

I'm liking the ability to actually use fully fledged google maps inside jupyter. I can actually street view to places and check the houses out themselves.

From this map, we see a few themes for highly valued properties:
- Outskirts of town
- Along the river
- Southwest Edmonton

Some of these places I know. I actually just went to take a walk around Crestwood with my parents the other day, and I can attest to those houses being super nice. Many houses up in the millions to drag that average up. Other places (especially the outskirts) I've never been to, and judging by street view, aren't even that nice! I'm thinking maybe they are much larger plots of land and are valued more in that way.

I'm looking for more urban areas, so let's filter even one step more and only look at communities with over 200 units.

In [16]:
# Show most valued neighbourhoods with greater than 20 units
most_valuable_50_neighbourhoods_min_200_units = edm_data_neighbour_grouped[edm_data_neighbour_grouped[('Assessed Value', 'size')] > 200].sort_values([('Assessed Value', 'mean')], ascending = False).head(50)
most_valuable_50_neighbourhoods_min_200_units.columns = most_valuable_50_neighbourhoods_min_200_units.columns.droplevel(-1)

In [17]:
# Plot most highly valued 50 communities with at least 20 units
edm_top_50_min_200_units_layer = gmaps.symbol_layer(
    most_valuable_50_neighbourhoods_min_200_units[['Latitude', 'Longitude']], 
    fill_color = "green", 
    stroke_color = "green", 
    scale = 2,
    info_box_content = most_valuable_50_neighbourhoods_min_200_units['Neighbourhood'].tolist()
)
edm_top_50_min_200_units_fig = gmaps.figure()
edm_top_50_min_200_units_fig.add_layer(edm_top_50_min_200_units_layer)
edm_top_50_min_200_units_fig

![](https://s3.ca-central-1.amazonaws.com/2017edmfasatb/edmonton_property_assessment/images/top_50_neighbourhoods_min_20_units.png)

Got rid of a lot of the outskirts ones and, alas, even more pop up in Southwest Edmonton lol. I get it, LIVE IN THE SOUTHWEST. I guess it makes sense. Lot's of new developments, the river flows right through, more and more grocery stores and commercial services are popping up... what's not to love?

## Conclusion
These maps are great for data exploration, but what if I want to built some type of regression model around this? My plots thus far have only given me a sense of location... _**where**_ are some of the most expensive units? Sure, I'm mapping the most expensive communities, but even amongst these communities, I can't quite tell which ones are the most valued. I know "The Uplands" is the most expensive, but outside of that, I can't quite distinguish the others.

I'd like to
1. Summarize lat, long, and assessment value information in some type of model for regression and automation
2. Be able to visually get a sense of value in different regions of the city by average price

Let's get it on the next post.