# Calgary Data for Good Micro Hackathon #1

Author: Roman Auriti

Date: April 23, 2020


## Introduction
I do GIS professionally and thought that the best thing I could do to help would be to do what's called [geocoding](https://en.wikipedia.org/wiki/Geocoding) to the Kijiji datasets that have been provided. Geocoding is the process where you convert the address of a physical location to a latitude/longitude coordinate pair. It was mentioned as being a stretch goal and I thought this was no better time to help the Calgary community.

## Summary
I intend to use OpenStreetMap's geocoder, Nominatim, to do the heavy lifting for me. It has a handy set of [APIs](https://nominatim.org/release-docs/develop/api/Search/) that make this kind of work straightforward. My plan to do this is to read the input `.csv` files, put them into dataframes, then iterate through the addresses with Python's `requests` library. I will create two additional columns in the dataframes to store the latitude and longitude values where each record's values will be written after they're geocoded.

I'd like to have a quick look at some of the values that Nominatim returns to show that it works and to practice using `folium`, which is a neat library I've seen in a number of projects but never had the chance to use myself.

## Challenges
Looks like the main challenge in this project will be parsing each address into something that Nominatim can use. Hopefully the `location` values are properly separated by commas because that will make my life a lot easier.

So, without further ado, let's get started!

In [9]:
import requests
import pandas as pd
import folium
import re

In [10]:
# Read the csv
df = pd.read_csv('kijiji-page_calgary_5KfromCityCentre_longTerm_24032020.csv')

In [11]:
df.head(4)

Unnamed: 0,web-scraper-order,web-scraper-start-url,pages,pages-href,ads,ads-href,title,amount,location,Unit Type,Bedrooms,description,size
0,1585065082-166,https://www.kijiji.ca/b-apartments-condos/calg...,18,https://www.kijiji.ca/b-apartments-condos/calg...,Wanted:\n looking f...,https://www.kijiji.ca/v-apartments-condos/calg...,Wanted: looking for an appartment july 2020,$700.00,"Calgary, AB, Canada, T2M2C2(View Map)",,,"Hi! Me, my boyfriend and my dog are looking fo...",500
1,1585066081-410,https://www.kijiji.ca/b-apartments-condos/calg...,12,https://www.kijiji.ca/b-apartments-condos/calg...,BEAUTIFUL ONE BEDROOM ON THE RIVER FRONT,https://www.kijiji.ca/v-apartments-condos/calg...,BEAUTIFUL ONE BEDROOM ON THE RIVER FRONT,"$1,450.00","315 3 St SE, Calgary, AB T2G 0S3, Canada(View ...",,,Incentives: ONLY $850 DAMAGE DEPOSIT!\n*Pets a...,660
2,1585064971-137,https://www.kijiji.ca/b-apartments-condos/calg...,18,https://www.kijiji.ca/b-apartments-condos/calg...,Central Core 1 Bedroom Suite,https://www.kijiji.ca/v-apartments-condos/calg...,Central Core 1 Bedroom Suite,$965.00,"500 505 4th Ave SW, T2P0J8, 505 4th Ave SW, AB...",,,If you are seeking a great value central core ...,620
3,1585065183-193,https://www.kijiji.ca/b-apartments-condos/calg...,20,https://www.kijiji.ca/b-apartments-condos/calg...,Wanted:\n WANTED FO...,https://www.kijiji.ca/v-apartments-condos/calg...,Wanted: WANTED FOR MARCH 15TH 2020....bachelor...,$700.00,"101 8 Ave SW, Calgary, AB T2P 1B4, Canada(View...",,,need laundry facility for 700 mnth. 50 yr old ...,400


Things are already looking a little fishy with the `location` column. In the previous 4 records of the original data we have the following address patterns:

* City, Province, Country, Postal Code (Without the space)
* Appartment Number and Street Address, Province and Postal Code (With the space)
* Appartment Number and Street Address, Postal Code (without the space), Appartment Number and Street Address again, Province
* Appartment Number and Street Address, City, Province and Postal Code (With the space), Country

Let's take a longer look at the top 15 `location` values.

In [13]:
df.location.values[:15]

array(['Calgary, AB, Canada, T2M2C2(View Map)',
       '315 3 St SE, Calgary, AB T2G 0S3, Canada(View Map)',
       '500 505 4th Ave SW, T2P0J8, 505 4th Ave SW, AB(View Map)',
       '101 8 Ave SW, Calgary, AB T2P 1B4, Canada(View Map)',
       '1805 17 Street SW, Calgary, AB, T2T 4M3(View Map)',
       'Ave SE, Calgary, AB T2G 0Z1, Canada(View Map)',
       '525 56 Ave SW, Calgary, AB T2V 4Z9, Canada(View Map)',
       '211 14 Avenue SW, Calgary, AB, T2R 0M2(View Map)',
       '126 14 Ave SW, Calgary, AB T2R 0L9, Canada(View Map)',
       '419 - 1 Avenue NE, Calgary, AB, T2E 0B3(View Map)',
       '1053 10 St SW, Calgary, AB T2R 0G3, Canada(View Map)',
       ', Calgary T2V 0G5 AB, Canada(View Map)',
       '1431 37 Street SW, Calgary, AB, T3C 1S6(View Map)',
       '1540 17 Ave SW #310, Calgary, AB T2T 0C8, Canada(View Map)',
       '432 2 Avenue, T2E0E6, Calgary, AB(View Map)'], dtype=object)

## Enter Regular Expressions
Regular expressions are a powerful tool that lets us parse specific patterns out of complicated strings. 