# Clustering and Scoring Job Relocation Opportunities - Playground Notebook

Austin Rainwater

---

# Initialization

In [24]:
!pip install --quiet --upgrade sqlalchemy pymysql

from helpers import RenderJSON
from urllib import quote as url_encode

import pandas as pd
import numpy as np

from sqlalchemy import (
    create_engine,
    Table,
    Column,
    MetaData,
    String,
    Numeric
)

import yaml

with open('secrets.yaml', 'r') as secrets_file:
    secrets = yaml.safe_load(secrets_file)
    
user_agent = 'datascience jupyter notebook/0.0 (https://github.com/pacorain/datascience-certification-final-project; Austin Rainwater, paco@heckin.io)'

ImportError: cannot import name 'quote' from 'urllib' (/opt/conda/lib/python3.8/urllib/__init__.py)

---

# City Definition

Obviously, a good place for me to start is with some cities. Below is the table definition for the cities I will be exploring and their specific traits.

In [16]:
engine = create_engine(secrets['db_connection_string'], echo=True)

meta = MetaData()

cities = Table(
    'cities', meta,
    Column('city_name', String(50), primary_key=True, comment='City Name'),
    Column('metro_name', String(50), comment='Metropolitan Area Name'),
    Column('state', String(2), nullable=False, comment='2-Letter abbreviation of State'),
    Column('lat', Numeric(10, 6), nullable=False, comment='Latitude of City'),
    Column('lng', Numeric(10, 6), nullable=False, comment='Longitude of City')
)

meta.drop_all(engine)
meta.create_all(engine)

2020-11-21 02:05:27,126 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2020-11-21 02:05:27,127 INFO sqlalchemy.engine.base.Engine {}
2020-11-21 02:05:27,135 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2020-11-21 02:05:27,135 INFO sqlalchemy.engine.base.Engine {}
2020-11-21 02:05:27,144 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2020-11-21 02:05:27,145 INFO sqlalchemy.engine.base.Engine {}
2020-11-21 02:05:27,151 INFO sqlalchemy.engine.base.Engine show collation where `Charset` = 'utf8mb4' and `Collation` = 'utf8mb4_bin'
2020-11-21 02:05:27,152 INFO sqlalchemy.engine.base.Engine {}
2020-11-21 02:05:27,169 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2020-11-21 02:05:27,171 INFO sqlalchemy.engine.base.Engine {}
2020-11-21 02:05:27,183 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2020-11-21 02:05:27,184 INFO sqlalchemy.engine.base.E

Let's start with my birthplace: Fort Wayne, Indiana.

In [18]:
new_city = cities.insert()

try:
    engine.execute(new_city, [
        {'city_name': 'Fort Wayne', 'metro_name': 'Fort Wayne', 'state': 'IN'}
    ])
except:
    print("Oops! That didn't work.")

2020-11-21 02:08:32,168 INFO sqlalchemy.engine.base.Engine INSERT INTO cities (city_name, metro_name, state) VALUES (%(city_name)s, %(metro_name)s, %(state)s)
2020-11-21 02:08:32,169 INFO sqlalchemy.engine.base.Engine {'city_name': 'Fort Wayne', 'metro_name': 'Fort Wayne', 'state': 'IN'}
2020-11-21 02:08:32,172 INFO sqlalchemy.engine.base.Engine ROLLBACK
Oops! That didn't work.


Ah, the table requires some more data to be able to insert the record. I could use the geocoder library from before to get the latitude and longitude, but since I will be using Wikipedia anyway, let's see if I can grab it from there.

I did some experimenting with the [Wikipedia API Sandbox](https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=Fort%20Wayne%2C%20Indiana&redirects=1&prop=wikitext), and oddly enough while there are multiple endpoints capable of getting the _names_ of the templates used in a page, I could not for the life of me find a way to get the _data inserted to_ the templates in an easy format such as JSON. So instead, I'm going to grab the [wikitext](https://en.wikipedia.org/wiki/Help:Wikitext) and use [regular expressions](https://docs.python.org/3/library/re.html) to get the data I'm looking for out of the template.

In [22]:
city_name = 'Fort Wayne'
state_name = 'IN'


wikipedia_url = 'https://en.wikipedia.org/w/api.php'
params = {
    "action": "parse",
    "format": "json",
    "redirects": "1",
    "page": url_encode(f'{city_name}, {state_name}'),
    "prop": "wikitext"
}