# Query WMATA API

We want to add distances to WMATA stations to our data, to do this we will query the WMATA API


In [1]:
import http.client, urllib.request, urllib.parse, urllib.error, ssl 
import json
import pandas as pd

In [2]:

ssl._create_default_https_context = ssl._create_unverified_context

# Thank you WMATA for the code snippet! https://developer.wmata.com/docs/services/
headers = {
    # Request headers
    'api_key': 'your_api_key_here', 
}

params = urllib.parse.urlencode({
})

try:
    conn = http.client.HTTPSConnection('api.wmata.com')
    conn.request("GET", "/Rail.svc/json/jStations?%s" % params, "{body}", headers)
    response = conn.getresponse()
    data = response.read()
    # print(data)
    conn.close()
except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))

data_json = json.loads(data.decode("UTF-8"))
wmata_stations_df = pd.DataFrame(data_json['Stations'])

In [3]:
# extract the state from the address data provided
wmata_stations_df['state'] = wmata_stations_df['Address'].apply(lambda x: x['State'])

In [4]:
wmata_stations_df[wmata_stations_df['Name'] == "Metro Center"]

Unnamed: 0,Code,Name,StationTogether1,StationTogether2,LineCode1,LineCode2,LineCode3,LineCode4,Lat,Lon,Address,state
0,A01,Metro Center,C01,,RD,,,,38.898303,-77.028099,"{'Street': '607 13th St. NW', 'City': 'Washing...",DC
27,C01,Metro Center,A01,,BL,OR,SV,,38.898303,-77.028099,"{'Street': '607 13th St. NW', 'City': 'Washing...",DC


In [5]:
wmata_stations_df['Code'].nunique() == len(wmata_stations_df)

True

Upon quick inspection it looks like there are some stations that are entered twice in this data. For example, Metro Center appears two times, once listing BL OR and SV lines, and once listing just the RD line. Each entry has a unique Code, but these duplicate stations have the same name and Lat/Lon coordinates but different codes. These entries are linked together by the StationTogether1 column containing the other entry's Code.

If we wanted just a list of unique stations and their locations we could select the names and coordinates and drop duplicates, but I want to preserve the line assignments in case those are useful later on. The format of having columns LineCode1, LineCode2, LineCode3 is also a bit unexpected, since there are only 6 lines I think having a boolean column for each line indicating whether a station services that line seems cleaner.

In [6]:
wmata_stations_df['lines_list'] = wmata_stations_df.apply(
    lambda x:[i for i in [x['LineCode1'], x['LineCode2'], x['LineCode3']] if i is not None], 
    axis=1
)

wmata_stations_df = wmata_stations_df.groupby(by=['Name', 'state', 'Lat', 'Lon'])['lines_list'].sum().reset_index()

wmata_lines = ['OR', 'SV', 'BL', 'YL', 'GR', 'RD']  
for line in wmata_lines:
    wmata_stations_df[line.lower()+'_line'] = wmata_stations_df['lines_list'].apply(lambda x: line in x)

# clean columns/column names
wmata_stations_df = (
    wmata_stations_df
    .drop(columns=['lines_list'])
    .rename(columns={
        'Name':'station_name', 
        'state':'station_state',
        'Lat':'station_lat', 
        'Lon':'station_lon',
    })
)

In [7]:
wmata_stations_df.to_csv("data/wmata_stations.csv", index=False)