Code libraries that we load in to make our analysis possible. Note that `tqdm` is just used to render a progress bar as we load in data from OpenStreetMap, and `overpy` is used to access the OpenStreetMap data from the internet.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tqdm import tqdm

import overpy

Here, we define the NC counties that we want to pull data about.

In [2]:
nc_counties = [
    "Wake County", "Harnett County", "Johnston County", "Wilson County", "Nash County",
    "Franklin County", "Granville County", "Durham County", "Orange County", "Chatham County"
]

First, we establish a connection to the OpenStreetMap internet servers.

In [3]:
api = overpy.Overpass()

We then define a function to generate a query in the somewhat difficult-to-read query language used by the OpenStreetMap servers. This query looks for `variety_store` (e.g. dollar stores), `supermarket`, and `convenience` stores in an OpenStreetMap "area" (e.g. a county) named with the `area_name` variable. This query is structure so that we ask for information on the coordinates, store name, brand name, and operating company, for all of the stores in the area we search.

In [20]:
def generate_query(area_name):
    return f'area["name"="{area_name}"]->.searchArea;(node["shop"="variety_store"](area.searchArea);way["shop"="variety_store"](area.searchArea);relation["shop"="variety_store"](area.searchArea);node["shop"="supermarket"](area.searchArea);way["shop"="supermarket"](area.searchArea);relation["shop"="supermarket"](area.searchArea);node["shop"="convenience"](area.searchArea);way["shop"="convenience"](area.searchArea);relation["shop"="convenience"](area.searchArea););out;>;out skel qt;'

We then actually query the OpenStreetMap servers, using our custom query function defined above. Then, we save this data in lists.

In [21]:
store_name = []
brand      = []
operator   = []
shop       = []
latitude   = []
longitude  = []
county     = []

for county_name in tqdm(nc_counties):
    query = generate_query(county_name)
    result = api.query(query)

    for way in result.ways:
        store_name.append( way.tags.get("name", "n/a") )
        brand.append( way.tags.get("brand", "n/a") )
        operator.append( way.tags.get("brand", "n/a") )
        shop.append( way.tags.get("shop", shop_type) )
        latitude.append( way.nodes[0].lat )
        longitude.append( way.nodes[0].lon )
        county.append(county_name)

100%|███████████████████████████████████████████| 10/10 [01:03<00:00,  6.32s/it]


Finally, we write these data to a `DataFrame`, a database-like data structure, and save that data structure to the local computer in `.pkl` (short for pickle!) format. The resulting file is `map.pkl`, in the same directory as this code.

In [22]:
map_df = pd.DataFrame(dict(
    store_name=store_name, 
    brand=brand, 
    operator=operator, 
    shop=shop,
    latitude=latitude, 
    longitude=longitude,
    county=county
))

In [25]:
map_df.to_pickle("./map.pkl")