# Checking 48 Mountain Weather Locations at Once
> Using Async Python to feed a Streamlit Dashboard

- toc: true 
- badges: true
- comments: true
- categories: [streamlit, python, async, intermediate]
- image: images/weather.png

# Peak Weather: Checking New Hampshire's 48 4,000 Footers

Check it out [live on streamlit cloud](https://share.streamlit.io/gerardrbentley/peak-weather/main/streamlit_app/streamlit_app.py)

Built to give you a dashboard view of the next few hours' forecast for New Hampshires 48 4,000 ft mountains.
Gonna rain on the Kinsmans?
Is it snowing on Washington?
Should I hike Owl's Head?

Powered by [Streamlit](https://docs.streamlit.io/) + [Open Weather API](https://openweathermap.org/api).
Specifically, Streamlit runs the web interactinos and OpenWeather provides the data.

This post will go over a few aspects of the app:

- Data scraping the mountain metadata
- Connecting to Weather API feed
- Making it reasonably fast 

## Data Scraping

I couldn't find an easy csv or api for the latitudes and longitudes of the 48 4,000 footers, so I turned to [Wikipedia](https://en.wikipedia.org/wiki/Four-thousand_footers) for the list.

### Try Pandas

The [`read_html()`](https://pandas.pydata.org/docs/reference/api/pandas.read_html.html) function in Pandas has been a sanity saver in my job for reading data from flat file specification documents.

Unfortunately the data I'm looking for in Wikipedia is in `<li>...</li>` tags, not a real html `<table>...</table>`

### Naive Copy+Paste

Next I tried just copying the list of names and heights to feed to a search API, yielding a csv like the following after some cleanup:

```txt
name,height_ft
Washington,6288
Adams,5774
Jefferson,5712
```

And this gives us csv access to the data like so:


In [None]:
import pandas as pd
mountains = pd.read_csv('./data/mtns.txt')
mountains.head(3)

### A-Links to the Rescue

Now with the list of peaks, I needed the corresponding latitude and longitudes.

After searching for a straightforward source, I realized the Wikipedia pages linked from the main list page were the best...

I grabbed the portion of the html with the list to a file with dev tools (chrome f12), but could have been done with BeautifulSoup

#### Scrape Mountain Links


In [None]:
from bs4 import BeautifulSoup
# Chunk from 4,000 footers page containing list of mountains
# https://en.wikipedia.org/wiki/Four-thousand_footers
soup = BeautifulSoup(open("./data/wiki.html"), "html.parser")

# Gather <a> tags, ignore citation
links = [x for x in soup.find_all("a") if x.get("title")]
links[:2]

#### Get Lat Lon For One Mountain

With access to the `href` attributes of the `<a>` tags, I could then fetch all of those pages and scrape out the Lat and Lon from each.

Most older guides will use Python's `requests` library for this kind of task, but that library does not have the ability to send asynchronous requests without multiprocessing (Translation: It's difficult to fetch a bunch of pages all at once).

I've found success with [`httpx`](https://www.python-httpx.org/) and [`aiohttp`](https://docs.aiohttp.org/en/stable/) for making asynchronous requests in one Python process.
So I went with `httpx` for fetching each page.

Lets demonstrate fetching one of those pages and scraping the Latitude and Longitude.
We won't worry too much about errors or missed data for this cleaning phase.

In [None]:
import httpx
# English Wikipedia
BASE_URL = "https://en.wikipedia.org"

def convert(raw_tude: str) -> float:
    """Takes a wikipedia latitude or longitude string and converts it to float
    Math Source: https://stackoverflow.com/questions/21298772/how-to-convert-latitude-longitude-to-decimal-in-python

    Args:
        raw_tude (str): Lat or Lon in one of the following forms:
            degrees°minutes′seconds″N,
            degrees°minutes′N,
            degrees-minutes-secondsN,
            degrees-minutesN

    Returns:
        (float): Float converted lat or lon based on supplied DMS
    """
    tude = raw_tude.replace("°", "-").replace("′", "-").replace("″", "")
    if tude[-2] == "-":
        tude = tude[:-2] + tude[-1]
    multiplier = 1 if tude[-1] in ["N", "E"] else -1
    return multiplier * sum(
        float(x) / 60 ** n for n, x in enumerate(tude[:-1].split("-"))
    )

a_link = links[0]
a_link

In [None]:
# bs4 lets us "get" html tag attributes as in python dicts
name = a_link.get("title")
link = a_link.get("href")

# httpx lets us fetch the raw html page
raw_page = httpx.get(BASE_URL + link)
# Which bs4 will help parse
raw_soup = BeautifulSoup(raw_page, "html.parser")

# find returns first instance of a tag with this class
raw_lat = raw_soup.find(class_="latitude").text.strip()
lat = convert(raw_lat)
raw_lon = raw_soup.find(class_="longitude").text.strip()
lon = convert(raw_lon)

name, link, lat, lon

#### Get Lat Lon For Many Mountains

Lets chuck the first 10 mountains into a for-loop and fetch the same pieces of data.

First we'll define a function to encapsulate the synchronous fetch logic

Then we'll see how long this takes with jupyter's `%%time` magic

In [None]:
def sync_get_coords(a_link: BeautifulSoup) -> dict:
    name = a_link.get("title")
    link = a_link.get("href")
    raw_page = httpx.get(BASE_URL + link)
    raw_soup = BeautifulSoup(raw_page, "html.parser")
    raw_lat = raw_soup.find(class_="latitude").text.strip()
    lat = convert(raw_lat)
    raw_lon = raw_soup.find(class_="longitude").text.strip()
    lon = convert(raw_lon)
    return {"name": name, "link": link, "lat": lat, "lon": lon}

In [None]:
%%time

for a_link in links[:10]:
    result = sync_get_coords(a_link)
    print(result)

Results will vary by machine, internet connection, Wikipedia server status, and [butterly wing flaps](https://xkcd.com/378/).

Mine were like this the first time around:

```txt
CPU times: user 2.25 s, sys: 65.1 ms, total: 2.31 s
Wall time: 5.47 s
```

#### Faster Fetching

We're not using the asynchronous capabilities of `httpx` yet, so each of the 10 requests to Wikipedia needs to go over the wire and back in order for the next request to start.

How about we speed things up a little (Jupyter `%%time` doesn't work on async cells):

In [None]:
import asyncio
async def get_coords(client: httpx.AsyncClient, a_link: BeautifulSoup) -> dict:
    """Given http client and <a> link from wikipedia list,
    Fetches the place's html page,
    Attempts to parse and convert lat and lon to decimal from the page (first occurrence)
    Returns entry with keys: "name", "link", "lat", "lon"

    Args:
        client (httpx.AsyncClient): To make requests. See httpx docs
        a_link (BeautifulSoup): <a> ... </a> chunk

    Returns:
        dict: coordinate entry for this wikipedia place
    """    
    name = a_link.get("title")
    link = a_link.get("href")
    raw_page = await client.get(BASE_URL + link)
    raw_soup = BeautifulSoup(raw_page, "html.parser")
    raw_lat = raw_soup.find(class_="latitude").text.strip()
    lat = convert(raw_lat)

    raw_lon = raw_soup.find(class_="longitude").text.strip()
    lon = convert(raw_lon)

    return {"name": name, "link": link, "lat": lat, "lon": lon}


async def gather_coords(links: list) -> list:
    """Given List of a links, asynchronously fetch all of them and return results"""
    async with httpx.AsyncClient() as client:
        tasks = [asyncio.ensure_future(get_coords(client, link)) for link in links]
        coords = await asyncio.gather(*tasks)
        return coords

In [None]:
from timeit import default_timer as timer
start = timer()
# Async get all lat lon as list of dictionaries
coords = await gather_coords(links[:10])
end = timer()
print(*coords[:10], f"{end - start :.2f} seconds", sep='\n')

In [None]:
2.16 / 5.47

40% of the time spent scraping data, sounds good to me!

#### Data Cleaning

If you thought the "finds first occurrence" strategy for scraping latitude and longitude was going to cause errors, cheers to you.

Turns out just a few mountains have multiple peaks that count as 4,000 footers, so these mountains have 2 sets of latitudes and longitudes.

I fetched these by hand and said LGTM with my csv of:
- Mountain Names
- Heights
- Latitudes
- Longitudes

## Weather Scraping

I figured there's probably a free open API for accessing weather data, and a quick google found two that caught my eye:

- [OpenWeatherMap](https://openweathermap.org/api)
- [Weather.gov](https://www.weather.gov/documentation/services-web-api)

It's a free API, but this was the selling point for OpenWeatherMap for this Proof-of-Concept project:

The [`One Call API`](https://openweathermap.org/api/one-call-api) provides the following weather data for any geographical coordinates:

- *Current weather*
- *Minute forecast* for 1 hour
- *Hourly forecast* for 48 hours
- *Daily forecast* for 7 days
- National weather *alerts*
- *Historical* weather data for the previous 5 days

### API Signup and Prep

Getting a free account and key was straightforward involving just an email address verification link.

Then off to the races with the following documentation (there's more on their site in better formatting):

```sh
# One Call URL
https://api.openweathermap.org/data/2.5/onecall?lat={lat}&lon={lon}&exclude={part}&appid={API key}
```

**Parameters**

`lat`, `lon`: *required* 
Geographical coordinates (latitude, longitude)

`appid`: *required* 
Your unique API key (you can always find it on your account page under the "API key" tab)

In [None]:
from pydantic import BaseSettings


class Settings(BaseSettings):
    """Handles fetching configuration from environment variables and secrets.
    Type-hinting for config as a bonus"""

    open_weather_api_key: str


settings = Settings()


class WeatherUnit:
    STANDARD = "standard"
    KELVIN = "standard"
    METRIC = "metric"
    IMPERIAL = "imperial"


def get_one_call_endpoint(
    lat: float,
    lon: float,
    units: WeatherUnit = WeatherUnit.IMPERIAL,
    exclude="",
    lang="en",
):
    if exclude != "":
        exclude = f"&exclude={exclude}"
    return f"https://api.openweathermap.org/data/2.5/onecall?lat={lat}&lon={lon}&units={units}{exclude}&lang={lang}&appid={settings.open_weather_api_key}"


def get_one_call_data(lat, lon):
    endpoint = get_one_call_endpoint(lat, lon)
    print(f"Fetching from '{endpoint}'")
    response = httpx.get(endpoint)
    return response.json()


### Test One Location

I included some of the API parameters as endpoint configuration options as I messed around with it.

For this use case these defaults are sensible to me:

- American users -> `units = Imperial`
- English speaking users -> `lang="en"`
- Exclude -> don't care too much about some extra data coming over to the server

Lets see what we get for a live mountain location!

In [None]:
mount_washington = coords[0]
mount_washington

In [None]:
get_one_call_data(mount_washington['lat'], mount_washington['lon'])

### Fetch for Many Locations

Using the same scaffolding as the Wikipedia asynchronous scrape, the helper code for the main streamlit app also relies on `httpx` to fetch 48 responses quickly.

In [None]:

async def async_get_one_call_data(client: httpx.AsyncClient, lat: float, lon: float) -> dict:
    """Given http client and valid lat lon, retrieves open weather "One call" API data

    Args:
        client (httpx.AsyncClient): To make requests. See httpx docs
        lat (float): lat of the desired location
        lon (float): lon of the desired location

    Returns:
        dict: json response from Open Weather One Call
    """
    endpoint = get_one_call_endpoint(lat, lon)
    response = await client.get(endpoint)
    return response.json()


async def gather_one_call_weather_data(lat_lon_pairs: list) -> list:
    """Given list of tuples of lat, lon pairs, will asynchronously fetch the one call open weather api data for those pairs

    Args:
        lat_lon_pairs (list): Destinations to get data for

    Returns:
        list: List of dictionaries which are json responses from open weather
    """
    async with httpx.AsyncClient() as client:
        tasks = [
            asyncio.ensure_future(async_get_one_call_data(client, lat, lon))
            for lat, lon in lat_lon_pairs
        ]
        one_call_weather_data = await asyncio.gather(*tasks)
        return one_call_weather_data

## Web App Component

Goals from the start:
- Usable UI for comparing / viewing weather on 48 locations (mobile-friendly for hikers)
- Not sluggish to load data or click through page after page to get different mountains / times
- Good uptime

Other technical considerations:
- Obeying API limits
    - API key security
- Streamlit resource limits
    - Cloud host or self host

### Caching Data

There are 2 main points of loading data in the app:

- Load the list of mountains, heights, lats, lons
- Fetch live data from OpenWeatherMap for all locations

With Streamlit, decorating a function with `@st.cache()` will save the computed result so that it can be loaded faster by the next user!

#### Caching Mountain Data

The first list is static, and purely for convenience of fetching columns I load it in with `pandas`. (In hindsight I could have at least reset the index after sorting).

Leaving the default arguments lets this dataset get cached indefinitely (until the app gets shut down / restarted)

*note:* `st.cache` decorators commented out in notebook

In [None]:
import pandas as pd
# import streamlit as st

#@st.cache()
def load_metadata() -> pd.DataFrame:
    """Function to read mountain lat, lon, and other metadata and cache results

    Returns:
        pd.DataFrame: df containing information for 48 mountains
    """
    df = pd.read_csv("./data/mountains.csv")
    df = df.sort_values("name")
    return df

load_metadata().head()

#### Caching Weather Data

With this dataset I don't want to cache things indefinitely.
In fact, we want it to update as often as the API limits will allow us to query it!

Setting a `ttl` or "Time To Live" value in `st.cache(ttl=...)` will cause the cache to bust if the precomputed result is longer than the provided time.

We'll set the `ttl` to 60 minutes to respect OpenWeatherMaps.

This means that if 100 users all open the app within 59 minutes of one another then only 1 request to `load_data()` would actually go to OpenWeatherMaps. The other 99 requests would use the cached result.

When any user opens it 61 minutes after the first user, the cache will be busted and another request to OpenWeatherMaps will refresh all of the 48 mountains' weather data in the app.

In [None]:
@st.cache(ttl=60 * 60)
def load_data(lat_lon_pairs: list) -> list:
    """Function to fetch Open Weather data and cache results

    Args:
        lat_lon_pairs (list): Destinations to get data for

    Returns:
        list: List of dictionaries which are json responses from open weather
    """
    data = asyncio.run(gather_one_call_weather_data(lat_lon_pairs))
    return data

### Bonuses

#### Display future forecast

Hikers don't need to know just the weather right now.
They also need to know the next few hours' forecast.

The OpenWeatherMaps data provides temperature and weather event forecasts hourly.

So how about a row across the screen with 5 hours of data in 5 even columns.

Feels good on desktop, but a horrendous amount of scrolling past locations you don't care about on mobile.

`st.expander()` provides a way to tuck sections away in a drop down hide/expand section.

Then using `st.columns()` we can get an iterator over `x` amount of columns.
Zipping this with the hourly results starting from the next hour gives a nice way to match up layout to data.
It also gives some flexibility for how many columns to include.

```py
response = load_data()[0]
current_temperature = round(response["current"]["temp"], 1)

with st.expander("Expand for future forecast:"):
    for col, entry in zip(st.columns(5), response["hourly"][1:]):
        col.write(f"{clean_time(entry['dt'])}")
        
        temperature = round(entry["temp"], 1)
        col.metric(
            "Temp (F)", temperature, round(temperature - current_temperature, 1)
        )
        current_temperature = temperature
```

#### Jump Link Table

Using the app on mobile even with expander sections was too much scrolling.

I thought a Markdown table of links would be more straightforward, but I would up doing a bunch of string mangling to get it running.

Having anchors on most commands such as `st.title()` is great for in-page navigation

In [None]:
def get_mtn_anchor(mountain: str) -> str:
    anchor = mountain.lower().replace(" ", "-")
    return f"[{mountain}](#{anchor})"

mountains = load_metadata()

table = []

table.append("| Mountains |  |  |")
table.append("|---|---|---|")
for left, middle, right in zip(
    mountains.name[::3], mountains.name[1::3], mountains.name[2::3]
):
    table.append(
        f"| {get_mtn_anchor(left)} | {get_mtn_anchor(middle)} | {get_mtn_anchor(right)} |"
    )
# st.markdown("\n".join(table))
"\n".join(table)