# Testing Visual Crossing Weather API

Goal of this notebook, see if we can get year round monthly aggregated historical weather data (temperature and precipitation) from the API given any geolocation. 

https://www.visualcrossing.com/weather-api.

Pricing: 1000 free results per day. Or pro plan for 35 USD to just download the data. For these plans, will have to give attribution. See description of [pricing plans](https://www.visualcrossing.com/weather-data-editions)


In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import os
import time
from dotenv import load_dotenv
import pandas as pd
import requests
import logging 

load_dotenv()
VISUALCROSSING_KEY = os.getenv("VISUALCROSSING_KEY")

S = requests.Session()

### Get locations

In [None]:
api_data_dir = '../../api/data/'

file_name = 'wikivoyage_destinations.csv'

df_places = pd.read_csv(api_data_dir + file_name)#.set_index("id", drop=False)

### Historical summaries query

This query can be used to fetch exactly what we want. Using the only API query editor we got the following query:

```
https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/weatherdata/historysummary?aggregateHours=24&combinationMethod=aggregate&maxStations=-1&maxDistance=-1&minYear=2000&maxYear=2020&chronoUnit=months&breakBy=self&dailySummaries=false&contentType=json&unitGroup=metric&locationMode=single&key=W8TURNQ78VLNPBK3MYPCMQDYS&dataElements=default&locations=25.7617%2C-80.1918
```

Let's translate that into a nice python call.

See docs on [historical summaries api](https://www.visualcrossing.com/resources/documentation/weather-api/weather-api-documentation/) for more details.

First compose a string with the geolocations of the places to query. It seems we get a time-out error if we query more than 4 destinations at once... 

In [None]:
from stairway.apis.visualcrossing.monthly_weather import get_visualcrossing_monthly_weather, await_completion

In [None]:
n_locations = 2

# add a column with comma seperated geolocation string
df_places = df_places.assign(location = lambda df: 
                             df['lat'].round(6).astype(str) + "," + df['lng'].round(6).astype(str))

def create_locations_string(df):
    return '|'.join(df['location'].to_list())

# crate pipe seperated geolocations string
locations = create_locations_string(df_places.sample(n_locations))
locations

Then call the API. Note that the call takes quite some time to compute from their end. Therefore, submit calls [asynchronously](https://www.visualcrossing.com/resources/documentation/weather-api/how-to-submit-weather-api-asynchronously/).

In [None]:
R = get_visualcrossing_monthly_weather(locations, S, VISUALCROSSING_KEY)

R.content

In [None]:
data = await_completion(R, S)

### Parsing the result

The results are in a nested json. This can easily be denormalized using the pandas `json_normalize()` function.

Add the `name` and `tz` columns as additional metadata. `name` to join with the places dataframe, timezone for who knows what future purpose. Better save it if we are getting it anyway.

In [None]:
df = pd.io.json.json_normalize(data["locations"], "values", ["name", "tz"])

print(df.shape)

# df.head()

Voila! 

After having queried all data, just join with the places dataframe to attach the stairway id.

In [None]:
df_out = (
    pd.merge(df_places[['id', 'location']], df,  
             how='inner', left_on=['location'], right_on = ['name'])
    .drop(columns=['name', 'location'])
)

print(df_out.shape)

df_out.head()

### Automation

Implemented a script that does the above automatically in `scripts/visualcrossing_monthly_weather.py`.

However, as we are only allowed to query 10 places at once and each query of 10 places takes around a minute, this would consume a lot of time to run for all places in our dataset. Therefore, try multithreading.

*Note:* Having the threaded script working, there is no reason to keep this script as we can also set the number of threads to 1. This script is therefore archived (i.e. deleted).

### Multithreading

Implemented in script `scripts/visualcrossing_monthly_weather_threaded.py`.

This scripts sends the asynchronous API calls from multiple threads at the same time and waits for the data to come in. Each threat then parses the nested json data and appends it to a shared output csv file.

## Analysing output

We have gathered data with both the single threaded script and the multithreaded script. Also, the script wasn't perfect when we started querying the API, so we need to check which records are missing and try to see if we can fill in the blanks by running the query script once more. There might also be duplicate records due to script restarts on places that were already added.

First combine both datasets after removing the duplicates:

In [None]:
df_1 = pd.read_csv('../../data/visualcrossing/visualcrossing_monthly_weather.csv').drop_duplicates()
df_2 = (
    pd.read_csv('../../data/visualcrossing/visualcrossing_monthly_weather_threaded.csv')
    .drop_duplicates()
    .drop('name', axis=1)  # Drop the name column only generated in the threaded script
)

df_out = pd.concat([df_1, df_2], axis=0, ignore_index=True, sort=False)

print(df_out.shape)

Examine how many of the dataset we got 12 records for:

In [None]:
import collections

collections.Counter(df_out['stairway_id'].value_counts().values)

Let's try again with all places that are missing or that have less than 12 records:

In [None]:
# create list of places that we have succesfully queried
successful_places = (
    df_out
    .groupby('stairway_id', as_index=False)
    .agg(
        count=pd.NamedAgg(column="stairway_id", aggfunc="count"),
    )
    .loc[lambda df: df['count'] == 12]  # successful means having 12 months
    ['stairway_id']
    .to_list()
)

# subset places that we want to try again.
df_retry = df_places.loc[~df_places['id'].isin(successful_places)]

print('Successful:', len(successful_places))
print('Yet to do:', len(df_retry))

Save the retry dataset somewhere and rerun the multithreaded script on this one.

In [None]:
df_retry[['id', 'lat', 'lng']].to_csv('../../data/visualcrossing/wikivoyage_destinations_retry.csv', index=False)

Done.