# Testing Visual Crossing Weather API

Goal of this notebook, see if we can get year round monthly aggregated historical weather data (temperature and precipitation) from the API given any geolocation. 

https://www.visualcrossing.com/weather-api.

Pricing: 1000 free results per day. Or pro plan for 35 USD to just download the data. For these plans, will have to give attribution. See description of [pricing plans](https://www.visualcrossing.com/weather-data-editions)


In [None]:
import os
from dotenv import load_dotenv
import pandas as pd
import requests

load_dotenv()
VISUALCROSSING_KEY = os.getenv("VISUALCROSSING_KEY")

S = requests.Session()

### Get locations

In [None]:
api_data_dir = '../../api/data/'

file_name = 'wikivoyage_destinations.csv'

df_places = pd.read_csv(api_data_dir + file_name).set_index("id", drop=False)

### Historical summaries query

This query can be used to fetch exactly what we want. Using the only API query editor we got the following query:

```
https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/weatherdata/historysummary?aggregateHours=24&combinationMethod=aggregate&maxStations=-1&maxDistance=-1&minYear=2000&maxYear=2020&chronoUnit=months&breakBy=self&dailySummaries=false&contentType=json&unitGroup=metric&locationMode=single&key=W8TURNQ78VLNPBK3MYPCMQDYS&dataElements=default&locations=25.7617%2C-80.1918
```

Let's translate that into a nice python call.

See docs on [historical summaries api](https://www.visualcrossing.com/resources/documentation/weather-api/weather-api-documentation/) for more details.

First compose a string with the geolocations of the places to query. It seems we get a time-out error if we query more than 4 destinations at once... 

In [None]:
n_locations = 4

df_places = df_places.assign(location = lambda df: 
                             df['lat'].round(6).astype(str) + "," + df['lng'].round(6).astype(str))

def create_locations_string(df):
    return '|'.join(df['location'].to_list())

locations = create_locations_string(df_places.sample(n_locations))
locations

Then call the API:

In [None]:
URL = "https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/weatherdata/historysummary"

PARAMS = {
    "aggregateHours":24,
    "combinationMethod":"aggregate",
#     "maxStations":-1,  # defaults to 3
#     "maxDistance":-1,  # defaults to 50,000m
    "minYear":1990,
    "maxYear":2020,
    "chronoUnit":"months",
    "breakBy":"self",
    "dailySummaries": False,
    "contentType":"json",
    "unitGroup":"metric",
    "locationMode":"array", # set to locationMode=array when querying multiple destinations
    "key": VISUALCROSSING_KEY,
    "dataElements":"default",
    "locations": locations
}

R = S.get(url=URL, params=PARAMS)
DATA = R.json()

In [None]:
# print(DATA)

In [None]:
# contains info on each of the columns, including the metric
# DATA['columns']

### Parsing the result

The results are in a nested json. This can easily be denormalized using the pandas `json_normalize()` function.

Add the `name` and `tz` columns as additional metadata. `name` to join with the places dataframe, timezone for who knows what future purpose. Better save it if we are getting it anyway.

In [None]:
df = pd.io.json.json_normalize(DATA["locations"], "values", ["name", "tz"])

print(df.shape)

# df.head()

Voila! 

After having queried all data, just join with the places dataframe to attach the stairway id.

In [None]:
df_out = (
    pd.merge(df_places[['id', 'location']], df,  
             how='inner', left_on=['location'], right_on = ['name'])
    .drop(columns=['name', 'location'])
)

print(df_out.shape)

df_out.head()

Done.