# MountainHub API request tests from Emilio

Adapted from `retrieve_from_MountainHub.ipynb`


## Original header notes from Anthony
This code allow for direct access to the MountainHub API. Data are returned in a Pandas dataframe.

Some code is adopted from Don Setiawan's work stored in the [cso legacy api](https://github.com/communitysnowobs/cso-legacy-api).

A. Arendt 20191231

## Notes from Emilio

- 2020-1-17. https://github.com/emiliom
- The operational javascript code that requests data from the MountainHub API is in [src/import/providers/mountainhub.js](https://github.com/communitysnowobs/cso-api/blob/master/src/import/providers/mountainhub.js)
- https://api.mountainhub.com returns an error, "ResourceNotFound", instead of useful API documentation
- https://api.mountainhub.com/timeline is the working endpoint
- Are there other endpoints besides `timeline`?
- The response is a JSON with two top elements: `results` and `pagination`. `results` is a list where each element is pretty complex and has a lot of information extraneous to CSO needs, related to the user, social linkages, etc
    - The `observation` element of a `results` element seems to have the key data of interest
    - All timestamps are in epoch milliseconds. I assume they're in UTC, but it'd be good to get official confirmation.
- Looks like the API response is **paginated**, with a max number of records returned presumably set by the `limit` parameter that is passed
- Is there an API request that returns the number of results that would be returned in all "pages"
- Where are the API parameters documented? The code below has these:
```python
args = {
  'publisher': 'all',
  'limit': 10000,
  'since' : min_timestamp,
  'before' : max_timestamp,
}
```
- There is a request parameter for 'observation' type: `obs_type`. Known values so far are "snow_conditions" and "snowpack_test"; this was obtained from the current `mountainhub.js` code (see [line 20](https://github.com/communitysnowobs/cso-api/blob/master/src/import/providers/mountainhub.js#L20) and [line 27](https://github.com/communitysnowobs/cso-api/blob/master/src/import/providers/mountainhub.js#L27)) and confirmed via just one set of responses. Incorporating this understanding would probably simplify the processing of the response and would make for smaller responses.

### My conda environment

For now, while focusing on the MountainHub API alone, I wanted to keep my conda environment slim and without too many extraneous packages. So I created my own environment file, adapted from https://github.com/communitysnowobs/cso-api/blob/master/csodb.yml. I kept geospatial packages I'm not using yet, but removed everything else not being used. I also slimmmed down the jupyter packages to follow my current personal practice of running notebooks in JupyterLab where JupyterLab itself is started on its own conda env. So, the only jupyter-related package needed in that context is `ipykernel`. Here's the content of my yaml conda environment file:

```yaml
name: mountainhubapi
channels:
- conda-forge
- defaults
dependencies:
- python=3.7
- ipykernel
- geopandas
- geojson
- folium
- cartopy
- pyyaml
```

## Imports and utility functions

In [1]:
import sys
import time
import datetime

import requests
import pandas as pd
import geopandas as gpd

In [2]:
BASE_URL = 'https://api.mountainhub.com/timeline'  # Are there other API endpoints besides "timeline"?
HEADER = { 'Accept-version': '1' }  # How do we know what versions are available?

ONE_MONTH = 2592000000  # This isn't used anywhere

Load these [utility functions from the cso-legacy-api](https://github.com/communitysnowobs/cso-legacy-api/blob/master/src/common/utils.py).

Todo: move these to a script after testing.

In [3]:
def date_to_timestamp(date):
    """Converts datetime object to unix timestamp.
    Keyword arguments:
    date -- Datetime object to convert
    """
    if date is None:
        return date
    return int(time.mktime(date.timetuple())) * 1000

def timestamp_to_date(timestamp):
    """Converts unix timestamp to datetime date object.
    Keyword arguments:
    timestamp -- Timestamp to convert
    """
    if timestamp is None:
        return timestamp
    return datetime.date.fromtimestamp(timestamp / 1000)

def timestamp_to_datetime(timestamp):
    """Converts unix timestamp to datetime datetime object.
    Keyword arguments:
    timestamp -- Timestamp to convert
    """
    if timestamp is None:
        return timestamp
    return datetime.datetime.fromtimestamp(timestamp / 1000)

In [4]:
now_ts = date_to_timestamp(datetime.datetime.now())
start_ts = date_to_timestamp(datetime.datetime(2019,12,31))

In [5]:
now_ts, start_ts

(1579293256000, 1577779200000)

Load parsing function from the [MountainHub cso-legacy-api script](https://github.com/communitysnowobs/cso-webapp/blob/dev/src/csoapi/apps/cso/types/snowobs.py).

#### NOTE: the MountainHub API has different named fields than what we were previously using in the legacy script above.

The code below is an attempt to match what is now in the MountainHub API:

In [6]:
def parse_data(results):
    observations = []

    for idx, res in enumerate(results):
        obs_data = {}
        # EM: The use of this try-except-finally scheme seems awkward and makes the 
        # intent and flow more opaque -- at least for my taste.
        # I think the intent would be clearer with a simpler if-else block with 
        # handling based on res['type'] = 'observation' and res['observation']['type']
        # Also, using enumerate and storing result `idx` is not really necessary
        try:
            observation = res['observation']
            obs_data['obs_id'] = observation['_id']
            obs_data['timestamp'] = int(observation['reported_at'])
            obs_data['obs_type'] = observation['type']
            obs_data['comment'] = observation['description']
            if len(observation['details']) > 0:
                if observation['details'][0]:
                    if 'snowpack_depth' in observation['details'][0].keys():
                        obs_data['snow_depth'] = observation['details'][0]['snowpack_depth'] 
        except:    
            obs_data['obs_id'] = 'None'
            #obs_data['timestamp'] = 'None'
            obs_data['obs_type'] = 'None'
            #obs_data['snow_depth'] = some dummy value?
        finally:
            actor = res['actor']
            if 'full_name' in actor.keys():
                obs_data['author_name'] = actor['full_name']
            elif 'fullName' in actor.keys():
                obs_data['author_name'] = actor['fullName']
            obs_data['id'] = idx
            obs_data['lat'] = res['location']['coordinates'][1]
            obs_data['lng'] = res['location']['coordinates'][0]
            obs_data['source'] = 'MountainHub'
        observations.append(obs_data)

    df = pd.DataFrame.from_records(observations).dropna()

    # EM: There is no case (as far as I can see based on spot checks) 
    # where snow_depth is coded as 'undefined'. I think it accomplishes 
    # what is intended, but how that comes about is opaque.
    return df[df['snow_depth'] != 'undefined']

In [7]:
def fetch_raw_data(min_timestamp, max_timestamp, limit=10000, obs_type_filter=None):
    # is_raw_json=False is not used, so I removed it
    # Also added limit and obs_type_filter as function arguments

    args = {
      'publisher': 'all',
      'limit': limit,
      'since' : min_timestamp,
      'before' : max_timestamp,
    }
    # currently known, relevant options: snow_conditions, snowpack_test
    if obs_type_filter:
        args['obs_type'] = obs_type_filter

    response = requests.get(BASE_URL, params=args, headers=HEADER)
    data = response.json()
    results = data['results']
    
    # Added data as an additional returned variable, 
    # to provide more info for exploration and debugging
    return parse_data(results), data

## Fetch-data tests

In [8]:
df, jsonresponse = fetch_raw_data(start_ts, now_ts, 
                                  limit=500, obs_type_filter=None)

In [9]:
df.columns

Index(['obs_id', 'timestamp', 'obs_type', 'author_name', 'id', 'lat', 'lng',
       'source', 'comment', 'snow_depth'],
      dtype='object')

In [10]:
len(df)

20

In [11]:
# Move this assignment into parse_data()
df['datetime'] = df['timestamp'].apply(timestamp_to_datetime)
df.head(10)

Unnamed: 0,obs_id,timestamp,obs_type,author_name,id,lat,lng,source,comment,snow_depth,datetime
6,5e20f99f90f4304604681cf4,1579219000000.0,snowpack_test,Dusty Hanna,6,43.431503,142.641712,MountainHub,01/16/2019,90.0,2020-01-16 16:02:19.678
9,5e1fa6894fdb45291ad99625,1579133000000.0,snow_conditions,Karsten von Hoesslin,9,47.46711,-115.845984,MountainHub,Full profile. \nCTE4 Down 10\nCTM3 Down 21\nTi...,170.0,2020-01-15 15:55:34.912
18,5e1fe41c21aa1476f41f2861,1579098000000.0,snow_conditions,Brint Markle,18,40.666391,-111.538634,MountainHub,CSO Snow Depth Ob,102.0,2020-01-15 06:21:55.349
22,5e1e4b71fe74e848ff10cda0,1579044000000.0,snow_conditions,Karsten von Hoesslin,22,46.919965,-113.87488,MountainHub,Field Obs & Full Profile\nTime: 14:15\nWind: N...,73.0,2020-01-14 15:13:45.501
26,5e1e696c27c9690d79ce307d,1579025000000.0,snowpack_test,Allen OBannon,26,43.510962,-110.920887,MountainHub,"ECTN15 20 cm down, ECTN28 40 cm down all on d...",170.0,2020-01-14 10:06:29.357
29,5e1cdabe21b9f44090eae92b,1578949000000.0,snow_conditions,DKS,29,42.789151,-122.146456,MountainHub,150c to rain,239.9999909596181,2020-01-13 13:00:37.106
34,5e1cd0a221b9f44090ead68a,1578947000000.0,snow_conditions,DKS,34,42.787901,-122.150746,MountainHub,85c to rain layer,178.99999325738185,2020-01-13 12:17:28.198
36,5e1cc32521b9f44090eabf33,1578943000000.0,snow_conditions,DKS,36,42.787124,-122.142332,MountainHub,100cm to rain layer,152.99999423675655,2020-01-13 11:19:55.480
41,5e1e4aaceff2ff59509d5865,1578869000000.0,snow_conditions,Karsten von Hoesslin,41,43.497253,-110.954924,MountainHub,Weather obvs on Edelweiss \nTime: 14:19\n2778m...,170.0,2020-01-12 14:37:53.235
51,5e1a6691e8cb0a6ff7a44af4,1578788000000.0,snow_conditions,Karsten von Hoesslin,51,43.513594,-110.934426,MountainHub,Field Obs & K9 training \nTime 1625\nWinds Cal...,150.0,2020-01-11 16:20:25.887


In [12]:
df.obs_type.value_counts()

snow_conditions    15
snowpack_test       5
Name: obs_type, dtype: int64

In [13]:
df.datetime.min(), df.datetime.max()

(Timestamp('2020-01-02 09:31:37.767000'),
 Timestamp('2020-01-16 16:02:19.678000'))

### Examine the `pagination` information in the response

In [14]:
jsonresponse.keys()

dict_keys(['results', 'pagination'])

In [15]:
jsonresponse['pagination']

{'perPage': 500,
 'thisPage': 138,
 'after': 1579292058509,
 'before': 1577826355941}

In [16]:
timestamp_to_datetime(jsonresponse['pagination']['before']), timestamp_to_datetime(jsonresponse['pagination']['after'])

(datetime.datetime(2019, 12, 31, 13, 5, 55, 941000),
 datetime.datetime(2020, 1, 17, 12, 14, 18, 509000))