### Identify what it is we want to do  / loose requirements
* Want to know where we can go camping near Mount Hood
* Want to know how many sites are at a campground
* Want to know what sites (number of sites?) are accessible, near water, toilet y/n, and allow pets

#### This looks like:  
A dataset we can query with criteria and get a list of campsites that match, ideally with info and weblinks for further research.

### Explore data to determine if / how we can do that  
* Identify attributes in RIDB with this info
* Join with data from forest service websites

### Productionalize for scale
* generic code for getting RIDB data, configured by JSON for specific locations. Store this in a db / lookup so we can onboard new campsites without a deploy
* parallelizing to reduce runtime - api rate limits
* how often should the pipeline run?
* replace or append?
* metadata - data source, ingested_on timestamp
* ERrors - just like you want to know when something isnt as expected at a campground


### What are some camping analogies that might relate to what Im trying to convey about scalability?
* Send friends to multiple campgrounds to help find sites instead of everyone going to the same place - parallelization
* Come back to rendevous point because no cell service and compare what we found - 5 sites near water, no accesable sites, 
* Ikea camping chair?

In [1]:
import json
import pandas as pd
import itertools

from camping.mocks.request import requests

pd.set_option('display.max_colwidth', None)

root /Users/gizmo/dev/strata_2021/camping
facilities path: /Users/gizmo/dev/strata_2021/camping/../data/RIDB/facilities


Getting a list of facilities from RIDB
lat/long  
or state as comma delimited list of 2 char state codes  
https://ridb.recreation.gov/docs  
https://ridb.recreation.gov/ridb-access-agreement 

In [2]:
ridb_facilities_url = "https://ridb.recreation.gov/api/v1/facilities"

Start with near mt hood, expand from there

In [3]:
# NOTE: Do not change these params, mock looks for lat/long/radius
params = {"activity_id":9, "latitude":45.4977712, "longitude":-121.8211673, "radius":15}
# params = {"activity_id":9, "state":"OR"}
headers = {"accept": "application/json", "apikey": "key"}
response = requests.get(ridb_facilities_url, params, headers=headers)
# with open('../data/RIDB_OR_facilities.json', 'w') as f:
#     json.dump(response.text, f)
camping_json  = json.loads(response.text)

In [4]:
response

{'status_code': 200, 'reason': 'OK', 'text': '{"RECDATA":[{"FacilityID":"234306","LegacyFacilityID":"75167","OrgFacilityID":"AN375167","ParentOrgID":"131","ParentRecAreaID":"1102","FacilityName":"EAGLE CREEK OVERLOOK GRP SITE","FacilityDescription":"\\u003ch2\\u003eOverview\\u003c/h2\\u003e\\nEagle Creek Overlook Group Site is set on a forested bluff above the Columbia River, providing an ideal setting for family gatherings and group events.\\u003cbr/\\u003e\\u003cbr/\\u003e\\n\\nDeveloped by the Civilian Conservation Corps (CCC) in the 1930s as a place to view construction of the Bonneville Dam, this site features CCC masonry and offers expansive views of the Columbia River and mountains rising from the gorge.\\u003ch2\\u003eRecreation\\u003c/h2\\u003e\\n\\u003cp\\u003eThe Eagle Recreation Area, just a short walk or bike ride away, provides visitors with opportunities for picnicking, hiking and wildlife viewing.\\u003cbr\\u003e\\u003cbr\\u003e \\u003cbr\\u003e\\u003cbr\\u003eBonnevill

In [None]:
response

In [5]:
len(camping_json['RECDATA'])

32

In [6]:
camping_json['RECDATA'][0].keys()

dict_keys(['FacilityID', 'LegacyFacilityID', 'OrgFacilityID', 'ParentOrgID', 'ParentRecAreaID', 'FacilityName', 'FacilityDescription', 'FacilityTypeDescription', 'FacilityUseFeeDescription', 'FacilityDirections', 'FacilityPhone', 'FacilityEmail', 'FacilityReservationURL', 'FacilityMapURL', 'FacilityAdaAccess', 'GEOJSON', 'FacilityLongitude', 'FacilityLatitude', 'Keywords', 'StayLimit', 'Reservable', 'Enabled', 'LastUpdatedDate'])

In [None]:
df_ridb_camping = pd.DataFrame(camping_json['RECDATA'])

In [None]:
df_ridb_camping.columns

In [None]:
df_ridb_camping[['FacilityID', 'FacilityName']]

In [None]:
df_ridb_camping.query("FacilityID == '251434'")

In [None]:
df_ridb_camping.query("FacilityID == '251434'")[['FacilityLongitude', 'FacilityLatitude']]

For each facility we want to get campground information

In [None]:
# Hmm! Not all these facilities are campgrounds

In [None]:
df_ridb_camping[['FacilityID', 'FacilityName']]

In [None]:
# Get campsite data for each area, if no campsites then drop --- or should we?
# Consider how we would scale this out - 
# Lost Lake example

In [None]:
f"{ridb_facilities_url}/251434/campsites".split("/")[-2]

In [None]:
resp = requests.get(f"{ridb_facilities_url}/251434/campsites", headers=headers)

In [None]:
resp.status_code

In [None]:
with open('../data/RIDB/campsites/251434.json', 'w') as f:
    json.dump(resp.text, f)

In [None]:
campsites = json.loads(resp.text)

In [None]:
len(campsites['RECDATA'])

In [None]:
df_campsites = pd.DataFrame(campsites['RECDATA'])

In [None]:
df_campsites

50 campsites, compare with the lostlake website that says 125:  
https://www.fs.usda.gov/recarea/mthood/recarea/?recid=53228 45.48889, -121.82194

In [None]:
[entry for entry in df_campsites.iloc[0].ATTRIBUTES]

Do the campsite attributes have the information we are looking for? Near water, accessible...

In [None]:
set(itertools.chain(*df_campsites['ATTRIBUTES'].apply(lambda x: [entry['AttributeName'] for entry in x])))

near water: "Proximity to Water"   
pets allowed: "Pets Allowed"   
accessibility: "Accessibility" - boolean  
toilet?  - no info, but thats what the forest service sites have

In [None]:
df_campsites.to_json('../data/ridb_campsites_mock.json', orient='records')

In [None]:
with open('../data/ridb_campsites_mock.json') as f:
    ridb_json = json.loads(f.read())
    
df = pd.json_normalize(ridb_json, "ATTRIBUTES", ["CampsiteID", "FacilityID"])

In [None]:
df

In [None]:
df.query("AttributeName == 'Pets Allowed'").head()

In [None]:
df.query("AttributeName == 'Proximity to Water'")

In [None]:
df_ridb_camping.dtypes

In [None]:
df.merge(df_ridb_camping, on='FacilityID', how='left')