### Identify what it is we want to do  / loose requirements
* Want to know where we can go camping near Mount Hood
* Want to know how many sites are at a campground
* Want to know what sites (number of sites?) are accessible, near water, toilet y/n, and allow pets

#### This looks like:  
A dataset we can query with criteria and get a list of campsites that match, ideally with info and weblinks for further research.

### Explore data to determine if / how we can do that  
* Identify attributes in RIDB with this info
* Join with data from forest service websites

### Productionalize for scale
* generic code for getting RIDB data, configured by JSON for specific locations. Store this in a db / lookup so we can onboard new campsites without a deploy
* parallelizing to reduce runtime - api rate limits
* how often should the pipeline run?
* replace or append?
* metadata - data source, ingested_on timestamp
* ERrors - just like you want to know when something isnt as expected at a campground


### What are some camping analogies that might relate to what Im trying to convey about scalability?
* Send friends to multiple campgrounds to help find sites instead of everyone going to the same place - parallelization
* Come back to rendevous point because no cell service and compare what we found - 5 sites near water, no accesable sites, 
* Ikea camping chair?

In [1]:
from csv import DictReader
import geopandas as gpd
import json
import pandas as pd
import itertools

from camping.mocks.request import RequestsMock
from camping.util.scraper import Scraper
from camping.util.distance import distance_merge

def max_col_width(w=100):
    pd.set_option('display.max_colwidth', w)

ridb_facilities_url = "https://ridb.recreation.gov/api/v1/facilities"

Getting a list of facilities from RIDB
lat/long  
or state as comma delimited list of 2 char state codes  
https://ridb.recreation.gov/docs  

Make sure you are making appropriate use of resources  
https://ridb.recreation.gov/ridb-access-agreement 

Lets take a look at RIDB facilities with camping near Mount Hood Oregon

In [None]:
# NOTE: Do not change these params, mock looks for lat/long/radius
params = {"activity_id":9, "latitude":45.4977712, "longitude":-121.8211673, "radius":15}
headers = {"accept": "application/json", "apikey": "key"}
response = RequestsMock.get(ridb_facilities_url, params, headers=headers)
camping_json  = json.loads(response.text)
camping_json

In [None]:
# Notice not all facilities are campgrounds
df_ridb_camping = pd.DataFrame(camping_json['RECDATA'])
df_ridb_camping.head(10)

In [None]:
# Hmm! Not all these facilities are campgrounds
# Mixture of casing
df_ridb_camping[['FacilityID', 'FacilityName']]

We have an idea of what we can get from the facilities endpoint:
* Facility Name & Facility ID
* Lat/Long
* Ada Accessability
* Description

Lets get more specific data on campsites at once of these facilities
Insert lost lake pic

In [None]:
# Get campsite data for each area, if no campsites then drop --- or should we?
# Consider how we would scale this out - 
# Lost Lake example

In [None]:
df_ridb_camping.query("FacilityName == 'LOST LAKE RESORT AND CAMPGROUND'")['FacilityID']

In [None]:
resp = RequestsMock.get(f"{ridb_facilities_url}/251434/campsites", headers=headers)
resp.status_code

In [None]:
campsites = json.loads(resp.text)
df_campsites = pd.DataFrame(campsites['RECDATA'])
df_campsites.head()

In [None]:
[entry for entry in df_campsites.iloc[0].ATTRIBUTES]

Do the campsite attributes have the information we are looking for? Near water, accessible...

In [None]:
ridb_attributes = set(itertools.chain(*df_campsites['ATTRIBUTES'].apply(lambda x: [entry['AttributeName'] for entry in x])))
ridb_attributes

near water: "Proximity to Water"   
pets allowed: "Pets Allowed"   
accessibility: "Accessibility" - boolean  
toilet?  - no info, but thats what the forest service sites have

In [None]:
# Note that boolean attributes are filled if they have a truthy value
attribute_name = 'proximity to water'
for campground in df_campsites['ATTRIBUTES']:
    for attribute in campground:
        if attribute['AttributeName'].lower() == attribute_name:
            print(attribute['AttributeValue'])

In [None]:
# Combine the campground attributes with the facility data for 1 large denormalized table to query
max_col_width()
df_combined = df_campsites[['FacilityID', 'CampsiteID', 'CampsiteName', 'ATTRIBUTES']].merge(df_ridb_camping, on='FacilityID', how='left')
df_combined.head()

In [None]:
def query(attributes, fields):
    found = 0
    for attribute in attributes:
        if attribute['AttributeName'] in fields: # and attribute['AttributeName'] is not None:
            found+=1
    if found == len(fields):
        return True
    return False

In [None]:
# refer to ridb_attributes for field options
fields = ['Accessibility', 'Proximity to Water']
df_combined['Match'] = df_combined['ATTRIBUTES'].apply(lambda x : query(x, fields))
df_combined.query("Match == True")

In [2]:
# putting it all together
ridb_facilities_url = "https://ridb.recreation.gov/api/v1/facilities"
params = {"activity_id":9, "state":"OR"}
headers = {"accept": "application/json", "apikey": "key"}


response = RequestsMock.get(ridb_facilities_url, params, headers=headers)
camping_json  = json.loads(response.text)

# Do we really need the campgrounds in a dataframe?
df_ridb_camping = pd.DataFrame(camping_json['RECDATA'])

campground_info = pd.DataFrame()
for facility in camping_json['RECDATA']:
    if facility.get('FacilityID') is not None:
        campground_url = f"{ridb_facilities_url}/{facility['FacilityID']}/campsites"
        resp = RequestsMock.get(campground_url, headers=headers)
        if resp.status_code != 200:
            continue
        
        campsites = json.loads(resp.text)
        if len(campsites['RECDATA']) > 0:
            df_campsites = pd.DataFrame(campsites['RECDATA'])
            campground_info = campground_info.append(df_campsites[['FacilityID', 'CampsiteID', 'CampsiteName', 'ATTRIBUTES']].merge(df_ridb_camping, on='FacilityID', how='left'))

In [None]:
campground_info['FacilityName'].unique()

At this point we have: 
* Ability to search for site characteristics
* Campground location and site name

Nice to have:
* Water availabilty
* Restroom access
* Current status - may be in facility description but not always

In [None]:
sc = Scraper("http://www.fs.usda.gov/recarea/mthood/recreation/camping-cabins/recarea/?recid=53228&actid=29", "Lost Lake")
sc.scrape()

In [3]:
nf_sites = []
with open('../data/NF_sites/OR_sitelist.csv') as f:
    reader = DictReader(f)
    for row in reader:
        nf_sites.append(row)
nf_sites

[{'site_name': 'East Lemolo Campground',
  'site_url': 'https://www.fs.usda.gov/recarea/umpqua/recarea/?recid=63492'},
 {'site_name': 'Magone Lake Campground',
  'site_url': 'https://www.fs.usda.gov/recarea/malheur/recarea/?recid=39964'},
 {'site_name': 'East Davis Lake Campground',
  'site_url': 'https://www.fs.usda.gov/recarea/deschutes/recarea/?recid=38854'},
 {'site_name': 'Lost Lake Campground Resort and Day Use Area',
  'site_url': 'https://www.fs.usda.gov/recarea/mthood/recarea/?recid=53228'},
 {'site_name': 'Anthony Lake',
  'site_url': 'https://www.fs.usda.gov/recarea/wallowa-whitman/recarea/?recid=52199'},
 {'site_name': 'Musick Guard Station',
  'site_url': 'https://www.fs.usda.gov/recarea/umpqua/recarea/?recid=63428'},
 {'site_name': 'Lost Lake Campground',
  'site_url': 'https://www.fs.usda.gov/recarea/willamette/recarea/?recid=13362'}]

In [4]:
nf_data = []
for site in nf_sites:
    sc = Scraper(site['site_url'], site['site_name'])
    nf_data.append(sc.scrape())
nf_df = pd.DataFrame(nf_data)
nf_df

Unnamed: 0,FacilityStatus,FacilityLatitude,FacilityLongitude,FacilityElevation,Conditions,Reservations,FacilityName,Water,Restroom,Open Season
0,Temporarily Closed,43.310697,-122.162651,"4,150 feet",10/28/2020: Closed for the season. Will reopen...,Reservations can be made at www.recreation.gov...,East Lemolo Campground,,,
1,Open,44.55266,-118.9094,5500,01/22/2021: The campground is is closed and th...,"To reserve the group site, visit www.recreatio...",Magone Lake Campground,Drinking Water,Vault Toilets,
2,Closed,43.5867,-121.85667,4400,,Reservations can be online through Recreation....,East Davis Lake Campground,Potable Water,Vault Toilet,
3,Closed,45.5008,-121.81641,3200,CLOSED FOR THE SEASON\n \n**Lost Lake is curre...,Reservations can be made by visiting Recreatio...,Lost Lake Campground Resort and Day Use Area,Drinking Water,Vault Toilet (18),
4,Closed,44.9625128531073,-118.228574730768,7150,Current Conditions,https://anthonylakes.com/campgrounds/,Anthony Lake,Potable Water,Vault Toilets,July - September
5,Temporarily Closed,43.581026,-122.641745,"5,000 feet",10/09/2020- This site is currently closed per ...,,Musick Guard Station,,,Early Summer
6,Temporarily Closed,44.42927714677809,-121.912474623539,4200 feet,,No advance reservations. All sites are first c...,Lost Lake Campground,,,- late-October (dependent on weather)


In [None]:

campground_info.shape

In [13]:
dm = distance_merge(nf_df, campground_info, 1500, 'ridb', 'nf')

  return _prepare_from_string(" ".join(pjargs))


In [14]:
dm.FacilityName_nf.unique()

array([nan, 'Magone Lake Campground', 'East Davis Lake Campground',
       'Anthony Lake', 'Musick Guard Station'], dtype=object)

In [22]:
# putting it all together
ridb_facilities_url = "https://ridb.recreation.gov/api/v1/facilities"
params = {"activity_id":9, "state":"OR"}
headers = {"accept": "application/json", "apikey": "key"}


response = RequestsMock.get(ridb_facilities_url, params, headers=headers)
camping_json  = json.loads(response.text)

# Do we really need the campgrounds in a dataframe?
df_ridb_camping = pd.DataFrame(camping_json['RECDATA'])

campground_info = pd.DataFrame()
for facility in camping_json['RECDATA']:
    if facility.get('FacilityID') is not None:
        campground_url = f"{ridb_facilities_url}/{facility['FacilityID']}/campsites"
        resp = RequestsMock.get(campground_url, headers=headers)
        if resp.status_code != 200:
            continue
        
        campsites = json.loads(resp.text)
        if len(campsites['RECDATA']) > 0:
            df_campsites = pd.DataFrame(campsites['RECDATA'])
            campground_info = campground_info.append(df_campsites[['FacilityID', 'CampsiteID', 'CampsiteName', 'ATTRIBUTES']].merge(df_ridb_camping, on='FacilityID', how='left'))
            
nf_data = []
with open('../data/NF_sites/OR_sitelist.csv') as f:
    reader = DictReader(f)
    for row in reader:
        sc = Scraper(row['site_url'], row['site_name'])
        nf_data.append(sc.scrape())
nf_df = pd.DataFrame(nf_data)
merged = distance_merge(nf_df, campground_info, 2000, 'ridb', 'nf')

  return _prepare_from_string(" ".join(pjargs))
  return _prepare_from_string(" ".join(pjargs))


In [23]:
merged

Unnamed: 0,FacilityID,CampsiteID,CampsiteName,ATTRIBUTES,LegacyFacilityID,OrgFacilityID,ParentOrgID,ParentRecAreaID,FacilityName_ridb,FacilityDescription,...,FacilityStatus,FacilityLatitude_nf,FacilityLongitude_nf,FacilityElevation,Conditions,Reservations,FacilityName_nf,Water,Restroom,Open Season
0,251894,98358,008,"[{'AttributeName': 'Location Rating', 'Attribu...",135642,AN435642,131,1112,EAST LEMOLO CAMPGROUND,<h2>Overview</h2>\nEast Lemolo is on the banks...,...,,,,,,,,,,
1,251894,98441,014,"[{'AttributeName': 'Picnic Table', 'AttributeV...",135642,AN435642,131,1112,EAST LEMOLO CAMPGROUND,<h2>Overview</h2>\nEast Lemolo is on the banks...,...,,,,,,,,,,
2,251894,98438,004,"[{'AttributeName': 'Picnic Table', 'AttributeV...",135642,AN435642,131,1112,EAST LEMOLO CAMPGROUND,<h2>Overview</h2>\nEast Lemolo is on the banks...,...,,,,,,,,,,
3,251894,98389,006,"[{'AttributeName': 'Picnic Table', 'AttributeV...",135642,AN435642,131,1112,EAST LEMOLO CAMPGROUND,<h2>Overview</h2>\nEast Lemolo is on the banks...,...,,,,,,,,,,
4,251894,98359,005,"[{'AttributeName': 'Placed on Map', 'Attribute...",135642,AN435642,131,1112,EAST LEMOLO CAMPGROUND,<h2>Overview</h2>\nEast Lemolo is on the banks...,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45,251434,96303,F001,"[{'AttributeName': 'Checkout Time', 'Attribute...",125541,AN425541,131,1106,LOST LAKE RESORT AND CAMPGROUND,<h2>Overview</h2>\n<p>Lost Lake Campground is ...,...,Closed,45.50080,-121.81641,3200,CLOSED FOR THE SEASON\n \n**Lost Lake is curre...,Reservations can be made by visiting Recreatio...,Lost Lake Campground Resort and Day Use Area,Drinking Water,Vault Toilet (18),
46,251434,96053,B011,"[{'AttributeName': 'Checkout Time', 'Attribute...",125541,AN425541,131,1106,LOST LAKE RESORT AND CAMPGROUND,<h2>Overview</h2>\n<p>Lost Lake Campground is ...,...,Closed,45.50080,-121.81641,3200,CLOSED FOR THE SEASON\n \n**Lost Lake is curre...,Reservations can be made by visiting Recreatio...,Lost Lake Campground Resort and Day Use Area,Drinking Water,Vault Toilet (18),
47,251434,96013,B002,"[{'AttributeName': 'Driveway Length', 'Attribu...",125541,AN425541,131,1106,LOST LAKE RESORT AND CAMPGROUND,<h2>Overview</h2>\n<p>Lost Lake Campground is ...,...,Closed,45.50080,-121.81641,3200,CLOSED FOR THE SEASON\n \n**Lost Lake is curre...,Reservations can be made by visiting Recreatio...,Lost Lake Campground Resort and Day Use Area,Drinking Water,Vault Toilet (18),
48,251434,96009,D004,"[{'AttributeName': 'Grills/Fire Ring', 'Attrib...",125541,AN425541,131,1106,LOST LAKE RESORT AND CAMPGROUND,<h2>Overview</h2>\n<p>Lost Lake Campground is ...,...,Closed,45.50080,-121.81641,3200,CLOSED FOR THE SEASON\n \n**Lost Lake is curre...,Reservations can be made by visiting Recreatio...,Lost Lake Campground Resort and Day Use Area,Drinking Water,Vault Toilet (18),
