### Notebook for ODSC blog post "Introduction to Object Oriented Data Science in Python"
To run this notebook you will need:  
python 3.4.3  
pandas 0.18.1  
requests 2.9.0  
json 2.0.9  
numpy 1.11.0  
  
To check library versions run the cell below

Let me know if you have any questions. Happy Pythoning! - sev@thedatascout.com  

In [31]:
import pandas
import requests
import json
import numpy

print(pandas.__version__)
print(requests.__version__)
print(json.__version__)
print(numpy.__version__)

0.18.1
2.9.0
2.0.9
1.11.0


Creating the RidbData object

In [42]:
import pandas as pd 
import requests
import json
from pandas.io.json import json_normalize
import config
import numpy as np

class RidbData():

   def __init__(self, name, endpoint, url_params):
      self.df = pd.DataFrame()
      self.endpoint = endpoint
      self.url_params = url_params
      self.name = name

   def clean(self) :
      # by replacing '' with np.NaN we can use dropna to remove rows missing required data, like lat/longs
      self.df = self.df.replace('', np.nan)
      self.df = self.df.dropna(subset=['FacilityLatitude','FacilityLongitude'])

   def extract(self):
      request_url = self.endpoint
      response = requests.get(url=self.endpoint,params=self.url_params)
      data = json.loads(response.text)
      self.df = json_normalize(data['RECDATA'])

Create an instance of RidbData to connect to the facilities endpoint. <br>
You can get a RIDB API key here: https://ridb.recreation.gov/?action=register

In [43]:
ridb = RidbData('ridb', 'https://ridb.recreation.gov/api/v1/facilities', dict(apiKey= config.API_KEY))

Running the extract method, we can observe the 'df' attribute with the fetched data

In [44]:
ridb.extract()

In [45]:
ridb.df.head()

Unnamed: 0,FacilityAdaAccess,FacilityDescription,FacilityDirections,FacilityEmail,FacilityID,FacilityLatitude,FacilityLongitude,FacilityMapURL,FacilityName,FacilityPhone,FacilityReservationURL,FacilityTypeDescription,FacilityUseFeeDescription,GEOJSON.COORDINATES,GEOJSON.TYPE,Keywords,LastUpdatedDate,LegacyFacilityID,OrgFacilityID,StayLimit
0,True,"Like the other Presidential Libraries, the Geo...","See the map at <a href=""http://bushlibrary.tam...",Library.Bush@nara.gov,200001,30.612222,-96.331389,http://bushlibrary.tamu.edu/map.html,George Bush Presidential Library and Museum,979-691-4000,,Library,,"[-96.331389, 30.612222]",Point,,2007-02-26,,,
1,True,"The National Archives Building in Washington, ...",The National Archives Building is located betw...,,200002,38.892778,-77.023056,http://www.archives.gov/national_archives_expe...,National Archives Building,(866) 272-6272,,Archives,,"[-77.023056, 38.892778]",Point,,2016-03-21,,,
2,True,The National Archives at College Park opened f...,From I-495 (The Capital Beltway) take exit 28B...,,200003,38.9975,-76.925556,http://www.archives.gov/facilities/md/images/m...,National Archives at College Park,1-866-272-6272,,Archives,,"[-76.925556, 38.9975]",Point,,2007-02-26,,,
3,True,"Located in Atlanta, Georgia, the Jimmy Carter ...",The Jimmy Carter Library and Museum is located...,carter.library@nara.gov,200004,33.7675,-84.3553,http://www.jimmycarterlibrary.gov/images/map_a...,Jimmy Carter Presidential Library and Museum,(404) 865-7100,,Library,,"[-84.3553, 33.7675]",Point,,2007-02-26,,,
4,True,The Eisenhower Presidential Library is a natio...,Abilene is located on I-70 approximately 150 m...,eisenhower.library@nara.gov,200005,38.943889,-97.219167,,Dwight D. Eisenhower Presidential Library and ...,(785) 263-6700,,Library,,"[-97.219167, 38.943889]",Point,,2007-02-26,,,


In [46]:
ridb.df.shape

(50, 20)

Next, we will remove any entries that dont have a lat/long and clean up empty strings with np.NAN

In [47]:
ridb.clean()

Compare the 'FacilityReservationURL' field from above with the cleaned up column below. You'll see 'NaN' after the DataFrame has been cleaned

In [48]:
ridb.df.head()

Unnamed: 0,FacilityAdaAccess,FacilityDescription,FacilityDirections,FacilityEmail,FacilityID,FacilityLatitude,FacilityLongitude,FacilityMapURL,FacilityName,FacilityPhone,FacilityReservationURL,FacilityTypeDescription,FacilityUseFeeDescription,GEOJSON.COORDINATES,GEOJSON.TYPE,Keywords,LastUpdatedDate,LegacyFacilityID,OrgFacilityID,StayLimit
0,True,"Like the other Presidential Libraries, the Geo...","See the map at <a href=""http://bushlibrary.tam...",Library.Bush@nara.gov,200001,30.612222,-96.331389,http://bushlibrary.tamu.edu/map.html,George Bush Presidential Library and Museum,979-691-4000,,Library,,"[-96.331389, 30.612222]",Point,,2007-02-26,,,
1,True,"The National Archives Building in Washington, ...",The National Archives Building is located betw...,,200002,38.892778,-77.023056,http://www.archives.gov/national_archives_expe...,National Archives Building,(866) 272-6272,,Archives,,"[-77.023056, 38.892778]",Point,,2016-03-21,,,
2,True,The National Archives at College Park opened f...,From I-495 (The Capital Beltway) take exit 28B...,,200003,38.9975,-76.925556,http://www.archives.gov/facilities/md/images/m...,National Archives at College Park,1-866-272-6272,,Archives,,"[-76.925556, 38.9975]",Point,,2007-02-26,,,
3,True,"Located in Atlanta, Georgia, the Jimmy Carter ...",The Jimmy Carter Library and Museum is located...,carter.library@nara.gov,200004,33.7675,-84.3553,http://www.jimmycarterlibrary.gov/images/map_a...,Jimmy Carter Presidential Library and Museum,(404) 865-7100,,Library,,"[-84.3553, 33.7675]",Point,,2007-02-26,,,
4,True,The Eisenhower Presidential Library is a natio...,Abilene is located on I-70 approximately 150 m...,eisenhower.library@nara.gov,200005,38.943889,-97.219167,,Dwight D. Eisenhower Presidential Library and ...,(785) 263-6700,,Library,,"[-97.219167, 38.943889]",Point,,2007-02-26,,,


Check the DataFrame shape after running clean to see if any entries were removed due to dropping cells with missing lat/longs. 

In [49]:
ridb.df.shape

(50, 20)