# National Park Vacation Route Tool

## Data Source

Data was obtained from the National Park Service (https://www.nps.gov/index.htm) using their API (https://www.nps.gov/subjects/developer/api-documentation.htm).

## Data Preparation

### Import Libraries

In [12]:
# JSON requests
import requests
import json

# Data manipulation
import numpy as np
import pandas as pd

# Displaying plots and maps
import seaborn as sns
import matplotlib.pyplot as plt
import folium as fl

# Timing the algorithm
import time

# Display all fields
pd.set_option('display.max_columns', None)

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

### Import Raw Data from NPS API (external script)

In [None]:
# Inspect full JSON output of the first park to determine fields we need
api_key = "fpyJ9NycrgZX5mK8f0n90c4qXGPcYAsBPwt4BLJk"
url = "https://developer.nps.gov/api/v1/parks"

params = {
    "limit": 1,
    "start": 0,
    "api_key": api_key
}

response = requests.get(url, params=params)
data = response.json()

# Print the full JSON of the first park
print(json.dumps(data["data"][0], indent=2))

{
  "id": "77E0D7F0-1942-494A-ACE2-9004D2BDC59E",
  "url": "https://www.nps.gov/abli/index.htm",
  "fullName": "Abraham Lincoln Birthplace National Historical Park",
  "parkCode": "abli",
  "description": "For over a century people from around the world have come to rural Central Kentucky to honor the humble beginnings of our 16th president, Abraham Lincoln. His early life on Kentucky's frontier shaped his character and prepared him to lead the nation through Civil War. Visit our country's first memorial to Lincoln, built with donations from young and old, and the site of his childhood home.",
  "latitude": "37.5858662",
  "longitude": "-85.67330523",
  "latLong": "lat:37.5858662, long:-85.67330523",
  "activities": [
    {
      "id": "13A57703-BB1A-41A2-94B8-53B692EB7238",
      "name": "Astronomy"
    },
    {
      "id": "D37A0003-8317-4F04-8FB0-4CF0A272E195",
      "name": "Stargazing"
    },
    {
      "id": "1DFACD97-1B9C-4F5A-80F2-05593604799E",
      "name": "Food"
    },
   

In [10]:
# (Moved to fetch_nps_data.py)
"""
# Available fields for each park

id
url
*fullName (used as name)
parkCode
name (short name)
*description
*designation
*latitude
*longitude
latLong (combined "lat:..., long:...")
*activities (array of objects with id and name)
*amenities
topics
*states
contacts (e.g., phone, email)
entranceFees (array)
entrancePasses
fees
directionsInfo
directionsUrl
operatingHours
addresses
images (photos array)
weatherInfo


API_KEY = "fpyJ9NycrgZX5mK8f0n90c4qXGPcYAsBPwt4BLJk"
url = "https://developer.nps.gov/api/v1/parks"

def fetch_all_parks(api_key):
    all_parks = []
    start = 0
    limit = 50

    while True:
        params = {
            "limit": limit,
            "start": start,
            "api_key": api_key
        }

        response = requests.get(url, params=params)
        response.raise_for_status()  # raise error for bad response
        data = response.json().get("data", [])

        if not data:
            break  # no more data

        all_parks.extend(data)
        start += limit  # go to next page

    return all_parks

# Fetch data
parks_raw = fetch_all_parks(API_KEY)

# Convert to DataFrame
records = []
for park in parks_raw:
    activity_list = park.get('activities', [])
    activity_names = [a.get('name', '') for a in activity_list]
    
    records.append({
        'name': park.get('fullName', ''),
        'latitude': park.get('latitude', ''),
        'longitude': park.get('longitude', ''),
        'designation': park.get('designation', ''),
        'states': park.get('states', ''),
        'description': park.get('description', ''),
        'activities': ', '.join(activity_names)
    })

parks = pd.DataFrame(records)

# Save as CSV
parks.to_csv("../data/nps_parks_with_activities.csv", index=False)
"""
# Run from fetch_nps_data.py
%run ../scripts/fetch_nps_data.py

Raw data for yell: [[{'id': '4E4D076A-6866-46C8-A28B-A129E2B8F3DB', 'name': 'Accessible Rooms', 'parks': [{'states': 'ID,MT,WY', 'designation': 'National Park', 'parkCode': 'yell', 'fullName': 'Yellowstone National Park', 'places': [{'title': 'Canyon Lodge and Cabins', 'id': '5D5EEEEF-ACF4-412C-9C99-9F58079D82C4', 'url': 'https://www.nps.gov/places/000/canyon-lodge-and-cabins.htm'}, {'title': 'Grant Village Lodge', 'id': '3C6780E8-F30F-4A64-B69D-BEFD5F187382', 'url': 'https://www.nps.gov/places/000/grant-village-lodge.htm'}, {'title': 'Lake Hotel and Cabins', 'id': '7D8E2400-AD59-4563-B408-59A5AA43CE0A', 'url': 'https://www.nps.gov/places/000/lake-hotel-and-cabins.htm'}, {'title': 'Lake Lodge Cabins', 'id': '285B61EB-0295-4808-9F07-17FEBA2D47CC', 'url': 'https://www.nps.gov/places/000/lake-lodge-cabins.htm'}, {'title': 'Mammoth Hot Springs Hotel and Cabins', 'id': '055C4DCE-97C1-45C5-A999-7BF4A393AA3B', 'url': 'https://www.nps.gov/places/000/mammoth-hot-springs-hotel-and-cabins.htm'}, 

### Import Generated CSV to Dataframe

In [19]:
parks = pd.read_csv('../data/nps_parks_with_activities.csv')

### Inspect Data

In [None]:
# Print dataframe head
print('First five rows of dataframe')
display(parks.head())
print()
    
# Print dataframe sample
print('Random five rows of dataframe')
display(parks.sample(5))
print()
    
# Check for missing values
print('Check for Missing Values')
print(parks.isna().sum())
print()

# Check data types
print('Check Data Types')
print(parks.info())
print()

# Check values for each column
print('Describe Dataframe')
print(parks.describe(include = 'all'))
print()
    
# Check for duplicates
print('Count of Duplicated Rows')
print(parks.duplicated().sum())
print()

# Check for whitespace in strings
cols = ['name', 'designation', 'states', 'description', 'activities']   # string columns
for col in cols:
    if col in parks.columns:
        # Convert to string just in case, then check
        has_ws = parks[col].astype(str).apply(lambda x: x != x.strip())
        count = has_ws.sum()
        if count > 0:
            print(f"Column '{col}' has {count} rows with leading/trailing whitespace")
        else: print("There is no leading/trailing whitespace")    
        print()

# Number of unique activities (important for web interface)
print('Unique Activities')
print(parks['activities'].nunique())

First five rows of dataframe


Unnamed: 0,name,latitude,longitude,designation,states,description,activities,amenities
0,Abraham Lincoln Birthplace National Historical...,37.585866,-85.673305,National Historical Park,KY,For over a century people from around the worl...,"Astronomy, Stargazing, Food, Picnicking, Guide...",Accessible Rooms
1,Acadia National Park,44.409286,-68.247501,National Park,ME,Acadia National Park protects the natural beau...,"Arts and Culture, Cultural Demonstrations, Ast...",Accessible Rooms
2,Adams National Historical Park,42.255396,-71.011604,National Historical Park,MA,From the sweet little farm at the foot of Penn...,"Guided Tours, Self-Guided Tours - Walking, Liv...",Historical/Interpretive Information/Exhibits
3,African American Civil War Memorial,38.9166,-77.026,,DC,"Over 200,000 African-American soldiers and sai...","Guided Tours, Self-Guided Tours - Walking",Historical/Interpretive Information/Exhibits
4,African Burial Ground National Monument,40.714527,-74.004474,National Monument,NY,The African Burial Ground is the oldest and la...,"Arts and Culture, Guided Tours, Junior Ranger ...",



Random five rows of dataframe


Unnamed: 0,name,latitude,longitude,designation,states,description,activities,amenities
249,Katmai National Park & Preserve,58.622357,-155.012657,National Park & Preserve,AK,"A landscape is alive underneath our feet, fill...","Boating, Camping, Backcountry Camping, Canoe o...",Amphitheater
355,Pipestone National Monument,44.01352,-96.324755,National Monument,MN,"For over 3,000 years, Indigenous people have q...","Arts and Culture, Craft Demonstrations, Cultur...",
322,New River Gorge National Park & Preserve,37.868786,-80.99956,National Park & Preserve,WV,"A rugged, whitewater river flowing northward t...","Arts and Culture, Theater, Auto and ATV, Sceni...",
257,Knife River Indian Villages National Historic ...,47.354022,-101.386053,National Historic Site,ND,"Earthlodge people hunted bison and other game,...","Arts and Culture, Craft Demonstrations, Cultur...",Accessible Rooms
338,Overmountain Victory National Historic Trail,35.14044,-81.377,National Historic Trail,"NC,SC,TN,VA",Stretching 330-miles through four states (Virg...,"Auto and ATV, Scenic Driving, Biking, Mountain...",



Check for Missing Values
name             0
latitude         1
longitude        1
designation     35
states           0
description      0
activities      10
amenities      243
dtype: int64

Check Data Types
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 474 entries, 0 to 473
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         474 non-null    object 
 1   latitude     473 non-null    float64
 2   longitude    473 non-null    float64
 3   designation  439 non-null    object 
 4   states       474 non-null    object 
 5   description  474 non-null    object 
 6   activities   464 non-null    object 
 7   amenities    231 non-null    object 
dtypes: float64(2), object(6)
memory usage: 29.8+ KB
None

Describe Dataframe
                                                     name    latitude  \
count                                                 474  473.000000   
unique                                     

**Observations:**

- There are a few missing values - if we want to include those values on our map, these should be removed. These are latitude, longitude, designation, and activities
- Datatypes are all correct
- There is no leading or trailing whitespace to strip
- Activities between parks have a lot of overlap (431 unique vs. 474 parks!) and need to be streamlined/categorized
- The instructions want a maximum of 9 geographical locations, so we will have to trim a subset of this data to use
- We also have to remove National Parks not accessible by roads or extremely far away (AK, HI, PR, and any other island parks)

### Clean Data
- Remove rows with missing coordinates
- Remove undesired parks (we have 474, probably way too many)
- Remove parks too far for a roadtrip or not accessible by land (AK, HI, PR, and any other island parks)


In [5]:
# Drop rows with missing coordinates
parks = parks.dropna(subset=['latitude', 'longitude'])

### Map of All National Parks in the US

In [None]:
fig = fl.Figure(width=1200, height=800)
m = Map(location=[39.8283, -98.5795], zoom_start=4)
fig.add_child(m)

for _, row in parks.iterrows():
    fl.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['name']
    ).add_to(m)

fig