In [1]:
import pandas as pd

The first difference we'll note with Python and Pandas is that instead of selecting from a database, we'll be reading in individual files to query with Pandas. This is primarily a byproduct of _when_ we use tools like Pandas. Frequently, this _will not_ be against a database, as with SQL, but against structured or semi-structured data stored in JSON, CSV, parquet, or other formats. 

Here we're using `read_parquet` to pull in the corresponding parquet file for the Parks dataset.

In [2]:
parks_df = pd.read_parquet("../../data/nps/nps_public_data_parks.parquet")
parks_df.head()

Unnamed: 0,relevanceScore,designation,weatherInfo,addresses,operatingHours,entrancePasses,name,description,directionsUrl,fees,...,activities,url,longitude,id,images,directionsInfo,fullName,parkCode,latLong,latitude
0,1,National Memorial,http://forecast.weather.gov/MapClick.php?CityN...,"[{'type': 'Physical', 'line2': '', 'line1': '1...","[{'name': 'Hours of Operation', 'standardHours...",[],Federal Hall,"Here on Wall Street, George Washington took th...",http://www.nps.gov/feha/planyourvisit/directio...,[],...,"[{'name': 'Arts and Culture', 'id': '09DF0950-...",https://www.nps.gov/feha/index.htm,-74.010256,2337D255-2D32-4997-957A-D461EEA03AF8,[{'url': 'https://www.nps.gov/common/uploads/s...,The main entrance of Federal Hall is located a...,Federal Hall National Memorial,feha,"lat:40.70731192, long:-74.01025636",40.707312
1,1,National Historic Trail,"In winter, watch for ice on trails and sidewal...","[{'type': 'Physical', 'line2': '', 'line1': '6...","[{'name': 'Visitor Center Hours', 'standardHou...",[],Lewis & Clark,The Lewis and Clark National Historic Trail wi...,https://www.nps.gov/lecl/,[],...,"[{'name': 'Auto and ATV', 'id': '5F723BAD-7359...",https://www.nps.gov/lecl/index.htm,-95.924515,5D443C5F-19A0-4A06-9CE4-30534A3DD81A,[{'url': 'https://www.nps.gov/common/uploads/s...,Lewis & Clark National Historic Trail Headquar...,Lewis & Clark National Historic Trail,lecl,"lat:41.2646141052, long:-95.9245147705",41.264614
2,1,,"Summers are generally hot and humid, with dayt...","[{'type': 'Physical', 'line2': '', 'line1': '1...",[{'name': 'National Capital Parks-East Headqua...,[],National Capital Parks-East,Welcome to National Capital Parks-East. We inv...,http://www.nps.gov/nace/planyourvisit/directio...,[],...,"[{'name': 'Biking', 'id': '7CE6E935-F839-4FEC-...",https://www.nps.gov/nace/index.htm,-76.994,BA3C1A1D-AA6A-49EB-9237-0222CEEE2670,[{'url': 'https://www.nps.gov/common/uploads/s...,DC295 South to the Exit for I-694/I-395/Capito...,National Capital Parks-East,nace,"lat:38.8659, long:-76.994",38.8659
3,1,National Historical Park,"Be prepared for hot, humid weather. The histor...","[{'type': 'Physical', 'line2': '', 'line1': '1...","[{'name': 'Visitor Center', 'standardHours': {...",[{'description': 'Adams National Historical Pa...,Adams,From the sweet little farm at the foot of Penn...,http://www.nps.gov/adam/planyourvisit/directio...,[],...,"[{'name': 'Guided Tours', 'id': 'B33DC9B6-0B7D...",https://www.nps.gov/adam/index.htm,-71.011604,E4C7784E-66A0-4D44-87D0-3E072F5FEF43,[{'url': 'https://www.nps.gov/common/uploads/s...,"Traveling on U.S. Interstate 93, take exit 7 -...",Adams National Historical Park,adam,"lat:42.2553961, long:-71.01160356",42.255396
4,1,Memorial Parkway,Summers on the parkway are generally hot and h...,"[{'type': 'Physical', 'line2': '700 George Was...",[{'name': 'George Washington Memorial Parkway ...,[],George Washington,The George Washington Memorial Parkway was des...,http://www.nps.gov/gwmp/planyourvisit/directio...,[],...,"[{'name': 'Arts and Culture', 'id': '09DF0950-...",https://www.nps.gov/gwmp/index.htm,-77.1495,E6D5BB41-3251-469F-ABDA-7B43B966F0CF,[{'url': 'https://www.nps.gov/common/uploads/s...,Directions to Parkway Headquarters From the so...,George Washington Memorial Parkway,gwmp,"lat:38.9628, long:-77.1495",38.9628


Let's perform some similar operations to our DuckDB example— renaming a column and expanding the `STRUCT` column, `operatingHours`

In [3]:
from pprint import pprint

rename_dict = {"operatingHours": "operating_hours"}

# Note that rename requires casing and names to be precisely correct..
# It often won't throw errors if they are not, but it also won't rename the columns, so be sure to check.

parks_df.rename(columns=rename_dict, inplace=True)

pprint(list(parks_df.columns))

['relevanceScore',
 'designation',
 'weatherInfo',
 'addresses',
 'operating_hours',
 'entrancePasses',
 'name',
 'description',
 'directionsUrl',
 'fees',
 'topics',
 'states',
 'entranceFees',
 'contacts',
 'activities',
 'url',
 'longitude',
 'id',
 'images',
 'directionsInfo',
 'fullName',
 'parkCode',
 'latLong',
 'latitude']


In [4]:
# We can inspect the operating_hours column to better understand the data structure
# Let's look at a sample record

parks_df["operating_hours"].iloc[0]

# Note how we're "selecting" a column and then using the iloc method to get the first record.

array([{'name': 'Hours of Operation', 'standardHours': {'friday': '10:00AM - 5:00PM', 'sunday': 'Closed', 'thursday': '10:00AM - 5:00PM', 'tuesday': '10:00AM - 5:00PM', 'saturday': 'Closed', 'monday': '10:00AM - 5:00PM', 'wednesday': '10:00AM - 5:00PM'}, 'description': 'Federal Hall is Open.', 'exceptions': array([{'endDate': datetime.date(2025, 1, 15), 'name': 'Martin Luther King Jr. Day', 'startDate': datetime.date(2025, 1, 15), 'exceptionHours': {'friday': None, 'sunday': None, 'thursday': None, 'tuesday': None, 'saturday': None, 'monday': '10:00AM - 5:00PM', 'wednesday': None}},
              {'endDate': datetime.date(2024, 2, 19), 'name': "Washington's Birthday", 'startDate': datetime.date(2024, 2, 19), 'exceptionHours': {'friday': None, 'sunday': None, 'thursday': None, 'tuesday': None, 'saturday': None, 'monday': '10:00AM - 5:00PM', 'wednesday': None}},
              {'endDate': datetime.date(2024, 5, 27), 'name': 'Memorial Day', 'startDate': datetime.date(2024, 5, 27), 'excepti

This record is a _list_ of dictionaries... That means we can't just unpack the values since each row could have more than one value. DuckDB handled that for us, but Pandas won't! 

This will be a theme throughout the remainder of the course. Certain tools might be more effective than others. It's our job to figure out what makes sense.

In [5]:
# The explode() method will unpack our list into individual rows
parks_df_exploded = parks_df.explode("operating_hours")
parks_df_exploded.head()

Unnamed: 0,relevanceScore,designation,weatherInfo,addresses,operating_hours,entrancePasses,name,description,directionsUrl,fees,...,activities,url,longitude,id,images,directionsInfo,fullName,parkCode,latLong,latitude
0,1,National Memorial,http://forecast.weather.gov/MapClick.php?CityN...,"[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Hours of Operation', 'standardHours'...",[],Federal Hall,"Here on Wall Street, George Washington took th...",http://www.nps.gov/feha/planyourvisit/directio...,[],...,"[{'name': 'Arts and Culture', 'id': '09DF0950-...",https://www.nps.gov/feha/index.htm,-74.010256,2337D255-2D32-4997-957A-D461EEA03AF8,[{'url': 'https://www.nps.gov/common/uploads/s...,The main entrance of Federal Hall is located a...,Federal Hall National Memorial,feha,"lat:40.70731192, long:-74.01025636",40.707312
1,1,National Historic Trail,"In winter, watch for ice on trails and sidewal...","[{'type': 'Physical', 'line2': '', 'line1': '6...","{'name': 'Visitor Center Hours', 'standardHour...",[],Lewis & Clark,The Lewis and Clark National Historic Trail wi...,https://www.nps.gov/lecl/,[],...,"[{'name': 'Auto and ATV', 'id': '5F723BAD-7359...",https://www.nps.gov/lecl/index.htm,-95.924515,5D443C5F-19A0-4A06-9CE4-30534A3DD81A,[{'url': 'https://www.nps.gov/common/uploads/s...,Lewis & Clark National Historic Trail Headquar...,Lewis & Clark National Historic Trail,lecl,"lat:41.2646141052, long:-95.9245147705",41.264614
2,1,,"Summers are generally hot and humid, with dayt...","[{'type': 'Physical', 'line2': '', 'line1': '1...",{'name': 'National Capital Parks-East Headquar...,[],National Capital Parks-East,Welcome to National Capital Parks-East. We inv...,http://www.nps.gov/nace/planyourvisit/directio...,[],...,"[{'name': 'Biking', 'id': '7CE6E935-F839-4FEC-...",https://www.nps.gov/nace/index.htm,-76.994,BA3C1A1D-AA6A-49EB-9237-0222CEEE2670,[{'url': 'https://www.nps.gov/common/uploads/s...,DC295 South to the Exit for I-694/I-395/Capito...,National Capital Parks-East,nace,"lat:38.8659, long:-76.994",38.8659
3,1,National Historical Park,"Be prepared for hot, humid weather. The histor...","[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Visitor Center', 'standardHours': {'...",[{'description': 'Adams National Historical Pa...,Adams,From the sweet little farm at the foot of Penn...,http://www.nps.gov/adam/planyourvisit/directio...,[],...,"[{'name': 'Guided Tours', 'id': 'B33DC9B6-0B7D...",https://www.nps.gov/adam/index.htm,-71.011604,E4C7784E-66A0-4D44-87D0-3E072F5FEF43,[{'url': 'https://www.nps.gov/common/uploads/s...,"Traveling on U.S. Interstate 93, take exit 7 -...",Adams National Historical Park,adam,"lat:42.2553961, long:-71.01160356",42.255396
3,1,National Historical Park,"Be prepared for hot, humid weather. The histor...","[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Historic Homes', 'standardHours': {'...",[{'description': 'Adams National Historical Pa...,Adams,From the sweet little farm at the foot of Penn...,http://www.nps.gov/adam/planyourvisit/directio...,[],...,"[{'name': 'Guided Tours', 'id': 'B33DC9B6-0B7D...",https://www.nps.gov/adam/index.htm,-71.011604,E4C7784E-66A0-4D44-87D0-3E072F5FEF43,[{'url': 'https://www.nps.gov/common/uploads/s...,"Traveling on U.S. Interstate 93, take exit 7 -...",Adams National Historical Park,adam,"lat:42.2553961, long:-71.01160356",42.255396


From there, we can unnest the operating hours column with `pd.json_normalize`

In [6]:
park_operating_hours_df = pd.json_normalize(parks_df_exploded["operating_hours"])

park_operating_hours_df.rename(
    columns={"name": "category", "description": "operating_hours_description"},
    inplace=True,
)

park_operating_hours_df.head()

Unnamed: 0,category,operating_hours_description,exceptions,standardHours.friday,standardHours.sunday,standardHours.thursday,standardHours.tuesday,standardHours.saturday,standardHours.monday,standardHours.wednesday
0,Hours of Operation,Federal Hall is Open.,"[{'endDate': 2025-01-15, 'name': 'Martin Luthe...",10:00AM - 5:00PM,Closed,10:00AM - 5:00PM,10:00AM - 5:00PM,Closed,10:00AM - 5:00PM,10:00AM - 5:00PM
1,Visitor Center Hours,Lewis and Clark National Historic Trail Visito...,"[{'endDate': 2024-10-28, 'name': 'SUMMER HOURS...",8:30AM - 4:30PM,Closed,8:30AM - 4:30PM,8:30AM - 4:30PM,Closed,8:30AM - 4:30PM,8:30AM - 4:30PM
2,National Capital Parks-East Headquarters,National Capital Parks-East is an administrati...,"[{'endDate': 2024-02-19, 'name': 'President's ...",8:00AM - 4:00PM,Closed,8:00AM - 4:00PM,8:00AM - 4:00PM,Closed,8:00AM - 4:00PM,8:00AM - 4:00PM
3,Visitor Center,The Visitor Center is open Monday through Frid...,"[{'endDate': 2024-05-27, 'name': 'Memorial Day...",10:00AM - 4:00PM,Closed,10:00AM - 4:00PM,10:00AM - 4:00PM,Closed,10:00AM - 4:00PM,10:00AM - 4:00PM
4,Historic Homes,The historic homes are closed for the season. ...,"[{'endDate': 2024-06-19, 'name': 'Juneteenth N...",Closed,Closed,Closed,Closed,Closed,Closed,Closed


But now we have a separate dataframe! To join it back to our original df, we can use `pd.concat` with `axis=1`— that tells us to concatenate our dataframes by column.

We also have to `reset_index` for each of our dataframes to ensure a true join. Yes, this is confusing. Unfortunately, the only way to learn is through trial, error, [StackOverflow](https://stackoverflow.com/a/47657006), and the Pandas [documentation](https://pandas.pydata.org/docs/reference/api/pandas.concat.html).

In [7]:
# Because the dataframes' order are identical, we can simply join them
parks_with_hours_df = pd.concat(
    [
        parks_df_exploded.reset_index(drop=True),
        park_operating_hours_df.reset_index(drop=True),
    ],
    axis=1,
)

parks_with_hours_df.head()

Unnamed: 0,relevanceScore,designation,weatherInfo,addresses,operating_hours,entrancePasses,name,description,directionsUrl,fees,...,category,operating_hours_description,exceptions,standardHours.friday,standardHours.sunday,standardHours.thursday,standardHours.tuesday,standardHours.saturday,standardHours.monday,standardHours.wednesday
0,1,National Memorial,http://forecast.weather.gov/MapClick.php?CityN...,"[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Hours of Operation', 'standardHours'...",[],Federal Hall,"Here on Wall Street, George Washington took th...",http://www.nps.gov/feha/planyourvisit/directio...,[],...,Hours of Operation,Federal Hall is Open.,"[{'endDate': 2025-01-15, 'name': 'Martin Luthe...",10:00AM - 5:00PM,Closed,10:00AM - 5:00PM,10:00AM - 5:00PM,Closed,10:00AM - 5:00PM,10:00AM - 5:00PM
1,1,National Historic Trail,"In winter, watch for ice on trails and sidewal...","[{'type': 'Physical', 'line2': '', 'line1': '6...","{'name': 'Visitor Center Hours', 'standardHour...",[],Lewis & Clark,The Lewis and Clark National Historic Trail wi...,https://www.nps.gov/lecl/,[],...,Visitor Center Hours,Lewis and Clark National Historic Trail Visito...,"[{'endDate': 2024-10-28, 'name': 'SUMMER HOURS...",8:30AM - 4:30PM,Closed,8:30AM - 4:30PM,8:30AM - 4:30PM,Closed,8:30AM - 4:30PM,8:30AM - 4:30PM
2,1,,"Summers are generally hot and humid, with dayt...","[{'type': 'Physical', 'line2': '', 'line1': '1...",{'name': 'National Capital Parks-East Headquar...,[],National Capital Parks-East,Welcome to National Capital Parks-East. We inv...,http://www.nps.gov/nace/planyourvisit/directio...,[],...,National Capital Parks-East Headquarters,National Capital Parks-East is an administrati...,"[{'endDate': 2024-02-19, 'name': 'President's ...",8:00AM - 4:00PM,Closed,8:00AM - 4:00PM,8:00AM - 4:00PM,Closed,8:00AM - 4:00PM,8:00AM - 4:00PM
3,1,National Historical Park,"Be prepared for hot, humid weather. The histor...","[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Visitor Center', 'standardHours': {'...",[{'description': 'Adams National Historical Pa...,Adams,From the sweet little farm at the foot of Penn...,http://www.nps.gov/adam/planyourvisit/directio...,[],...,Visitor Center,The Visitor Center is open Monday through Frid...,"[{'endDate': 2024-05-27, 'name': 'Memorial Day...",10:00AM - 4:00PM,Closed,10:00AM - 4:00PM,10:00AM - 4:00PM,Closed,10:00AM - 4:00PM,10:00AM - 4:00PM
4,1,National Historical Park,"Be prepared for hot, humid weather. The histor...","[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Historic Homes', 'standardHours': {'...",[{'description': 'Adams National Historical Pa...,Adams,From the sweet little farm at the foot of Penn...,http://www.nps.gov/adam/planyourvisit/directio...,[],...,Historic Homes,The historic homes are closed for the season. ...,"[{'endDate': 2024-06-19, 'name': 'Juneteenth N...",Closed,Closed,Closed,Closed,Closed,Closed,Closed


We can then perform filters, as in SQL

In [8]:
parks_with_hours_df[parks_with_hours_df["category"] == "Hours of Operation"]

Unnamed: 0,relevanceScore,designation,weatherInfo,addresses,operating_hours,entrancePasses,name,description,directionsUrl,fees,...,category,operating_hours_description,exceptions,standardHours.friday,standardHours.sunday,standardHours.thursday,standardHours.tuesday,standardHours.saturday,standardHours.monday,standardHours.wednesday
0,1,National Memorial,http://forecast.weather.gov/MapClick.php?CityN...,"[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Hours of Operation', 'standardHours'...",[],Federal Hall,"Here on Wall Street, George Washington took th...",http://www.nps.gov/feha/planyourvisit/directio...,[],...,Hours of Operation,Federal Hall is Open.,"[{'endDate': 2025-01-15, 'name': 'Martin Luthe...",10:00AM - 5:00PM,Closed,10:00AM - 5:00PM,10:00AM - 5:00PM,Closed,10:00AM - 5:00PM,10:00AM - 5:00PM
417,1,National Historic Site,Spring: Temperature range from 30F-70F. Mostly...,"[{'type': 'Physical', 'line2': '', 'line1': '2...","{'name': 'Hours of Operation', 'standardHours'...",[],Theodore Roosevelt Birthplace,This is the boyhood home of the first U.S. pre...,http://www.nps.gov/thrb/planyourvisit/directio...,[],...,Hours of Operation,Theodore Roosevelt Birthplace National Histori...,[],10:00AM - 4:00PM,10:00AM - 4:00PM,10:00AM - 4:00PM,Closed,10:00AM - 4:00PM,Closed,10:00AM - 4:00PM
562,1,National Historical Park,"Located above the heat of the low desert, Tuma...","[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Hours of Operation', 'standardHours'...",[{'description': 'The annual pass at Tumacácor...,Tumacácori,Tumacácori sits at a cultural crossroads in th...,http://www.nps.gov/tuma/planyourvisit/directio...,[],...,Hours of Operation,These hours apply to the visitor center and mi...,"[{'endDate': 2024-11-28, 'name': 'Closed Thank...",9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM
612,1,National Memorial,Weather on the Outer Banks varies seasonally a...,"[{'type': 'Physical', 'line2': '', 'line1': '1...","{'name': 'Hours of Operation', 'standardHours'...",[],Wright Brothers,"Wind, sand, and a dream of flight brought Wilb...",http://www.nps.gov/wrbr/planyourvisit/directio...,[],...,Hours of Operation,These are the times the park grounds and visit...,"[{'endDate': 2024-12-25, 'name': 'Christmas Da...",9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM,9:00AM - 5:00PM


Note the filter syntax:

```
df[
    df[column] [LOGICAL MODIFIER] [VALUE]
]
```

we're taking the dataframe and applying a _mask_ where the column satisfies a certain condition. This is fundamentally different from filtering in SQL and can take some getting used to.

For multiple filters, we can use the logical operators `& |` and parentheses for grouping, for example:

```
df[
    (CONDITION 1 & CONDITION 2) | CONDITION 3
]
```

In [20]:
# Let's get the hours of operation for Theodore Roosevelt National Park based on the description

parks_with_hours_df[
    (parks_with_hours_df["category"] == "Hours of Operation")
    & (
        parks_with_hours_df["operating_hours_description"].str.contains(
            "Theodore Roosevelt"
        )
    )
]

# Note the multiline formatting— this can directly improve readability

Unnamed: 0,relevanceScore,designation,weatherInfo,addresses,operating_hours,entrancePasses,name,description,directionsUrl,fees,...,category,operating_hours_description,exceptions,standardHours.friday,standardHours.sunday,standardHours.thursday,standardHours.tuesday,standardHours.saturday,standardHours.monday,standardHours.wednesday
417,1,National Historic Site,Spring: Temperature range from 30F-70F. Mostly...,"[{'type': 'Physical', 'line2': '', 'line1': '2...","{'name': 'Hours of Operation', 'standardHours'...",[],Theodore Roosevelt Birthplace,This is the boyhood home of the first U.S. pre...,http://www.nps.gov/thrb/planyourvisit/directio...,[],...,Hours of Operation,Theodore Roosevelt Birthplace National Histori...,[],10:00AM - 4:00PM,10:00AM - 4:00PM,10:00AM - 4:00PM,Closed,10:00AM - 4:00PM,Closed,10:00AM - 4:00PM


Distinct values can be accessed with the `unique()` method

In [18]:
pprint(list(parks_with_hours_df["standardHours.thursday"].unique())[:5])

['10:00AM - 5:00PM',
 '8:30AM - 4:30PM',
 '8:00AM - 4:00PM',
 '10:00AM - 4:00PM',
 'Closed']


The Pandas equivalent of `CASE WHEN` is accessed through the numpy library `.where` function. If follows a similar pattern. 

We'll set a condition to select for— `parks_with_hours_df['standardHours.monday'] == 'unknown'`, if true, we'll return `Closed`. If not, we'll return the existing value.

In [21]:
import numpy as np

# CASE monday WHEN 'unknown' THEN 'Closed' ELSE monday END as monday_hours,

parks_with_hours_df["monday_hours"] = np.where(
    parks_with_hours_df["standardHours.monday"] == "unknown",
    "Closed",
    parks_with_hours_df["standardHours.monday"],
)
parks_with_hours_df["tuesday_hours"] = np.where(
    parks_with_hours_df["standardHours.tuesday"] == "unknown",
    "Closed",
    parks_with_hours_df["standardHours.tuesday"],
)
parks_with_hours_df["wednesday_hours"] = np.where(
    parks_with_hours_df["standardHours.wednesday"] == "unknown",
    "Closed",
    parks_with_hours_df["standardHours.wednesday"],
)
parks_with_hours_df["thursday_hours"] = np.where(
    parks_with_hours_df["standardHours.thursday"] == "unknown",
    "Closed",
    parks_with_hours_df["standardHours.thursday"],
)
parks_with_hours_df["friday_hours"] = np.where(
    parks_with_hours_df["standardHours.friday"] == "unknown",
    "Closed",
    parks_with_hours_df["standardHours.friday"],
)
parks_with_hours_df["saturday_hours"] = np.where(
    parks_with_hours_df["standardHours.saturday"] == "unknown",
    "Closed",
    parks_with_hours_df["standardHours.saturday"],
)
parks_with_hours_df["sunday_hours"] = np.where(
    parks_with_hours_df["standardHours.sunday"] == "unknown",
    "Closed",
    parks_with_hours_df["standardHours.sunday"],
)


parks_with_hours_df["open_seven_days_a_week"] = np.where(
    (
        (parks_with_hours_df["monday_hours"] != "Closed")
        & (parks_with_hours_df["tuesday_hours"] != "Closed")
        & (parks_with_hours_df["wednesday_hours"] != "Closed")
        & (parks_with_hours_df["thursday_hours"] != "Closed")
        & (parks_with_hours_df["friday_hours"] != "Closed")
        & (parks_with_hours_df["saturday_hours"] != "Closed")
        & (parks_with_hours_df["sunday_hours"] != "Closed")
    ),
    True,
    False,
)

cols_to_select = [
    "fullName",
    "open_seven_days_a_week",
    "monday_hours",
    "tuesday_hours",
    "wednesday_hours",
    "thursday_hours",
    "friday_hours",
    "saturday_hours",
    "sunday_hours",
]

parks_with_hours_df[cols_to_select].head()

Unnamed: 0,fullName,open_seven_days_a_week,monday_hours,tuesday_hours,wednesday_hours,thursday_hours,friday_hours,saturday_hours,sunday_hours
0,Federal Hall National Memorial,False,10:00AM - 5:00PM,10:00AM - 5:00PM,10:00AM - 5:00PM,10:00AM - 5:00PM,10:00AM - 5:00PM,Closed,Closed
1,Lewis & Clark National Historic Trail,False,8:30AM - 4:30PM,8:30AM - 4:30PM,8:30AM - 4:30PM,8:30AM - 4:30PM,8:30AM - 4:30PM,Closed,Closed
2,National Capital Parks-East,False,8:00AM - 4:00PM,8:00AM - 4:00PM,8:00AM - 4:00PM,8:00AM - 4:00PM,8:00AM - 4:00PM,Closed,Closed
3,Adams National Historical Park,False,10:00AM - 4:00PM,10:00AM - 4:00PM,10:00AM - 4:00PM,10:00AM - 4:00PM,10:00AM - 4:00PM,Closed,Closed
4,Adams National Historical Park,False,Closed,Closed,Closed,Closed,Closed,Closed,Closed


Now you might be saying "gee Matt, that's a lot of code," and you'd be right! But that's one of the core lessons of this course— some things take lots of work in Python that are easy in SQL... but the opposite is true, too!

Now we can filter our dataframe to create a new one:

In [22]:
open_seven_days_df = parks_with_hours_df[parks_with_hours_df["open_seven_days_a_week"]]

open_seven_days_df[cols_to_select].head()

Unnamed: 0,fullName,open_seven_days_a_week,monday_hours,tuesday_hours,wednesday_hours,thursday_hours,friday_hours,saturday_hours,sunday_hours
6,George Washington Memorial Parkway,True,All Day,All Day,All Day,All Day,All Day,All Day,All Day
8,Eleanor Roosevelt National Historic Site,True,7:00AM - 5:00PM,7:00AM - 5:00PM,7:00AM - 5:00PM,7:00AM - 5:00PM,7:00AM - 5:00PM,7:00AM - 5:00PM,7:00AM - 5:00PM
10,Morristown National Historical Park,True,8:00AM - 5:00PM,8:00AM - 5:00PM,8:00AM - 5:00PM,8:00AM - 5:00PM,8:00AM - 5:00PM,8:00AM - 5:00PM,8:00AM - 5:00PM
14,Cedar Breaks National Monument,True,All Day,All Day,All Day,All Day,All Day,All Day,All Day
15,Devils Postpile National Monument,True,All Day,All Day,All Day,All Day,All Day,All Day,All Day


& save it off to a new file

In [23]:
open_seven_days_df.to_parquet("../../data/pandas/open_seven_days_df.parquet")