## Analyzing Spreadsheet Data with Python

Spreadsheets are one of the most common tools for storing and working with tabular data. However, when the data grow large, navigating row and columns manually becomes inefficient and error-prone. Python provides powerful libraries, such as Pandas, that make it easier to load, manipulate, and analyze spreadsheet data programmatically.

In this project, I demonstrate how to:
- Load spreadsheet data (.xlsx or .xls files) into Python.
- Explore and manipulate datasets using Pandas.
- Perform basic analysis such as subsetting, filtering, and calculating statistics (mean, median, maximum, etc.).
- Write processed data back into a spreadsheet file.

### Import the required packages

We start by importing the necessary python packages:
- `os` for interacting with the os (e.g., file path)
- `pandas` for data manipulation and analysis

In [16]:
import os
import pandas as pd

### 1. Load the data 

Next, we load the facilities data from the Excel file into a Pandas `DataFrame`.  
The dataset is stored in `airport_data.xlsx`, and we specifically read the **"Facilities"** sheet.  

In [17]:
df_facilities = pd.read_excel("./airport_data.xlsx", sheet_name="Facilities")

### 2. Exploring Data

Once the data is loaded, the next step is to inspect the dataset. By displaying the DataFrame, we can quickly verify that the data is imported correctly and gain an initial sense of its structure:

In [9]:
df_facilities

Unnamed: 0,SiteNumber,Type,LocationID,EffectiveDate,Region,DistrictOffice,State,StateName,County,CountyState,...,AirportPositionSource,AirportPositionSourceDate,AirportElevationSource,AirportElevationSourceDate,ContractFuelAvailable,TransientStorage,OtherServices,WindIndicator,IcaoIdentifier,BeaconSchedule2
0,50009.*A,AIRPORT,'ADK,3/30/2017,AAL,NONE,AK,ALASKA,ALEUTIANS WEST,AK,...,3RD PARTY SURVEY,00:00:00,3RD PARTY SURVEY,00:00:00,,HGR,CARGO,Y,PADK,SS-SR
1,50016.1*A,AIRPORT,'AKK,3/30/2017,AAL,NONE,AK,ALASKA,KODIAK ISLAND,AK,...,NACO,00:00:00,NACO,00:00:00,,,,Y,PAKH,
2,50017.*A,AIRPORT,'Z13,3/30/2017,AAL,NONE,AK,ALASKA,BETHEL,AK,...,STATE,00:00:00,STATE,00:00:00,,,CARGO,Y-L,,SEE RMK
3,50017.1*C,SEAPLANE BASE,'KKI,3/30/2017,AAL,NONE,AK,ALASKA,BETHEL,AK,...,,,,,,,,N,,
4,50020.*A,AIRPORT,'AKI,3/30/2017,AAL,NONE,AK,ALASKA,BETHEL,AK,...,STATE,00:00:00,STATE,00:00:00,,,CARGO,Y-L,PFAK,SS-SR
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
746,50920.12*C,SEAPLANE BASE,'2Y3,3/30/2017,AAL,NONE,AK,ALASKA,SKAGWAY-YAKUTAT,AK,...,FAA-EST,00:00:00,,,,TIE,CARGO,,,
747,50920.*A,AIRPORT,'YAK,3/30/2017,AAL,NONE,AK,ALASKA,SKAGWAY-YAKUTAT,AK,...,3RD PARTY SURVEY,00:00:00,3RD PARTY SURVEY,00:00:00,,HGR,CARGO,Y-L,PAYA,SS-SR
748,50925.1*A,AIRPORT,'A77,3/30/2017,AAL,NONE,AK,ALASKA,KUSKOKWIM,AK,...,,,,,,,CARGO,,,
749,50928.*C,SEAPLANE BASE,'78K,3/30/2017,AAL,NONE,AK,ALASKA,KETCHIKAN GATEWAY,AK,...,,,,,,,CARGO,N,,


The preview shows us the first and the last five rows of the dataset, including columns such as **`SiteNumber`**, **`Type`**, **`StateName`**, etc. 
This helps confirm that the spreadsheet was read correctly and gives us starting point for further analysis. 

#### 2.1. Selecting
electing data from a DataFrame is straightforward. We can access one or more columns by specifying their names in square brackets, enclosed in quotes.

Selecting specific columns enables us to examine a specific attribute of the dataset in more detail. Here, we'll retrieve the `Type` column, which contains information about the types of aviation facilities. 

In [14]:
df_facilities["Type"]

0            AIRPORT
1            AIRPORT
2            AIRPORT
3      SEAPLANE BASE
4            AIRPORT
           ...      
746    SEAPLANE BASE
747          AIRPORT
748          AIRPORT
749    SEAPLANE BASE
750          AIRPORT
Name: Type, Length: 751, dtype: object

Exploring individual columns helps us understand the types of data we are working with.  
In this case, examining the `"Type"` column gives insight into the categorical values of aviation facilities, which will guide filtering, grouping, or aggregation steps later.  
We can apply the Pandas `unique()` method on the series to quickly retrieve all distinct values present in the column.

In [18]:
df_facilities["Type"].unique()

array(['AIRPORT', 'SEAPLANE BASE', 'HELIPORT'], dtype=object)

The preview reveals that the "Type" column has three distinct values.
Understanding these categories is important because it informs how we might filter, group, or summarize the data in later analysis steps.
For example, knowing the types helps us analyze patterns or differences between facility categories.

#### 2.2. Filtering
After selecting a column and examining its values, we can filter the data to focus on specific entries.
For example, we will filter the "Type" column to include only rows corresponding to "SEAPLANE BASE".

In [21]:
df_facilities["Type"] == "SEAPLANE BASE"

0      False
1      False
2      False
3       True
4      False
       ...  
746     True
747    False
748    False
749     True
750    False
Name: Type, Length: 751, dtype: bool