### <font color='#0000FF'>**Table of Contents**<font><a class="anchor"  id="0"></a>
1. [Introduction](#1)
2. [Loading Packages](#2)
3. [Import Data](#3)
4. [Dataset Overview](#4)
5. [Data Exploration](#5)  
    5.1. [Number of Passengers](#6)  
    5.2. [Number of Flights](#7)  
    5.3. [Revenue Passenger Miles (RPM)](#8)  
    5.3. [Available Seat Miles (ASM)](#9)  
    5.3. [Load Factor](#10)  
9. [Final Note](#11)

# **Introduction**<a class="anchor"  id="1"></a>  [↑](#0)

This notebook provides a quick exploration of U.S. air traffic data using the `dygraphs` package. `Dygraphs` is an easy-to-use package that quickly creates interactive charts for time series data. It also offers flexibility, allowing customization of the graph by adding plugins.
<div class="alert alert-block alert-success">  
  
**References:**  
* [dygraphs for R](https://rstudio.github.io/dygraphs/index.html)
* [Time series visualization with the dygraphs package](https://r-graph-gallery.com/317-time-series-with-the-dygraphs-library.html)
  
</div>

# **Load Packages**<a class="anchor"  id="2"></a>  [↑](#0)

In [None]:
library(tidyverse) # metapackage of all tidyverse packages
library(dygraphs) # Create interactive graph

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


# **Import Data**<a class="anchor"  id="3"></a>  [↑](#0)

In [2]:
# The statistics for U.S. air carrier traffic only account for scheduled passenger flights and not seasonally adjusted.

df <- data.frame(read_csv("/kaggle/input/u-s-airline-traffic-data/air traffic.csv", show_col_types = FALSE))

str(df)

'data.frame':	249 obs. of  17 variables:
 $ Year   : num  2003 2003 2003 2003 2003 ...
 $ Month  : num  1 2 3 4 5 6 7 8 9 10 ...
 $ Dom_Pax: num  43032450 41166780 49992700 47033260 49152352 ...
 $ Int_Pax: num  4905830 4245366 5008613 4345444 4610834 ...
 $ Pax    : num  47938280 45412146 55001313 51378704 53763186 ...
 $ Dom_Flt: num  785160 690351 797194 766260 789397 ...
 $ Int_Flt: num  57667 51259 58926 55005 55265 ...
 $ Flt    : num  842827 741610 856120 821265 844662 ...
 $ Dom_RPM: num  36211422 34148439 41774564 39465980 41001934 ...
 $ Int_RPM: num  12885980 10715468 12567068 10370592 11575026 ...
 $ RPM    : num  49097402 44863907 54341633 49836572 52576960 ...
 $ Dom_ASM: num  56191300 50088434 57592901 54639679 55349897 ...
 $ Int_ASM: num  17968572 15587880 17753174 15528761 15629821 ...
 $ ASM    : num  74159872 65676314 75346075 70168440 70979718 ...
 $ Dom_LF : num  64.4 68.2 72.5 72.2 74.1 ...
 $ Int_LF : num  71.7 68.7 70.8 66.8 74.1 ...
 $ LF     : num  66.2 68.3 

# **Dataset Overview**<a class="anchor"  id="4"></a>  [↑](#0)

In [3]:
dim(df)

* There are 249 observation and 17 variables in this dataset.

In [4]:
colSums(is.na(df)) # check for NA values

* There is no missing value.

In [5]:
sum(duplicated(df)) # check for duplicated rows

* There is no duplicated row.

In [6]:
colSums(df == "") # check empty values

* No presence of empty values.

In [7]:
summary(df)

      Year          Month           Dom_Pax            Int_Pax        
 Min.   :2003   Min.   : 1.000   Min.   : 2877290   Min.   :  136609  
 1st Qu.:2008   1st Qu.: 3.000   1st Qu.:50982170   1st Qu.: 6395022  
 Median :2013   Median : 6.000   Median :56200104   Median : 7419187  
 Mean   :2013   Mean   : 6.446   Mean   :55209710   Mean   : 7392209  
 3rd Qu.:2018   3rd Qu.: 9.000   3rd Qu.:60892131   3rd Qu.: 8567847  
 Max.   :2023   Max.   :12.000   Max.   :75378157   Max.   :12432615  
      Pax              Dom_Flt          Int_Flt           Flt        
 Min.   : 3013899   Min.   :217262   Min.   : 4996   Min.   :222280  
 1st Qu.:57664576   1st Qu.:662000   1st Qu.:61615   1st Qu.:727898  
 Median :63899130   Median :709933   Median :66557   Median :779011  
 Mean   :62601919   Mean   :706751   Mean   :64736   Mean   :771487  
 3rd Qu.:69447429   3rd Qu.:781804   3rd Qu.:71924   3rd Qu.:848650  
 Max.   :87810772   Max.   :890938   Max.   :82681   Max.   :964102  
    Dom_RPM  

# **Data Exploration**<a class="anchor"  id="5"></a>  [↑](#0)

## **Number of Passengers**<a class="anchor"  id="6"></a>  [↑](#0)

In [8]:
pax <- df %>% 
        subset(,3:5) %>% 
        ts(start=c(2003,1),freq=12)

pax %>%
    dygraph(main = "U.S. Monthly Air Carrier Traffic, 2003-2023", ylab = "Number of Passengers") %>%
    dySeries("Dom_Pax", label = "Domestic") %>%
    dySeries("Int_Pax", label = "International") %>%
    dySeries("Pax", label = "Total") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyOptions(drawPoints = TRUE, pointSize = 1.5, labelsKMB = TRUE) %>%
    dyHighlight(highlightCircleSize = 3, 
                highlightSeriesBackgroundAlpha = 0.75,
                hideOnMouseOut = TRUE) %>%
    dyShading(from = "2020-1-1", to = "2023-12-1", color = "#FAF0F0") %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyRangeSelector(dateWindow = c("2013-01-01", "2023-12-01")) %>%
    dyEvent("2020-1-1", "COVID-19 Pandemic", labelLoc = "bottom") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyEvent("2009-6-1") %>%
    dyRangeSelector() %>% 
    dyUnzoom() %>%
    dyCrosshair(direction = "vertical")

* As April marked the first full month of widespread shutdowns due to the COVID-19 crisis, a significant drop in passenger numbers was expected.
* The number of air passengers decreased significantly in April 2020 and stayed below the usual trend for nearly two years.
* This prolonged downturn in air travel had a profound impact on the aviation industry. 
* Airlines had to grapple with unprecedented challenges, including grounded fleets, layoffs and significant financial losses.
* As 2022 unfolds, the industry is showing signs of recovery. Vaccination rollouts worldwide and easing of travel restrictions are slowly leading to an increase in passenger numbers.
* However, the road to recovery is expected to be long and uneven, with passenger numbers anticipated to return to pre-pandemic levels only by 2024 or later. 
* The future of the aviation industry hinges on various factors, including the course of the pandemic, economic recovery and shifts in passenger behavior.

In [9]:
df %>% 
    subset(,3) %>% 
    ts(start=c(2003,1),end=c(2019, 12), freq=12) %>% 
    dygraph(main = "U.S. Monthly Air Carrier Traffic Before COVID-19", ylab = "Number of Passengers") %>%
    dySeries(label = "Total") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyOptions(drawPoints = TRUE, pointSize = 2.5, labelsKMB = TRUE) %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyEvent("2009-6-1") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyRangeSelector() %>% 
    dyUnzoom() %>%
    dyCrosshair(direction = "vertical")

* While the global financial crisis (2007-2008) resulted in a decline in both leisure and business travel, the air passenger traffic returned to its usual trend in the subsequent years. Following the recovery from this downturn, the growth of air passenger traffic has been more or less consistent with long-term trends.

## **Number of Flights**<a class="anchor"  id="7"></a>  [↑](#0)

In [10]:
flight <- df %>% 
        subset(,6:8) %>% 
        ts(start=c(2003,1),freq=12)

flight %>%
    dygraph(main = "Number of Flights for All U.S. Carriers, 2003-2023", ylab = "Number of Flights") %>%
    dySeries("Dom_Flt", label = "Domestic") %>%
    dySeries("Int_Flt", label = "International") %>%
    dySeries("Flt", label = "Total") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyOptions(fillGraph = TRUE, fillAlpha = 0.05, labelsKMB = TRUE) %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyEvent("2009-6-1") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyEvent("2020-1-1", "COVID-19 Pandemic", labelLoc = "bottom") %>%
    dyRangeSelector() %>%
    dyCrosshair(direction = "vertical")

* As countries around the world implemented lockdowns and travel restrictions to curb the spread of the virus, both domestic and international air travel saw unprecedented declines
* In April 2020, the number of flights decreased sharply to 222.28 thousand. Domestic flights made up 97.74% (217.26 thousand) of this, while the remaining 2.26% were international (5.02 thousand).

## **Revenue Passenger Miles (RPM)**<a class="anchor"  id="8"></a>  [↑](#0)

In [11]:
rpm <- df %>% 
        subset(,9:11) %>% 
        ts(start=c(2003,1),freq=12)

rpm %>%
    dygraph(main = "Revenue Passenger Miles for All U.S. Carriers, 2003-2023", ylab = "Thousands (000)") %>%
    dySeries("Dom_RPM", label = "Domestic") %>%
    dySeries("Int_RPM", label = "International") %>%
    dySeries("RPM", label = "Total") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyOptions(stepPlot = TRUE, labelsKMB = TRUE) %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyEvent("2009-6-1") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyEvent("2020-1-1", "COVID-19 Pandemic", labelLoc = "bottom") %>%
    dyRangeSelector() %>%
    dyCrosshair(direction = "vertical")

* In the aviation industry, `revenue passenger miles (RPM)` is used to measure the number of miles traveled by paying passengers. It is computed by multiplying the number of paying passengers by the distance traveled.

In [12]:
rpm2 <- df %>%
            transmute(YoY = (RPM - lag(RPM, 12)) / lag(RPM, 12) * 100) %>%
            ts(start=c(2003,1),freq=12)

rpm2 %>%
    dygraph(main = "Revenue Passenger Miles: Year-Over-Year Change", ylab = "Year-Over-Year Growth (%yoy)") %>%
    dySeries("YoY", label = "YoY") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyOptions(fillGraph = TRUE, drawPoints = TRUE, pointSize = 2.5) %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyEvent("2009-6-1") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyEvent("2020-1-1", "COVID-19 Pandemic", labelLoc = "bottom") %>%
    dyRangeSelector() %>%
    dyCrosshair(direction = "vertical") %>%
    dyRangeSelector(dateWindow = c("2018-01-01", "2023-12-01"))

* Due to COVID-19 related large scale lockdowns, RPM shrank by 96.62%yoy in April 2020.
* March had already seen a significant 52.44% fall from the previous year, and this continued.
* A year-over-year growth of 1538.25% in RPM during April 2021 likely indicates a significant recovery in air travel compared to April 2020, which was one of the months most severely affected by global lockdowns and travel restrictions.
* However, this doesn’t necessarily imply that air travel had returned to pre-pandemic levels. The growth rate is so high or inflated because the comparison is with a period of time (April 2020) when air travel was exceptionally low. So, even a modest rise in travel would lead to a high growth rate.
* The reasons behind this recovery could be multifaceted, where it could be due to eased travel restrictions, higher vaccination rates, pent-up demand for travel or improvements in airlines’ COVID-19 safety procedures that gave passengers more confidence to fly.

## **Available Seat Miles (ASM)**<a class="anchor"  id="9"></a>  [↑](#0)

In [13]:
asm <- df %>% 
        subset(,12:14) %>% 
        ts(start=c(2003,1),freq=12)

asm %>%
    dygraph(main = "Available Seat Miles for All U.S. Carriers, 2003-2023", ylab = "Thousands (000)") %>%
    dySeries("Dom_ASM", label = "Domestic") %>%
    dySeries("Int_ASM", label = "International") %>%
    dySeries("ASM", label = "Total") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyOptions(labelsKMB = TRUE) %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyEvent("2009-6-1") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyEvent("2020-1-1", "COVID-19 Pandemic", labelLoc = "bottom") %>%
    dyRangeSelector() %>%
    dyCrosshair(direction = "vertical")

* `Available seat miles (ASM)` measures the total flight passenger capacity of an airline, which is calculated by multiplying the number of seats available by the number of miles traveled by a given airplane.
* It is an important measure for investors to evaluate the capability of airlines to generate revenues from the availability of seats to customers.

In [14]:
asm2 <- df %>%
            transmute(YoY = (ASM - lag(ASM, 12)) / lag(ASM, 12) * 100) %>%
            ts(start=c(2003,1),freq=12)

asm2 %>%
    dygraph(main = "Available Seat Miles: Year-Over-Year Change", ylab = "Year-Over-Year Growth (%yoy)") %>%
    dySeries("YoY", label = "YoY") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyOptions(fillGraph = TRUE, drawPoints = TRUE, pointSize = 2.5) %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyEvent("2009-6-1") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyEvent("2020-1-1", "COVID-19 Pandemic", labelLoc = "bottom") %>%
    dyRangeSelector() %>%
    dyCrosshair(direction = "vertical") %>%
    dyRangeSelector(dateWindow = c("2018-01-01", "2023-12-01"))

* ASM contracted by a dramatic 79.33%yoy in April 2020, which is more than triple the negative year-on-year growth in March 2020 (-21.54%yoy).
* This huge contraction was a direct result of the global travel restrictions and lockdown measures enforced in response to the COVID-19 pandemic. 
* With fewer people traveling, airlines had to significantly reduce their flight schedules, leading to a steep drop in ASM.
* Again, April 2020 was a period of severe global travel restrictions due to the onset of the pandemic, leading to a drastic reduction in flights. Therefore, the base for comparison (ASM in April 2020) was very low. A more than 200% increase in April 2021 from this low base could still be far from the pre-pandemic levels of ASM.
* Since RPM dropped faster than ASM, a sharp fall in load factor was expected as well.

## **Load Factor**<a class="anchor"  id="10"></a>  [↑](#0)

In [15]:
# Calculate mean for load factor before COVID-19
mean <- df %>% 
        filter(Year <= "2019") %>%
        summarise(mean = mean(LF, na.rm = TRUE))

# Calculate standard deviation for load factor before COVID-19
std <- df %>%
        filter(Year <= "2019") %>%
        summarise(std = sd(LF, na.rm = TRUE))

print(mean)

      mean
1 80.91397


In [16]:
lf <- df %>% 
        subset(,15:17) %>% 
        ts(start=c(2003,1),freq=12)

lf %>%
    dygraph(main = "Passenger Load Factor for All U.S. Carriers, 2003-2023") %>%
    dySeries("Dom_LF", label = "Domestic") %>%
    dySeries("Int_LF", label = "International") %>%
    dySeries("LF", label = "Total") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyAxis("y", label = "Percent (%)", valueRange = c(10, 100)) %>%
    dyShading(axis = "y", from = mean - std, to = mean + std) %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyEvent("2009-6-1") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyEvent("2020-1-1", "COVID-19 Pandemic", labelLoc = "bottom") %>%
    dyOptions(axisLineWidth = 1.5) %>%
    dyRangeSelector() %>%
    dyCrosshair(direction = "vertical")

* `Load factor` serves as a gauge to measure the percentage of total seat availability that is occupied by passengers.
* High load factors indeed suggest that most seats have been sold, and this allows the airline to distribute its fixed costs across a larger number of passengers. This can lead to increased profitability for the airline. 
* The average load factor prior to the COVID-19 pandemic was 80.92%, indicating that for every 100 seats available on the airline’s flights, about 81 seats were occupied by passengers. The remaining 19.08% are unsold seats.
* In April 2020, due to a rapid decrease in demand compared to capacity, the load factor dropped to a record low of 13.83%, suggesting that for every 100 seats available on the airline’s flights, only about 14 seats were occupied by passengers.
* This is a clear indication that the challenges facing the aviation industry during the COVID-19 pandemic have significantly reduced travel demand.

In [17]:
df[which.max(df$LF), ]

Unnamed: 0_level_0,Year,Month,Dom_Pax,Int_Pax,Pax,Dom_Flt,Int_Flt,Flt,Dom_RPM,Int_RPM,RPM,Dom_ASM,Int_ASM,ASM,Dom_LF,Int_LF,LF
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
234,2022,6,67368670,10023468,77392138,640218,67900,708118,63343854,24276114,87619968,70415408,27883747,98299154,89.96,87.06,89.14


* However, load factor improved well and reached a record high of 89.14% in June 2022.

In [18]:
lf <- df %>%
            transmute(YoY = (LF - lag(LF, 12)) / lag(LF, 12) * 100) %>%
            ts(start=c(2003,1),freq=12)

lf %>%
    dygraph(main = "Passenger Load Factor: Year-Over-Year Change", ylab = "Year-Over-Year Growth (%yoy)") %>%
    dySeries("YoY", label = "YoY") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyOptions(fillGraph = TRUE, drawPoints = TRUE, pointSize = 2.5) %>%
    dyShading(from = "2007-12-1", to = "2009-6-1", color = "#FAF0F0") %>%
    dyEvent("2009-6-1") %>%
    dyEvent("2007-12-1", "The Great Recession", labelLoc = "bottom") %>%
    dyEvent("2020-1-1", "COVID-19 Pandemic", labelLoc = "bottom") %>%
    dyRangeSelector() %>%
    dyCrosshair(direction = "vertical") %>%
    dyRangeSelector(dateWindow = c("2018-01-01", "2023-12-01"))

* With demand declining faster than capacity, the load factor shrank by a dramatic 83.66%yoy in April 2020, which is more than half the negative year-on-year growth in March 2020 (-39.37%yoy).
* It may be because airlines in the US responded with less radical capacity adjustments.
* A year-over-year growth for load factor that remains below 10% since February 2023 could indicate a recovery in passenger demand relative to the available capacity in the airline industry.

# **Final Note**<a class="anchor"  id="11"></a>  [↑](#0)

In [19]:
pax2 <- df %>% 
            group_by(Year) %>% 
            summarize(Pax = sum(Pax)) %>% 
            subset(,2) %>% 
            ts(start = 2003)

pax2 %>%
    dygraph(main = "U.S. Yearly Air Carrier Traffic, 2003-2023", ylab = "Number of Passengers") %>%
    dySeries("Pax", label = "Total") %>%
    dyAxis("x", drawGrid = FALSE) %>%
    dyLegend(show = "follow") %>%
    dyOptions(fillGraph = TRUE, drawPoints = TRUE, pointSize = 1.5, labelsKMB = TRUE) %>%
    dyHighlight(highlightCircleSize = 3, 
                highlightSeriesBackgroundAlpha = 0.75,
                hideOnMouseOut = TRUE) %>%
    dyRangeSelector() %>% 
    dyUnzoom() %>%
    dyCrosshair(direction = "vertical")

Overall, the U.S. air traffic has shown signs of recovery. U.S. airlines carried 186.66 million more passengers in 2022 (852.81 million) than in 2021 (666.15 million), which is a 28.02% increase year-over-year. For the first three quarters of 2023, U.S. airlines carried 702.45 million passengers.

However, it’s important to note that these numbers are still below the pre-pandemic levels of 2019. The recovery of the aviation industry is impacted by several factors, including the pace of COVID-19 vaccinations, government policies and passenger confidence in air travel.

Thanks for your reading, and please upvote if you liked it.