<a href="https://colab.research.google.com/github/etrahadias/etrahadias.github.io/blob/main/Trahadias_CS620_DataProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![Travel](https://drive.google.com/uc?export=view&id=12ADuKga-IdVDh88RBrBmfocYSq6qtfdp)



# **Life After Lockdown: An Analysis of COVID-19's Impact on Travel Trends**

**Name:** Elizabeth Trahadias

**Email:** etrah001@odu.edu

**Portfolio:** https://etrahadias.github.io/


### **Abstract**

In March of 2020, the world temporarily shut down due to the COVID-19 pandemic. A lot of time was spent at home, businesses closed, holidays were celebrated via Zoom, and students and employees learned to work from home. Another major factor of life that was impacted by the pandemic was travel.

I am looking to explore the how travel trends have evolved from 2019 to present. I am curious to see if there is a relationship between the number COVID-19 cases, the popularity of certain travel destinations, and number of airline passengers.


#### **Goals**

The objectives of this project are to:

* Merge and clean multiple data sets
* Perform exploratory data analysis
* Practice data wrangling
* Analyze U.S. travel trends in 2019 versus the COVID-19 pandemic (i.e. most popular airlines, common destinations and travel months, number of people traveling, total number of scheduled flights, etc.)
* Determine if there is a relationship between number of positive COVID-19 cases vs number of people traveling on airplanes
* Create visualizations that display popular U.S. travel destinations and airlines before and during the pandemic
* Determine the impact of COVID-19 on airlines' profitability in the U.S. (TBD if this will be explored)
* Develop a model that predicts travel trends in the future (TBD if this will--or even can be--explored)

#### **Data Sources**

There are 4 data sets to investigate the above objectives. The first 3 data sets are from the [Bureau of Transportation Statistics](https://www.transtats.bts.gov/Tables.asp?QO_VQ=EED&QO_anzr=Nv4%FDPn44vr4%FDf6n6v56vp5%FD%FLS14z%FDHE%FDg4nssvp%FM-%FD%FDh.f.%FDPn44vr45&QO_fu146_anzr=Nv4%FDPn44vr45). The data sets display information like the number of airline passengers and the destinations of the flights. The also data includes the month for each flight. Since each csv file represents a year (2019, 2020, & 2021), I will need to merge the 3 data sets into one for ease of analysis. A description of the variables in this data set can be found [here](https://www.transtats.bts.gov/TableInfo.asp?gnoyr_VQ=FIL&QO_fu146_anzr=Nv4%20Pn44vr45&V0s1_b0yB=D).

The dataset with the COVID-19 case information is from the [CDC](https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36/data), and it is a relatively messy data set. It appears the data was entered or submitted by users, and there are a lot of missing and inconsistent values.

Since the airplane flight data is broken down by month and the CDC data is broken down by day, I will have to clean the CDC data, organize it by month, and merge it with a modified version of the airline data to show the relationship between the number of COVID-19 cases and the number of people traveling per month.

*TBD - I may try to use the data set below to explore COVID-19's impact on airline profitability.*

* Possible data source for airline profitability: https://www.transtats.bts.gov/Data_Elements_Financial.aspx?Qn6n=K

### **Importing Data**


In [None]:
import pandas as pd

In [None]:
# Mount to Google Drive to import the data
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Read in primary data sets with Pandas
# These data sets show airline information (number of passengers, destinations, etc.)
airline19=pd.read_csv("/content/drive/My Drive/CS 620/Project/2019Airline.csv")
airline20=pd.read_csv("/content/drive/My Drive/CS 620/Project/2020Airline.csv")
airline21=pd.read_csv("/content/drive/My Drive/CS 620/Project/2021Airline.csv")

In [None]:
# Show the first few lines of data for the 2019 airline data
airline19.head()

Unnamed: 0,PASSENGERS,FREIGHT,MAIL,DISTANCE,UNIQUE_CARRIER,AIRLINE_ID,UNIQUE_CARRIER_NAME,UNIQUE_CARRIER_ENTITY,REGION,CARRIER,CARRIER_NAME,CARRIER_GROUP,CARRIER_GROUP_NEW,ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,ORIGIN,ORIGIN_CITY_NAME,ORIGIN_STATE_ABR,ORIGIN_STATE_FIPS,ORIGIN_STATE_NM,ORIGIN_WAC,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID,DEST,DEST_CITY_NAME,DEST_STATE_ABR,DEST_STATE_FIPS,DEST_STATE_NM,DEST_WAC,YEAR,QUARTER,MONTH,DISTANCE_GROUP,CLASS,Unnamed: 36
0,0.0,0.0,0.0,476.0,C5,20445,"Commutair Aka Champlain Enterprises, Inc.",6944,D,C5,"Commutair Aka Champlain Enterprises, Inc.",1,6,12339,1233904,32337,IND,"Indianapolis, IN",IN,18,Indiana,42,12264,1226402,30852,IAD,"Washington, DC",VA,51,Virginia,38,2019,3,9,1,F,
1,0.0,0.0,0.0,507.0,C5,20445,"Commutair Aka Champlain Enterprises, Inc.",6944,D,C5,"Commutair Aka Champlain Enterprises, Inc.",1,6,12339,1233904,32337,IND,"Indianapolis, IN",IN,18,Indiana,42,13230,1323002,32070,MDT,"Harrisburg, PA",PA,42,Pennsylvania,23,2019,3,9,2,F,
2,0.0,0.0,0.0,215.0,C5,20445,"Commutair Aka Champlain Enterprises, Inc.",6944,D,C5,"Commutair Aka Champlain Enterprises, Inc.",1,6,12397,1239702,32397,ITH,"Ithaca/Cortland, NY",NY,36,New York,22,10785,1078502,30785,BTV,"Burlington, VT",VT,50,Vermont,16,2019,3,9,1,F,
3,0.0,0.0,0.0,613.0,C5,20445,"Commutair Aka Champlain Enterprises, Inc.",6944,D,C5,"Commutair Aka Champlain Enterprises, Inc.",1,6,12451,1245102,31136,JAX,"Jacksonville, FL",FL,12,Florida,33,11193,1119302,33105,CVG,"Cincinnati, OH",KY,21,Kentucky,52,2019,3,9,2,F,
4,0.0,0.0,0.0,1304.0,C5,20445,"Commutair Aka Champlain Enterprises, Inc.",6944,D,C5,"Commutair Aka Champlain Enterprises, Inc.",1,6,13244,1324402,33244,MEM,"Memphis, TN",TN,47,Tennessee,54,10581,1058102,30581,BGR,"Bangor, ME",ME,23,Maine,12,2019,3,9,3,F,


In [None]:
# Show the first few lines of data for the 2020 airline data
airline20.head()

Unnamed: 0,PASSENGERS,FREIGHT,MAIL,DISTANCE,UNIQUE_CARRIER,AIRLINE_ID,UNIQUE_CARRIER_NAME,UNIQUE_CARRIER_ENTITY,REGION,CARRIER,CARRIER_NAME,CARRIER_GROUP,CARRIER_GROUP_NEW,ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,ORIGIN,ORIGIN_CITY_NAME,ORIGIN_STATE_ABR,ORIGIN_STATE_FIPS,ORIGIN_STATE_NM,ORIGIN_WAC,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID,DEST,DEST_CITY_NAME,DEST_STATE_ABR,DEST_STATE_FIPS,DEST_STATE_NM,DEST_WAC,YEAR,QUARTER,MONTH,DISTANCE_GROUP,CLASS,Unnamed: 36
0,0.0,165.0,3641.0,373.0,KO,20341,Alaska Central Express,6019,D,KO,Alaska Central Express,1,5,15991,1599102,35991,YAK,"Yakutat, AK",AK,2,Alaska,1,10299,1029906,30299,ANC,"Anchorage, AK",AK,2,Alaska,1,2020,1,1,1,G,
1,0.0,751.0,161.0,557.0,KO,20341,Alaska Central Express,6019,D,KO,Alaska Central Express,1,5,10299,1029906,30299,ANC,"Anchorage, AK",AK,2,Alaska,1,14738,1473802,34738,SDP,"Sandpoint, AK",AK,2,Alaska,1,2020,1,1,2,L,
2,0.0,0.0,0.0,385.0,AN,21894,"ADVANCED AIR, LLC",1229,D,AN,"ADVANCED AIR, LLC",1,6,12127,1212702,32575,HHR,"Hawthorne, CA",CA,6,California,91,15232,1523201,35232,TKF,"Truckee, CA",CA,6,California,91,2020,1,1,1,F,
3,0.0,0.0,0.0,385.0,AN,21894,"ADVANCED AIR, LLC",1229,D,AN,"ADVANCED AIR, LLC",1,6,15232,1523201,35232,TKF,"Truckee, CA",CA,6,California,91,12127,1212702,32575,HHR,"Hawthorne, CA",CA,6,California,91,2020,1,1,1,F,
4,0.0,0.0,0.0,1624.0,27Q,21652,"Jet Aviation Flight Services, Inc.",11046,I,27Q,"Jet Aviation Flight Services, Inc.",1,1,12197,1219702,31703,HPN,"White Plains, NY",NY,36,New York,22,14843,1484306,34819,SJU,"San Juan, PR",PR,72,Puerto Rico,3,2020,1,1,4,L,


In [None]:
# Show the first few lines of data for the 2021 airline data
airline21.head()

Unnamed: 0,PASSENGERS,FREIGHT,MAIL,DISTANCE,UNIQUE_CARRIER,AIRLINE_ID,UNIQUE_CARRIER_NAME,UNIQUE_CARRIER_ENTITY,REGION,CARRIER,CARRIER_NAME,CARRIER_GROUP,CARRIER_GROUP_NEW,ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,ORIGIN,ORIGIN_CITY_NAME,ORIGIN_STATE_ABR,ORIGIN_STATE_FIPS,ORIGIN_STATE_NM,ORIGIN_WAC,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID,DEST,DEST_CITY_NAME,DEST_STATE_ABR,DEST_STATE_FIPS,DEST_STATE_NM,DEST_WAC,YEAR,QUARTER,MONTH,DISTANCE_GROUP,CLASS,Unnamed: 36
0,0.0,732068.0,0.0,196.0,5X,19917,United Parcel Service,6910,D,5X,United Parcel Service,3,3,14730,1473004,33044,SDF,"Louisville, KY",KY,21,Kentucky,52,11823,1182304,31823,FWA,"Fort Wayne, IN",IN,18,Indiana,42,2021,1,2,1,G,
1,0.0,619050.0,0.0,1630.0,5X,19917,United Parcel Service,6910,D,5X,United Parcel Service,3,3,14730,1473004,33044,SDF,"Louisville, KY",KY,21,Kentucky,52,10713,1071302,30713,BOI,"Boise, ID",ID,16,Idaho,83,2021,1,2,4,G,
2,0.0,3966220.0,272048.0,942.0,5X,19917,United Parcel Service,6910,D,5X,United Parcel Service,3,3,14730,1473004,33044,SDF,"Louisville, KY",KY,21,Kentucky,52,14683,1468305,33214,SAT,"San Antonio, TX",TX,48,Texas,74,2021,1,2,2,G,
3,0.0,998701.0,0.0,473.0,5X,19917,United Parcel Service,6910,D,5X,United Parcel Service,3,3,14730,1473004,33044,SDF,"Louisville, KY",KY,21,Kentucky,52,12448,1244807,32448,JAN,"Jackson/Vicksburg, MS",MS,28,Mississippi,53,2021,1,2,1,G,
4,0.0,882144.0,0.0,1102.0,5X,19917,United Parcel Service,6910,D,5X,United Parcel Service,3,3,14730,1473004,33044,SDF,"Louisville, KY",KY,21,Kentucky,52,13256,1325602,33256,MFE,"Mission/McAllen/Edinburg, TX",TX,48,Texas,74,2021,1,2,3,G,


In [None]:
# Read in secondary data set with Pandas
# This is data from the CDC that shows COVID-19 case information
covid=pd.read_csv("/content/drive/My Drive/CS 620/Project/COVID.csv")

In [None]:
# Show the first few lines of data for the COVID-19 case information by state per day
covid.head()

Unnamed: 0,submission_date,state,tot_cases,conf_cases,prob_cases,new_case,pnew_case,tot_death,conf_death,prob_death,new_death,pnew_death,created_at,consent_cases,consent_deaths
0,09/01/2021,ND,118491,107475.0,11016.0,536,66,1562,,,1,0,09/02/2021 01:49:05 PM,Agree,Not agree
1,02/02/2021,IL,1130917,1130917.0,0.0,2304,0,21336,19306.0,2030.0,63,16,02/03/2021 02:55:58 PM,Agree,Agree
2,02/02/2021,MS,280182,176228.0,103954.0,1059,559,6730,4739.0,1991.0,13,7,02/04/2021 12:00:00 AM,Agree,Agree
3,05/03/2020,NH,2518,,,89,0,86,,,2,0,05/04/2020 10:49:24 PM,Not agree,Not agree
4,07/31/2020,ND,6602,6602.0,0.0,133,0,103,,,0,0,08/01/2020 02:38:12 PM,Agree,Not agree


### **References**

* https://www.marktechpost.com/2019/06/07/how-to-connect-google-colab-with-google-drive/
* https://colab.research.google.com/notebooks/io.ipynb#scrollTo=RWSJpsyKqHjH
* https://www.reddit.com/r/GoogleColab/comments/k6qpff/using_an_image_located_in_google_drive_in_colab/