# Final Project: Flight Price Prediction

According to statistics from the Bureau of Transportation, over 853 million passengers traveled through U.S. airports in 2022. In 2020, there were 388 million passengers traveling and in 2021, there were 658 million passengers. The number of travelers has been steadily increasing year by year as the global aviation industry has expanded and demand for tourism has accelerated. However, not everyone in the world can afford to fly because of the high cost of air travel. We hope to give potential passengers and airlines an idea of the market demand and price of air travel. 


In our project we will answer the question:
     Are we able to find the cheapest flight price given certain criteria for flights?

## Part 1: Exploratory Data Analysis

First we need to import the necessary libraries for data analysis and preprocess our [kaggle dataset](https://www.kaggle.com/datasets/dilwong/flightprices). 

In [40]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#include imports for machine learning here

#to create the complex graph network
import networkx as nx 
#use pyvis to generate better looking graphs
#rom pyvis.network import Network

#chunk the data - use a preditermined chunking pattern before the dataset is loaded in. 
columns = ['flightDate', 'segmentsAirlineName', 'baseFare', 'startingAirport', 'destinationAirport']


chunksize=10000
flight_price_dataset = pd.read_csv("data/itineraries.csv", chunksize=chunksize)
week_analysis_data = next(flight_price_dataset)
week_analysis_data = week_analysis_data.sample(frac=1, random_state=42)
display(week_analysis_data)
week_analysis_data = week_analysis_data[columns]
# display(week_analysis_data)
#for data analysis of weekend prices v. weekday prices.

#drop many rows
#or process analysis in chunks and then aggregate the data

# TODO
#     grab airline, day of journey, and flight price -> these parameters will change based on destination



Unnamed: 0,legId,searchDate,flightDate,startingAirport,destinationAirport,fareBasisCode,travelDuration,elapsedDays,isBasicEconomy,isRefundable,...,segmentsArrivalTimeEpochSeconds,segmentsArrivalTimeRaw,segmentsArrivalAirportCode,segmentsDepartureAirportCode,segmentsAirlineName,segmentsAirlineCode,segmentsEquipmentDescription,segmentsDurationInSeconds,segmentsDistance,segmentsCabinCode
6252,42ad9a7ee89845a6f1ad24fb2752d7b7,2022-04-16,2022-04-17,OAK,SFO,QA0QA0MQ,PT7H51M,0,False,False,...,1650209700||1650232860,2022-04-17T08:35:00.000-07:00||2022-04-17T15:0...,LAX||SFO,OAK||LAX,Delta||United,DL||UA,Embraer 175 (Enhanced Winglets)||Boeing 737-800,5100||5760,338||339,coach||coach
4684,15beafefec9f073253a15565deb36bd1,2022-04-16,2022-04-17,LAX,DTW,LH0OAVMN,PT11H55M,1,False,False,...,1650279600||1650303180,2022-04-18T07:00:00.000-04:00||2022-04-18T13:3...,EWR||DTW,LAX||EWR,Alaska Airlines||United,AS||UA,Boeing 737-900||Embraer 170,19320||7380,2458||485,coach||coach
1731,3f14d949df3493ff68a20427172bb153,2022-04-16,2022-04-17,DEN,EWR,VAA0AKEN,PT8H55M,0,False,False,...,1650218640||1650226740||1650243540,2022-04-17T13:04:00.000-05:00||2022-04-17T15:1...,DFW||IAH||EWR,DEN||DFW||IAH,United||United||United,UA||UA||UA,Airbus A319||Boeing 737 MAX 8||Boeing 737 MAX 8,7200||4320||13080,650||233||1419,coach||coach||coach
4742,9e4264d5d6db5cb271884f97f82c3f7c,2022-04-16,2022-04-17,LAX,EWR,LH0OAVMN,PT5H22M,1,False,False,...,1650279600,2022-04-18T07:00:00.000-04:00,EWR,LAX,Alaska Airlines,AS,Boeing 737-900,19320,2458,coach
4521,f2413f98f5726d05947eacd0402be0ce,2022-04-16,2022-04-17,LAX,DEN,KA0NA0MQ,PT7H8M,0,False,False,...,1650225720||1650244680,2022-04-17T14:02:00.000-06:00||2022-04-17T19:1...,SLC||DEN,LAX||SLC,Delta||Delta,DL||DL,Boeing 737-900||Airbus A321,6720||4980,590||380,coach||coach
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5734,0e94753b40077240ffebe3f246249708,2022-04-16,2022-04-17,MIA,DTW,HA0OA0MQ,PT3H2M,0,False,False,...,1650245340,2022-04-17T21:29:00.000-04:00,DTW,MIA,Delta,DL,Boeing 737-900,10920,1153,coach
5191,f29117f1e75f16010f8652a4a5673420,2022-04-16,2022-04-17,LGA,ATL,EAA0OKEN,PT4H28M,0,False,False,...,1650226080||1650236760,2022-04-17T16:08:00.000-04:00||2022-04-17T19:0...,IAD||ATL,LGA||IAD,United||United,UA||UA,Embraer 175 (Enhanced Winglets)||Embraer 170,5400||7260,221||541,coach||coach
5390,9ee183e46e9a92d6e0cf030b311c3c4a,2022-04-16,2022-04-17,LGA,IAD,QA0NA0MQ,PT5H47M,0,False,False,...,1650230460||1650244020,2022-04-17T17:21:00.000-04:00||2022-04-17T21:0...,CMH||IAD,LGA||CMH,Delta||United,DL||UA,Embraer 175||Embraer 175 (Enhanced Winglets),7260||4260,472||311,coach||coach
860,af37cef7a840e30dddf1273d1a49002f,2022-04-16,2022-04-17,BOS,ORD,V0AJZNN3,PT2H58M,0,False,False,...,1650247980,2022-04-17T21:13:00.000-05:00,ORD,BOS,JetBlue Airways,B6,Embraer 175,10680,862,coach
