## FLIGHT PRICE PREDICTION 
Anyone who has booked a flight ticket knows how unexpectedly the prices vary. The cheapest available ticket on a given flight gets more and less expensive over time. This usually happens as an attempt to maximize revenue based on - 

1. Time of purchase patterns (making sure last-minute purchases are expensive) 
2. Keeping the flight as full as they want it (raising prices on a flight which is filling up in order to reduce sales and hold back inventory for those expensive last-minute expensive purchases) So, you have to work on a project where you collect data of flight fares with other features and work to make a model to predict fares of flights. 

STEPS 1. Data Collection

You have to scrape at least 1500 rows of data. You can scrape more data as well, it’s up to you, More the data better the model
In this section you have to scrape the data of flights from different websites (yatra.com, skyscanner.com, official websites of airlines, etc). The number of columns for data doesn’t have limit, it’s up to you and your creativity. Generally, these columns are airline name, date of journey, source, destination, route, departure time, arrival time, duration, total stops and the target variable price. You can make changes to it, you can add or you can remove some columns, it completely depends on the website from which you are fetching the data.

2. Data Analysis
After cleaning the data, you have to do some analysis on the data. Do airfares change frequently? Do they move in small increments or in large jumps? Do they tend to go up or down over time? What is the best time to buy so that the consumer can save the most by taking the least risk? Does price increase as we get near to departure date? Is Indigo cheaper than Jet Airways? Are morning flights expensive?

3. Model Building
After collecting the data, you need to build a machine learning model. Before model building do all data pre-processing steps. 

Try different models with different hyper parameters and select the best model.

Follow the complete life cycle of data science. Include all the steps like
1. Data Cleaning
2. Exploratory Data Analysis
3. Data Pre-processing
4. Model Building
5. Model Evaluation
6. Selecting the best model

In [1]:
# import libraries
import pandas as pd
import time
from bs4 import BeautifulSoup
import selenium
from selenium import webdriver # Importing selenium webdriver 
from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException  # Importing required Exceptions which needs to handled
import requests #Importing requests
import re # importing regex


# importings warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
driver = webdriver.Chrome(r"C:\Users\dell\Downloads\chromedriver_win32\chromedriver.exe")
url= 'https://www.yatra.com/flight-schedule/domestic-flight-routes'
driver.get(url)

In [3]:
all_loc = []
for i in driver.find_elements_by_xpath('//li[@class="txt-cap mb10"]/a'):
    all_loc.append(i.get_attribute('href'))

In [4]:
all_loc

['https://www.yatra.com/flight-schedule/lucknow-to-mumbai-flights.html',
 'https://www.yatra.com/flight-schedule/madurai-to-chennai-flights.html',
 'https://www.yatra.com/flight-schedule/ahmedabad-to-lucknow-flights.html',
 'https://www.yatra.com/flight-schedule/bagdogra-to-kolkata-flights.html',
 'https://www.yatra.com/flight-schedule/chennai-to-madurai-flights.html',
 'https://www.yatra.com/flight-schedule/kolkata-to-guwahati-flights.html',
 'https://www.yatra.com/flight-schedule/kochi-to-bangalore-flights.html',
 'https://www.yatra.com/flight-schedule/kolkata-to-bagdogra-flights.html',
 'https://www.yatra.com/flight-schedule/delhi-to-dibrugarh-flights.html',
 'https://www.yatra.com/flight-schedule/chandigarh-to-delhi-flights.html',
 'https://www.yatra.com/flight-schedule/delhi-to-raipur-flights.html',
 'https://www.yatra.com/flight-schedule/mumbai-to-kochi-flights.html',
 'https://www.yatra.com/flight-schedule/bangalore-to-darbhanga-flights.html',
 'https://www.yatra.com/flight-sche

In [5]:
len(all_loc)

2390

In [6]:
name = []
number = []
journey_date = []
source = []
destination = []
arrival_time = []
departure_time = []
time_of_journey = []
stops = []
price = []

for url in all_loc:
    driver.get(url)
    
    try:
        # airline name
        for i in driver.find_elements_by_xpath('//div[@class="fs-13 airline-name no-pad col-8"]/span'):
            name.append(i.text)
        
        # plane unique number
        for i in driver.find_elements_by_xpath('//div[@class="fs-13 airline-name no-pad col-8"]/p'):
            number.append(i.text)
        
        # source name
        for i in driver.find_elements_by_xpath('//div[@class="i-b col-4 no-wrap text-right dtime col-3"]/p'):
            source.append(i.text)
        
        # destination name
        for i in driver.find_elements_by_xpath('//div[@class="i-b pdd-0 text-left atime col-5"]/p[2]'):
            destination.append(i.text)
        
        # departure time from source airport
        for i in driver.find_elements_by_xpath('//div[@class="fs-15 bold time"]'):
            departure_time.append(i.text)
   
        # arrival time to destination airport
        for i in driver.find_elements_by_xpath('//p[@class="bold fs-15 mb-2 pr time"]'):
            arrival_time.append(i.text)
    
        # duration of flight
        for i in driver.find_elements_by_xpath('//p[@class="fs-12 bold du mb-2"]'):
            time_of_journey.append(i.text)
        
        # number of stops
        for i in driver.find_elements_by_xpath('//div[@class=" font-lightgrey fs-10 tipsy i-b fs-10"]/span'):
            stops.append(i.text)
    
        # flight date
        date = driver.find_elements_by_xpath('//div[@class="day-li text-center cursor-pointer pr active font-primary-color"]/p[1]')[0].text.split(",")[1][1:]
    
        # flight price
        try:
            for i in driver.find_elements_by_xpath('//div[@class="i-b tipsy fare-summary-tooltip fs-18"]'):
                price.append(i.text)
                journey_date.append(date)
        except NoSuchElementException:#handling no such element exception
            price.append('No details available')
            journey_date.append('No details available') 
        except StaleElementReferenceException:#handling Stale element exception
            price.append('No details available')
            journey_date.append('No details available') 
        
    except:
        continue

In [7]:
len(name), len(number), len(source), len(destination), len(departure_time), len(arrival_time), len(time_of_journey), len(stops), len(journey_date), len(price)

(11636, 11636, 11636, 11636, 11636, 11636, 11636, 11636, 11627, 11627)

In [8]:
df = pd.DataFrame({'Airline':name[:11627],
                   'Airline_Unique_ID':number[:11627],
                  'Source':source[:11627],
                  'Destination':destination[:11627],
                  'Departure_time':departure_time[:11627],
                  'Arrival_time':arrival_time[:11627],
                  'Duration':time_of_journey[:11627],
                  'Stops':stops[:11627],
                  'Date_of_Journey':journey_date,
                  'Price':price})
df

Unnamed: 0,Airline,Airline_Unique_ID,Source,Destination,Departure_time,Arrival_time,Duration,Stops,Date_of_Journey,Price
0,Go First,G8-396,Lucknow,Mumbai,18:55,21:10,2h 15m,Non Stop,30 Jun,5362
1,Go First,G8-2620,Lucknow,Mumbai,06:30,08:50,2h 20m,Non Stop,30 Jun,5362
2,Go First,G8-307,Lucknow,Mumbai,13:35,15:55,2h 20m,Non Stop,30 Jun,5362
3,IndiGo,6E-5392,Lucknow,Mumbai,14:30,16:35,2h 05m,Non Stop,30 Jun,5363
4,IndiGo,6E-5312,Lucknow,Mumbai,19:15,21:30,2h 15m,Non Stop,30 Jun,5363
...,...,...,...,...,...,...,...,...,...,...
11622,Air India,AI-669,Mumbai,Bhubaneswar,12:15,14:25,2h 10m,Non Stop,14 Jul,4862
11623,Go First,G8-532,Ahmedabad,Kolkata,05:55,08:35,2h 40m,Non Stop,14 Jul,5502
11624,IndiGo,6E-966,Ahmedabad,Kolkata,06:20,09:00,2h 40m,Non Stop,14 Jul,5502
11625,IndiGo,6E-6559,Ahmedabad,Kolkata,19:35,22:15,2h 40m,Non Stop,14 Jul,5502


In [9]:
df.to_csv('Flight Prediction data.csv')

In [10]:
driver.close()