## Plotting Airline Data with PLOTLY

The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.

Import required modueles

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
print("Plotly imported")

Plotly imported


Import Airline Data

In [2]:
airline_data =  pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv')
print("Airline Data Imported")

Airline Data Imported


In [3]:
airline_data.head()

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1295781,1998,2,4,2,4,1998-04-02,AS,19930,AS,...,,,,,,,,,,
1,1125375,2013,2,5,13,1,2013-05-13,EV,20366,EV,...,,,,,,,,,,
2,118824,1993,3,9,25,6,1993-09-25,UA,19977,UA,...,,,,,,,,,,
3,634825,1994,4,11,12,6,1994-11-12,HP,19991,HP,...,,,,,,,,,,
4,1888125,2017,3,8,17,4,2017-08-17,UA,19977,UA,...,,,,,,,,,,


In [4]:
airline_data.shape

(27000, 110)

Now randomly sample 500 data points. Setting the random state to be 42 so that we get same result.

In [5]:
data = airline_data.sample(n=500, random_state=42)
data.shape

(500, 110)

In [6]:
data.columns

Index(['Unnamed: 0', 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek',
       'FlightDate', 'Reporting_Airline', 'DOT_ID_Reporting_Airline',
       'IATA_CODE_Reporting_Airline',
       ...
       'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5AirportID',
       'Div5AirportSeqID', 'Div5WheelsOn', 'Div5TotalGTime',
       'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'],
      dtype='object', length=110)

Let's visually  capture details such as

* Departure time changes with respect to airport distance.

* Average Flight Delay time over the months

* Comparing number of flights in each destination state

* Number of  flights per reporting airline

* Distrubution of arrival delay

* Proportion of distance group by month (month indicated by numbers)

* Hierarchical view in the order of month and destination state holding value of number of flights

### `Scatter Plot: ` 
Let us use a scatter plot to represent departure time changes with respect to airport distance

In [7]:
fig = go.Figure()
fig.add_trace(go.Scatter(x = data['Distance'], y = data['DepTime'], 
                         mode ='markers', marker =dict(color ='darkmagenta')))

fig.update_layout(title ="Departure time changes with respect to airport distance.",
                  xaxis_title ='Distance', yaxis_title ='DepTime')

fig.show()

#### Inferences

It can be inferred that there are more flights round the clock for shorter distances. However, for longer distance there are limited flights through the day.


### `Line Plot:` 
Let us now use a line plot to extract average monthly arrival delay time and see how it changes over the year.

In [8]:
line_data = data.groupby("Month")['ArrDelay'].mean().reset_index()
fig = go.Figure()
fig.add_trace(go.Scatter(x =line_data['Month'], y= line_data['ArrDelay'],
                         mode ='lines', marker = dict(color='crimson')))

fig.update_layout(title ='Monthly Flight Arrival Delay',
                  xaxis_title ='Month', yaxis_title = 'Arrival Delay')

fig.show() 

#### Inferences

The Arrival delay is maximum in the month of June while it falls in the month of November. 

### `Bar Chart:`
Let us use a bar chart to extract number of flights that goes to a destination.

In [9]:
bar_data = data.groupby(['DestState'])['Flights'].sum().reset_index()

fig= px.bar(bar_data, x ='DestState', y ='Flights',
            title ='Total number of flights to the destination state split by reporting air',
            color =bar_data['DestState'])

fig.show()

### `Histogram:`
Let us represent the distribution of arrival delay using a histogram.

*First we need to set missing values to zero in ArrDelay column.*

In [17]:
data['ArrDelay'] = data['ArrDelay'].fillna(0)

fig = px.histogram(data, x ='ArrDelay', 
                   title = 'Total number of flights to the destination state split by reporting air',
                   color = data['ArrDelay'])

fig.show()

#### `Buble Plot`
Let  use a bubble plot to represent number of flights as per reporting airline.

In [15]:
data['Reporting_Airline'].unique()

array(['OO', 'DL', 'HP', 'UA', 'FL', 'NW', 'WN', 'KH', 'MQ', 'AA', '9E',
       'US', 'CO', 'B6', 'PA (1)', 'TW', 'AS', 'YV', 'NK', 'EA', 'F9',
       'EV', 'HA', 'OH', 'YX', 'XE', 'VX', 'PI', 'PS'], dtype=object)

In [12]:
bub_data = data.groupby('Reporting_Airline')['Flights'].sum().reset_index()

fig =px.scatter(bub_data, x ='Reporting_Airline', y ='Flights',
                size ='Flights', hover_name = "Reporting_Airline", size_max =60,
                title ="Reporting Airline vs Number of Flights",
                color = bub_data['Reporting_Airline'])

fig.show()


#### Inferences

It is found that the reporting airline **WN** has the highest number of flights which is around 86.

#### `Pie Chart`
Let us represent the proportion of distance group by month (month indicated by numbers).

In [22]:
fig =px.pie(data, values ='Month', names ='DistanceGroup',
            title = "Distance group proportion by month")

fig.show()

#### `SunBurst Chart`
Let us represent the hierarchical view in othe order of month and destination state holding value of number of flights

This plot should contain the following

*  Define hierarchy of sectors from root to leaves in `path` parameter. Here, we go from `Month` to `DestStateName` feature.
*   Set sector values in `values` parameter. Here, we can pass in `Flights` feature.
*   Title as **Flight Distribution Hierarchy.**

In [23]:
fig =px.sunburst(
    data,
    path =['Month', 'DestStateName'],
    values ='Flights',
    title ='Flights Distribution Hierarchy')

fig.show()

#### Inferences

Here the  **Month** numbers present in the innermost concentric circle is the root and for each month we can check the **number of flights** for the different **destination states** under it.

## <h3 align="center"> Data Visualization by Lovish Garlani | Source IBM <h3/>