## Objectives


We will be using the [airline dataset](https://developer.ibm.com/exchanges/data/all/airline/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDV0101ENSkillsNetwork20297740-2021-01-01) from [Data Asset eXchange](https://developer.ibm.com/exchanges/data/).

#### Airline Reporting Carrier On-Time Performance Dataset

The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.

Preview data, dataset metadata, and data glossary [here.](https://dax-cdn.cdn.appdomain.cloud/dax-airline/1.0.1/data-preview/index.html)


In [26]:
!pip install plotly



In [27]:
# Import required libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [28]:
# Read the airline data into pandas dataframe
airline_data =  pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv', 
                            encoding = "ISO-8859-1",
                            dtype={'Div1Airport': str, 'Div1TailNum': str, 
                                   'Div2Airport': str, 'Div2TailNum': str})

In [29]:
# Preview the first 5 lines of the loaded data 
airline_data.head()

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1295781,1998,2,4,2,4,1998-04-02,AS,19930,AS,...,,,,,,,,,,
1,1125375,2013,2,5,13,1,2013-05-13,EV,20366,EV,...,,,,,,,,,,
2,118824,1993,3,9,25,6,1993-09-25,UA,19977,UA,...,,,,,,,,,,
3,634825,1994,4,11,12,6,1994-11-12,HP,19991,HP,...,,,,,,,,,,
4,1888125,2017,3,8,17,4,2017-08-17,UA,19977,UA,...,,,,,,,,,,


In [30]:
# Shape of the data
airline_data.shape

(27000, 110)

In [31]:
# Randomly sample 500 data points. Setting the random state to be 42 so that we get same result.
data = airline_data.sample(n=2000, random_state=42)

In [32]:
# Get the shape of the trimmed data
data.shape

(2000, 110)

# plotly.graph_objects


## 1. Scatter Plot


In [22]:
# First we create a figure using go.Figure and adding trace to it through go.scatter

fig = go.Figure(data=go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers', marker=dict(color='red')))

# Updating layout through `update_layout`. Here we are adding title to the plot and providing title to x and y axis.

fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')

# Display the figure
fig.show()

In [37]:
fig=go.Figure(data=go.Scatter(x=data['Distance'],y=data['ArrTime'],mode='markers',marker=dict(color='green')))
fig.update_layout(title="Distance vs Arrival Time", xaxis_title="Distance", yaxis_title="Arrival Time")
fig.show()

## 2. Line Plot


In [50]:
# Group the data by Month and compute average over arrival delay time.
line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()
line_data

Unnamed: 0,Month,ArrDelay
0,1,5.3625
1,2,5.714286
2,3,12.63125
3,4,6.344444
4,5,3.304598
5,6,11.408163
6,7,9.923529
7,8,3.315789
8,9,4.221477
9,10,2.680982


In [39]:
# Create line plot here

fig = go.Figure(data=go.Scatter(x=line_data['Month'], y=data['ArrDelay'], mode='lines', marker=dict(color='green')))

# Updating layout through `update_layout`. Here we are adding title to the plot and providing title to x and y axis.

fig.update_layout(title='Month vs Average Flight Delay Time', xaxis_title='Month', yaxis_title='ArrDelay')

# Display the figure
fig.show()

In [53]:
dep_delay=data.groupby('Month')['DepDelay'].mean().reset_index()
dep_delay

Unnamed: 0,Month,DepDelay
0,1,8.639752
1,2,6.357143
2,3,14.7625
3,4,7.938889
4,5,4.747126
5,6,11.993289
6,7,12.777778
7,8,6.48538
8,9,7.892617
9,10,5.699387


In [55]:
fig=go.Figure(data=go.Scatter(x=dep_delay['Month'], y=data['DepDelay'], mode='lines', marker=dict(color='lightblue')))
fig.update_layout(title='Average Departure Delay per month', xaxis_title='Month',yaxis_title='Mean Departure Delay')
fig.show()

# plotly.express¶


## 1. Bar Chart


In [56]:
# Group the data by destination state and reporting airline. Compute total number of flights in each combination
bar_data = data.groupby(['DestState'])['Flights'].sum().reset_index()

In [58]:
# Display the data
bar_data.head()

Unnamed: 0,DestState,Flights
0,AK,13.0
1,AL,7.0
2,AR,4.0
3,AZ,57.0
4,CA,253.0


In [59]:
# Use plotly express bar chart function px.bar. Provide input data, x and y axis variable, and title of the chart.
# This will give total number of flights to the destination state.
fig = px.bar(bar_data, x="DestState", y="Flights", title='Total number of flights to the destination state split by reporting airline') 
fig.show()

In [64]:
rep_airline=data.groupby('Reporting_Airline')['Flights'].sum().reset_index()
rep_airline.head()

Unnamed: 0,Reporting_Airline,Flights
0,9E,24.0
1,AA,238.0
2,AS,48.0
3,B6,41.0
4,CO,93.0


In [65]:
fig=px.bar(rep_airline, x='Reporting_Airline', y='Flights', title='Total Number of flights by airline')
fig.show()

## 2. Bubble Chart


Learn more about bubble chart [here](https://plotly.com/python/bubble-charts/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDV0101ENSkillsNetwork20297740-2021-01-01)

#### Idea: Get number of flights as per reporting airline


In [67]:
# Group the data by reporting airline and get number of flights
bub_data = data.groupby('Reporting_Airline')['Flights'].sum().reset_index()
bub_data.head()

Unnamed: 0,Reporting_Airline,Flights
0,9E,24.0
1,AA,238.0
2,AS,48.0
3,B6,41.0
4,CO,93.0


In [68]:
# Create bubble chart here
fig = px.scatter(bub_data, x="Reporting_Airline", y="Flights", size="Flights",
                 hover_name="Reporting_Airline", title='Reporting Airline vs Number of Flights', size_max=60)
fig.show()

In [69]:
fig=px.scatter(bar_data, x='DestState', y='Flights', size='Flights', hover_name='DestState', title='Total Number of Flights reported by airline', size_max=60)
fig.show()

# Histogram


In [74]:
# Set missing values to 0
data['ArrDelay'] = data['ArrDelay'].fillna(0)

In [75]:
# Create histogram here
fig = px.histogram(data, x="ArrDelay")
fig.show()

In [76]:
data['DepDelay'].isnull().sum()

39

In [77]:
data['DepDelay']=data['DepDelay'].fillna(0)
data['DepDelay'].isnull().sum()

0

In [78]:
fig=px.histogram(data, x='DepDelay')
fig.show()

In [84]:
data['DistanceGroup'].value_counts()

2     552
3     382
1     331
4     269
5     190
7      70
6      69
8      41
10     41
11     31
9      24
Name: DistanceGroup, dtype: int64

# Pie Chart


In [85]:
# Use px.pie function to create the chart. Input dataset. 
# Values parameter will set values associated to the sector. 'Month' feature is passed to it.
# labels for the sector are passed to the `names` parameter.
fig = px.pie(data, values='Month', names='DistanceGroup', title='Distance group proportion by month')
fig.show()

# Sunburst Charts


In [86]:
# Create sunburst chart here
fig = px.sunburst(data, path=['Month', 'DestStateName'], values='Flights')
fig.show()