## Basic Plotly Charts

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

In [5]:
airline_data = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv",
                           encoding = "ISO-8859-1",
                           dtype={'Div1Airport': str, 'Div1TailNum': str, 
                           'Div2Airport': str, 'Div2TailNum': str})

In [6]:
airline_data.head()

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1295781,1998,2,4,2,4,1998-04-02,AS,19930,AS,...,,,,,,,,,,
1,1125375,2013,2,5,13,1,2013-05-13,EV,20366,EV,...,,,,,,,,,,
2,118824,1993,3,9,25,6,1993-09-25,UA,19977,UA,...,,,,,,,,,,
3,634825,1994,4,11,12,6,1994-11-12,HP,19991,HP,...,,,,,,,,,,
4,1888125,2017,3,8,17,4,2017-08-17,UA,19977,UA,...,,,,,,,,,,


In [7]:
airline_data.shape

(27000, 110)

In [10]:
# Randomly sample 500 data points. Setting the random state to be 42 so that we get same result.

data = airline_data.sample(n=500, random_state=42)
data.shape

(500, 110)

### Lab structure
#### plotly.graph_objects

1. Review scatter plot creation

Theme: How departure time changes with respect to airport distance

2. To do - Create line plot

Theme: Extract average monthly delay time and see how it changes over the year

#### plotly.express
1. Review bar chart creation

Theme: Extract number of flights from a specific airline that goes to a destination

2. To do - Create bubble chart

Theme: Get number of flights as per reporting airline

3. To do - Create histogram

Theme: Get distribution of arrival delay

4. Review pie chart

Theme: Proportion of distance group by month (month indicated by numbers)

5. To do - Create sunburst chart

Theme: Hierarchical view in othe order of month and destination state holding value of number of flights

## plotly.graph_objects

### 1. Scatter Plot 

#### Idea: How departure time changes with respect to airport distance

In [15]:
# First we create a figure using go.Figure and adding trace to it through go.scatter
fig = go.Figure(data=go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers', marker = dict(color='red')))
# Updating layout through `update_layout`. Here we are adding title to the plot and providing title to x and y axis.
fig.update_layout(title="Distance vs Departure Time", xaxis_title="Distance" , yaxis_title="Departure Time")
fig.show()

### 2. Line Plot

#### Idea: Extract average monthly arrival delay time and see how it changes over the year.

In [47]:
linedata = data.groupby('Month')['ArrDelay'].mean().to_frame()

In [48]:
linedata = linedata.reset_index()
linedata

Unnamed: 0,Month,ArrDelay
0,1,2.232558
1,2,2.6875
2,3,10.868421
3,4,6.229167
4,5,-0.27907
5,6,17.310345
6,7,5.088889
7,8,3.121951
8,9,9.081081
9,10,1.2


To do:
- Create a line plot with x-axis being the month and y-axis being computed average delay time. Update plot title,
xaxis, and yaxis title.

- Hint: Scatter and line plot vary by updating mode parameter.

In [52]:
fig = go.Figure(data=go.Scatter(x=linedata['Month'], y=linedata['ArrDelay'], mode='lines', marker = dict(color='red')))

fig.update_layout(title="Monthly Average Arrival Delay Time", xaxis_title='Month', yaxis_title='ArrDelay')
fig.show()

## plotly.express

## 1. Bar Chart

#### Idea: Extract number of flights from a specific airline that goes to a destination

In [55]:
bardata = data.groupby('DestState')['Flights'].sum().to_frame()
bardata = bardata.reset_index()
bardata.head()

Unnamed: 0,DestState,Flights
0,AK,4.0
1,AL,3.0
2,AZ,8.0
3,CA,68.0
4,CO,20.0


In [59]:
fig = px.bar(bardata, x='DestState', y='Flights', title='Total number of flights to the destination state split by airline')

fig.show()

### 2. Bubble Chart

#### Idea: Get number of flights as per reporting airline

In [62]:
bubble_data = data.groupby('Reporting_Airline')['Flights'].sum().reset_index()

type(bubble_data)

pandas.core.frame.DataFrame

To do

- Create a bubble chart using the bub_data with x-axis being reporting airline and y-axis being flights.
- Provide title to the chart
- Update size of the bubble based on the number of flights. Use size parameter.
- Update name of the hover tooltip to reporting_airline using hover_name parameter.

In [73]:
fig = px.scatter(bubble_data, x='Reporting_Airline', y='Flights', size='Flights', hover_name = 'Reporting_Airline',
                 title='Number of Flights as per Reporting Airline' , size_max = 50)

fig.show()


## Histogram

### Idea: Get distribution of arrival delay

In [81]:
data['ArrDelay'].isnull().value_counts()

False    485
True      15
Name: ArrDelay, dtype: int64

In [82]:
data['ArrDelay'].fillna(0, inplace=True)

In [83]:
data['ArrDelay'].isnull().value_counts()

False    500
Name: ArrDelay, dtype: int64

To do

- Use px.histogram and pass the dataset.
- Pass ArrDelay to x parameter.

In [91]:
fig = px.histogram(data, x=data['ArrDelay'], title='Distribution of Arrival Delay')

fig.show()

### Pie Chart

#### Idea: Proportion of distance group by month (month indicated by numbers)

In [93]:
fig = px.pie(data, values='Month', names = 'DistanceGroup', title='Distance group proportion by month')
fig.show()

### Sunburst Charts

Idea: Hierarchical view in othe order of month and destination state holding value of number of flights

To do

- Create sunburst chart using px.sunburst.
- Define hierarchy of sectors from root to leaves in path parameter. Here, we go from Month to DestStateName feature.
- Set sector values in values paramter. Here, we can pass in Flights feature.
- Show the figure.

In [97]:
fig = px.sunburst(data, path = ['Month', 'DestState'], values='Flights')

fig.show()