# Integrated Continuous Assignment 2
**MSc Data Analytics**</p>
<p>Student: 2021322 </p>
<p>Name: Luciana Teixeira</p>

# Ireland Transportation: Aviation

<p>This study is an analysis of air transportation data related to Ireland, focusing on passenger travel, flight operations, and freight movement. The objective is to analyse and compare these metrics with data from the airport of San Francisco in the United States. Both cities share similar characteristics in population size and geographical space, making them viable subjects for comparative assessment.
<p>Air transportation serves as a critical component of modern-day connectivity, facilitating passenger mobility, cargo logistics, and global trade. Ireland, recognized for its strategic geographical location, and San Francisco, renowned for its economic prominence in the United States, offer compelling cases for comparative analysis in the realm of air travel data. 
<p>The project also includes an additional Jupyter Notebook with the remaining of the tasks. 
<p>The project also contain a POSTMAN collection with the API calls used for data acquisition, made available on Github. 

In [1]:
# Importing the libraries
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore") 


**Datasets for the Interactive Dashboard**
<p>These datasets will be used as data source for this interactive dashboard.

**Dataset One: Queueing Time**
<p>This dataset was obtained via QSensor and will be used to generate the following visualisations: 
<p>Average per day of the week, per month.

In [2]:
queue_time = pd.read_csv('waiting_time.csv')

In [3]:
queue_time.head()

Unnamed: 0,terminal,minWaitTime,maxWaitTime,waitTimeText,created_at
0,T2,5,5,5,2022-12-31 23:26:39
1,T1,5,5,5,2023-01-01 00:28:28
2,T2,5,5,5,2023-01-01 00:28:28
3,T1,5,5,5,2023-01-01 01:26:33
4,T2,5,5,5,2023-01-01 01:26:33


In [4]:
queue_time['created_at'] = pd.to_datetime(queue_time['created_at'])
queue_time['hour'] = queue_time['created_at'].dt.hour
queue_time['day_of_week'] = queue_time['created_at'].dt.day_name()
queue_time['month'] = queue_time['created_at'].dt.strftime('%B') 
queue_time = queue_time.iloc[1:]
queue_time.drop(['terminal', 'maxWaitTime', 'waitTimeText'], axis=1, errors='ignore', inplace=True)


In [5]:
queue_time.head()

Unnamed: 0,minWaitTime,created_at,hour,day_of_week,month
1,5,2023-01-01 00:28:28,0,Sunday,January
2,5,2023-01-01 00:28:28,0,Sunday,January
3,5,2023-01-01 01:26:33,1,Sunday,January
4,5,2023-01-01 01:26:33,1,Sunday,January
5,5,2023-01-01 02:26:39,2,Sunday,January


**Dataset Two: Reviews**
<p>This dataset was obtained via a script from the website 'Airline Quality' and will be used to generate the following visualisations: 
<p>Overall Rating;
<p>Word Cloud;


In [6]:
reviews = pd.read_csv('reviews_without_verified.csv')

In [7]:
reviews.head()

Unnamed: 0,Rating,Review
0,7,"After reading some of the reviews here, I was..."
1,1,"The place is dirty, uncared for, inefficient a..."
2,9,I think Dublin airport does a very good job o...
3,7,I have traveled through Dublin Airport a few t...
4,3,Upon landing at Dublin Airport after a long f...


In [8]:
reviews['Review'] = reviews['Review'].str.replace('12345', '')

**Dataset Three: Arrival & Departure Locations**
<p>This dataset was obtained via a script from the website 'CSO' and will be used to generate the following visualisations: 
<p>Map for Arrivals;
<p>Map for Departures;

In [9]:
excel_file = 'P-AS2023Q3TBL5A.xlsx' 
df_excel = pd.read_excel(excel_file) 

In [10]:
df_excel.head(100)

Unnamed: 0,"Table 5A Top 10 arrivals and departures for Dublin airport by number of passengers, Quarter 3 2023",Unnamed: 1,Unnamed: 2,Unnamed: 3
0,,,,Number
1,,Arrivals,Departures,Total
2,London - Heathrow,236538,226856,463394
3,London - Gatwick,164873,160363,325236
4,Amterdam - Schiphol,160961,155658,316619
5,Malaga,137477,134036,271513
6,Faro,132594,125945,258539
7,London - Stansted,123953,120272,244225
8,Manchester,121478,121126,242604
9,Edinburgh,105447,98374,203821


In [11]:
airport_loc = df_excel.drop([0, 1]).reset_index(drop=True)

airport_loc.columns = ['City Airport', 'Arrivals', 'Departures', 'Total']


In [12]:
airport_loc.head(10)

Unnamed: 0,City Airport,Arrivals,Departures,Total
0,London - Heathrow,236538,226856,463394
1,London - Gatwick,164873,160363,325236
2,Amterdam - Schiphol,160961,155658,316619
3,Malaga,137477,134036,271513
4,Faro,132594,125945,258539
5,London - Stansted,123953,120272,244225
6,Manchester,121478,121126,242604
7,Edinburgh,105447,98374,203821
8,Birmingham,100471,98979,199450
9,Barcelona,91626,88242,179868


In [13]:
airport_coordinates = {
    'London - Heathrow': (51.4700, -0.4543),
    'London - Gatwick': (51.1537, -0.1821),
    'Amterdam - Schiphol': (52.3086, 4.7639),
    'Malaga': (36.6749, -4.4991),
    'Faro': (37.0186, -7.9706),
    'London - Stansted': (51.8853, 0.2354),
    'Manchester': (53.3651, -2.2722),
    'Edinburgh': (55.9500, -3.3725),
    'Birmingham': (52.4539, -1.7481),
    'Barcelona': (41.2980, 2.0799)
}

airport_loc['Latitude'] = airport_loc['City Airport'].apply(lambda x: airport_coordinates.get(x, (None, None))[0])
airport_loc['Longitude'] = airport_loc['City Airport'].apply(lambda x: airport_coordinates.get(x, (None, None))[1])


In [14]:
iso_codes = {
    'London - Heathrow': 'GBR',
    'London - Gatwick': 'GBR',
    'Amterdam - Schiphol': 'NLD', #there was an error in the spelling
    'Malaga': 'PRT',
    'Faro': 'FAO',
    'London - Stansted': 'GBR',
    'Manchester': 'GBR',
    'Edinburgh': 'GBR',
    'Birmingham': 'GBR',
    'Barcelona': 'ESP'
}
airport_loc['ISO Code'] = airport_loc['City Airport'].map(iso_codes)

In [15]:
airport_loc.head(10)

Unnamed: 0,City Airport,Arrivals,Departures,Total,Latitude,Longitude,ISO Code
0,London - Heathrow,236538,226856,463394,51.47,-0.4543,GBR
1,London - Gatwick,164873,160363,325236,51.1537,-0.1821,GBR
2,Amterdam - Schiphol,160961,155658,316619,52.3086,4.7639,NLD
3,Malaga,137477,134036,271513,36.6749,-4.4991,PRT
4,Faro,132594,125945,258539,37.0186,-7.9706,FAO
5,London - Stansted,123953,120272,244225,51.8853,0.2354,GBR
6,Manchester,121478,121126,242604,53.3651,-2.2722,GBR
7,Edinburgh,105447,98374,203821,55.95,-3.3725,GBR
8,Birmingham,100471,98979,199450,52.4539,-1.7481,GBR
9,Barcelona,91626,88242,179868,41.298,2.0799,ESP


In [16]:
airport_loc['Arrivals'] = pd.to_numeric(airport_loc['Arrivals'], errors='coerce')
airport_loc['Departures'] = pd.to_numeric(airport_loc['Departures'], errors='coerce')

In [17]:
airport_loc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   City Airport  10 non-null     object 
 1   Arrivals      10 non-null     int64  
 2   Departures    10 non-null     int64  
 3   Total         10 non-null     object 
 4   Latitude      10 non-null     float64
 5   Longitude     10 non-null     float64
 6   ISO Code      10 non-null     object 
dtypes: float64(2), int64(2), object(3)
memory usage: 692.0+ bytes


**Dataset Four: Dublin Airport Passenger**
<p>This dataset was obtained via the CSO and will be used to generate the following visualisations: 
<p>Volume of Passenger over the years

In [18]:
passenger = pd.read_csv('TAM08.20240105T110114.csv')

In [19]:
passenger.head()

Unnamed: 0,Statistic Label,Month,Airport,UNIT,VALUE
0,Passengers,2019 January,Dublin,Number,2054794
1,Passengers,2019 February,Dublin,Number,1993325
2,Passengers,2019 March,Dublin,Number,2432195
3,Passengers,2019 April,Dublin,Number,2789660
4,Passengers,2019 May,Dublin,Number,2965517


In [20]:
passenger.drop(columns=['Statistic Label', 'UNIT'], inplace=True)
passenger[['Year', 'Month']] = passenger['Month'].str.split(' ', n=1, expand=True)


In [21]:
passenger.tail()

Unnamed: 0,Month,Airport,VALUE,Year
53,June,Dublin,3218315,2023
54,July,Dublin,3425253,2023
55,August,Dublin,3419986,2023
56,September,Dublin,3083300,2023
57,October,Dublin,2980282,2023


In [22]:
passenger.shape

(58, 4)

**Dataset Five: Dublin Airport Delays**
<p>This dataset was obtained via the FlightLabs API and will be used to generate the following visualisations: 
<p>Delay Average per Airline;
<p>Top ten Airlines with the highest Delays;

In [23]:
import requests
import json
import pandas as pd

# List of API URLs to request
api_urls = ['https://app.goflightlabs.com/historical/2023-11-21?access_key=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJhdWQiOiI0IiwianRpIjoiMWJhNTcyNzY3MDdhMTcwMWRkZDY1M2JlZWY5N2U5OTIxODMwMjUzMzEwNzFlNzNjMDI4OThhMmZkMWYzZGYxMmExYTQ0OTVkOGM0ZjNkNTAiLCJpYXQiOjE3MDQ1NzgxNDcsIm5iZiI6MTcwNDU3ODE0NywiZXhwIjoxNzM2MjAwNTQ3LCJzdWIiOiIyMjA0OSIsInNjb3BlcyI6W119.nsuNqgdL9g5Rc7foj2HC7rA_1DPKVrwX1rn2Y5FHFkBkRgCwloJJQhjPf5ECgiUF3I7W7D6AJBnCQy_WMbtq8g&code=DUB&type=departure&date_to=2023-11-30&status=delayed', 
            'https://app.goflightlabs.com/historical/2023-12-01?access_key=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJhdWQiOiI0IiwianRpIjoiMWJhNTcyNzY3MDdhMTcwMWRkZDY1M2JlZWY5N2U5OTIxODMwMjUzMzEwNzFlNzNjMDI4OThhMmZkMWYzZGYxMmExYTQ0OTVkOGM0ZjNkNTAiLCJpYXQiOjE3MDQ1NzgxNDcsIm5iZiI6MTcwNDU3ODE0NywiZXhwIjoxNzM2MjAwNTQ3LCJzdWIiOiIyMjA0OSIsInNjb3BlcyI6W119.nsuNqgdL9g5Rc7foj2HC7rA_1DPKVrwX1rn2Y5FHFkBkRgCwloJJQhjPf5ECgiUF3I7W7D6AJBnCQy_WMbtq8g&code=DUB&type=departure&date_to=2023-12-10&status=delayed',
            'https://app.goflightlabs.com/historical/2023-12-11?access_key=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJhdWQiOiI0IiwianRpIjoiMWJhNTcyNzY3MDdhMTcwMWRkZDY1M2JlZWY5N2U5OTIxODMwMjUzMzEwNzFlNzNjMDI4OThhMmZkMWYzZGYxMmExYTQ0OTVkOGM0ZjNkNTAiLCJpYXQiOjE3MDQ1NzgxNDcsIm5iZiI6MTcwNDU3ODE0NywiZXhwIjoxNzM2MjAwNTQ3LCJzdWIiOiIyMjA0OSIsInNjb3BlcyI6W119.nsuNqgdL9g5Rc7foj2HC7rA_1DPKVrwX1rn2Y5FHFkBkRgCwloJJQhjPf5ECgiUF3I7W7D6AJBnCQy_WMbtq8g&code=DUB&type=departure&date_to=2023-12-21&status=delayed'
           ]

all_responses = []

for api_url in api_urls:
    response = requests.get(api_url)

    if response.status_code == 200:
        response_data = response.json()
        if 'data' in response_data:
            all_responses.extend(response_data['data'])
        else:
            print(f"No 'data' key found in response: {response_data}")
df = pd.json_normalize(all_responses)

file_path = 'dash_delay.csv'
df.to_csv(file_path, index=False)
print(f"Data saved successfully to: {file_path}")

Data saved successfully to: dash_delay.csv


In [24]:
dash_delay = pd.read_csv('dash_delay.csv')

In [25]:
dash_delay.head()

Unnamed: 0,type,status,departure.iataCode,departure.icaoCode,departure.delay,departure.scheduledTime,departure.estimatedTime,departure.actualTime,departure.estimatedRunway,departure.actualRunway,...,codeshared.airline.iataCode,codeshared.airline.icaoCode,codeshared.flight.number,codeshared.flight.iataNumber,codeshared.flight.icaoNumber,arrival.actualTime,arrival.estimatedRunway,arrival.actualRunway,arrival.baggage,arrival.delay
0,departure,active,dub,eidw,8.0,2023-11-21t02:10:00.000,2023-11-21t02:20:00.000,2023-11-21t02:17:00.000,2023-11-21t02:17:00.000,2023-11-21t02:17:00.000,...,,,,,,,,,,
1,departure,active,dub,eidw,14.0,2023-11-21t03:50:00.000,2023-11-21t04:04:00.000,2023-11-21t04:04:00.000,2023-11-21t04:04:00.000,2023-11-21t04:04:00.000,...,,,,,,,,,,
2,departure,active,dub,eidw,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,2023-11-21t05:25:00.000,2023-11-21t05:25:00.000,...,lh,dlh,983.0,lh983,dlh983,,,,,
3,departure,active,dub,eidw,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,2023-11-21t05:25:00.000,2023-11-21t05:25:00.000,...,lh,dlh,983.0,lh983,dlh983,,,,,
4,departure,active,dub,eidw,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,2023-11-21t05:25:00.000,2023-11-21t05:25:00.000,...,lh,dlh,983.0,lh983,dlh983,,,,,


In [26]:
columns_to_keep = [
    'type', 'status', 'departure.iataCode', 'departure.delay', 
    'departure.scheduledTime', 'departure.estimatedTime', 'departure.actualTime',
    'codeshared.airline.icaoCode'
]
dash_delay = dash_delay[columns_to_keep]


In [27]:
dash_delay.head()

Unnamed: 0,type,status,departure.iataCode,departure.delay,departure.scheduledTime,departure.estimatedTime,departure.actualTime,codeshared.airline.icaoCode
0,departure,active,dub,8.0,2023-11-21t02:10:00.000,2023-11-21t02:20:00.000,2023-11-21t02:17:00.000,
1,departure,active,dub,14.0,2023-11-21t03:50:00.000,2023-11-21t04:04:00.000,2023-11-21t04:04:00.000,
2,departure,active,dub,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh
3,departure,active,dub,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh
4,departure,active,dub,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh


In [28]:
print(dash_delay.isnull().sum())

type                              0
status                            0
departure.iataCode                0
departure.delay                 797
departure.scheduledTime           0
departure.estimatedTime         168
departure.actualTime           2640
codeshared.airline.icaoCode    8865
dtype: int64


In [29]:
#there are a lot of missing values, but since I want to plot the airlines, I need to remove this. 
dash_delay.dropna(subset=['codeshared.airline.icaoCode'], inplace=True)


In [30]:
unique_icao_codes = dash_delay['codeshared.airline.icaoCode'].unique()
print(unique_icao_codes)


['dlh' 'klm' 'afr' 'ein' 'baw' 'qtr' 'etd' 'tra' 'aca' 'ual' 'aal' 'dal'
 'ibs' 'ice' 'log' 'sas' 'uae' 'vlg' 'thy' 'tap' 'fin' 'joc' 'pvg' 'ibe']


In [31]:
unique_airlines = {
    'dlh': 'Lufthansa',
    'klm': 'KLM Royal Dutch Airlines',
    'afr': 'Air France',
    'ein': 'Aer Lingus',
    'baw': 'British Airways',
    'qtr': 'Qatar Airways',
    'etd': 'Etihad Airways',
    'tra': 'Transavia',
    'aca': 'Air Canada',
    'ual': 'United Airlines',
    'aal': 'American Airlines',
    'dal': 'Delta Air Lines',
    'ibs': 'Iberia Airlines',
    'ice': 'Icelandair',
    'log': 'ASL Airlines Ireland',
    'sas': 'Scandinavian Airlines',
    'uae': 'Emirates',
    'vlg': 'Vueling Airlines',
    'thy': 'Turkish Airlines',
    'tap': 'TAP Air Portugal',
    'fin': 'Finnair'
}
dash_delay['Airline_Name'] = dash_delay['codeshared.airline.icaoCode'].map(unique_airlines)


In [32]:
dash_delay.head()

Unnamed: 0,type,status,departure.iataCode,departure.delay,departure.scheduledTime,departure.estimatedTime,departure.actualTime,codeshared.airline.icaoCode,Airline_Name
2,departure,active,dub,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh,Lufthansa
3,departure,active,dub,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh,Lufthansa
4,departure,active,dub,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh,Lufthansa
5,departure,active,dub,10.0,2023-11-21t05:15:00.000,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh,Lufthansa
11,departure,active,dub,22.0,2023-11-21t05:55:00.000,2023-11-21t06:13:00.000,2023-11-21t06:17:00.000,klm,KLM Royal Dutch Airlines


In [33]:
dash_delay['departure.scheduledTime'] = pd.to_datetime(dash_delay['departure.scheduledTime'])

dash_delay['day_of_week'] = dash_delay['departure.scheduledTime'].dt.day_name()


In [34]:
print(dash_delay.isnull().sum())

type                              0
status                            0
departure.iataCode                0
departure.delay                 308
departure.scheduledTime           0
departure.estimatedTime          46
departure.actualTime           1247
codeshared.airline.icaoCode       0
Airline_Name                     16
day_of_week                       0
dtype: int64


In [35]:
dash_delay.dropna(subset=['Airline_Name'], inplace=True)

In [36]:
dash_delay.head()

Unnamed: 0,type,status,departure.iataCode,departure.delay,departure.scheduledTime,departure.estimatedTime,departure.actualTime,codeshared.airline.icaoCode,Airline_Name,day_of_week
2,departure,active,dub,10.0,2023-11-21 05:15:00,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh,Lufthansa,Tuesday
3,departure,active,dub,10.0,2023-11-21 05:15:00,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh,Lufthansa,Tuesday
4,departure,active,dub,10.0,2023-11-21 05:15:00,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh,Lufthansa,Tuesday
5,departure,active,dub,10.0,2023-11-21 05:15:00,2023-11-21t05:15:00.000,2023-11-21t05:25:00.000,dlh,Lufthansa,Tuesday
11,departure,active,dub,22.0,2023-11-21 05:55:00,2023-11-21t06:13:00.000,2023-11-21t06:17:00.000,klm,KLM Royal Dutch Airlines,Tuesday


In [37]:
dash_delay.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7768 entries, 2 to 16629
Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   type                         7768 non-null   object        
 1   status                       7768 non-null   object        
 2   departure.iataCode           7768 non-null   object        
 3   departure.delay              7466 non-null   float64       
 4   departure.scheduledTime      7768 non-null   datetime64[ns]
 5   departure.estimatedTime      7722 non-null   object        
 6   departure.actualTime         6528 non-null   object        
 7   codeshared.airline.icaoCode  7768 non-null   object        
 8   Airline_Name                 7768 non-null   object        
 9   day_of_week                  7768 non-null   object        
dtypes: datetime64[ns](1), float64(1), object(8)
memory usage: 667.6+ KB


In [38]:
#pip install dash

In [39]:
import pandas as pd
import plotly.express as px
import dash
from dash import dcc, html
from dash.dependencies import Input, Output
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap  
import io
import base64
import time
import dash
import plotly.graph_objs as go


In [40]:
#Dash App
app = dash.Dash(__name__)

**Dash Components**

In [41]:
#App Layout
app.layout = html.Div([
    html.H1("Dublin Airport Analytics Dashboard", style={'textAlign': 'center'}),
    
#Components: Visualisation One
    html.Label("Select Month"),
    dcc.Dropdown(
        id='month-dropdown',
        options=[
            {'label': month, 'value': month} for month in queue_time['month'].unique()
        ],
        value=queue_time['month'].unique()[0]  
    ),
    html.Label("Select Day of the Week"),
    dcc.Dropdown(
        id='day-dropdown',
        options=[
            {'label': day, 'value': day} for day in queue_time['day_of_week'].unique()
        ],
        value=queue_time['day_of_week'].unique()[0]  
    ),
    dcc.Graph(id='bar-chart'),
    
#Components: Visualisation Two and Three
    html.Div([
        html.Div([
            dcc.Graph(id='donut-chart'),
        ], style={'width': '50%', 'display': 'inline-block', 'vertical-align': 'top'}),
        
        html.Div([
            html.Button('Arrivals', id='btn-arrivals', n_clicks=0),
            html.Button('Departures', id='btn-departures', n_clicks=0),
            dcc.Graph(id='flight-map', style={'width': '100%', 'display': 'inline-block'}),
        ], style={'width': '50%', 'display': 'inline-block', 'vertical-align': 'top'}),
    ]),

#Components: Visualisation Four and Five
    html.Div([
        html.Div([
            html.Img(id='word-cloud'),
        ], style={'width': '50%', 'display': 'inline-block', 'vertical-align': 'top'}),
        
        html.Div([
            dcc.Graph(id='bar-chart-volume'),
        ], style={'width': '50%', 'display': 'inline-block', 'vertical-align': 'top'}),
    ]),

#Components: Visualisation Six 
    html.Div([
        dcc.Graph(id='passenger-volume'),
        html.Div(id='index', style={'display': 'none'}, children=0),
        dcc.Interval(
            id='interval-component',
            interval=2 * 1000, 
            n_intervals=0
        )  
    ]),
    

#Components: Visualisation Seven

  
    html.Label("Top Ten Airlines with Highest Delays (in minutes)"),
    dcc.Graph(id='top-ten-delays-bar-chart'),

     
#Components: Visualisation Eight
    html.Label("Select Airline"),
    dcc.Dropdown(
    id='airline-dropdown',
    options=[
        {'label': airline, 'value': airline} for airline in dash_delay['Airline_Name'].unique()
            ],
        value=dash_delay['Airline_Name'].unique()[0]
),
    dcc.Graph(id='average-delay-per-day-bar-chart'),
    
    
])

**Dash Callbacks**

In [42]:
#Callback: Visualisation One

@app.callback(
    dash.dependencies.Output('bar-chart', 'figure'),
    [dash.dependencies.Input('month-dropdown', 'value'),
     dash.dependencies.Input('day-dropdown', 'value')]
)

#Visualization One: Bar Chart
def update_graph(selected_month, selected_day):
    filtered_df = queue_time[(queue_time['month'] == selected_month) & (queue_time['day_of_week'] == selected_day)]
    avg_wait_times = filtered_df.groupby('hour')['minWaitTime'].mean().reset_index()

    fig = px.bar(avg_wait_times, x='hour', y='minWaitTime', 
                 labels={'hour': 'Time of the Day (hour)', 'minWaitTime': 'Average Wait Time (minutes)'},
                 title=f'Average Wait Time per Hour for {selected_day} in {selected_month}',
                 color_discrete_sequence=['#C8553D']) 

    fig.update_xaxes(range=[0, 23], dtick=1)
    fig.update_layout(title_x=0.5)

    return fig


In [43]:
#Callback: Visualisation Two

@app.callback(
    Output('donut-chart', 'figure'),
   [Input('month-dropdown', 'value')]  
)

#Visualization Two: Donut Chart

def update_donut_chart(selected_month):
    
    avg_rating = reviews['Rating'].mean()

    rating_percentages = reviews['Rating'].value_counts(normalize=True) * 100
    rating_percentages = rating_percentages.sort_index()

    labels = [str(rating) for rating in rating_percentages.index]
    values = rating_percentages.values

    custom_colors = ['#FFD5C2', '#F28F3B', '#C8553D', '#2D3047', '#93B7BE', '#F28F3B', '#C8553D', '#2D3047', '#93B7BE', '#FFD5C2']

    fig = px.pie(values=values, names=labels, hole=0.4)
    fig.update_traces(textposition='inside', textinfo='percent+label',
                      marker=dict(colors=custom_colors))
    
    fig.update_layout(
        title='Average Review Rating',
        annotations=[
            dict(text=f'Average: {avg_rating:.2f}', x=0.5, y=0.5, font_size=10, showarrow=False)
        ],
        
        showlegend=False,  
        height=500,  
        width=600, 
        title_x=0.5,
        title_y=0.95,
        title_xanchor='center',
        title_yanchor='top'
        
    )

    return fig


In [44]:
#Callback: Visualization Three

@app.callback(
    Output('flight-map', 'figure'),
    [Input('btn-arrivals', 'n_clicks'), Input('btn-departures', 'n_clicks')]
)


#Visualization Three: Map Chart

def update_map(n_clicks_arrivals, n_clicks_departures):
    ctx = dash.callback_context
    if not ctx.triggered:
        button_id = None
    else:
        button_id = ctx.triggered[0]['prop_id'].split('.')[0]

    if button_id == 'btn-arrivals':
        value = 'Arrivals'
    elif button_id == 'btn-departures':
        value = 'Departures'
    else:
        value = 'Arrivals'

    fig = px.scatter_geo(airport_loc, lat='Latitude', lon='Longitude', 
                         hover_name='City Airport', text=value, size=value,
                         projection='natural earth', 
                         title=f'Flight {value} by Location',
                         locations=airport_loc['ISO Code'], 
                         locationmode='ISO-3')  

    fig.update_geos(projection_type="orthographic", showcoastlines=True, coastlinecolor="#2D3047")
    fig.update_layout(height=400, margin={"r": 20, "t": 40, "l": 20, "b": 20})

    fig.update_traces(marker=dict(color='#2D3047'))

    fig.update_geos(
        visible=False,  
        projection_type="orthographic",  
        showland=True,
        landcolor='#93B7BE', 
        showocean=True, 
        oceancolor="#FFFFFF"  
    )
    
    return fig

In [45]:
#Callback: Visualization Four

text_data = ' '.join(reviews['Review'])

def generate_wordcloud(text_data, title):
        
    custom_colors = ['#F28F3B', '#C8553D', '#2D3047', '#93B7BE']
    colormap = ListedColormap(custom_colors)
    wordcloud = WordCloud(width=900, height=300, background_color='white', colormap=colormap).generate(text_data)


    #Word Cloud
    plt.figure(figsize=(6, 6), facecolor=None)
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.title(title, fontsize=10, ha='center')


    plt.tight_layout(pad=0)

    img = io.BytesIO()
    plt.savefig(img, format='png')
    img.seek(0)
    img_base64 = base64.b64encode(img.getvalue()).decode()

    return f"data:image/png;base64,{img_base64}"

@app.callback(
    Output('word-cloud', 'src'),
    [Input('month-dropdown', 'value')]
)


#Visualization Four: 

def update_wordcloud(selected_month):
    
    title = ""  
    return generate_wordcloud(text_data, title)
 


In [46]:
#Callback: Visualisation Five 
@app.callback(
    Output('bar-chart-volume', 'figure'),
    [Input('interval-component', 'n_intervals')]
)
def update_bar_chart_volume(n_intervals):
    age_range = ['18 - 24', '25 - 34', '35 - 44', '45 - 54', '55 - 64', '65+']
    number_of_passengers = [274, 1193, 1064, 825, 519, 278]
    percentages = [6.6, 28.7, 25.6, 19.9, 12.5, 6.7]

    bar_labels = [f'{perc}%' for perc in percentages]

    # Custom colors for each bar
    custom_colors = ['#588B8B', '#FFD5C2', '#F28F3B', '#93B7BE', '#C8553D', '#2D3047', '#93B7BE']

    bar_data = []
    for i, (age, passengers, label, color) in enumerate(zip(age_range, number_of_passengers, bar_labels, custom_colors)):
        bar_data.append(
            go.Bar(
                x=[age],
                y=[passengers],
                text=[label],
                name=age,
                marker=dict(color=color),
                showlegend=False  
            )
        )

    bar_layout_volume = go.Layout(
        title='Number of Passengers by Age Range',
        xaxis=dict(title='Age Range'),
        yaxis=dict(title='Number of Passengers'),
        barmode='group' 
    )

    fig_volume = go.Figure(data=bar_data, layout=bar_layout_volume)
    return fig_volume




In [47]:
#Callback: Visualization Six

years = sorted(passenger['Year'].unique())  

@app.callback(
    Output('passenger-volume', 'figure'),
    [Input('interval-component', 'n_intervals')]
)


#Visualisation Six: Line Chart

def update_passenger_volume(n_intervals):
    year_index = n_intervals % len(years)

    year_data = passenger[passenger['Year'] == years[year_index]]

    colors = ['#588B8B', '#FFD5C2', '#F28F3B', '#93B7BE']

    fig = px.area(year_data, x='Month', y='VALUE',
                  title=f'Passenger Volume for {years[year_index]}',
                  labels={'VALUE': '', 'Month': ''},
                  color_discrete_sequence=[colors[year_index % len(colors)]])
    fig.update_layout(title_x=0.5)

    return fig


In [48]:
#Callback: Visualization Seven
@app.callback(
    Output('top-ten-delays-bar-chart', 'figure'),
    [Input('month-dropdown', 'value'),
     Input('day-dropdown', 'value')]
)
def update_top_ten_average_delays(selected_month, selected_day):
    avg_delays = dash_delay.groupby('Airline_Name')['departure.delay'].mean().nlargest(10).reset_index()
    
    # Define the color sequence with repeated colors
    custom_colors = ['#FFD5C2', '#F28F3B', '#C8553D', '#2D3047', '#93B7BE']
    colors = custom_colors * 2
    
    fig = px.bar(avg_delays, x='Airline_Name', y='departure.delay', 
                 labels={'Airline_Name': '', 'departure.delay': 'Average Delay (minutes)'},
                 title='Top Ten Airlines with Highest Average Delays',
                 color=avg_delays['Airline_Name'], color_discrete_sequence=colors)
    
    fig.update_traces(text=avg_delays['departure.delay'].round(2), textposition='inside', showlegend=False)
    
    return fig




In [49]:
# Callback: Visualization Eight
@app.callback(
    Output('average-delay-per-day-bar-chart', 'figure'),
    [Input('airline-dropdown', 'value')]
)
def update_average_delay_per_day(selected_airline):
    airline_data = dash_delay[dash_delay['Airline_Name'] == selected_airline]

    avg_delay_per_day = airline_data.groupby('day_of_week')['departure.delay'].mean().reset_index()
    
    fig = px.line(avg_delay_per_day, x='day_of_week', y='departure.delay',
                  labels={'day_of_week': 'Day of the Week', 'departure.delay': 'Average Delay (minutes)'},
                  title=f'Average Delay per Day for {selected_airline}',
                  markers=True)  
    
    fig.update_traces(line=dict(color='#2D3047', width=3))  
    
    return fig

**Deployment**

In [50]:
if __name__ == '__main__':
    app.run_server(debug=True)

**References**

<p>https://dash.plotly.com/layout
<p>https://community.plotly.com/t/run-dash-locally/25876
<p>https://plotly.com/examples/dashboards/
<p>https://www.youtube.com/watch?v=WOWVat5BgM4&ab_channel=CharmingData
<p>https://www.youtube.com/watch?v=74mHRPzqK9g&ab_channel=Plotly
<p>https://www.youtube.com/watch?v=pGMvvq7R1IM&ab_channel=RealPython
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>