<div>
   <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/Jupyter_Notebook_Title_Screen.png" alt="Title" align="middle"/>
</div>

# Introduction
This hands-on lab is designed to do show you how to build Python applications that retrieve data from a Db2 database using <b>Db2 Magic Commands</b> and analyze that data using <b>IBM Watson Studio</b>. Exercises in this lab are desinged to obtain data from any Db2 database, regardless of whether the database is on-premises (<b>created using IBM Db2 Community Edition</b>) or in the Cloud (<b>created under the Db2 "Lite" service on IBM Cloud</b>).

## Scenario
<b>Global Travel Associates</b> (GTA) has been receiving feedback from their air travel clients within the United States concerning the increase in flight delays during the return to travel in 2022 after the pandemic. GTA has learned that the <i>U.S. Department of Transportation, Bureau of Transportation Statistics</i> has a data set available that contains flight delay information for the United States from 2019 through September 2022. GTA will perform analysis on this data using a Jupyter Notebook to determine if flight delays have increased in 2022 over prior years as their customers claim. 

GTA wants to keep the costs of this analysis to a minimum, so the current data (2019 through September 2022) will be stored in a fully managed Db2 on Cloud Lite plan (free) database on IBM Cloud to keep analysis costs to a minimum.

### How this lab is organized
The exercises found in this hands-on lab are organized as follows:
<br><br>
<dl>
    <dt><b>Section 1.</b> Prepare the lab environment</dt>
    <ul>
        <dd><b>Step 1.</b> Download and install the appropriate software packages</dd>
        <dd><b>Step 2.</b> Set up the Jupyter Notebook environment</dd>
        <dd><b>Step 3.</b> Assign values to application variables that will be used to establish a database connection</dd>
    </ul>
    <dt><b>Section 2.</b> Interact with the Db2 database</dt>
    <ul>
        <dd><b>Step&ensp; 1.</b> Establish a database connection with a <b>Db2 Magic Command</b></dd>
        <dd><b>Step&ensp; 2.</b> Describe a table named AIRLINE_DELAY_CAUSE</dd>
        <dd><b>Step&ensp; 3.</b> Establish a database connection with the <b>ibm_db.connect() API</b></dd>
        <dd><b>Step&ensp; 4.</b> Check the data in the AIRLINE_DELAY_CAUSE table using <b>Db2 Magic Commands</b> and pandas DataFrames</dd>
    </ul>
    <dt><b>Section 3.</b> Data analysis</dt>
    <ul>
        <dd><b>Analysis 1.</b> Comparing total COVID-era delayed flights by reason</dd>
        <dd><b>Analysis 2.</b> Creating a normalized view of COVID-era flight delay data</dd>
        <dd><b>Analysis 3.</b> Understanding flight delays in the context of historical data</dd>
    </ul>
</dl>

### Using IBM Db2 Community Edition with this hands-on lab
While this hands-on lab is set up to be ran with the <b>Db2 "Lite" service</b> on IBM Cloud, this Jupyter Notebook can be used with <b>IBM Db2 Community Edition</b> as well. IBM Db2 Community Edition is a special no-charge, full featured version of Db2 that enables data professionals to develop and deploy applications using any of the features and functionality found in in the latest release of IBM Db2. There are no limits on how long IBM Db2 Community Edition can be used, and unlike with the Db2 "Lite" service on IBM Cloud, there are no limits on the size of databases that can be created. However, there are limits on the number of cores and amount of memory supported: IBM Db2 Community Edition can be used with up to four processor cores and no more than 16 gigabytes (GB) of random access memory (RAM).

<div class="alert alert-block alert-warning">
    <b>IMPORTANT:</b> If you do not complete this lab in a single sitting, you must re-run all the exercises in <b>Section 1</b> and the first exercise in <b>Section 2</b> before you attempt to pick back up where you left off!
</div>

# Section 1. Prepare the lab environment

## Step 1. Download and install the appropriate software packages
### Overview:
Before you can perform some of the exercises in this lab, you must ensure that the appropriate Python software packages have been installed.

### Execute the code:
The code in the next "cell" installs all of the packages needed to perform the exercises in this lab. It also downloads and installs a Jupyter Notebook named <b>db2.ipynb</b> that contains the source code behind the <b>Db2 Magic Commands</b>.
<ol>
   <li>Select the code cell below and carefully read through the comments (i.e., the text that begins with a <b>#</b> character). This will help you understand the actions the code performs.</li><br>
    <li><div style="display:inline-block; vertical-align:middle;">
        When you are ready, click on the
    </div>
    <div style="display:inline-block; vertical-align:middle;">
        <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Run.png" alt="Run" width="60px" align="middle"/>
    </div>
    <div style="display:inline-block; vertical-align:middle;">
       button or press <b>Shift+Enter</b> to execute the code.
    </div></li>
</ol>

In [None]:
#----------------------------------------------------------------------------------------------#
# Install The Appropriate Software Packages                                                    #
#                                                                                              #
# NOTE: The code in this section only needs to be executed once in a runtime environment and   #
# the packages may have already been installed. If so, it is not harmful to attempt to install #
# the packages identified again, as subsequent attempts will simply state that the package     #
# requirement has already been satisfied.                                                      #
#----------------------------------------------------------------------------------------------#

#----------------------------------------------------------------------------------------------#
# Download And Install The ibm_db Driver Package                                               #
#----------------------------------------------------------------------------------------------#
print()
!python3 -m pip install ibm_db

#----------------------------------------------------------------------------------------------#
# Download And Install The ibm_db_sa (Db2 for SQLAlchemy) Driver Package (An Open-Source SQL   #
# Toolkit And Object-Relational Mapper For Python)                                             #
#----------------------------------------------------------------------------------------------#
print()
!python3 -m pip install ibm_db_sa

#----------------------------------------------------------------------------------------------#
# Download And Install The pandasql Driver Package (An Open-Source Package That Enables Users  #
# To Query pandas DataFrames Using SQL Syntax)                                                 #
#----------------------------------------------------------------------------------------------#
print()
!python3 -m pip install pandasql

#----------------------------------------------------------------------------------------------#
# Download And Install The qgrid Driver Package (An Open-Source Package That Uses SlickGrid To #
# Sort, Filter, And Manipulate DataFrames In Jupyter Notebooks)                                #
#----------------------------------------------------------------------------------------------#
print()
!python3 -m pip install qgrid

#----------------------------------------------------------------------------------------------#
# Download And Install The ipywidgets Driver Package (An Open-Source Package That Is Needed    #
# For qgrid)                                                                                   #
#----------------------------------------------------------------------------------------------#
print()
!python3 -m pip install 'ipywidgets~=7.0'

#----------------------------------------------------------------------------------------------#
# Download And Install The seaborn Driver Package (An Open-Source Package That Provides A      #
# High-Level Interface For Drawing Attractive And Informative Statistical Graphics)            #
#----------------------------------------------------------------------------------------------#
print()
!python3 -m pip install seaborn

#----------------------------------------------------------------------------------------------#
# Download And Install The Jupyter Notebook db2.ipynb, Which Contains The Code Needed To       #
# Support The SQL Magic Commands                                                               #
#----------------------------------------------------------------------------------------------#
print()
!wget https://raw.githubusercontent.com/IBM/db2-jupyter/master/db2.ipynb -O db2.ipynb

#----------------------------------------------------------------------------------------------#
# Display A Status Message Indicating This Work Is Complete                                    #
#----------------------------------------------------------------------------------------------#
print("All software packages needed have been installed!\n")

## Step 2. Set up the Jupyter Notebook environment
### Overview:
Before you can begin interacting with an IBM Db2 database using Python or Jupyter Notebook, there are some basic steps you must perform. These steps include:
<ol>
    <li>Loading (importing) the <strong>ibm_db</strong> driver into the Python application or Jupyter Notebook.</li>
    <li>Loading any additional external Python modules needed into the Python application or Jupyter Notebook.</li> 
    <li>Defining any variables that will be used to supply information to, or obtain information from application programming interfaces (APIs) in the <strong>ibm_db</strong> driver. (These APIs are used to do things like establish a connection to a database, submit a query for execution, retrieve query results, and so forth.)</li>
</ol>

### Execute the code:
The code in the next "cell" performs all but the last of the tasks just identified — the task of defining variables will be performed, when necessary, throughout the remaining exercises in this lab.
<ol>
   <li>Select the code cell below and carefully read through the comments (i.e., the text that begins with a <b>#</b> character). This will help you understand the actions the code performs.</li><br>
    <li><div style="display:inline-block; vertical-align:middle;">
        When you are ready, click on the
    </div>
    <div style="display:inline-block; vertical-align:middle;">
        <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Run.png" alt="Run" width="60px" align="middle"/>
    </div>
    <div style="display:inline-block; vertical-align:middle;">
       button or press <b>Shift+Enter</b> to execute the code.
    </div></li>
</ol>

In [None]:
#----------------------------------------------------------------------------------------------#
# Set Up The Jupyter Notebook Environment                                                      #
#----------------------------------------------------------------------------------------------#

#----------------------------------------------------------------------------------------------#
# Display A Status Message Indicating Environment Initialization Is Being Done                 #
#----------------------------------------------------------------------------------------------#
print("\nNecessary itialization work is starting.")

#----------------------------------------------------------------------------------------------#
# Load The Appropriate Python Modules                                                          #
#----------------------------------------------------------------------------------------------#
import sys                        # Provides Information About Python Interpreter Constants,
                                  # Functions, And Methods
import os                         # Provides The Ability To Interact With The Underlying
                                  # Operating System    
import types                      # Contains Type Objects For All Object Types (INTEGER, FLOAT, 
                                  # STRING, And So On) Defined By The Standard Interpreter
import datetime                   # Provides Classes For Manipulating Dates And Times
import warnings                   # Used To Control Whether Warnings Are Ignored, Displayed, Or
                                  # Turned Into Errors
from io import StringIO           # Implements A File-Like Class That Reads And Writes A
                                  # String Buffer (i.e., A Memory File)
from IPython import get_ipython   # Simple Function That Can Be Called To Get The Current
                                  # Interactive Shell Instance
import pandas as pd               # Provides An Open-Source Data Analysis Library (Built On
                                  # Top Of Python) That Is Used To Work With Data
                                  # "pd" Is A Common Alias For This Library
import numpy as np                # An Open-Source Library For Python That Provides Support For
                                  # Large, Multi-Dimensional Arrays And Matrices, Along With A
                                  # Collection Of High-Level Mathematical Functions To Operate
                                  # On These Arrays
                                  # "np" Is A Common Alias For This Library
import matplotlib.pyplot as plt   # Provides An Object-Oriented API For Embedding Plots Into
                                  # Applications Using General-Purpose GUI Toolkits
                                  # "plt" Is A Common Alias For This Library
import seaborn as sns             # Provides A High-Level Interface For Drawing Attractive
                                  # And Informative Statistical Graphics
from pandasql import sqldf        # Provides A Simple Wrapper To Run SQL (SQLite) Queries On
                                  # pandas.DataFrame Objects
import ibm_db                     # Contains The APIs Needed To Work With Db2 Databases
import ibm_db_dbi                 # Contains The APIs Needed To Work With Db2 Databases 
                                  # Using The Python Database API Specification 2.0 

#----------------------------------------------------------------------------------------------#
# Define A Python Class Named ipynb_Exit()                                                     #
#----------------------------------------------------------------------------------------------#
#  CLASS NAME:  ipynb_Exit()                                                                   #
#  PURPOSE:     This class contains the programming logic needed to allow Python "exit()"      #
#               functionality to work without raising an error or stopping the Jupyter         #
#               Notebook kernel in the event the exit() function is called.                    #
#----------------------------------------------------------------------------------------------#
class ipynb_Exit(SystemExit):
    """Exit Exception for IPython. Exception Temporarily Redirects stderr To Buffer."""

    #------------------------------------------------------------------------------------------#
    #  FUNCTION NAME:  __init()__                                                              #
    #  PURPOSE:        This method initializes an instance of the ipynb_Exit class.            #
    #------------------------------------------------------------------------------------------#
    def __init__(self):
        sys.stderr = StringIO()      # Redirect sys.stderr to a StringIO (memory buffer) object.

    #------------------------------------------------------------------------------------------#
    #  FUNCTION NAME:  __del()__                                                               #
    #  PURPOSE:        This method cleans up when an instance of the ipynb_Exit class is       #
    #                  deleted.                                                                #
    #------------------------------------------------------------------------------------------#
    def __del__(self):
        sys.stderr = sys.__stderr__  # Restore sys.stderr to the original values it had at the
                                     # start of the program.    

#----------------------------------------------------------------------------------------------#
# Define A Python Function Named customExit()                                                  #
#----------------------------------------------------------------------------------------------#
#  FUNCTION:  customExit()                                                                     #
#  PURPOSE:   This function is used to define a customized exit process.                       #
#----------------------------------------------------------------------------------------------#
def customExit(returnCode=0):
    if returnCode == 0:
        ipynb_Exit()
    else:
        raise ipynb_Exit

#----------------------------------------------------------------------------------------------#
# If An IPython Application (i.e., A Jupyter Notebook) Calls The "exit() Function, Call A      #
# Customized Exit Routine So The Jupyter Notebook Will Not Stop Running - Otherwise, Call The  #
# Default Exit Routine                                                                         #
#----------------------------------------------------------------------------------------------#
if get_ipython():
    exit = customExit                # Rebind To The Custom Exit Function
else:
    exit = exit                      # Just Call The Exit Function

#----------------------------------------------------------------------------------------------#
# Tell This Jupyter Notebook To Ignore Warnings                                                #
#----------------------------------------------------------------------------------------------#
warnings.filterwarnings('ignore')
print()
print ("Configuring the environment to ignore warnings.")

#----------------------------------------------------------------------------------------------#
# Display The Version Of pandas Being Used                                                     #
#----------------------------------------------------------------------------------------------#
print()
print ("Using pandas version " + pd.__version__ + ".")

#----------------------------------------------------------------------------------------------#
# Run The db2.ipynb Jupyter Notebook To Get SQL Magic Command Functionality                    #
#----------------------------------------------------------------------------------------------#
print()
print("Setting up SQL Magic Command functionality.")
%run db2.ipynb

#----------------------------------------------------------------------------------------------#
# Display A Status Message Indicating This Work Is Complete                                    #
#----------------------------------------------------------------------------------------------#
print("\nAll initialization work is complete!\n")

## Step 3. Assign values to application variables that will be used to establish a database connection
### Overview:
Before operations can be performed against an IBM Db2 database, a connection to the database must first be established. To do this, you need information about the database environment such as the database's name or alias, the host name or IP address of the database server, and the port number Db2 uses for TCP/IP communications. You also need appropriate authorization credentials, which typically consist of a user or authentication ID and a corresponding password. Instructions on how to obtain this information are provided in the main guide for this lab.   

### Execute the code:
Once you have collected the information needed, perform these steps to assign it to the application variables that will be used later to establish a database connection:<br>
<ol>
    <li>
        <div>Select the appropriate code cell below and assign values to the <code>dbName</code>,<code>hostName</code>, <code>portNum</code>, <code>userID</code>, and <code>passWord</code> application variables. If the database you are using is provided through the <b>Db2 "Lite" service on IBM Cloud</b>, modify the code in the cell immediately below the heading <b>Step 3a: Connect to a remote, Db2 on Cloud database</b>. On the other hand, if you are using a local database that was created with <b>IBM Db2 Community Edition</b> (or some other Db2 Edition), modify the code in the cell immediately below the heading <b>Step 3b: Connect to a local, on-premises Db2 database</b>.
        </div>
    </li><br>
    <li>
        <div style="display:inline-block; vertical-align:middle;">
            Click on the </div>
        <div style="display:inline-block; vertical-align:middle;">
            <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Save.png" alt="Save" width="30px" align="middle"/></div>
        <div style="display:inline-block; vertical-align:middle;">
            button or press <b>Ctrl+s</b> (or <b>command+s</b> on Mac) to save your changes.</div>
    </li><br>
    <li>
        <div style="display:inline-block; vertical-align:middle;">
            When you are ready, click on the
        </div>
        <div style="display:inline-block; vertical-align:middle;">
            <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Run.png" alt="Run" width="60px" align="middle"/>
        </div>
        <div style="display:inline-block; vertical-align:middle;">
            button or press <b>Shift+Enter</b> to execute the code.
        </div>
    </li>
</ol>

<div class="alert alert-block alert-danger">
    <b>IMPORTANT:</b> Whenever new values are assigned to the variables in the cell below, the code must be saved and <i><u>re-executed</i></u>. Otherwise, the code in the code cells that follow may not execute correctly.
</div>

## Step 3a: Connect to a remote, Db2 on Cloud database

In [None]:
#----------------------------------------------------------------------------------------------#
# Initialize All User-Specific Connection Variables - Db2 on Cloud Database                    #
#   IMPORTANT: UPDATE WITH VALUES FROM YOUR OWN ENVIRONMENT, AS PER LAB INSTRUCTIONS.          #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
dbName = "bludb"
hostName = "replace_with-your-hostname"
portNum = "replace-with-your-port-number"
userID = "replace-with-your-userID"
passWord = "replace-with-your-password"
secureComm = True              # Use SSL (Secure Sockets Layer) Communication

# Display A Status Message Indicating This Work Is Complete
print("\nUser-specific connection variable initialization work complete!\n")

## Step 3b: Connect to a local, on-premises Db2 database

In [None]:
#----------------------------------------------------------------------------------------------#
# Initialize All User-Specific Connection Variables - Local Db2 Database                       #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
dbName = "replace-with-your-db-name"       # The Alias For The Db2 Database
userID = "replace-with-your-userID"        # The Instance User ID
passWord = "replace-with-your-password"    # The Password For The Instance User ID
hostName = "replace-with-your-hostname"    # The Host Name
portNum = "replace-with-your-port-number"  # The TCP/IP Port Number That Receives Db2 Connections
                                           # On Db2 11.5.5 and older, the default port number
                                           # is 50000 (Non-SSL) or 50001 (SSL).
                                           # On Db2 11.5.6 and newer, the default port number
                                           # is 25000 (Non-SSL or 25001 (SSL).
secureComm = False              # Do NOT Use SSL (Secure Sockets Layer) Communication

# Display A Status Message Indicating This Work Is Complete
print("\nUser-specific connection variable initialization work complete!\n")

# Section 2. Interact with the Db2 database

## Step 1: Establish a database connection using a Db2 Magic Command
### Overview:
As mentioned earlier, before anything can be done with a Db2 database, a connection to the database must first be established. With <b>Db2 Magic Commands</b>, the Db2 <b>CONNECT</b> command can be used to perform this task. The syntax for the actual <b>Db2 Magic Command</b> used has the following format:

<b>%sql CONNECT TO {<i>dbName</i>} USER {<i>userName</i>} USING {<i>passWord</i>} HOST {<i>hostName</i>} PORT {<i>portNum</i>} SSL TRUE</b>

where:
<table style="border:none; border-collapse:collapse; font-size:14px; width:100%;">
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>dbName</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The name of the Db2 server or database the connection is to be made to.</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>userName</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The user name/authorization ID that is to be used for authentication when the connection is established.</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style=" width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>passWord</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The password that corresponds to the user name/authorization ID specified in the <b><i>userName</i></b> parameter.</td>
    </tr>    
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>hostName</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The host name or IP address of the Db2 server — as it is known to the TCP/IP network — the connection is to be made to.</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>portNum</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The port number that receives Db2 connections on the server the connection is to be made to.</td>
    </tr>   
</table>

The <b>%sql</b> text preceeding the <b>CONNECT</b> command indicates the <b>Db2 Magic Commands</b> are in use. These commands are designed to make it easy for individuals who do not have experience coding in Python to interact with Db2 databases using a Jupyter Notebook. 

<div class="alert alert-block alert-success">
    <b>NOTE:</b> When working with a local Db2 database, a Db2 instance must be up and running before a connection can be established. A Db2 instance can be started by executing the <b>START DATABASE MANAGER</b> (<b>db2start</b>) command. (<i>You can learn more about the START DATABASE MANAGER command here:</i> <a href="https://www.ibm.com/docs/en/db2/11.5?topic=commands-start-database-manager">START DATABASE MANAGER command</a>.) With the Db2 "Lite" service on IBM Cloud, a Db2 instance is always running.<br><br>
    In addition, if Transport Layer Security (TLS) — formerly known as Secure Sockets Layer (SSL) — communications is not being used to establish a connection to a local Db2 database, the "<b>SSL TRUE</b>" clause should not be included in the command used. 
</div>

### Execute the code:
The code in the next cell builds a connection string in the format just described, using the values assigned to application variables earlier. It then attempts to establish a connection to the Db2 database specified using a <b>Db2 Magic Command</b>.
<ol>
    <li>Select the code cell below and carefully read through the comments. This will help you understand the actions the code performs.</li><br>
    <li>
        <div style="display:inline-block; vertical-align:middle;">
            When you are ready, click on the
        </div>
        <div style="display:inline-block; vertical-align:middle;">
            <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Run.png" alt="Run" width="60px" align="middle"/>
        </div>
        <div style="display:inline-block; vertical-align:middle;">
            button or press <b>Shift+Enter</b> to execute the code.
        </div>
        (Output containing the phrase "<b>Connection successful</b>" indicates a database connection was established.)
    </li>
</ol>

<div class="alert alert-block alert-danger">
    <b>IMPORTANT:</b> Make sure the correct Db2 server/database name, host name, port number, user ID, and password     have been assigned to the appropriate variables <i><u>before</u></i> executing the code in this cell.
</div>

In [None]:
#----------------------------------------------------------------------------------------------#
# Establish A Db2 Database Connection Using A Db2 Magic Command                                #
#----------------------------------------------------------------------------------------------#

# Display A Status Message Indicating An Attempt To Establish A Connection To A Db2 Database
# Is About To Be Made
print("\nConnecting to the \'" + dbName + "\' database ... ", end="")

# If The Connection Requires Secure Sockets Layer (SSL) Communications, Ensure The "SSL TRUE" 
# Option Is Provided
if secureComm == True:
    %sql CONNECT TO {dbName} USER {userID} USING {passWord} HOST {hostName} PORT {portNum} SSL TRUE
else:
    %sql CONNECT TO {dbName} USER {userID} USING {passWord} HOST {hostName} PORT {portNum}

## Step 2: Obtain information about the AIRLINE_DELAY_CAUSE table
### Overview:
The <b>DESCRIBE</b> command displays metadata about the columns, indexes, and data partitions of a particular table or view.

### Execute the code:
The code in the next cell uses a <b>Db2 Magic Command</b> to describe the <b>AIRLINE_DELAY_CAUSE</b> table that was created earlier.
<ol>
   <li>Select the code cell below and carefully read through the comments. This will help you understand the actions the code performs.</li><br>
    <li><div style="display:inline-block; vertical-align:middle;">
        When you are ready, click on the
    </div>
    <div style="display:inline-block; vertical-align:middle;">
        <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Run.png" alt="Run" width="60px" align="middle"/>
    </div>
    <div style="display:inline-block; vertical-align:middle;">
       button or press <b>Shift+Enter</b> to execute the code.
    </div></li>
</ol>

In [None]:
#----------------------------------------------------------------------------------------------#
# Describe The Structure Of The AIRLINE_DELAY_CAUSE Table Using A Db2 Magic Command            #
#----------------------------------------------------------------------------------------------#

%sql DESCRIBE TABLE AIRLINE_DELAY_CAUSE

## Step 3: Establish a database connection with the ibm_db_dbi.connect API
### Overview:
When the <b>Db2 Magic Commands</b> are not being used (and in some cases, even when they are), the <b>ibm_db_dbi.connect( )</b> can be used to establish a Db2 database connection. This API utilizes a connection string that has the following format:

<b>DRIVER={IBM DB2 ODBC DRIVER};ATTACH=<i>connType</i>;DATABASE=<i>dbName</i>;HOSTNAME=<i>hostName</i>;PORT=<i>portNum</i>;PROTOCOL=TCPIP;SECURITY=SSL;UID=<i>userName</i>;PWD=<i>passWord</i></b>

where:
<table style="border:none; border-collapse:collapse; font-size:14px; width:100%;">
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>connType</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Specifies whether a connection is to be made to Db2 server (<code>TRUE</code>) or a Db2 database (<code>FALSE</code>).</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>dbName</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The name of the Db2 server or database the connection is to be made to.</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>hostName</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The host name or IP address of the Db2 server — as it is known to the TCP/IP network — the connection is to be made to.</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>portNum</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The port number that receives Db2 connections on the server the connection is to be made to.</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>userName</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The user name/authorization ID that is to be used for authentication when the connection is established.</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style=" width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:20px;">&bull;</td>
        <td style="width:4%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;"><i>passWord</i></td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">:</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">The password that corresponds to the user name/authorization ID specified in the <b><i>userName</i></b> parameter.</td>
    </tr>      
</table>

### Execute the code:
The code in the next cell builds a connection string in the format just described, using the values assigned to application variables earlier. It then attempts to establish a connection to the Db2 database specified. (<i>This is the connection that is used by the pandas DataFrames that are utilized in different exercises in this lab.</i>)
<ol>
    <li>Select the code cell below and carefully read through the comments. This will help you understand the actions the code performs.</li><br>
    <li>
        <div style="display:inline-block; vertical-align:middle;">
            When you are ready, click on the
        </div>
        <div style="display:inline-block; vertical-align:middle;">
            <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Run.png" alt="Run" width="60px" align="middle"/>
        </div>
        <div style="display:inline-block; vertical-align:middle;">
            button or press <b>Shift+Enter</b> to execute the code.
        </div>
    </li>
</ol>

<div class="alert alert-block alert-danger">
    <b>IMPORTANT:</b> Make sure the correct Db2 server/database name, host name, port number, user ID, and password     have been assigned to the appropriate variables <i><u>before</u></i> executing the code in this cell.
</div>

In [None]:
#----------------------------------------------------------------------------------------------#
# Establish A Db2 Database Connection                                                          #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
connString = ""                                 # Db2 Connection String
connOption = ""                                 # ibm_db.connect() API Connection Option
connectionID = None                             # Db2 Connection ID
pConnIDID = None                                # Python Database API Specification 2.0 
                                                # Db2 Connection ID
errorMsg = ""                                   # Detailed Error Information

# Construct The String That Will Be Used To Establish A Db2 Database Connection
connString = "DRIVER={IBM DB2 ODBC DRIVER}"
connString += ";ATTACH=FALSE"            # Connect To A Database - Not A Server
connString += ";DATABASE=" + dbName      # Database Name
connString += ";HOSTNAME=" + hostName    # Host Name
connString += ";PORT=" + portNum         # Port Number
connString += ";PROTOCOL=TCPIP"          # Protocol (TCP/IP)

# If The Connection Requires Secure Sockets Layer (SSL) Communications, Add "SECURITY=SSL"
# To The Connection String
if secureComm == True:
    connString += ";SECURITY=SSL"        # Security (SSL)

# Finish Constructing The Database Connection String
connString += ";UID=" + userID           # Authorization ID
connString += ";PWD=" + passWord         # Password

# Define The Db2 Database Connection Option That Will Enable AUTOCOMMIT Behavior
connOption = {ibm_db.SQL_ATTR_AUTOCOMMIT : ibm_db.SQL_AUTOCOMMIT_ON}

# Display A Status Message Indicating An Attempt To Establish A Connection To A Db2 Database
# Is About To Be Made
print("\nConnecting to the \'" + dbName + "\' database ... ", end="")

# Attempt To Establish A Connection To The Database Specified, Using The Connection String
# Just Constructed By Calling The ibm_db.connect() API - Turn AUTOCOMMIT Behavior ON and
# QUOTED_LITERAL_REPLACEMENT Behavior OFF
try:
    connectionID = ibm_db.connect(connString, '', '', connOption,
        ibm_db.QUOTED_LITERAL_REPLACEMENT_OFF)
except Exception:
    print("\n\nERROR: Unable to connect to the \'" + dbName + "\' database.")
    pass

# If A Database Connection Could Not Be Established, Display A Detailed Error Message And
# Exit (Call The ibm_db.conn_errormsg() API To Obtain Detailed Error Information)
if connectionID == None:
    errorMsg = ibm_db.conn_errormsg()
    print(errorMsg + "\n")
    exit(-1)

# Otherwise, Complete The Status Message
else:
    print("Done!\n")

# Display The Connection String That Was Used To Establish The Connection
print("Connection string used:\n\"" + connString + "\"\n")

# Using The Database Connection Just Established, Create A Corresponding
# Python Database API Specification 2.0 Db2 Connection ID
pConnID = ibm_db_dbi.Connection(connectionID)

## Step 4: Check the data in the AIRLINE_DELAY_CAUSE table

### Overview:
The first objective in this analysis is to assess the current state of airline delays. To do this, the data stored in the <b>AIRLINE_DELAY_CAUSE</b> <i>must</i> contain records for the years 2019 through 2022. We can validate that this table was populated with the correct data by running some simple SQL queries, using either <b>Db2 Magic Commands</b> or <b>pandas DataFrames</b>.

### Execute the code:
The code in the next four cell queries the <b>AIRLINE_DELAY_CAUSE</b> table and returns information that can be used to verify that the table was populated with the correct data.
<ol>
   <li>Select each code cell below and carefully read through the comments. This will help you understand the actions the code performs.</li><br>
    <li><div style="display:inline-block; vertical-align:middle;">
        When you are ready, click on the
    </div>
    <div style="display:inline-block; vertical-align:middle;">
        <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Run.png" alt="Run" width="60px" align="middle"/>
    </div>
    <div style="display:inline-block; vertical-align:middle;">
       button or press <b>Shift+Enter</b> to execute the code.
    </div></li>
</ol>

### Step 4a: Using a Db2 Magic Command, retrieve the first five rows of data stored in the AIRLINE_DELAY_CAUSE table

In [None]:
#----------------------------------------------------------------------------------------------#
# Retrieve The First Five Rows From The AIRLINE_DELAY_CAUSE Table Using A Db2 Magic Command    #
#----------------------------------------------------------------------------------------------#

%sql SELECT * FROM airline_delay_cause LIMIT 5

### Understanding the data stored in the AIRLINE_DELAY_CAUSE table 

Each record in the AIRLINE_DELAY_CAUSE table contains the following information:

where:
<table style="border:none; border-collapse:collapse; font-size:14px; width:100%;">
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">YEAR</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Flight year</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">MONTH</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Flight month</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style=" width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">CARRIER</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Carrier abbreviation</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">CARRIER_NAME</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Full carrier name</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">AIRPORT</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Airport abbreviation code</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">AIRPORT_NAME</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Full airport name</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">ARR_FLIGHTS</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights that arrived</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">ARR_DEL15</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights delayed by 15 minutes or more</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">CARRIER_CT</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights delayed because of air carrier issues</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">WEATHER_CT</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights delayed because of extreme weather</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">NAS_CT</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights delayed because of the National Aviation System (NAS)</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">SECURITY_CT</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights delayed because of security</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">LATE_AIRCRAFT_CT</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights delayed because of late arriving aircraft</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">ARR_CANCELLED</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights cancelled</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">ARR_DIVERTED</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total number of flights diverted</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">ARR_DELAY</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total minutes delayed</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">CARRIER_DELAY</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total minutes delayed due to the carrier</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">WEATHER_DELAY</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total minutes delayed due to the weather</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">NAS_DELAY</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total minutes delayed due to the National Aviation System (NAS)</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">SECURITY_DELAY</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total minutes delayed due to security</td>
    </tr>
    <tr style="background-color: #FFFFFF;">
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; font-size:16px;">&bull;</td>
        <td style="width:9%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">LATE_AIRCRAFT_DELAY</td>
        <td style="width:1%; text-align:right; vertical-align:top; padding-top:0px; padding-bottom:0px; vertical-align:top;">-</td>
        <td style="width:40%; text-align:left; vertical-align:top; padding-top:0px; padding-bottom:0px;">Total minutes delayed due to late arriving aircraft</td>
    </tr>
</table>

<div class="alert alert-block alert-success">
    <b>NOTE:</b> Columns containing "count" information (<b><i>nnnnn</i>_CT</b>) information are pro-rated by minutes. For example, if the total flight delay was 45 minutes and 15 minutes was due to weather and 30 minutes was because of a late arriving aircraft, the count for <b>WEATHER_CT</b> would be <b>.33</b> and the count for <b>LATE_AIRCRAFT_CT</b> would be <b>.66</b>. 
</div>

The cancellation/delay categories used are defined as follows:

<b>Air Carrier</b>: The cause of the cancellation or delay was due to circumstances within the airline's control (for example; maintenance or crew problems, aircraft cleaning, baggage loading, fueling, and other reasons).

<b>Extreme Weather</b>: The cause of the cancellation or delay was significant meteorological conditions (actual or forecasted) that, in the judgment of the carrier, delayed or prevented the operation of a flight (for example, a thunderstorm, tornado, blizzard or hurricane).

<b>National Aviation System (NAS)</b>: Delays and cancellations attributable to the National Aviation System, which refers to a broad set of conditions such as non-extreme weather conditions, airport operations, heavy traffic volume, and air traffic control.

<b>Late-arriving aircraft</b>: The cause of the cancellation or delay was the late arrival of a previous flight using the same aircraft (forcing the present flight to depart late).

<b>Security</b>: Delays or cancellations caused by evacuation of a terminal or concourse, re-boarding of aircraft because of a security breach, inoperative screening equipment and/or long lines in excess of 29 minutes at screening areas.

<br>

### Step 4b: Using a pandas DataFrame, retrieve the first five rows of data stored in the AIRLINE_DELAY_CAUSE table

In [None]:
#----------------------------------------------------------------------------------------------#
# Retrieve The First Five Rows From The AIRLINE_DELAY_CAUSE Table Using A pandas DataFrame     #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

# Define The SQL Statement To Be Executed
sqlStatement = "SELECT * FROM airline_delay_cause LIMIT 5"

# Execute The SQL Statement And Copy The Data Returned Into A pandas DataFrame
datapd = pd.read_sql(sqlStatement, pConnID)

# Display The Data In The pandas DataFrame Just Populated
print()
print(datapd)

### Step 4c: Using a Db2 Magic Command, confirm that the data stored in the AIRLINE_DELAY_CAUSE table covers the years 2019 through 2022

In [None]:
#----------------------------------------------------------------------------------------------#
# Use A Db2 Magic Command To Confirm That The Data In The AIRLINE_DELAY_CAUSE Table Covers The #
# Years 2019 Through 2022                                                                      #
#----------------------------------------------------------------------------------------------#
%sql SELECT DISTINCT flight_year FROM airline_delay_cause ORDER BY 1

### Step 4d: Using a pandas DataFrame, confirm that the data stored in the AIRLINE_DELAY_CAUSE table covers the years 2019 through 2022

In [None]:
#----------------------------------------------------------------------------------------------#
# Use A pandas DataFrame To Confirm That The Data In The AIRLINE_DELAY_CAUSE Table Covers The  #
# Years 2019 Through 2022                                                                      #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

# Define The SQL Statement To Be Executed
sqlStatement = "SELECT DISTINCT flight_year FROM airline_delay_cause ORDER BY 1"

# Execute The SQL Statement And Copy The Data Returned Into A pandas DataFrame 
datapd = pd.read_sql(sqlStatement, pConnID)

# Display The Data In The pandas DataFrame Just Populated
print()
print(datapd)

# Section 3. Data analysis

## Analysis  1 - Comparing total COVID-era delayed flights by reason
The end goal of this exercise is to gain an understanding of how airline delays have changed over time. So, the analysis begins by comparing total delayed flights by reason (carrier, weather, NAS, security, and late aircraft).  

### Analysis 1a : Query the AIRLINE_DELAY_CAUSE table and calculate the total number of delays encountered during the years 2019 through 2022 <i>for each delay type</i> 

In [None]:
#----------------------------------------------------------------------------------------------#
# Determine The Causes Of Delays During The Peak COVID Period                                  #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

# Define The SQL Statement To Be Executed
sqlStatement = "SELECT flight_year,"
sqlStatement += " SUM(CAST(carrier_ct AS FLOAT)) AS carrier_delay,"
sqlStatement += " SUM(CAST(weather_ct AS FLOAT)) AS weather_delay,"
sqlStatement += " SUM(CAST(nas_ct AS FLOAT)) AS nas_delay,"
sqlStatement += " SUM(CAST(security_ct AS FLOAT)) AS security_delay,"
sqlStatement += " SUM(CAST(late_aircraft_ct AS FLOAT)) AS late_aircraft_delay"
sqlStatement += " FROM airline_delay_cause"
sqlStatement += " GROUP BY flight_year"
sqlStatement += " ORDER BY flight_year"

# Display All Floating Point Numbers With A Precision Of Two Decimals
pd.options.display.float_format = '{:,.2f}'.format

# Execute The SQL Statement And Copy The Data Returned Into A pandas DataFrame 
delays_cause_df = pd.read_sql_query(sqlStatement, pConnID)

# Display The Data In The pandas DataFrame Just Populated
delays_cause_df

### Analysis 1b: Graph the results

Although this data provides useful information, it is not as easy to interpret as a graphical view. The following code, when executed, displays a bar chart that illustrates the different delay counts by reason. This allows a quick assessment of why flight delays might have been occuring. 

In [None]:
#----------------------------------------------------------------------------------------------#
# Display A Bar Chart Showing The Different Delay Counts By Reason                             #
#----------------------------------------------------------------------------------------------#

# Set The width Of The Bar Chart Bars
barWidth = 0.15

# Set The Position Of Bar Chart Bars On The X-Axis
r1 = np.arange(len(delays_cause_df['CARRIER_DELAY']))
r2 = [x + barWidth for x in r1]
r3 = [x + barWidth for x in r2]
r4 = [x + barWidth for x in r3]
r5 = [x + barWidth for x in r4]

# Set The Bar Chart Size
plt.figure(figsize=(15, 6))

# Generate The Bar Chart
plt.bar(r1, delays_cause_df['CARRIER_DELAY'], color='blue',
    width=barWidth, edgecolor='white', label='Carrier Delay')
plt.bar(r2, delays_cause_df['WEATHER_DELAY'], color='orange',
    width=barWidth, edgecolor='white', label='Weather Delay')
plt.bar(r3, delays_cause_df['NAS_DELAY'], color='pink',
    width=barWidth, edgecolor='white', label='NAS Delay')
plt.bar(r4, delays_cause_df['LATE_AIRCRAFT_DELAY'], color='purple',
    width=barWidth, edgecolor='white', label='Late Aircraft Delay')
plt.bar(r5, delays_cause_df['SECURITY_DELAY'], color='green',
    width=barWidth, edgecolor='white', label='Security Delay')

# Add X-Axis Tick Marks To The Middle Of "Group" Bars
plt.xlabel('Year', fontweight='bold')
plt.ylabel('Total Flights', fontweight='bold')
plt.xticks([r + barWidth for r in range(len(delays_cause_df['CARRIER_DELAY']))], ['2019', '2020', '2021', '2022'])

# Create The Bar Chart Legend
plt.legend()

# Display The Completed Bar Chart
plt.show()

When analyzing data, numerical output does not always enable insights to be readily determined. In these situations a pictorial view of the data using some type of graph is preferable. Pandas DataFrames, together with Matplotlib can be used to generate a graph of the data that was returned in a prior query (which is what was done in the code cell just executed).

With the data visualized, it is clear that there are some major descrepancies both by <b>year</b> as well as by <b>reason</b>. Security delays are minimal across all years, while the top two causes for flight delays are consistently due to circumstances within the airline's control (<b>Carrier Delay</b>) and late aircraft arivals (<b>Late Aircraft Delay</b>).  
<br>
An uninformed viewer may interpret years 2020 and 2022 as being particularly great years to travel since there were significantly less delays. However, due to the impacts of COVID-19, this assumption is probably inaccurate. By examining the total flight delays compared to the total number of flights for the period, a more accurate analysis and comparison of flight delay data can be achieved.

## Analysis 2 - Creating a normalized view of COVID-era flight delay data

To provide a more accurate analysis, the overall airline delay totals are not sufficient. Therefore, in this analysis, data normalization will be performed so that flight delays for all years will be used to make a fair analysis of airline delays for 2022 verses earlier time periods, that is, years 2019 through 2021. (Comparing airline flight delays against the total flights for the year provides a true comparison of the airline flight delays between different time periods.)

### Analysis 2a: Calculate sums for flight cancellations and delays

The code in the next cell calculates sums for Total Cancelled Flights, Total Delayed Flights, Total Flights, Carrier Delays, Weather Delays, NAS Delays, Security Delays, and Late Aircraft Delays <i>for each year (2019 through 2022)</i>. It then stores the results in a pandas DataFrame that will be used later.

In [None]:
#----------------------------------------------------------------------------------------------#
# Collect The Sums For Flight Cancellations And Delays From The AIRLINE_DELAY_CAUSE Table      #
# Using A pandas DataFrame                                                                     #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

# Define The SQL Statement To Be Executed
sqlStatement = "SELECT flight_year,"
sqlStatement += " SUM(CAST(arr_cancelled AS FLOAT)) AS total_cancelled_flights,"
sqlStatement += " SUM(CAST(arr_del15 AS FLOAT)) AS total_delayed_flights,"
sqlStatement += " SUM(CAST(arr_flights AS FLOAT)) AS total_flights,"
sqlStatement += " SUM(CAST(carrier_ct AS FLOAT)) AS carrier_delay,"
sqlStatement += " SUM(CAST(weather_ct AS FLOAT)) AS weather_delay,"
sqlStatement += " SUM(CAST(nas_ct AS FLOAT)) AS nas_delay,"
sqlStatement += " SUM(CAST(security_ct AS FLOAT)) AS security_delay,"
sqlStatement += " SUM(CAST(late_aircraft_ct AS FLOAT)) AS late_aircraft_delay"
sqlStatement += " FROM AIRLINE_DELAY_CAUSE"
sqlStatement += " GROUP BY flight_year"
sqlStatement += " ORDER BY flight_year"

# Display All Floating Point Numbers With A Precision Of Two Decimals
pd.options.display.float_format = '{:,.2f}'.format

# Execute The SQL Statement And Copy The Data Returned Into A pandas DataFrame 
delays_cause_df2 = pd.read_sql_query(sqlStatement, pConnID)

# Display The Data In The pandas DataFrame Just Populated
delays_cause_df2

### Analysis 2b: Calculate flight delays as a <i>ratio</i> of the overall flight total

The code in the next cell calculates flight delays as a <i>ratio</i> of overall flight totals (On-Time Flights, Cancelled Flights, and Delayed Flights) to provide a better understanding of the flight delay data. This is done by by capturing the number of cancelled flights, delayed flights, and the total number of flights for each year (2019 through 2022), using a <b>Db2 Magic Command</b>. A <b>Db2 Magic Command</b> was used here to provide a way to review the results before copying this information into a pandas DataFrame.

In [None]:
#----------------------------------------------------------------------------------------------#
# Create A View Containing Flight Delay Data Using A Db2 Magic Command                         #
#----------------------------------------------------------------------------------------------#
%sql -q CREATE OR REPLACE VIEW flight_delays AS \
    SELECT flight_year, \
    SUM(arr_cancelled) AS total_cancelled_flights, \
    SUM(arr_del15) AS total_delayed_flights, \
    SUM(arr_cancelled) + SUM(arr_del15) AS total_disrupted_flights, \
    SUM(arr_flights) AS total_flights, \
    SUM(carrier_ct) AS carrier_delay_sum, \
    SUM(weather_ct) AS weather_delay_sum, \
    SUM(nas_ct) AS nas_delay_sum, \
    SUM(security_ct) AS security_delay_sum, \
    SUM(late_aircraft_ct) AS late_aircraft_delay_sum \
    FROM airline_delay_cause \
    GROUP BY flight_year

#----------------------------------------------------------------------------------------------#
# Calculate Flight Delay Ratios To Account For COVID Distruptions Using A Db2 Magic Command    #
#----------------------------------------------------------------------------------------------#

%sql SELECT flight_year, \
    CAST(total_delayed_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100 AS delayed_total_ratio, \
    CAST(total_cancelled_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100 AS cancelled_total_ratio, \
    CAST(total_disrupted_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100 AS distrupted_total_ratio, \
    total_cancelled_flights, \
    total_delayed_flights, \
    total_disrupted_flights, \
    total_flights \
    FROM flight_delays \
    ORDER BY flight_year

### Analysis 2c: Calculate flight delay <i>ratios</i> and store this information in a pandas DataFrame

The code in the next cell uses an SQL query to calculate flight delay <i>ratios</i> for each of the flight delay categories and stores this information in a pandas DataFrame. Having this information available makes it possible caclulate a flight delay percentage for each year (and provides a normalized comparison between years).

In [None]:
#----------------------------------------------------------------------------------------------#
# Calculate Flight Delay Ratios To Account For COVID Distruptions Using A pandas DataFrame     #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

#----------------------------------------------------------------------------------------------#
# Create A View Containing Flight Delay Data Using A Db2 Magic Command                         #
#----------------------------------------------------------------------------------------------#
%sql -q CREATE OR REPLACE VIEW flight_delays AS \
    SELECT flight_year, \
    SUM(arr_cancelled) AS total_cancelled_flights, \
    SUM(arr_del15) AS total_delayed_flights, \
    SUM(arr_cancelled) + SUM(arr_del15) AS total_disrupted_flights, \
    SUM(arr_flights) AS total_flights, \
    SUM(carrier_ct) AS carrier_delay_sum, \
    SUM(weather_ct) AS weather_delay_sum, \
    SUM(nas_ct) AS nas_delay_sum, \
    SUM(security_ct) AS security_delay_sum, \
    SUM(late_aircraft_ct) AS late_aircraft_delay_sum \
    FROM airline_delay_cause \
    GROUP BY flight_year

#----------------------------------------------------------------------------------------------#
# Calculate Flight Delay Ratios Using A pandas DataFrame                                       #
#----------------------------------------------------------------------------------------------#

# Define The SQL Statement To Be Executed
sqlStatement = " SELECT flight_year,"
sqlStatement += " CAST(total_delayed_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100"
sqlStatement += "  AS delayed_total_ratio,"
sqlStatement += " CAST(total_cancelled_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100"
sqlStatement += "  AS cancelled_total_ratio,"
sqlStatement += " CAST(total_disrupted_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100"
sqlStatement += "  AS distrupted_total_ratio"
sqlStatement += " FROM flight_delays"
sqlStatement += " ORDER BY flight_year"

# Display All Floating Point Numbers With A Precision Of Two Decimals
pd.options.display.float_format = '{:,.2f}'.format

# Execute The SQL Statement And Copy The Data Returned Into A pandas DataFrame 
delays_ratio_df = pd.read_sql_query(sqlStatement, pConnID)

# Display The Data In The pandas DataFrame Just Populated
delays_ratio_df

### Analysis 2d: Graph the results

The code in the next cell uses the data generated and stored in a pandas DataFrame earlier to produce a graph of the results.

In [None]:
#----------------------------------------------------------------------------------------------#
# Display A Bar Chart Showing A Normalized View Of Flight Delay Data                           #
#----------------------------------------------------------------------------------------------#

# Set The width Of The Bar Chart Bars
barWidth = 0.15

# Set The Position Of Bar Chart Bars On The X-Axis
r1 = np.arange(len(delays_cause_df2['FLIGHT_YEAR']))
r2 = [x + barWidth for x in r1]
r3 = [x + barWidth for x in r2]

# Set The Bar Chart Size
plt.figure(figsize=(14, 6))

# Generate The Bar Chart
plt.bar(r1, delays_ratio_df['DELAYED_TOTAL_RATIO'], color='#d0c4ef',
    width=barWidth, edgecolor='white', label='Delayed Ratio')
plt.bar(r2, delays_ratio_df['CANCELLED_TOTAL_RATIO'], color='#c75f44',
    width=barWidth, edgecolor='white', label='Cancelled Ratio')
plt.bar(r3, delays_ratio_df['DISTRUPTED_TOTAL_RATIO'], color='#7e9b89',
    width=barWidth, edgecolor='white', label='Distrupted Ratio')

# Add X-Axis Tick Marks To The Middle Of "Group" Bars
plt.xlabel('Year', fontweight='bold')
plt.ylabel('Percentage', fontweight='bold')
plt.xticks([r + barWidth for r in range(len(delays_cause_df2['FLIGHT_YEAR']))], ['2019', '2020', '2021', '2022'])

# Create The Bar Chart Label
plt.title(label = "Normalized Distruption Ratios")

# Create The Bar Chart Legend
plt.legend(bbox_to_anchor=(1.01, 1), loc='upper left', borderaxespad=0)

# Display The Completed Bar Chart
plt.show()

With this new normalized view of flight delay data, it can be seen that the initial analysis of flight delays was not completely fair. Although 2020 was not a great year to travel because of COVID-19, out of the flights that did operate, the overall chance of experiencing a flight delay was relatively low compared to prior years.

And while 2022 looked as though it had a low number of flight delays, when analyzing the amount of flight delays compared to the total number of flights scheduled, it's clear that 2022 saw a larger number of flight delays than previous years. Thus, it appears that the return to travel post COVID-19, together with airline challenges (shortage of pilots, shortage of baggage handlers and other personnel, and other factors) has led to a rise in airline flight delays.

### Analysis 2e: A closer look at monthly delay trends for 2022

The code in the next cell obtains and displays information about the total duration of flight delays for each month in 2022. As you might expect, the month of December had the largest amount of flight delay time.

In [None]:
#----------------------------------------------------------------------------------------------#
# Calculate Monthly Delay Trends For 2022 Using A Pandas DataFrame                             #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

# Define The SQL Statement To Be Executed
sqlStatement = "SELECT flight_month,"
sqlStatement += " SUM(arr_del15) AS total_flight_delay"
sqlStatement += " FROM airline_delay_cause"
sqlStatement += " WHERE flight_year = 2022"
sqlStatement += " GROUP BY flight_month"
sqlStatement += " ORDER BY flight_month"

# Execute The SQL Statement And Copy The Data Returned Into A Pandas DataFrame
delay_by_month_df3 = pd.read_sql_query(sqlStatement, pConnID)

# By Default, pandas Will Display A Maximum Of 10 Rows In A DataFrame - If There Are More Than
# 10 Rows In A DataFrame, Only The First 5 And The Last 5 Rows Will Be Displayed. So, Set The
# display.max_rows Value To 12 To Ensure That The Data For All 12 Months Will Be Displayed
pd.set_option('display.max_rows', 12)

# Display The Data In The Pandas DataFrame Just Populated
delay_by_month_df3

## Analysis  3 - Understanding flight delays in the context of historical data

This analysis looks at more historical data obtained from the U.S. Department of Transporation in an attempt to understand whether the flight delays seen in 2022 are a new trend or if the airline industry, as a whole, is simply a repeating historical pattern.

## Analysis 3a: Using a pandas DataFrame, confirm that the data stored in the AIRLINE_DELAY_CAUSE_TOTAL table covers the years 2013 through 2022

In [None]:
#----------------------------------------------------------------------------------------------#
# Use A Pandas DataFrame To Confirm That The Data In The AIRLINE_DELAY_CAUSE_TOTAL Table       #
# Covers The Years 2013 Through 2022                                                           #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

# Define The SQL Statement To Be Executed
sqlStatement = "SELECT DISTINCT flight_year FROM airline_delay_cause_total ORDER BY 1"

# Execute The SQL Statement And Copy The Data Returned Into A Pandas DataFrame
datapd = pd.read_sql_query(sqlStatement, pConnID)

# Display The Data In The Pandas DataFrame Just Populated
print()
print(datapd)

### Analysis 3b: Determining the cause of airline delays during the years 2013 through 2022 

The code in the next cell obtains and displays information about the total duration of flight delays for each year found in the historical data set used (i.e., for years 2013 through 2022).

In [None]:
#----------------------------------------------------------------------------------------------#
# Determine The Causes Of Delays During The Years 2013 Through 2022                            #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

# Define The SQL Statement To Be Executed
sqlStatement = "SELECT flight_year,"
sqlStatement += " SUM(CAST(arr_cancelled AS FLOAT)) AS total_cancelled_flights,"
sqlStatement += " SUM(CAST(arr_del15 AS FLOAT)) AS total_delayed_flights,"
sqlStatement += " SUM(CAST(arr_flights AS FLOAT)) AS total_flights,"
sqlStatement += " SUM(CAST(carrier_ct AS FLOAT)) AS carrier_delay,"
sqlStatement += " SUM(CAST(weather_ct AS FLOAT)) AS weather_delay,"
sqlStatement += " SUM(CAST(nas_ct AS FLOAT)) AS nas_delay,"
sqlStatement += " SUM(CAST(security_ct AS FLOAT)) AS security_delay,"
sqlStatement += " SUM(CAST(late_aircraft_ct AS FLOAT)) AS late_aircraft_delay"
sqlStatement += " FROM airline_delay_cause_total"
sqlStatement += " GROUP BY flight_year"
sqlStatement += " ORDER BY flight_year"

# Display All Floating Point Numbers With A Precision Of Two Decimals
pd.options.display.float_format = '{:,.2f}'.format

# Execute The SQL Statement And Copy The Data Returned Into A Pandas DataFrame
delays_cause_df2 = pd.read_sql_query(sqlStatement, pConnID)

# Display The Data In The Pandas DataFrame Just Populated
delays_cause_df2

### Analysis 3c: Calculate flight delay <i>ratios</i> and store this information in a pandas DataFrame

The code in the next cell uses an SQL query to calculate flight delay <i>ratios</i> for each of the flight delay categories and stores this information in a pandas DataFrame. As before, having this information available makes it possible caclulate a flight delay percentage for each year (and provides a normalized comparison between years).

In [None]:
#----------------------------------------------------------------------------------------------#
# Calculate Flight Delay Ratios For Historical Data Using A Pandas DataFrame                   #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
sqlStatement = ""                               # Structured Query Language (SQL) Statement

#----------------------------------------------------------------------------------------------#
# Create A View Containing Flight Delay Data Using A Db2 Magic Command                         #
#----------------------------------------------------------------------------------------------#
%sql -q CREATE OR REPLACE VIEW flight_delays AS \
    SELECT flight_year, \
    SUM(arr_cancelled) AS total_cancelled_flights, \
    SUM(arr_del15) AS total_delayed_flights, \
    SUM(arr_cancelled) + SUM(arr_del15) AS total_disrupted_flights, \
    SUM(arr_flights) AS total_flights, \
    SUM(carrier_ct) AS carrier_delay_sum, \
    SUM(weather_ct) AS weather_delay_sum, \
    SUM(nas_ct) AS nas_delay_sum, \
    SUM(security_ct) AS security_delay_sum, \
    SUM(late_aircraft_ct) AS late_aircraft_delay_sum \
    FROM airline_delay_cause_total \
    GROUP BY flight_year

#----------------------------------------------------------------------------------------------#
# Calculate Flight Delay Ratios Using A pandas DataFrame                                       #
#----------------------------------------------------------------------------------------------#

# Define The SQL Statement To Be Executed
sqlStatement = " SELECT flight_year,"
sqlStatement += " CAST(total_delayed_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100"
sqlStatement += "  AS delayed_total_ratio,"
sqlStatement += " CAST(total_cancelled_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100"
sqlStatement += "  AS cancelled_total_ratio,"
sqlStatement += " CAST(total_disrupted_flights AS FLOAT) / CAST(total_flights AS FLOAT) * 100"
sqlStatement += "  AS disrupted_total_ratio"
sqlStatement += " FROM flight_delays"
sqlStatement += " ORDER BY flight_year"

# Display All Floating Point Numbers With A Precision Of Two Decimals
pd.options.display.float_format = '{:,.2f}'.format

# Execute The SQL Statement And Copy The Data Returned Into A Pandas DataFrame
delays_ratio_df = pd.read_sql_query(sqlStatement, pConnID)

# Display The Data In The Pandas DataFrame Just Populated
delays_ratio_df

### Analysis 3d: Graph the results

Now, the data is normalized, enabling a more accurate comparison to be made between each year. The code in the next cell uses this data to produce a graph of the results.

In [None]:
#----------------------------------------------------------------------------------------------#
# Display A Bar Chart Showing A Normalized View Of Historical Flight Delay Data                #
#----------------------------------------------------------------------------------------------#

# Set The width Of The Bar Chart Bars
barWidth = 0.15

# Set The Position Of Bar Chart Bars On The X-Axis
r1 = np.arange(len(delays_cause_df2['FLIGHT_YEAR']))
r2 = [x + barWidth for x in r1]
r3 = [x + barWidth for x in r2]

# Set The Bar Chart Size
plt.figure(figsize=(14, 10))

# Generate The Bar Chart
plt.bar(r1, delays_ratio_df['DELAYED_TOTAL_RATIO'], color='#d0c4ef',
    width=barWidth, edgecolor='white', label='Delayed Ratio')
plt.bar(r2, delays_ratio_df['CANCELLED_TOTAL_RATIO'], color='#c75f44',
    width=barWidth, edgecolor='white', label='Canceled Ratio')
plt.bar(r3, delays_ratio_df['DISRUPTED_TOTAL_RATIO'], color='#7e9b89',
    width=barWidth, edgecolor='white', label='Disrupted Ratio')

# Add X-Axis Tick Marks To The Middle Of "Group" Bars
plt.xlabel('Year', fontweight='bold')
plt.ylabel('Percentage', fontweight='bold')
plt.xticks([r + barWidth for r in range(len(delays_cause_df2['FLIGHT_YEAR']))],
    ['2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022'])

# Create The Bar Chart Label
plt.title(label = "Normalized Distruption Ratios")

# Create The Bar Chart Legend
plt.legend(bbox_to_anchor=(1.01, 1), loc='upper left', borderaxespad=0)

# Display The Completed Bar Chart
plt.show()

Graphing this normalized view of flight delay data makes it much easier to compare airline delays observed over the past 10 years. Looking at the results, it appears that the number of flight delays U.S. travelers saw in 2022 is very similar to the number of flight delays observed in 2014. That being said, it's clear that 2022 has had a larger number of flight delays than previous years. This analysis confirms that the return to travel post COVID-19, together with airline challenges has resulted in a larger impact on airline flight delays than was experienced in prior years.

#### An interesting side note

According to the <b>International Civil Aviation Organisation</b> (ICAO), in 2014 North America, the world’s largest domestic market with 44% of the world's domestic traffic (at the time), grew by 3.1%. Low-cost carriers carried an estimated 900 million passengers in 2014, indicating a growth of 10.3% when compared to the number of passengers carried by low-cost carriers in 2013. Capacity offered by the world’s airlines, expressed as available seat-kilometres, increased globally by 5.6%. And air freight, expressed in terms of scheduled total freight tonne-kilometres performed, posted an increase of 4.9%. This growth in could have something to do with the higher flight delay numbers observed in 2014.

<i>Source:</i> <a href="https://www.icao.int/annual-report-2014/Pages/the-world-of-air-transport-in-2014.aspx">The World of Air Transport in 2014</a>.

## Terminate the database connection
### Overview:
When a database connection is established, it remains in effect until it is explicitly terminated or until the application that established the connection ends. That said, it is good programming practice to explicitly terminate any database connections that are open before ending an application returning control to the operating system. While this is typically done as part of the "exit and cleanup" work of an application, it can also be done any time an error is raised that forces an application to terminate prematurely.

### Execute the code:<br>
The code in the next cell terminates the Db2 database connection established earlier and returns control to the operating system.
<ol>
   <li>Select the code cell below and carefully read through the comments. This will help you understand the actions the code performs.</li><br>
    <li><div style="display:inline-block; vertical-align:middle;">
        When you are ready, click on the
    </div>
    <div style="display:inline-block; vertical-align:middle;">
        <img src="https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Db2-L3-Tech-Lab/CP4D_JN_Run.png" alt="Run" width="60px" align="middle"/>
    </div>
    <div style="display:inline-block; vertical-align:middle;">
       button or press <b>Shift+Enter</b> to execute the code.
    </div></li>
</ol>

In [None]:
#----------------------------------------------------------------------------------------------#
# Attempt To Close The Db2 Database Connection That Was Opened Earlier                         #
#----------------------------------------------------------------------------------------------#

# Define And Initialize The Appropriate Variables
retCode = 0                                     # Return Code
errorMsg = ""                                   # Detailed Error Information

# If A Db2 Database Connection Exists, Print A Status Message And Close It By Calling
# The ibm_db.close() API
if not connectionID == None:
    print("\nDisconnecting from the \'" + dbName + "\' database ... ", end="")
    try:
        retCode = ibm_db.close(connectionID)
    except Exception:
        print("\nERROR: Unable to disconnect from the " + dbName + " database.")
        pass

    # If The Db2 Database Connection Was Not Closed, Display An Error Message And Exit
    if retCode == False:
        errorMsg = ibm_db.conn_error(connectionID)
        print(errorMsg + "\n")
        exit(-1)

    # Otherwise, Complete The Status Message
    else:
        print("Done!\n")

# Return Control To The Operating System
exit()

## This concludes this portion of the lab.