# Case Study: Analysis of Airline Delay using Spark
### Airline delay is a critical issue affecting both airlines and passengers. In this assignment, you will use PySpark to analyze a dataset containing information about airline flights and predict flight delays.


In [1]:
sc

In [2]:
spark

1. create a new spark session with spark config

In [3]:
sc.stop()

In [4]:
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession, HiveContext

In [5]:
config = SparkConf().setAppName('Flights Delay Analysis').setMaster('local[4]')
sc = SparkContext.getOrCreate(conf=config)

In [6]:
#Spark integration with Hive with Spark Session
spark = (SparkSession.builder.appName("pyspark-Hive-integration")
         .config('spark.sql.warehouse.dir','/user/hive/warehouse/')
        .enableHiveSupport().getOrCreate())

In [7]:
spark.sql("show databases").show()

+------------+
|databaseName|
+------------+
|  banking_db|
|     default|
+------------+



In [8]:
spark

### 2.Create new instance of Spark SQL session and define new DataFrame using Flights_Delay.csv dataset.

In [9]:
flights_df = spark.read.csv("file:///home/hadoop/Downloads/Flights_Delay.csv", header=True, inferSchema=True)

### 3.Create table Spark HIVE table flights_table

In [10]:
flights_df.createOrReplaceTempView('flights_table')

### 4.Show Top 10 Rows and describe table

In [11]:
spark.sql("select * from flights_table").show(10)

+---+----+-----+---+-----------+-------+-------------+-----------+--------------+-------------------+-------------------+--------------+---------------+--------+----------+--------------+------------+--------+--------+---------+-------+-----------------+------------+-------------+--------+---------+-------------------+----------------+--------------+-------------+-------------------+-------------+
| ID|YEAR|MONTH|DAY|DAY_OF_WEEK|AIRLINE|FLIGHT_NUMBER|TAIL_NUMBER|ORIGIN_AIRPORT|DESTINATION_AIRPORT|SCHEDULED_DEPARTURE|DEPARTURE_TIME|DEPARTURE_DELAY|TAXI_OUT|WHEELS_OFF|SCHEDULED_TIME|ELAPSED_TIME|AIR_TIME|DISTANCE|WHEELS_ON|TAXI_IN|SCHEDULED_ARRIVAL|ARRIVAL_TIME|ARRIVAL_DELAY|DIVERTED|CANCELLED|CANCELLATION_REASON|AIR_SYSTEM_DELAY|SECURITY_DELAY|AIRLINE_DELAY|LATE_AIRCRAFT_DELAY|WEATHER_DELAY|
+---+----+-----+---+-----------+-------+-------------+-----------+--------------+-------------------+-------------------+--------------+---------------+--------+----------+--------------+-----------

### Describe the table

In [12]:
flights_df.printSchema()

root
 |-- ID: integer (nullable = true)
 |-- YEAR: integer (nullable = true)
 |-- MONTH: integer (nullable = true)
 |-- DAY: integer (nullable = true)
 |-- DAY_OF_WEEK: integer (nullable = true)
 |-- AIRLINE: string (nullable = true)
 |-- FLIGHT_NUMBER: integer (nullable = true)
 |-- TAIL_NUMBER: string (nullable = true)
 |-- ORIGIN_AIRPORT: string (nullable = true)
 |-- DESTINATION_AIRPORT: string (nullable = true)
 |-- SCHEDULED_DEPARTURE: integer (nullable = true)
 |-- DEPARTURE_TIME: integer (nullable = true)
 |-- DEPARTURE_DELAY: integer (nullable = true)
 |-- TAXI_OUT: integer (nullable = true)
 |-- WHEELS_OFF: integer (nullable = true)
 |-- SCHEDULED_TIME: integer (nullable = true)
 |-- ELAPSED_TIME: integer (nullable = true)
 |-- AIR_TIME: integer (nullable = true)
 |-- DISTANCE: integer (nullable = true)
 |-- WHEELS_ON: integer (nullable = true)
 |-- TAXI_IN: integer (nullable = true)
 |-- SCHEDULED_ARRIVAL: integer (nullable = true)
 |-- ARRIVAL_TIME: integer (nullable = true

### 5. Apply Query performance optimization techniques like – creating Partitioning DataFrame by a specific column, parquet data, caching, predicate pushdown methods etc.

In [13]:
partition_df = flights_df.repartition(3)

In [14]:
flights_df.cache()

DataFrame[ID: int, YEAR: int, MONTH: int, DAY: int, DAY_OF_WEEK: int, AIRLINE: string, FLIGHT_NUMBER: int, TAIL_NUMBER: string, ORIGIN_AIRPORT: string, DESTINATION_AIRPORT: string, SCHEDULED_DEPARTURE: int, DEPARTURE_TIME: int, DEPARTURE_DELAY: int, TAXI_OUT: int, WHEELS_OFF: int, SCHEDULED_TIME: int, ELAPSED_TIME: int, AIR_TIME: int, DISTANCE: int, WHEELS_ON: int, TAXI_IN: int, SCHEDULED_ARRIVAL: int, ARRIVAL_TIME: int, ARRIVAL_DELAY: int, DIVERTED: int, CANCELLED: int, CANCELLATION_REASON: string, AIR_SYSTEM_DELAY: int, SECURITY_DELAY: int, AIRLINE_DELAY: int, LATE_AIRCRAFT_DELAY: int, WEATHER_DELAY: int]

### Persistance of dataframe with a specific storage level

In [15]:
from pyspark import StorageLevel
flights_df.persist(StorageLevel.MEMORY_AND_DISK)

DataFrame[ID: int, YEAR: int, MONTH: int, DAY: int, DAY_OF_WEEK: int, AIRLINE: string, FLIGHT_NUMBER: int, TAIL_NUMBER: string, ORIGIN_AIRPORT: string, DESTINATION_AIRPORT: string, SCHEDULED_DEPARTURE: int, DEPARTURE_TIME: int, DEPARTURE_DELAY: int, TAXI_OUT: int, WHEELS_OFF: int, SCHEDULED_TIME: int, ELAPSED_TIME: int, AIR_TIME: int, DISTANCE: int, WHEELS_ON: int, TAXI_IN: int, SCHEDULED_ARRIVAL: int, ARRIVAL_TIME: int, ARRIVAL_DELAY: int, DIVERTED: int, CANCELLED: int, CANCELLATION_REASON: string, AIR_SYSTEM_DELAY: int, SECURITY_DELAY: int, AIRLINE_DELAY: int, LATE_AIRCRAFT_DELAY: int, WEATHER_DELAY: int]

### 6.Average arrival delay caused by airlines

In [16]:
spark.sql("select AIRLINE , round(avg(ARRIVAL_DELAY),2) as avg_arrival_delay  from flights_table group by AIRLINE having avg_arrival_delay > 1").show()

+-------+-----------------+
|AIRLINE|avg_arrival_delay|
+-------+-----------------+
|     UA|              6.7|
|     NK|            14.21|
|     AA|             8.39|
|     EV|            10.88|
|     B6|            13.96|
|     DL|             2.81|
|     OO|            10.15|
|     F9|             24.1|
|     US|             5.98|
|     MQ|            19.23|
|     HA|             4.07|
|     VX|             5.13|
|     WN|              3.7|
+-------+-----------------+



### 7.Days of months with respected to average of arrival delays

In [17]:
spark.sql("select DAY,round(avg(ARRIVAL_DELAY),2) as avg_arrival_delay from flights_table group by DAY order by DAY").show()

+---+-----------------+
|DAY|avg_arrival_delay|
+---+-----------------+
|  1|            14.81|
|  2|            15.05|
|  3|            18.14|
|  4|            17.16|
|  5|            16.24|
|  6|            10.61|
|  7|             2.83|
|  8|             5.23|
|  9|             4.42|
| 10|            -0.05|
| 11|             3.99|
| 12|            11.25|
| 13|             3.38|
| 14|             1.33|
| 15|             2.97|
| 16|             9.12|
| 17|             8.76|
| 18|             3.57|
| 19|             1.63|
| 20|             3.88|
+---+-----------------+
only showing top 20 rows



### 8.Arrange weekdays with respect to the average arrival delays caused

In [18]:
spark.sql("select DAY_OF_WEEK , round(avg(ARRIVAL_DELAY),2) as avg_arrival_delay from flights_table group by DAY_OF_WEEK order by DAY_OF_WEEK").show()

+-----------+-----------------+
|DAY_OF_WEEK|avg_arrival_delay|
+-----------+-----------------+
|          1|            10.81|
|          2|             8.03|
|          3|             5.59|
|          4|             7.17|
|          5|             6.01|
|          6|             4.89|
|          7|            10.11|
+-----------+-----------------+



### 9.Arrange Days of month as per cancellations done in Descending

In [19]:
spark.sql("select DAY,count(CANCELLED) as cancelled from flights_table group by DAY order by cancelled desc").show()

+---+---------+
|DAY|cancelled|
+---+---------+
|  4|     2660|
|  2|     2644|
|  9|     2620|
|  5|     2619|
|  6|     2596|
|  3|     2481|
|  8|     2403|
|  7|     2267|
|  1|     2248|
| 10|     1813|
| 13|     1788|
| 16|     1724|
| 23|     1712|
| 20|     1700|
| 26|     1700|
| 27|     1678|
| 22|     1671|
| 18|     1663|
| 12|     1655|
| 19|     1654|
+---+---------+
only showing top 20 rows



### 10.Find Top 10 busiest airports with respect to day of week

In [20]:
spark.sql("""select DAY_OF_WEEK, airport, max(countOfFlights) as maxCount from
(
select DAY_OF_WEEK, ORIGIN_AIRPORT as airport, count(*) as countOfFlights from flights_table group by DAY_OF_WEEK, ORIGIN_AIRPORT
union all
select DAY_OF_WEEK, DESTINATION_AIRPORT as airport, count(*) as countOfFlights from flights_table group by DAY_OF_WEEK, DESTINATION_AIRPORT
) as result
group by DAY_OF_WEEK, airport order by maxCount desc limit 10""").show()


+-----------+-------+--------+
|DAY_OF_WEEK|airport|maxCount|
+-----------+-------+--------+
|          5|    ATL|     644|
|          4|    ATL|     557|
|          1|    ATL|     555|
|          7|    ATL|     522|
|          3|    ATL|     505|
|          2|    ATL|     485|
|          5|    ORD|     483|
|          5|    DFW|     447|
|          4|    ORD|     441|
|          1|    ORD|     436|
+-----------+-------+--------+



### 11.Finding airlines that make the maximum number of cancellations

In [21]:
spark.sql("select AIRLINE,COUNT(CANCELLED) as cancelled from flights_table group by AIRLINE order by cancelled desc limit 1").show()

+-------+---------+
|AIRLINE|cancelled|
+-------+---------+
|     WN|    11738|
+-------+---------+



### 12.Find and order airlines in descending that make the most number of diversions

In [22]:
spark.sql("select AIRLINE,COUNT(DIVERTED) as total_diverted FROM flights_table \
          group by AIRLINE ORDER BY total_diverted desc ").show()

+-------+--------------+
|AIRLINE|total_diverted|
+-------+--------------+
|     WN|         11738|
|     DL|          7989|
|     EV|          5916|
|     OO|          5708|
|     AA|          5250|
|     UA|          4701|
|     US|          3925|
|     MQ|          3502|
|     B6|          2548|
|     AS|          1586|
|     NK|          1048|
|     F9|           794|
|     HA|           722|
|     VX|           573|
+-------+--------------+



### 13.Finding days of month that see the most number of diversion

In [23]:
spark.sql("select DAY, count(DIVERTED) as most_diverted from flights_table group by DAY order by most_diverted desc").show()

+---+-------------+
|DAY|most_diverted|
+---+-------------+
|  4|         2660|
|  2|         2644|
|  9|         2620|
|  5|         2619|
|  6|         2596|
|  3|         2481|
|  8|         2403|
|  7|         2267|
|  1|         2248|
| 10|         1813|
| 13|         1788|
| 16|         1724|
| 23|         1712|
| 26|         1700|
| 20|         1700|
| 27|         1678|
| 22|         1671|
| 18|         1663|
| 12|         1655|
| 19|         1654|
+---+-------------+
only showing top 20 rows



### 14.Calculating mean and standard deviation of departure delay for all flights in minutes

In [24]:
spark.sql("select round(avg(DEPARTURE_DELAY),2) as mean ,round(stddev(DEPARTURE_DELAY),2) as standard_deviation from flights_table group by AIRLINE").show()

+-----+------------------+
| mean|standard_deviation|
+-----+------------------+
|14.29|             36.37|
|15.58|              46.0|
| 11.5|             50.59|
|11.53|              40.6|
|16.07|             44.45|
| 9.94|             44.58|
| 11.6|              41.9|
|23.51|             55.22|
| 7.81|             29.95|
|17.07|             43.47|
| 1.18|             30.28|
| 2.34|             29.15|
| 9.86|             35.18|
|10.12|             28.66|
+-----+------------------+



### 15.Calculating mean and standard deviation of arrival delay for all flights in minutes

In [25]:
spark.sql("select round(avg(ARRIVAL_DELAY),2) as mean ,round(stddev(ARRIVAL_DELAY),2) as standard_deviation from flights_table group by AIRLINE").show()

+-----+------------------+
| mean|standard_deviation|
+-----+------------------+
|  6.7|             38.97|
|14.21|             47.58|
| 8.39|             53.57|
|10.88|             43.39|
|13.96|             47.64|
| 2.81|             46.96|
|10.15|             43.76|
| 24.1|             56.27|
| 5.98|             34.11|
|19.23|              46.4|
| 4.07|             32.38|
|-1.53|             31.37|
| 5.13|             40.87|
|  3.7|             31.23|
+-----+------------------+



### 16.Finding all diverted Route from a source to destination Airport & which route is the most diverted

In [26]:
spark.sql("SELECT ORIGIN_AIRPORT, DESTINATION_AIRPORT, COUNT(DIVERTED) AS diverted_count \
FROM flights_table WHERE DIVERTED = 1 GROUP BY ORIGIN_AIRPORT, DESTINATION_AIRPORT order by diverted_count desc").show()

+--------------+-------------------+--------------+
|ORIGIN_AIRPORT|DESTINATION_AIRPORT|diverted_count|
+--------------+-------------------+--------------+
|           HOU|                DAL|             2|
|           PHL|                SAN|             2|
|           STT|                PHL|             2|
|           IAH|                ASE|             2|
|           TPA|                LGA|             2|
|           JFK|                EGE|             2|
|           JFK|                SEA|             2|
|           ORD|                ASE|             2|
|           CLT|                IAH|             2|
|           EWR|                STL|             1|
|           SNA|                SFO|             1|
|           FLL|                PVD|             1|
|           COS|                ORD|             1|
|           FLL|                BWI|             1|
|           BOS|                LAS|             1|
|           ASE|                LAX|             1|
|           

In [27]:
most_diverted_route = spark.sql("SELECT ORIGIN_AIRPORT, DESTINATION_AIRPORT, COUNT(DIVERTED) AS diverted_count \
FROM flights_table WHERE DIVERTED = 1 GROUP BY ORIGIN_AIRPORT, DESTINATION_AIRPORT ORDER BY diverted_count DESC LIMIT 1")

most_diverted_route.show()

+--------------+-------------------+--------------+
|ORIGIN_AIRPORT|DESTINATION_AIRPORT|diverted_count|
+--------------+-------------------+--------------+
|           TPA|                LGA|             2|
+--------------+-------------------+--------------+



### 17. Finding AIRLINES with its total flight count, total number of flights arrival delayed by more than 30 Minutes, % of such flights delayed by more than 30 minutes when it is not Weekends with minimum count of flights from Airlines by more than 10. Also Exclude some of Airlines 'AK', 'HI', 'PR', 'VI' and arrange output in descending order by % of such count of flights.

In [28]:
spark.sql("""
  SELECT 
    AIRLINE,
    COUNT(*) AS total_flights,
    SUM(CASE WHEN ARRIVAL_DELAY > 30 AND DAY_OF_WEEK NOT IN (6, 7) THEN 1 ELSE 0 END) AS delayed_flights,
    round(CAST(SUM(CASE WHEN ARRIVAL_DELAY > 30 AND DAY_OF_WEEK NOT IN (6, 7) THEN 1 ELSE 0 END) AS float) / COUNT(*) * 100,2) AS delay_percentage
  FROM 
    flights_table 
  WHERE 
    AIRLINE NOT IN ('AK', 'HI', 'PR', 'VI') 
  GROUP BY 
    AIRLINE 
  HAVING 
    COUNT(*) > 10 
  ORDER BY 
    delay_percentage DESC
""").show()

+-------+-------------+---------------+----------------+
|AIRLINE|total_flights|delayed_flights|delay_percentage|
+-------+-------------+---------------+----------------+
|     F9|          794|            139|           17.51|
|     MQ|         3502|            601|           17.16|
|     B6|         2548|            360|           14.13|
|     NK|         1048|            139|           13.26|
|     EV|         5916|            665|           11.24|
|     OO|         5708|            633|           11.09|
|     UA|         4701|            497|           10.57|
|     AA|         5250|            484|            9.22|
|     VX|          573|             47|             8.2|
|     US|         3925|            310|             7.9|
|     DL|         7989|            592|            7.41|
|     WN|        11738|            869|             7.4|
|     AS|         1586|             64|            4.04|
|     HA|          722|             23|            3.19|
+-------+-------------+--------

### 18. Finding AIRLINES with its total flight count with total number of flights departure delayed by less than 30 Minutes, % of such flights delayed by less than 30 minutes when it is Weekends with minimum count of flights from Airlines by more than 10. Also Exclude some of Airlines AK,HI,PR,VI and arrange output in descending order by % of such count of flights.

In [29]:
spark.sql("SELECT AIRLINE,COUNT(*) AS total_flights,\
SUM(CASE WHEN DEPARTURE_DELAY < 30 AND DAY_OF_WEEK IN (6, 7) THEN 1 ELSE 0 END) AS delayed_flights,\
round(CAST(SUM(CASE WHEN DEPARTURE_DELAY < 30 AND DAY_OF_WEEK IN (6, 7) THEN 1 ELSE 0 END) AS float) / COUNT(*) * 100,2) \
AS delay_percentage FROM flights_table WHERE AIRLINE NOT IN ('AK', 'HI', 'PR', 'VI') \
GROUP BY AIRLINE HAVING COUNT(*) > 10 order by delay_percentage desc").show()


+-------+-------------+---------------+----------------+
|AIRLINE|total_flights|delayed_flights|delay_percentage|
+-------+-------------+---------------+----------------+
|     AS|         1586|            412|           25.98|
|     HA|          722|            179|           24.79|
|     NK|         1048|            253|           24.14|
|     AA|         5250|           1214|           23.12|
|     DL|         7989|           1814|           22.71|
|     VX|          573|            129|           22.51|
|     WN|        11738|           2636|           22.46|
|     US|         3925|            867|           22.09|
|     OO|         5708|           1244|           21.79|
|     B6|         2548|            543|           21.31|
|     EV|         5916|           1203|           20.33|
|     UA|         4701|            950|           20.21|
|     MQ|         3502|            622|           17.76|
|     F9|          794|            133|           16.75|
+-------+-------------+--------

### 19.When is the best time of day/day of week/time of a year to fly with minimum delays?

In [30]:
# Calculate average delays by hour of day
hourly_delays = spark.sql("""
  SELECT
    FLOOR(DEPARTURE_TIME / 60) AS HOUR_OF_DAY,
    round(AVG(DEPARTURE_DELAY),2) as DepartDelay,
    round(AVG(ARRIVAL_DELAY),2) AS Arrival_Delay
  FROM flights_table
  WHERE DEPARTURE_DELAY IS NOT NULL and ARRIVAL_DELAY IS NOT NULL
  GROUP BY FLOOR(DEPARTURE_TIME / 60)
  ORDER BY DepartDelay, Arrival_Delay
""")

# Calculate average delays by day of week
daily_delays = spark.sql("""
  SELECT
    DAY_OF_WEEK,
    round(AVG(DEPARTURE_DELAY),2) as DepartDelay,
    round(AVG(ARRIVAL_DELAY),2) AS Arrival_Delay
  FROM flights_table
  WHERE DEPARTURE_DELAY IS NOT NULL and ARRIVAL_DELAY IS NOT NULL
  GROUP BY DAY_OF_WEEK
  ORDER BY DepartDelay, Arrival_Delay
""")

# Calculate average delays by month
monthly_delays = spark.sql("""
  SELECT
    MONTH,
    round(AVG(DEPARTURE_DELAY),2) as DepartDelay,
    round(AVG(ARRIVAL_DELAY),2) AS Arrival_Delay
  FROM flights_table
  WHERE DEPARTURE_DELAY IS NOT NULL and ARRIVAL_DELAY IS NOT NULL
  GROUP BY MONTH
  ORDER BY DepartDelay, Arrival_Delay
""")

# Show the results
hourly_delays.show()
daily_delays.show()
monthly_delays.show()

+-----------+-----------+-------------+
|HOUR_OF_DAY|DepartDelay|Arrival_Delay|
+-----------+-----------+-------------+
|          9|      -4.64|        -7.47|
|         10|      -0.88|        -4.55|
|          8|      -0.44|        -4.73|
|         11|       0.84|        -2.36|
|         12|        0.9|        -2.72|
|         13|        2.7|        -0.92|
|         14|        4.2|         0.59|
|         15|       6.15|         3.16|
|         16|       6.16|         3.26|
|         18|       8.96|         5.05|
|         17|       9.14|         5.32|
|         19|       9.21|         5.67|
|         24|      10.15|         6.51|
|         21|      10.35|          7.0|
|          7|      10.35|         7.72|
|         20|      10.89|         6.54|
|         22|       11.5|         7.28|
|         27|      11.85|         8.04|
|         29|      13.21|         9.61|
|         28|       13.4|         9.11|
+-----------+-----------+-------------+
only showing top 20 rows

+-----------+-

### 20.Which airlines are best airline to travel considering number of cancellations, arrival, departure delays and all reasons affecting performance of airline industry.

In [31]:
spark.sql("""
  SELECT
    AIRLINE,
    CAST(SUM(CANCELLED) AS float) / COUNT(*) AS AirCancellationRatio,
    round(AVG(COALESCE(ARRIVAL_DELAY, 0)),2) AS AverageLateArrivals,
    round(AVG(COALESCE(DEPARTURE_DELAY, 0)),2) AS AverageLateDepartures,
    round(AVG(COALESCE(ELAPSED_TIME, 0)) - AVG(COALESCE(SCHEDULED_TIME, 0)),2) AS AverageTimeDifference,
    CAST(SUM(DIVERTED) AS float) / COUNT(*) AS FlightDiversionRate
  FROM Flights_table
  GROUP BY AIRLINE
""").show()

+-------+--------------------+-------------------+---------------------+---------------------+--------------------+
|AIRLINE|AirCancellationRatio|AverageLateArrivals|AverageLateDepartures|AverageTimeDifference| FlightDiversionRate|
+-------+--------------------+-------------------+---------------------+---------------------+--------------------+
|     UA|0.025951925122314402|               6.51|                13.94|               -13.14|0.001701765581791...|
|     NK|0.020038167938931296|              13.92|                15.28|                -4.41|                 0.0|
|     AA| 0.04590476190476191|               7.98|                10.99|               -11.04|0.002285714285714286|
|     EV| 0.05273833671399594|              10.27|                10.95|                 -6.4|0.003718728870858...|
|     B6| 0.05690737833594976|              13.08|                15.15|               -12.72|0.006279434850863423|
|     DL|0.022155463762673678|               2.75|                 9.72|

In [32]:
# Calculate the metrics for each airline
airline_metrics = spark.sql("""
  SELECT
    AIRLINE,
    CAST(SUM(CANCELLED) AS float) / COUNT(*) AS AirCancellationRatio,
    round(AVG(COALESCE(ARRIVAL_DELAY, 0)),2) AS AverageLateArrivals,
    round(AVG(COALESCE(DEPARTURE_DELAY, 0)),2) AS AverageLateDepartures,
    round(AVG(COALESCE(ELAPSED_TIME, 0)) - AVG(COALESCE(SCHEDULED_TIME, 0)),2) AS AverageTimeDifference,
    CAST(SUM(DIVERTED) AS float) / COUNT(*) AS FlightDiversionRate
  FROM Flights_table
  GROUP BY AIRLINE
""")

# Rank the airlines based on the metrics
ranked_airlines = airline_metrics.createOrReplaceTempView("airline_metrics")
ranked_airlines = spark.sql("""
  SELECT *,
    RANK() OVER (ORDER BY AirCancellationRatio, AverageLateArrivals, AverageLateDepartures, AverageTimeDifference, FlightDiversionRate) AS Rank
  FROM airline_metrics
  ORDER BY Rank
""")

# Show the ranked airlines
ranked_airlines.show()

+-------+--------------------+-------------------+---------------------+---------------------+--------------------+----+
|AIRLINE|AirCancellationRatio|AverageLateArrivals|AverageLateDepartures|AverageTimeDifference| FlightDiversionRate|Rank|
+-------+--------------------+-------------------+---------------------+---------------------+--------------------+----+
|     HA|0.004155124653739612|               4.05|                 1.18|                 2.66|0.001385041551246...|   1|
|     AS|0.007566204287515763|              -1.52|                 2.32|                -5.01|                 0.0|   2|
|     F9|0.013853904282115869|              23.77|                23.19|                -1.68|                 0.0|   3|
|     NK|0.020038167938931296|              13.92|                15.28|                -4.41|                 0.0|   4|
|     DL|0.022155463762673678|               2.75|                 9.72|               -10.63|0.002253098009763...|   5|
|     VX| 0.02268760907504363|  