# Line plots

This notebook contains the code to create line plots and area plots in `lets-plot`, using the ["Airlines Delays from 2003-2016"](https://www.kaggle.com/datasets/giovamata/airlinedelaycauses) dataset by [Priank Ravichandar](https://www.kaggle.com/priankravichandar) licensed under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/). This dataset contains the information on flight delays and cancellations in the US airports for the period of 2003-2016.

## Importing the dependencies

In [1]:
import pandas as pd
from lets_plot import *

LetsPlot.setup_html()

## Import and process the data
* Create a date/time variable from the month/year column
* Remove the first and last years, as they only contain partial records for the year

In [2]:
airlines = pd.read_csv("data/airlines.csv")
airlines["Time"] = pd.to_datetime(airlines["TimeLabel"], infer_datetime_format=True)
airlines = airlines[~(airlines["TimeYear"].isin([2003, 2016]))]

## Create a summary DataFrame that gives the number of delays over time by cause

In [3]:
delays_by_time_and_cause = (
    airlines[["Time", "NumDelaysLateAircraft", "NumDelaysWeather", "NumDelaysSecurity", "NumDelaysCarrier"]]
    .groupby("Time")
    .sum()
    .reset_index()
)

delays_by_time_and_cause = (
    pd.melt(delays_by_time_and_cause, id_vars="Time",
            value_vars=["NumDelaysLateAircraft", "NumDelaysWeather", "NumDelaysSecurity", "NumDelaysCarrier"])
    .rename(columns={
        "variable": "TypeOfDelay",
        "value": "NumberDelays"
    })
    .assign(TypeOfDelay=lambda x: x["TypeOfDelay"].str.replace("NumDelays", ""))
)

## Create a lineplot showing total delays over time due to late aircraft

In [29]:
(
        ggplot(delays_by_time_and_cause[delays_by_time_and_cause["TypeOfDelay"] == "LateAircraft"],
               aes(x="Time", y="NumberDelays"))
        + geom_line(color = "#fbb4ae", size = 1)
        + scale_x_datetime()
        + xlab("Time")
        + ylab("Number of delays")
        + ggtitle("Total delays due to late aircrafts in US airports, 2004-2015")
)

## Create a line plot which shows delays over time due to late aircraft or carrier issues

In [27]:
(
        ggplot(delays_by_time_and_cause[delays_by_time_and_cause["TypeOfDelay"].isin(["LateAircraft", "Carrier"])],
               aes(x="Time", y="NumberDelays", color="TypeOfDelay"))
        + geom_line(size = 1)
        + scale_x_datetime()
        + xlab("Time")
        + ylab("Number of delays")
        + ggtitle("Total delays in US airport, 2004-2015")
        + scale_color_brewer(type="qual", palette="Pastel1", name="Type of delay", labels=["Late aircraft", "Carrier"])
)

## Create an area plot which shows delays due to weather and late aircraft

In [22]:
(
        ggplot(
            delays_by_time_and_cause[
                delays_by_time_and_cause["TypeOfDelay"].isin(["LateAircraft", "Weather"])].sort_values("TypeOfDelay",
                                                                                                       ascending=False),
            aes(x="Time", y="NumberDelays", fill="TypeOfDelay"))
        + geom_area(color = "white")
        + scale_x_datetime()
        + xlab("Time")
        + ylab("Number of delays")
        + ggtitle("Total delays in US airports, 2004-2015")
        + scale_fill_brewer(type="qual", palette="Pastel1", name="Type of delay", labels=["Weather", "Late aircraft"])
)