# Histograms

This notebook contains the code to create histograms and density plots in `lets-plot`, using the ["Airlines Delays from 2003-2016"](https://www.kaggle.com/datasets/giovamata/airlinedelaycauses) dataset by [Priank Ravichandar](https://www.kaggle.com/priankravichandar) licensed under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/). This dataset contains the information on flight delays and cancellations in the US airports for the period of 2003-2016.

In [3]:
import pandas as pd
from lets_plot import *

LetsPlot.setup_html()

## Import and process the data
* Create a date/time variable from the month/year column
* Remove the first and last years, as they only contain partial records for the year
* Create a variable which calculates the average delay per flight per airport per month

In [4]:
airlines = pd.read_csv("data/airlines.csv")
airlines["Time"] = pd.to_datetime(airlines["TimeLabel"], infer_datetime_format=True)
airlines = airlines[~(airlines["TimeYear"].isin([2003, 2016]))]
airlines["AverageMinutesDelayed"] = airlines["MinutesDelayedTotal"] / airlines["FlightsDelayed"]

## Histogram showing the distribution of time that flights are delayed

In [6]:
(
        ggplot(airlines, aes(x="AverageMinutesDelayed"))
        + geom_histogram(fill = "#b3cde3")
        + xlab("Minutes delayed per flight")
        + ylab("Count")
        + ggtitle("Average minutes flights delayed in US airports, 2004-2015")
)

## Histogram showing the distribution of time that flights are delayed for EWR and LAS

In [8]:
(
        ggplot(airlines[airlines["AirportCode"].isin(["EWR", "LAS"])],
               aes(x="AverageMinutesDelayed", fill="AirportCode"))
        + geom_histogram()
        + xlab("Minutes delayed per flight")
        + ylab("Count")
        + ggtitle("Average minutes flights delayed in EWR vs LAS, 2004-2015")
        + scale_fill_brewer(type="qual", palette="Pastel1", name="Airport code")
)

## Density plot showing the distribution of time that flights are delayed for EWR and LAS

In [11]:
(
        ggplot(airlines[airlines["AirportCode"].isin(["EWR", "LAS"])],
               aes(x="AverageMinutesDelayed", color="AirportCode"))
        + geom_density(size = 1)
        + xlab("Minutes delayed per flight")
        + ylab("Count")
        + ggtitle("Average minutes flights delayed in EWR vs LAS, 2004-2015")
        + scale_color_brewer(type="qual", palette="Pastel1", name="Airport code")
)