# Scatterplots

This notebook contains the code to create scatterplots and weighted scatterplots in `lets-plot`, using the ["Airlines Delays from 2003-2016"](https://www.kaggle.com/datasets/giovamata/airlinedelaycauses) dataset by [Priank Ravichandar](https://www.kaggle.com/priankravichandar) licensed under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/). This dataset contains the information on flight delays and cancellations in the US airports for the period of 2003-2016.

In [1]:
import pandas as pd
from lets_plot import *

LetsPlot.setup_html()

## Import and process the data
* Create a date/time variable from the month/year column
* Remove the first and last years, as they only contain partial records for the year

In [2]:
airlines = pd.read_csv("data/airlines.csv")
airlines["Time"] = pd.to_datetime(airlines["TimeLabel"], infer_datetime_format=True)
airlines = airlines[~(airlines["TimeYear"].isin([2003, 2016]))]

## Scatterplot showing relationship between delays from weather and delays from late aircraft

In [5]:
(
        ggplot(airlines, aes(x="NumDelaysWeather", y="NumDelaysLateAircraft"))
        + geom_point(color = "#fbb4ae")
        + xlab("Number of delays from weather")
        + ylab("Number of delays from late aircraft")
        + ggtitle("Relationship between delays in US airports, 2004-2015")
)

## Scatterplot showing relationship between delays from weather and delays from late aircraft by airport

In [7]:
(
        ggplot(airlines[airlines["AirportCode"].isin(["LGA", "LAS"])],
               aes(x="NumDelaysWeather", y="NumDelaysLateAircraft", color="AirportCode"))
        + geom_point()
        + xlab("Number of delays from weather")
        + ylab("Number of delays from late aircraft")
        + scale_color_brewer(type="qual", palette="Pastel1", name="Airport code")
        + ggtitle("Relationship between delays for LAS and LGA, 2004-2015")
)

## Scatterplot showing relationship between delays from weather and delays from late aircraft by airport size

In [19]:
(
        ggplot(airlines[(airlines["TimeLabel"].isin(["2013/12", "2014/1", "2014/2", "2014/12", "2015/1", "2015/2"]))],
               aes(x="NumDelaysWeather", y="NumDelaysLateAircraft", size="FlightsTotal"))
        + geom_point(color = "#b3cde3")
        + xlab("Number of delays from weather")
        + ylab("Number of delays from late aircraft")
        + ggtitle("Relationship between delays by total flights, winters 2013/2014")
        + scale_size(name="Total flights")
)