## Interactive data visualization

This application uses [mosaic](https://uwdata.github.io/mosaic/) to display three linked visualizations.

Data: [Cross-filter flights (10M)](https://uwdata.github.io/mosaic/examples/flights-10m.html#cross-filter-flights-10m).
The dataset is a part of the [ASA Data Expo](https://community.amstat.org/jointscsg-section/dataexpo/dataexpo2009) dataset and consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008.

Hosted in [Ploomber Cloud](https://ploomber.io/).

How to use it: Select a histogram region to cross-filter the charts.

In [9]:
%load_ext sql

In [5]:
%%sql duckdb://
INSTALL httpfs
-- to avoid httpfs.duckdb_extension not found error

Done.


Success


In [6]:
from mosaic_widget import MosaicWidget

In [7]:
spec = {
  "meta": {
    "title": "Cross-Filter Flights (10M)",
    "description": "Histograms showing arrival delay, departure time, and distance flown for 10 million flights.\nOnce loaded, automatically-generated indexes enable efficient cross-filtered selections.\n\n_You may need to wait a few seconds for the dataset to load._\n"
  },
  "data": {
    "flights10m": "SELECT GREATEST(-60, LEAST(ARR_DELAY, 180))::DOUBLE AS delay, DISTANCE AS distance, DEP_TIME AS time FROM 'https://uwdata.github.io/mosaic-datasets/data/flights-10m.parquet'"
  },
  "params": {
    "brush": {
      "select": "crossfilter"
    }
  },
  "vconcat": [
    {
      "plot": [
        {
          "mark": "rectY",
          "data": {
            "from": "flights10m",
            "filterBy": "$brush"
          },
          "x": {
            "bin": "delay"
          },
          "y": {
            "count": None
          },
          "fill": "steelblue",
          "inset": 0.5
        },
        {
          "select": "intervalX",
          "as": "$brush"
        }
      ],
      "xDomain": "Fixed",
      "marginLeft": 75,
      "width": 600,
      "height": 200
    },
    {
      "plot": [
        {
          "mark": "rectY",
          "data": {
            "from": "flights10m",
            "filterBy": "$brush"
          },
          "x": {
            "bin": "time"
          },
          "y": {
            "count": None
          },
          "fill": "steelblue",
          "inset": 0.5
        },
        {
          "select": "intervalX",
          "as": "$brush"
        }
      ],
      "xDomain": "Fixed",
      "marginLeft": 75,
      "width": 600,
      "height": 200
    },
    {
      "plot": [
        {
          "mark": "rectY",
          "data": {
            "from": "flights10m",
            "filterBy": "$brush"
          },
          "x": {
            "bin": "distance"
          },
          "y": {
            "count": None
          },
          "fill": "steelblue",
          "inset": 0.5
        },
        {
          "select": "intervalX",
          "as": "$brush"
        }
      ],
      "xDomain": "Fixed",
      "marginLeft": 75,
      "width": 600,
      "height": 200
    }
  ]
}

### Visualizations

The visualizations consist of three histograms showing arrival delay, departure time, and distance flown for 10 million flights.

- Arrival Delay Histogram: This histogram displays the distribution of flight delays. The x-axis represents the delay duration, and the y-axis represents the count of flights with delays falling into various bins.

- Departure Time Histogram: This histogram shows the distribution of departure times. The x-axis represents the time of day, while the y-axis represents the count of flights departing at different times.

- Distance Histogram: This histogram visualizes the distribution of flight distances. The x-axis represents the flight distance, and the y-axis represents the count of flights covering various distances.

#### Cross-Filtering Interaction

The plots are interactive and linked together through cross-filtering. Cross-filtering allows users to select a range or subset of data in one plot, and the other plots will update to reflect the filtered data. For example, if you select a specific range of delay times in one histogram, the other two histograms will adjust to show the corresponding distribution of departure times and distances for those filtered flights. Uses the Vega [crossfilter](https://vega.github.io/vega/docs/transforms/crossfilter/) transform to perform efficient incremental updates.

In [10]:
# the widget will create a connection to an in-memory DuckDB database.
# Reference: https://uwdata.github.io/mosaic/jupyter/

MosaicWidget(spec)