<img src="images/panel_logo.png" width="10%" align="right">


# Big data dashboarding with Panel

In this notebook, we are going to put our visualizations together into a dashboard. We will also introduce a new tool from the HoloViz suite, Panel - a high-level app and dashboarding solution for Python.

---

## Reconnect to Dask Cluster

In [None]:
import dask_gateway
import dask.dataframe as dd

In [None]:
gateway = dask_gateway.Gateway()

In [None]:
if len(running_clusters := gateway.list_clusters())>0:
    cluster = gateway.connect(running_clusters[0].name)
else:
    cluster = gateway.new_cluster(conda_environment="global/global-data-of-unusual-size", profile="Medium Worker")
    cluster.adapt(1,10)

In [None]:
cluster

In [None]:
client = cluster.get_client()
client

## Load the airline on-time performance dataset

In [None]:
columns = [
    'YEAR', 'MONTH', 'DAY_OF_MONTH', 'DAY_OF_WEEK', 'FL_DATE', 'OP_CARRIER', 
    'TAIL_NUM', 'OP_CARRIER_FL_NUM', 'ORIGIN', 'DEST', 'CRS_DEP_TIME', 
    'DEP_TIME', 'DEP_DELAY', 'ARR_TIME', 'ARR_DELAY', 'CANCELLED', 
    'CANCELLATION_CODE', 'DIVERTED', 'AIR_TIME', 'FLIGHTS', 'DISTANCE',
    'CARRIER_DELAY', 'WEATHER_DELAY', 'NAS_DELAY', 'SECURITY_DELAY', 
    'LATE_AIRCRAFT_DELAY', 'DIV_ARR_DELAY'
]

In [None]:
flights = dd.read_parquet(
    f"gcs://quansight-datasets/airline-ontime-performance/sorted/full_dataset.parquet", 
    columns=columns
)

## Load the latitude-longitude data for the airports

We have a file `prep/airports.csv` that has the lat/lon information.

In [None]:
import panel as pn
import pandas as pd

pn.extension()

In [None]:
import hvplot.dask
import hvplot.pandas

In [None]:
airports = pd.read_csv("prep/airports.csv", 
                        usecols=["AIRPORT", "DISPLAY_AIRPORT_NAME", "LATITUDE", "LONGITUDE"], 
                        dtype={
                            "AIRPORT": "string", 
                            "DISPLAY_AIRPORT_NAME": "string", 
                            "LATITUDE":"float", 
                            "LONGITUDE":"float",
                        }
                      ).set_index('AIRPORT')

# drop duplicates keeping the last entry
airports = airports[~airports.index.duplicated(keep='last')]

Let's quickly plot the map of airports:

In [None]:
airports.hvplot.points('LONGITUDE', 'LATITUDE',  geo=True, 
                       color='red', alpha=0.2, hover_cols=['AIRPORT'],
                       tiles='CartoLight')

## Build a Panel dashboard

Panel has 3 methods for building out interactive dashboards:

- hvPlot `.interactive` turns any of your DataFrame processing pipelines into a dashboard (great if you want to explore a dataset!);
- Panel `.bind` binds your widgets with your interactive plot (great if you want to build an arbitrary app!);
- Param encapsulates your dashboard as self-contained classes (great if you want to build a complex codebase supporting both GUI and non-GUI usage).

For more details, read this excellent blog post: [3 ways to build a panel visualization dashboard](https://towardsdatascience.com/3-ways-to-build-a-panel-visualization-dashboard-6e14148f529d).

We will use the `hvplot.interactive` method here, but first, let's build a small `pandas` data pipeline:

In [None]:
pipeline = (
    flights[
        (flights['FL_DATE'] > "2020") &
        (flights['FL_DATE'] <= "2021")
    ]
    .groupby('DAY_OF_WEEK')["ARR_DELAY"].agg(how="mean")
    .rename(columns={"how": f"ARR_DELAY - mean"})
)

In [None]:
pipeline

## Move from static pipeline to a dynamic version

In the data pipeline above, let's use variables to represent the quantities we want to select in our dashboard: 

- `daterange` - start and end dates to filter by
- `groupby` - variable we wish to groupby
- `field` - data field we wish to plot
- `method` - statistic we wish to calculate (min, max, mean etc)

### Pick some Panel widgets for each of these variables

Further reading: [Panel documentation](https://panel.holoviz.org/reference/index.html#widgets)

**We're picking the `DateRangeSlider` widget for the `daterange` variable:**

In [None]:
import datetime as dt

daterange = pn.widgets.DateRangeSlider(
    name='Date Range Slider',
    start=dt.datetime(2003, 1, 1), end=dt.datetime(2022, 12, 31),
    value=(dt.datetime(2022, 1, 1), dt.datetime(2022, 12, 31)),
    step=24*3600*2*1000,
    bar_color = "green",
    width=800
)

In [None]:
daterange

In [None]:
daterange.value

**We're picking the `RadioButtonGroup` widget for the `groupby` variable:**

In [None]:
groupby = pn.widgets.RadioButtonGroup(
    name='Period', 
    options=['YEAR', 'MONTH', 'DAY_OF_MONTH', 'OP_CARRIER'],
    value='MONTH',
)
groupby

In [None]:
groupby.value

### 💻 Your turn: Choose an appropriate widget for `field` and `method`

In [None]:
# Your code here. When ready, click on the three dots below for the solutions.

**We're picking the `RadioButtonGroup` widget for the `groupby` variable:**

In [None]:
method = pn.widgets.Select(
    name='Method', 
    options=['min', 'max', 'mean', 'count'],
    value='mean',
)
method

**We're picking the `RadioButtonGroup` widget for the `groupby` variable:**

In [None]:
field = pn.widgets.RadioBoxGroup(
    name='Field', 
    options=['DEP_DELAY', 'ARR_DELAY'],
)
field

## Put everything together with `hvplot.interactive`

First, we need to make the DataFrame into an interactive DataFrame:

In [None]:
iflights = flights.interactive()

Then, we can make an interactive pipeline using our widgets as variables

For reference, here's our pipeline code from before:

```python
pipeline = (
    flights[
        (flights['FL_DATE'] > "2020") &
        (flights['FL_DATE'] <= "2021")
    ]
    .groupby('DAY_OF_WEEK')["ARR_DELAY"].agg(how="mean")
)
```

In [None]:
# Combine pipeline and widgets

ipipeline = (
    iflights[
        (iflights['FL_DATE'] > daterange.value[0]) &
        (iflights['FL_DATE'] <= daterange.value[1])
    ]
    .groupby(groupby)[field]
    .agg(how=method)
    .rename(columns={"how": f"{field} - {method}"})
)

In [None]:
ipipeline

We can now use the interactive pipeline object in several ways.

In [None]:
data_plot = ipipeline.hvplot()
data_plot

### 💻 Your turn: Create an interactive version of the airport map plot, but color the airports based on data values

In [None]:
# Your code here. When ready, click on the three dots below for the solutions.

In [None]:
flight_delays = (
    iflights[
        (iflights['FL_DATE'] > daterange.value[0]) &
        (iflights['FL_DATE'] <= daterange.value[1])
    ]
    .groupby('ORIGIN')[field]
    .agg(how=method)
    .join(airports)
    .rename(columns={"how": f"{field} - {method}"})
)

flight_delays

In [None]:
map_plot = flight_delays.hvplot.points('LONGITUDE', 'LATITUDE', geo=True, color=f"{field} - {method}", 
                 hover_cols=['ORIGIN', f"{field} - {method}"],
                 xlim=(-180, -30), ylim=(-20, 75), 
                 cmap='viridis', tiles='CartoLight')

# Uncomment the next line to see the map plot (note: this will take a bit of time)
# map_plot

## Panel's Row/Column Grid system

Panel also has a customizable template system that allows you to build apps that have a header, sidebar, main area and popup windows. For details, see: https://panel.holoviz.org/user_guide/Templates.html

Let's re-arrange the components of our dashboard with this system:

In [None]:
pn.Column(
    daterange,
    pn.Row(field, pn.Column(groupby, method)),
    data_plot.panel(),
    map_plot.panel(),
)

---

## Next →

[Big data application pipeline](./08-big-data-application-pipeline.ipynb)