<a href="https://colab.research.google.com/github/hluling/ph-dash/blob/master/interactive-data-visualization-dashboard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Creating a Dashboard for Interactive Data Visualization with Dash in Python
Use this notebook for testing specific lines of code for the TV airtime case study

Refer to [the lesson](https://programminghistorian.github.io/ph-submissions/en/drafts/originals/interactive-data-visualization-dashboard)

## Coding the Dashboard

In [None]:
# you may need to install and upgrade the following libraries
!pip install dash --upgrade
!pip install jupyter_dash --upgrade
!pip install dash_bootstrap_components --upgrade

### Import Libraries

In [3]:
import datetime
import requests
import pandas as pd
from io import StringIO
from datetime import date
import dash
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
import dash_bootstrap_components as dbc
import plotly.express as px

### Retrieve Data Using API

In [4]:
# Specify a date range
start_day_str = '20211228'
last_day_str = '20221231'

Code explanation: We first define a range of dates for the complete dataset we want to retrieve using the API. The goal here is to create two string objects: `start_day_str` and `last_day_str`. Note that here we restrict the range to be 365 days for demonstration purpose only.

In [8]:
start_day_str # using this date will return data from January 1, 2022 from the API

'20211228'

In [9]:
last_day_str

'20221231'

In [10]:
query_url_ukr = f"https://api.gdeltproject.org/api/v2/tv/tv?query=(ukraine%20OR%20ukrainian%20OR%20zelenskyy%20OR%20zelensky%20OR%20kiev%20OR%20kyiv)%20market:%22National%22&mode=timelinevol&format=html&datanorm=perc&format=csv&timelinesmooth=5&datacomb=sep&timezoom=yes&STARTDATETIME={start_day_str}120000&ENDDATETIME={last_day_str}120000"

In [11]:
query_url_rus = f"https://api.gdeltproject.org/api/v2/tv/tv?query=(kremlin%20OR%20russia%20OR%20putin%20OR%20moscow%20OR%20russian)%20market:%22National%22&mode=timelinevol&format=html&datanorm=perc&format=csv&timelinesmooth=5&datacomb=sep&timezoom=yes&STARTDATETIME={start_day_str}120000&ENDDATETIME={last_day_str}120000"

Code explanation: We create two string objects for query: one for Ukraine-related terms and one for Russia-related terms. The parameters to be specified include keywords, geographic market, output mode, output format, range of dates, etc. See [this documentation](https://blog.gdeltproject.org/gdelt-2-0-television-api-debuts/) for a complete description of query parameters.

In [12]:
def to_df(queryurl):
    response = requests.get(queryurl)
    content_text = StringIO(response.content.decode('utf-8'))
    df = pd.read_csv(content_text)
    return df

Code explanation: Now we use the `requests` library to execute the queries and transform the query results into a `pandas` dataframe. To do this, we create a function called `to_df()` to streamline the workflow.

In [13]:
df_ukr = to_df(query_url_ukr)

In [14]:
df_rus = to_df(query_url_rus)

In [15]:
# Take a look at the retrieved dataframe
df_ukr.tail()

Unnamed: 0,Date (Daily +00:00: 12/28/2021 - 12/31/2022),Series,Value
3280,2022-12-27,MSNBC,3.1592
3281,2022-12-28,MSNBC,3.4652
3282,2022-12-29,MSNBC,3.4551
3283,2022-12-30,MSNBC,3.5233
3284,2022-12-31,MSNBC,3.2535


In [16]:
df_ukr[df_ukr.Series == 'CNN'].tail()

Unnamed: 0,Date (Daily +00:00: 12/28/2021 - 12/31/2022),Series,Value
1090,2022-12-27,CNN,2.9924
1091,2022-12-28,CNN,2.7289
1092,2022-12-29,CNN,3.2791
1093,2022-12-30,CNN,3.811
1094,2022-12-31,CNN,3.4444


So now we have two dataframes: one for Ukraine and one for Russia. In either, there are three columns: date, station, and relative frequency of keyword mentions (from left to right).



### Clean Data for Further Use

In [17]:
# Rename the first column to something shorter for convenience
df_ukr = df_ukr.rename(columns={df_ukr.columns[0]: "date_col"})
df_rus = df_rus.rename(columns={df_rus.columns[0]: "date_col"})

In [18]:
# Transform the first column to the datetime format
df_ukr['date_col'] = pd.to_datetime(df_ukr['date_col'])
df_rus['date_col'] = pd.to_datetime(df_rus['date_col'])

In [19]:
# Select three stations for comparison
# CNN: Presumed to represent an ideological middle ground
# FOXNEWS: Presumed to represent the ideological conservative
# MSNBC: Presumed to represent the ideological liberal
df_rus = df_rus[[x in ['CNN', 'FOXNEWS', 'MSNBC'] for x in df_rus.Series]]
df_ukr = df_ukr[[x in ['CNN', 'FOXNEWS', 'MSNBC'] for x in df_ukr.Series]]

In [20]:
df_ukr

Unnamed: 0,date_col,Series,Value
730,2022-01-01,CNN,2.3055
731,2022-01-02,CNN,2.6079
732,2022-01-03,CNN,2.6540
733,2022-01-04,CNN,1.8096
734,2022-01-05,CNN,1.0919
...,...,...,...
3280,2022-12-27,MSNBC,3.1592
3281,2022-12-28,MSNBC,3.4652
3282,2022-12-29,MSNBC,3.4551
3283,2022-12-30,MSNBC,3.5233


### Initiate a Dashboard Instance



In [None]:
app = JupyterDash(__name__, external_stylesheets=[dbc.themes.LITERA]) # for here use JupyterDash in a Jupyter environment
server = app.server

Code explanation: This is just the formalities of creating a dashboard. To use a template that controls how our dashboard will look, we use the LITERA theme from [Dash Bootstrap Components](https://dash-bootstrap-components.opensource.faculty.ai/)(`dbc`). You can choose any theme you prefer from [this list](https://dash-bootstrap-components.opensource.faculty.ai/docs/themes/).  

### Coding the Frontend

In [22]:
app.layout = dbc.Container(
    [   dbc.Row([ # row 1
        dbc.Col([html.H1('US National Television News Coverage of the War in Ukraine')],
        className="text-center mt-3 mb-1")
    ]
    ),
        dbc.Row([ # row 2
            dbc.Label("Select a date range:", className="fw-bold")
    ]),

     dbc.Row([ # row 3
              dcc.DatePickerRange(
                id='date-range',
                min_date_allowed=df_ukr['date_col'].min().date(),
                max_date_allowed=df_ukr['date_col'].max().date(),
                initial_visible_month=df_ukr['date_col'].min().date(),
                start_date=df_ukr['date_col'].min().date(),
                end_date=df_ukr['date_col'].max().date()
              )
    ]),

     dbc.Row([ # row 4
              dbc.Col(dcc.Graph(id='line-graph-ukr'),
                      )
     ]),

    dbc.Row([ # row 5
              dbc.Col(dcc.Graph(id='line-graph-rus'),
                      )
     ])

    ])

Code explanation: Here, we need to think about the dashboard layout as a grid with rows and columns. In our dashboard, we have five rows from top to bottom: title, instruction text for the date-range selector, data-range selector, the first line graph, and the second line graph. If you want to add columns within a row, you can easily do so by nesting two `dbc.Col` components under the same `dbc.Row` component. Below is an example of placing the two line graphs side by side on the same row:

In [23]:
dbc.Row([
          dbc.Col(dcc.Graph(id='line-graph-ukr'),
                  ),
          dbc.Col(dcc.Graph(id='line-graph-rus'),
                  )
  ])

Row([Col(Graph(id='line-graph-ukr')), Col(Graph(id='line-graph-rus'))])

Also important to note in the frontend code above is that we explicitly give names to those components that are involved in user interaction. In our case, we have three such components: the data-range selector as input and the two line graphs as output (i.e., reacting to any update in the date-range selector triggered by a user). The names of these components are created using the `id` parameter. These names are very important when we code the backend.

### Coding the Backend

In [24]:
# callback decorator
@app.callback(
    Output('line-graph-ukr', 'figure'),
    Output('line-graph-rus', 'figure'),
    Input('date-range', 'start_date'),
    Input('date-range', 'end_date')
)

# callback function
def update_output(start_date, end_date):
    # filter dataframes based on updated data range
    mask_ukr = (df_ukr['date_col'] >= start_date) & (df_ukr['date_col'] <= end_date)
    mask_rus = (df_rus['date_col'] >= start_date) & (df_rus['date_col'] <= end_date)
    df_ukr_filtered = df_ukr.loc[mask_ukr]
    df_rus_filtered = df_rus.loc[mask_rus]

    # create line graphs based on filtered dataframes
    line_fig_ukr = px.line(df_ukr_filtered, x="date_col", y="Value",
                     color='Series', title="Coverage of Ukrainian Keywords")
    line_fig_rus = px.line(df_rus_filtered, x='date_col', y='Value',
                     color='Series', title="Coverage of Russian Keywords")

    # set x-axis title and y-axis title in line graphs
    line_fig_ukr.update_layout(
                   xaxis_title='Date',
                   yaxis_title='Percentage of Airtime')
    line_fig_rus.update_layout(
                   xaxis_title='Date',
                   yaxis_title='Percentage of Airtime')

    # set label format on y-axis in line graphs
    line_fig_ukr.update_xaxes(tickformat="%b %d<br>%Y")
    line_fig_rus.update_xaxes(tickformat="%b %d<br>%Y")

    return line_fig_ukr, line_fig_rus

Code explanation: In the backend, the core concepts are *callback decorator* and *callback function*. In the above code, `@app.callback`, the callback decorator, defines which output variables and input variables are included in a user interaction. For example, remember that when we code the frontend, we name the line graph for Ukraine as 'line-graph-ukr'. Now we refer this name in one of our Output variable. The parameter 'figure' specifies which property of the referred component is updated when needed.<br>

The callback function, `update_output()`, defines how the interaction occurs: The two line graphs are updated whenever the start date or the end date in the date-range selector is changed by a user. This is called *reactive programming*, similar to [the server logic used in R Shiny](https://programminghistorian.org/en/lessons/shiny-leaflet-newspaper-map-tutorial#shiny-and-reactive-programming). More detailed explanations are provided as comments in the above code. Note that the two returned objects (`line_fig_ukr` and `line_fig_rus`) should be ordered in the same way as how the output variables are ordered in the callback decorator (i.e., Ukraine's line graph goes first).

### Testing the Dashboard

In [None]:
app.run_server(debug=True)

Code explanation: Now we can run the above code to actually see and test the created dashboard. It is recommended to turn on the debug mode so that any errors can be looked into when needed.

In [None]:
# or
app.run_server(debug=True, mode="inline")
# this may not work in Google Colab