[DIY Disease Tracking Dashboard Kit](https://github.com/fsmeraldi/diy-covid19dash) (C) Fabrizio Smeraldi, 2020,2024 ([f.smeraldi@qmul.ac.uk](mailto:f.smeraldi@qmul.ac.uk) - [web](http://www.eecs.qmul.ac.uk/~fabri/)). This notebook is released under the [GNU GPLv3.0 or later](https://www.gnu.org/licenses/).

# DIY Disease Tracking Dashboard -Extracting public health data -Rolling Mean COVID-19 Dashboard


During the Pandemic, Public Health England (PHE) launched a Covid-19 dashboard. This timely service came with an Application Programming Interface (API) allowing users programmatic access to the data for the purpose of creating visualisations or data analysis. Interestingly, it also included a wrapper library written in Python, that made access to the data seamless. At the end of 2023, the PHE dashboard was replaced by the UK Health Security Agency dashboard (UKHSA dashboard). This new API, at the time of writing in the Beta stage, includes data on various infectious diseases including respiratory and gastrointestinal, bloodstream infections, and vaccine-preventable diseases. The data are better organised and documented, and many of the quirks of the old API have been fixed. An interesting feature of the new system is that all of its code has been open-sourced.

Step 1: Fetching and Storing the Data

The data was accessed through a public API provided by the UKHSA. The specific metric I worked with was: COVID-19_cases_rateRollingMean This metric represents the 7-day rolling average of daily case rates. I queried the API with relevant parameters such as geography (England), year (e.g., 2022), and page size. the data returned was in JSON format, which I saved locally in a file named:RollingMeancases.json This allowed the dashboard to load "canned" data on startup — useful in case the API is unavailable or if the user prefers not to wait for live data.

Step 2: Data Wrangling (Cleaning and Structuring) The raw JSON data was a list of dictionaries, each containing: A date An age group A sex field (ignored for our purposes) A metric_value (the case rate) This structure isn't suitable for direct plotting, so I performed the following data wrangling steps: Loaded the JSON data into Python. Converted the list of dictionaries into a Pandas DataFrame. Parsed the date strings into actual datetime objects. Pivoted the DataFrame so that: Rows = dates Columns = age groups (e.g., '0_4', '20_29', '60+', 'all') Values = rolling mean case rate (metric_value) Filled missing values with 0.0 to avoid plotting issues. This final DataFrame (meandf) became the foundation for all visualizations.

Step 3: Static and Monthly Aggregated Visualizations With the cleaned DataFrame: I created a line plot showing case rates for a specific age group ('all') over time. I also aggregated the data monthly, normalised the values (so each month’s total equals 100%), and displayed it as a horizontal stacked bar chart to highlight differences in age group contributions over time. These charts helped explore both trends over time and relative comparisons between age groups.

Step 4: Adding Interactivity with ipywidgets To make the dashboard more dynamic, I used the ipywidgets library to add interactive controls: A Dropdown widget to select an age group. A RadioButtons widget to toggle between linear and logarithmic scales on the y-axis. These controls were linked to a callback function (plot_age_group) using wdg.interactive_output(). Each time the user changes a selection, the graph updates automatically.

Final Output

The final output is an interactive dashboard that: Loads and processes COVID-19 case data Provides users with control over what age group they want to explore Allows switching between linear and log scales for better insight Optionally supports data refresh from the live API The whole dashboard is built inside a Jupyter Notebook and can be launched as a standalone web app using Voila, which hides the code and displays only the UI and output.

Technologies Used

Python (Pandas, Matplotlib, JSON) ipywidgets (for interactive controls) Voila (for dashboard deployment) UKHSA API (for real COVID-19 data)



In [113]:
from IPython.display import clear_output
import ipywidgets as wdg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import time
import json

In [114]:
%matplotlib inline
# make figures larger
plt.rcParams['figure.dpi'] = 100

## Load initial data from disk

You should include "canned" data in ```.json``` files along with your dashboard. When the dashboard starts, it should load that data and assign it as a dictionary to the ```jsondata``` variable (the code below will be hidden when the dashboard is rendered by Voila).

In [115]:
# Load JSON files and store the raw data in some variable. Edit as appropriate
jsondata={}

## Wrangle the data

The dashboard should contain the logic to wrangle the raw data into a ```DataFrame``` (or more than one, as required) that will be used for plotting. The wrangling code should be put into a function and called on the data from the JSON file (we'll need to call it again on any data downloaded from the API).  In this template, we just pretend we are wrangling ```rawdata``` and instead generate a dataframe with some random data

In [116]:
def wrangle_data(rawdata):
    """ Parameters: rawdata - data from json file or API call. Returns a dataframe.
    Edit to include the code that wrangles the data, creates the dataframe and fills it in. """
    df=pd.DataFrame(index=range(0,100), columns=['One', 'Two'])
    # we have no real data to wrangle, so we just generate two random walks.
    one=two=0.0
    for i in range(0,100):
        df.loc[i,'One']=one
        df.loc[i,'Two']=two
        one+=np.random.randn()
        two+=2*np.random.randn()
    return df

# putting the wrangling code into a function allows you to call it again after refreshing the data through 
# the API. You should call the function directly on the JSON data when the dashboard starts, by including 
# the call in this cell as below:
df=wrangle_data(jsondata) # df is the dataframe for plotting

## Download current data


 The code above implements a 'Refresh' button that allows users to update the dataset dynamically by retrieving the most recent data from an external API. This feature is useful when working with live datasets, such as those from public health APIs, where the data changes frequently. Instead of manually re-running code or reloading the notebook, users can simply click the button to fetch the latest available data. When the button is clicked, it triggers the api_button_callback() function. This function calls access_api(), which is where the actual connection to the external data source should happen. Although the access_api() function is currently a placeholder, it is intended to return new raw data from the API. Once the data is retrieved, it is cleaned and structured using a separate function called wrangle_data(), and the result replaces the existing global variable df, which is used to power your graphs or visualisations. However, graphs in Jupyter or Voilà dashboards do not update automatically when data changes. To solve this, the function also includes a call to refresh_graph(), which simulates user interaction with the widgets and forces the graph to redraw using the updated data. To improve user experience, the button also changes its icon to a green checkmark once the process is successfully completed, giving clear visual feedback that the data has been refreshed. Overall, this interactive feature is essential in making your dashboard dynamic, user-friendly, and responsive to new information. It enables users to work with real-time data confidently, while reducing the manual steps needed to keep visualisations current.

In [117]:
# Place your API access code in this function. Do not call this function directly; it will be called by 
# the button callback. 
def access_api():
    """ Accesses the UKHSA API. Return data as a like-for-like replacement for the "canned" data loaded from the JSON file. """
    return {} # return data read from the API

In [118]:
# Printout from this function will be lost in Voila unless captured in an
# output widget - therefore, we give feedback to the user by changing the 
# appearance of the button
def api_button_callback(button):
    """ Button callback - it must take the button as its parameter (unused in this case).
    Accesses API, wrangles data, updates global variable df used for plotting. """
    # Get fresh data from the API. If you have time, include some error handling
    # around this call.
    apidata=access_api()
    # wrangle the data and overwrite the dataframe for plotting
    global df
    df=wrangle_data(apidata)
    # the graph won't refresh until the user interacts with the widget.
    # this function simulates the interaction, see Graph and Analysis below.
    # The function needs to be adapted to your graph; you can omit this call
    # in the first instance
    refresh_graph()
    # after all is done, you can switch the icon on the button to a "check" sign
    # and optionally disable the button - it won't be needed again. If you are 
    # implementing error handling, you can use icons "unlink" or "times" and 
    # change the button text to "Unavailable" when the api call fails.
    apibutton.icon="check"
    # apibutton.disabled=True

    
apibutton=wdg.Button(
    description='REFRESH', # you may want to change this...
    disabled=False,
    button_style='info', # 'success', 'info', 'warning', 'danger' or ''
    tooltip="Keep calm and carry on",
    # FontAwesome names without the `fa-` prefix - try "download"
    icon='exclamation-triangle'
)

# remember to register your button callback function with the button
apibutton.on_click(api_button_callback) # the name of your function inside these brackets

display(apibutton)

# run all cells before clicking on this button

Button(button_style='info', description='REFRESH', icon='exclamation-triangle', style=ButtonStyle(), tooltip='…

## Graphs and Analysis

Include at least one graph with interactive controls, as well as some instructions for the user and/or comments on what the graph represents and how it should be explored (this example shows two random walks)

age_selector = wdg.Dropdown(
    options=sorted(df.columns),
    value='all',
    description='Age Group:',
    disabled=False
)

Plot function
def plot_age_group(selected_age):
    plt.figure(figsize=(12, 6))
    df[selected_age].plot(color='blue', label=selected_age)
    plt.title(f"Rolling Mean COVID-19 Cases for Age Group: {selected_age}")
    plt.xlabel("Date")
    plt.ylabel("Rolling Mean Case Rate")
    plt.grid(True)
    plt.legend()
    plt.tight_layout()
    plt.show()

Graph output
graph = wdg.interactive_output(plot_age_group, {'selected_age': age_selector})

Display controls and graph
display(age_selector, graph)

**Author and License** Remember that if you deploy your dashboard as a Binder it will be publicly accessible. Change the copyright notice and take credit for your work! Also acknowledge your sources and the conditions of the license by including this notice: "Based on UK Government [data](https://ukhsa-dashboard.data.gov.uk/) published by the [UK Health Security Agency](https://www.gov.uk/government/organisations/uk-health-security-agency) and on the [DIY Disease Tracking Dashboard Kit](https://github.com/fsmeraldi/diy-covid19dash) by Fabrizio Smeraldi. Released under the [GNU GPLv3.0 or later](https://www.gnu.org/licenses/)."

In [119]:
import json

with open("RollingMeancases.json", "rt") as INFILE:
    jsondata = json.load(INFILE)

In [120]:
from datetime import datetime

def parse_date(s):
    return datetime.strptime(s, "%Y-%m-%d")

def wrangle_data(rawdata):
    data = {}
    for entry in rawdata:
        date = entry['date']
        age = entry['age']
        value = entry['metric_value']
        if date not in data:
            data[date] = {}
        data[date][age] = value

    dates = list(data.keys())
    dates.sort()
    startdate = parse_date(dates[0])
    enddate = parse_date(dates[-1])

    age_groups = []
    for entry in data.values():
        for x in entry.keys():
            if x not in age_groups:
                age_groups.append(x)

    age_groups.sort()

    index = pd.date_range(startdate, enddate, freq='W-MON')
    df = pd.DataFrame(index=index, columns=age_groups)

    for date, entry in data.items():
        pd_date = parse_date(date)
        if pd_date in df.index:
            for column in entry.keys():
                df.loc[pd_date, column] = entry[column]

    df.fillna(0.0, inplace=True)
    df = df.apply(pd.to_numeric, errors='coerce')
    return df

In [121]:
df = wrangle_data(jsondata)

In [122]:
def access_api():
   
    return jsondata  

In [123]:
age_selector = wdg.Dropdown(
    options=sorted(df.columns),
    value='all',
    description='Age Group:',
    disabled=False
)

# Plot function
def plot_age_group(selected_age):
    plt.figure(figsize=(12, 6))
    df[selected_age].plot(color='blue', label=selected_age)
    plt.title(f"Rolling Mean COVID-19 Cases for Age Group: {selected_age}")
    plt.xlabel("Date")
    plt.ylabel("Rolling Mean Case Rate")
    plt.grid(True)
    plt.legend()
    plt.tight_layout()
    plt.show()

# Graph output
graph = wdg.interactive_output(plot_age_group, {'selected_age': age_selector})

# Display controls and graph
display(age_selector, graph)

Dropdown(description='Age Group:', index=20, options=('00-04', '05-09', '10-14', '15-19', '20-24', '25-29', '3…

Output()

In [124]:
rollingmeandf=pd.read_pickle("meandf.pkl")

In [125]:
rollingmeandf = rollingmeandf.apply(pd.to_numeric, errors='coerce')

In [126]:
import calendar

month_numbers = sorted(rollingmeandf.index.month.unique())
month_names = [calendar.month_name[m] for m in month_numbers]
month_map = dict(zip(month_names, month_numbers))

month = wdg.Select(
    options=month_names,
    value=month_names[-1],
    rows=1,
    description='Month:',
    disabled=False
)

def rollingmean_month_graph(graphmonth):
    # Convert month name to number
    graphmonth_num = month_map[graphmonth]

    # Filter dataframe for that month
    monthdf = rollingmeandf[rollingmeandf.index.month == graphmonth_num]

    # Group by week and compute mean
    weekly = monthdf.groupby(pd.Grouper(freq='1W')).mean()
    totals = weekly.sum(axis=1)
    weekly = weekly.div(totals, axis=0) * 100
    weekly = weekly[::-1]  # reverse order for plotting

    # Plot
    ax = weekly.plot(kind='barh', stacked=True, cmap='tab20')
    ax.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
    ax.set_yticklabels(weekly.index.strftime('%Y-%m-%d'))
    ax.set_title(f'Rolling Mean Cases for {graphmonth}')
    plt.show()

output_month = wdg.interactive_output(rollingmean_month_graph, {'graphmonth': month})

display(month, output_month)

Select(description='Month:', index=10, options=('February', 'March', 'April', 'May', 'June', 'July', 'August',…

Output()