# Public health dashboards in the UK

During the Pandemic, Public Health England (PHE) launched a Covid-19 dashboard. This timely service came with an Application Programming Interface ([API](https://en.wikipedia.org/wiki/API)) allowing users programmatic access to the data for the purpose of creating visualisations or data analysis. Interestingly, it also included a wrapper library written in Python, that made access to the data seamless. At the end of 2023, the PHE dashboard was [replaced](https://ukhsa.blog.gov.uk/2023/12/21/ukhsa-data-dashboard-takes-over-from-the-coronavirus-covid-19-dashboard/) by the UK Health Security Agency dashboard ([UKHSA dashboard](https://ukhsa-dashboard.data.gov.uk/)). This new API, at the time of writing in the [Beta](https://en.wikipedia.org/wiki/Software_release_life_cycle) stage, includes data on various infectious diseases including respiratory and gastrointestinal, bloodstream infections, and vaccine-preventable diseases. The data are better organised and documented, and many of the quirks of the old API have been fixed.  An interesting feature of the new system is that all of its [code](https://ukhsa-dashboard.data.gov.uk/coding-in-the-open) has been open-sourced (though this is the type of read you most likely want to save for a later time!). On the down side, the Python wrapper has for the time being been dropped, so one needs to access the API directly, via *http* requests.

In this series of notebooks, we will guide you through creating your own simple dashboard based on UKHSA data and putting it online as a [Binder](https://mybinder.org/). 

## The UKHSA web-based API 

Many websites support access to their underlying data through a web-based [Application Programming Interface](https://en.wikipedia.org/wiki/API) (an API for short). This is often based on the *http* protocol, and may involve the excange of information in [JSON format](https://en.wikipedia.org/wiki/JSON). Specifically, using a web-based API typically involves sending *http* requests with parameters conforming to a given schema to a dedicated URL (the API *endpoint*), to which the server responds with JSON content. All this is specified in the UKHSA [API documentation](https://ukhsa-dashboard.data.gov.uk/access-our-data); this is a good point to start reading it.

Briefly, the [data structure](https://ukhsa-dashboard.data.gov.uk/access-our-data/data-structure) can be navigated as a URL path:
```
/themes/{theme}/sub_themes/{sub_theme}/topics/{topic}/geography_types/{geography_type}/geographies/{geography}/metrics/{metric}
```
where the parts in curly braces need to be replaced by the desired themes, topics, etc of interest, as specified in the documentation. In particular, the ```{metric}``` section specifies the actual statistics to be downloaded; see the [documentation on metrics](https://ukhsa-dashboard.data.gov.uk/metrics-documentation?search=) for a searchable index.

The URL path needs to be appended to the API *access point*, which is 
```
https://api.ukhsa-dashboard.data.gov.uk/
```
to obtain a complete URL (the *endpoint*) for the data of interest. For example, some browsing through the documentation shows that the number of ```COVID-19``` (a ```respiratory``` ```infectious_disease```) cases by day for ```England``` can be obtained from the following endpoint: 
```
https://api.ukhsa-dashboard.data.gov.uk/themes/infectious_disease/sub_themes/respiratory/topics/COVID-19/geography_types/Nation/geographies/England/metrics/COVID-19_cases_casesByDay
```
Note that opening [the above link](https://api.ukhsa-dashboard.data.gov.uk/themes/infectious_disease/sub_themes/respiratory/topics/COVID-19/geography_types/Nation/geographies/England/metrics/COVID-19_cases_casesByDay) or [the access point](https://api.ukhsa-dashboard.data.gov.uk/) in a browser actually returns documentation. This is useful as you can navigate the API structure and figure out the various options at each level. However, the response changes when you access this link via a utility such as ```wget``` or programmatically. On the Jhub, Unix, Linux or MacOS, try the cells below. 

In [None]:
# Will only work on the JHub or Unix-like systems;
# saves data to the COVID-19_cases_casesByDay file
!wget https://api.ukhsa-dashboard.data.gov.uk/themes/infectious_disease/sub_themes/respiratory/topics/COVID-19/geography_types/Nation/geographies/England/metrics/COVID-19_cases_casesByDay

In [None]:
# Will only work on the JHub or Unix-like systems;
# displays the contents of the file
!cat COVID-19_cases_casesByDay

On other systems you can force this behaviour by adding the parameter ```?format=json``` to the URL, as done [here](https://api.ukhsa-dashboard.data.gov.uk/themes/infectious_disease/sub_themes/respiratory/topics/COVID-19/geography_types/Nation/geographies/England/metrics/COVID-19_cases_casesByDay?format=json); however, the browser may pretty-print the response by default, so if you want to see the raw data you should view the page source.

The above response is formatted in [JSON format](https://en.wikipedia.org/wiki/JSON), a standard format for exchanging object information that's generally human readable and that closely resembles the notation for nested lists and dictionaries in Python.

Finally, the API offers access to [Swagger documentation](https://api.ukhsa-dashboard.data.gov.uk/api/swagger), that allows you to try the various parameters in an interactive way. You may want to use it to test the parameters of interest to you.

# Using the "requests" library

We will access the API using the [requests](https://requests.readthedocs.io/en/latest/) library, a high-level *http* library that is not part of the standard library but that's bundled with most Python distributions. For example:

In [None]:
import requests

# Same as the above but in Python, with a couple of extra parameters added to the URL
requests.get("https://api.ukhsa-dashboard.data.gov.uk/themes"
             "/infectious_disease/sub_themes/respiratory/topics"
             "/COVID-19/geography_types/Nation/geographies/England"
             "/metrics/COVID-19_cases_casesByDay", 
             params={'year': 2022, 'page_size': 3, 'page': 2}).json()

This is the same as the request issued via ```wget```, except that we added a few  parameters to the URL requesting the ```year``` 2022, a ```page_size``` of 3, meaning 3 data points are returned with each query, and ```page``` number equal 2.

As you can see, the decoded API response is organised as a dictionary. Notable features are:
* ```count```: the total number of data available for this request (365, since we requested one year);
* ```next```: the URL of the next page, if any exists (here, the URL for page 3);
* ```previous```: the URL of the previous page, if any exists (here the 1st page, with no page number);
* ```results```: a list of dictionaries with the actual data points ([] if there are no more data, or no data at all). Because ```page_size``` is 3 and ```page``` is 2, we get the data for the 4th, 5th and 6th day of the year.

The interesting data are in this case contained in the ```date``` and ```metric_value``` fields.

# A simple API wrapper object

This is all that there is to it, really. However, it is convenient to have some code to build the endpoint from the structure and handle the paging. The ```APIwrapper``` class implements the following functionality:
* ```.__init__()```: builds the URL of the endpoint starting from the structure parameters;
* ```.get_page()```: returns at each call the results for the next page, allowing you to specify filters and page size;
* ```.count```: after the first API call, this attribute is set to the number of data points available.

The class also restricts API access rates to a maximum of 3 requests/second, to prevent you from getting banned. You may want to start playing with this class as is, but feel free to modify it or subclass it as suits you best.

In [None]:
import requests
import time

class APIwrapper:
    # class variables shared among all instances
    _access_point="https://api.ukhsa-dashboard.data.gov.uk"
    _last_access=0.0 # time of last api access
    
    def __init__(self, theme, sub_theme, topic, geography_type, geography, metric):
        """ Init the APIwrapper object, constructing the endpoint from the structure
        parameters """
        # build the path with all the required structure parameters
        url_path=(f"/themes/{theme}/sub_themes/{sub_theme}/topics/{topic}/geography_types/" +
                  f"{geography_type}/geographies/{geography}/metrics/{metric}")
        # our starting API endpoint
        self._start_url=APIwrapper._access_point+url_path
        self._filters=None
        self._page_size=-1
        # will contain the number of items
        self.count=None

    def get_page(self, filters={}, page_size=5):
        """ Access the API and download the next page of data. Sets the count
        attribute to the total number of items available for this query. Changing
        filters or page_size will cause get_page to restart from page 1. Rate
        limited to three request per second. The page_size parameter sets the number
        of data points in one response page (maximum 365); use the default value 
        for debugging your structure and filters, increase when you start looping 
        over all pages. """
        # Check page size is within range
        if page_size>365:
            raise ValueError("Max supported page size is 365")
        # restart from first page if page or filters have changed
        if filters!=self._filters or page_size!=self._page_size:
            self._filters=filters
            self._page_size=page_size
            self._next_url=self._start_url
        # signal the end of data condition
        if self._next_url==None: 
            return [] # we already fetched the last page
        # simple rate limiting to avoid bans
        curr_time=time.time() # Unix time: number of seconds since the Epoch
        deltat=curr_time-APIwrapper._last_access
        if deltat<0.33: # max 3 requests/second
            time.sleep(0.33-deltat)
        APIwrapper._last_access=curr_time
        # build parameter dictionary by removing all the None
        # values from filters and adding page_size
        parameters={x: y for x, y in filters.items() if y!=None}
        parameters['page_size']=page_size
        # the page parameter is already included in _next_url.
        # This is the API access. Response is a dictionary with various keys.
        # the .json() method decodes the response into Python object (dictionaries,
        # lists; 'null' values are translated as None).
        response = requests.get(self._next_url, params=parameters).json()
        # update url so we'll fetch the next page
        self._next_url=response['next']
        self.count=response['count']
        # data are in the nested 'results' list
        return response['results'] 

To start with, you may want to define a ```structure``` dictionary that initially contains the main parameters of your query as defined [here](https://api.ukhsa-dashboard.data.gov.uk/), for instance:

In [None]:
structure={"theme": "infectious_disease", 
           "sub_theme": "respiratory",
           "topic": "COVID-19",
           "geography_type": "Nation", 
           "geography": "England"}

You may then want to add the specific [metric](https://ukhsa-dashboard.data.gov.uk/metrics-documentation?search=) you are interested in (we'll download more than one):

In [None]:
# COVID-19 cases by day
structure["metric"]="COVID-19_cases_casesByDay" 

At this point, all that you have to do is create the ```APIwrapper``` object and call ```get_page()``` to retrieve the first page of data:

In [None]:
# ** unpacks the structure dictionary over the __init__ arguments
api=APIwrapper(**structure)
data=api.get_page() # default size is 5
print(api.count)
print(data)

If you want, you can define a dictionary of query parameters to filter your results (see for instance [here](https://api.ukhsa-dashboard.data.gov.uk/themes/infectious_disease/sub_themes/respiratory/topics/COVID-19/geography_types/Nation/geographies/England/metrics/COVID-19_cases_casesByDay)):


In [None]:
# Let's filter for the year 2022.
# None values will be ignored by the APIwrapper

filters={"stratum" : None, # Smallest subgroup a metric can be broken down into e.g. ethnicity, testing pillar
         "age": None, # Smallest subgroup a metric can be broken down into e.g. 15_44 for the age group of 15-44 years
         "sex": None, #  Patient gender e.g. 'm' for Male, 'f' for Female or 'all' for all genders
         "year": 2022, #  Epi year of the metrics value (important for annual metrics) e.g. 2020
         "month": None, # Epi month of the metric value (important for monthly metrics) e.g. 12
         "epiweek" :None, # Epi week of the metric value (important for weekly metrics) e.g. 30
         "date" : None, # The date which this metric value was recorded in the format YYYY-MM-DD e.g. 2020-07-20
         "in_reporting_delay_period": None # Boolean indicating whether the data point is considered to be subject to retrospective updates
        }

You can use pass this to ```get_page``` as follows:

In [None]:
data_2022=api.get_page(filters, page_size=3)
print(api.count)
print(data_2022)

Note that not all metrics are available for all structures, and not all filters apply to each metric. If these are mismatched, the API will fail silently returning a count of 0 and an empty list (for instance, try replacing ```"England"``` with ```"Scotland"``` in the structure. The documentation is sparse, so it may take some trial and error before you get the result you want. I suggest you keep the ```page_size``` low while you experiment; the default value of 5 is adequate.

## Downloading all data for cases, hospital admissions and deaths

As an example, let us download all data for daily cases. Once we have checked that our structure and metric work as expected, we can simply increase the page number and write a loop to go through all the pages:

In [None]:
# The original structure, just in case you edited it:
structure={"theme": "infectious_disease", 
           "sub_theme": "respiratory",
           "topic": "COVID-19",
           "geography_type": "Nation", 
           "geography": "England"}

In [None]:
structure["metric"]="COVID-19_cases_casesByDay"
api=APIwrapper(**structure)
cases=[]
page=1
while True:
    data=api.get_page(page_size=365)
    print(f"Pages retrieved: {page}")
    if data==[]:
        break
    cases.extend(data)
    page+=1
print(f"Data points expected: {api.count}")
print(f"Data points retrieved: {len(cases)}")

After checking that the metrics work, we can do the same for admissions and deaths:

In [None]:
structure["metric"]="COVID-19_healthcare_admissionByDay"
# the structure has changed, so we need to create a new object
api=APIwrapper(**structure)
admissions=[]
while True:
    data=api.get_page(page_size=365)
    if data==[]:
        break
    admissions.extend(data)


and finally

In [None]:
structure["metric"]="COVID-19_deaths_ONSByDay"
api=APIwrapper(**structure)
deaths=[]
while True:
    data=api.get_page(page_size=365)
    if data==[]:
        break
    deaths.extend(data)


You can obviously save yourself the trouble of cutting and pasting these loops by defining a small function, or even better by adding a suitable ```.get_all_pages()``` method to the ```APIwrapper``` class.

## Another example: lineage prevalence

The example above lends itself to visualisation as a plot of daily cases, hospital admissions and fatalities vs time. In this example, instead, we investigate the prevalence of the various Covid variants as a fraction of the total; this data is available weekly and varies more slowly. Eventually, we'll display this as a stacked bar chart.

Again we define our  *metric* and test it out as follows: 

In [None]:
structure["metric"]="COVID-19_cases_lineagePercentByWeek" 
api=APIwrapper(**structure)
data=api.get_page(page_size=3)
print(data)

As can be seen, here there is more than one data point per date, and the name of the variant is included under the ```stratum``` field. We will worry about this in the next notebook. In the meanwhile, let us retrieve all the data for this metric:

In [None]:
structure["metric"]="COVID-19_cases_lineagePercentByWeek" 
api=APIwrapper(**structure)
lineage=[]
while True:
    data=api.get_page(page_size=365)
    if data==[]:
        break
    lineage.extend(data)
print(f"Data points expected: {api.count}")
print(f"Data points retrieved: {len(lineage)}")

# Saving the data in JSON format

At this point, we want to save the result of our API queries in order to 
* have something definite to work on in the other notebooks
* eventually, give our dashboard some starting data.

The problem arises of how to save these dictionaries to the disk. Luckily we do not have to save them in a bespoke way at this stage - we can use the [json module](https://docs.python.org/3/library/json.html) in the stardard library to dump them sa they are in [JSON format](https://en.wikipedia.org/wiki/JSON). This is straightforward:

In [None]:
import json

In [None]:
with open("cases.json", "wt") as OUTF:
    json.dump(cases, OUTF)

In [None]:
with open("admissions.json", "wt") as OUTF:
    json.dump(admissions, OUTF)

In [None]:
with open("deaths.json", "wt") as OUTF:
    json.dump(deaths, OUTF)

In [None]:
with open("lineage.json", "wt") as OUTF:
    json.dump(lineage, OUTF)

If you now use a text editor (or the Jupyter Notebook interface), you will see that the content of the files closely resembles the tangle of dictionaries and lists we have seen above. However, technically, these are no longer Python dictionary and files, rather the JSON representation of them, and could be opened by another program written in another language, that will map them to an equivalent data structure (whichever is provided by that language).

## Your turn

Explore the various [structures](https://api.ukhsa-dashboard.data.gov.uk/) and [metrics](https://ukhsa-dashboard.data.gov.uk/metrics-documentation?search=) available for the various diseases, and think of a query that may be of interest to you, and how you might then want to visualise the data. You can modify either the *structure*, in order to select different diseases and locations, or the *metrics*, to specify different statistics. Possible graphs of interest might include
* comparison of statistics at different sites or in different years;
* comparison between different respiratory diseases;
* comparisons between different age groups, where available;
* and so on... the choice is yours!
  
Please keep in mind the following points:
* Not all metrics are available for all dates, or at all levels of granularity; querying for data that's unavailable will not return any results.
* Documentation is somewhat lacking - welcome to the real world. A BSc in Reverse Engineering would come in handy.
* Experimenting is fine; the [Swagger documentation](https://api.ukhsa-dashboard.data.gov.uk/api/swagger) may be of help.
* Avoid flooding the server with multiple queries at machine speed - the last thing you want is for UKHSA  to ban you. The rate limiting in ```APIwrapper``` is there for a reason - do not remove it.

Once you succeed in retrieving the data you want, save them in JSON format and move on to the next stage - visualisation.


**(C) 2020,2024 Fabrizio Smeraldi** ([f.smeraldi@qmul.ac.uk](mailto:f.smeraldi@qmul.ac.uk) - [web](http://www.eecs.qmul.ac.uk/~fabri/)). This notebook is released under the [GNU GPLv3.0 or later](https://www.gnu.org/licenses/).