## COVID 19 Data Visualiser by Pietro Capece Galeota

# Importing all necessary modules

For this coursework I shall be using the uk_covi19 module to obtain the data to plot, the json module to handle the JSON file that the API responds with, the pandas library to format the data in the JSON, the matplotlib library to plot the data into graphs, and the ipywidgets module to display the graphs in a widget.

In [67]:
from uk_covid19 import Cov19API
import json
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as wdg

# Creating API query

To send the API query we require two variables to be introduced in the call to specify what type of data we want to retrieve.

First we collect data for cases, cases where hospitalisation was necessary, and the cumulative death rates:

In [68]:
filters1 = [
    'areaType=overview'
]

In [69]:
structure1 = {
    "Date": "date",
    "Cases": "newCasesByPublishDate",
    "Hospitalised": "newAdmissions",
    "Deaths": "cumDeaths28DaysByDeathDateRate"    
}

In [70]:
api = Cov19API(filters=filters1, structure=structure1)

In [71]:
timeseries=api.get_json()

Then we collect data for male and female cases:

In [72]:
filters2 = [
    'areaType=nation',
    'areaName=England'
]
structure2 = {
    "Males": "maleCases",
    "Females": "femaleCases"
}

In [73]:
api = Cov19API(filters=filters2, structure=structure2)

In [74]:
agedistribution=api.get_json()

We also define the button that allows to refresh this data:

In [75]:
def access_api(button):
    print("Downloading API data...")
    api = Cov19API(filters=filters1, structure=structure1)
    timeseries = api.get_json
    api = Cov19API(filters=filters2, structure=structure2)
    agedistribution = api.get_json
    if timeseries != None and agedistribution != None:
        print("API data was refreshed")
    else:
        print("Initial API call failed, trying again. If this keeps happening, check your internet connection!")
        api = Cov19API(filters=filters1, structure=structure1)
        timeseries = api.get_json
        api = Cov19API(filters=filters2, structure=structure2)
        agedistribution = api.get_json
 
apibutton=wdg.Button(
    description='Refresh data',
    disabled=False,
    button_style='info',
    tooltip='Click to download current Public Health England data',
    icon='download'
)
apibutton.on_click(access_api)
display(apibutton)
print()

Button(button_style='info', description='Refresh data', icon='download', style=ButtonStyle(), tooltip='Click t…




# Saving API query as JSON

We save the output of the previous query into a JSON file below:

In [76]:
with open("timeseries.json", "wt") as OUTF:
    json.dump(timeseries, OUTF)

In [77]:
with open("agedistribution.json", "wt") as OUTF:
    json.dump(agedistribution, OUTF)

Here we enable MatPlotLib output:

In [78]:
%matplotlib inline
plt.rcParams['figure.dpi'] = 100

# Loading JSON

Next we save the list of dictionaries contained in the JSON file as list.

In [79]:
with open("timeseries.json", "rt") as INFILE:
    data=json.load(INFILE)

In [80]:
datalist=data['data']

Now we extract the dates from the individual dictionary entries in the list, then sort them from newest to oldest.

In [81]:
dates=[dictionary['Date'] for dictionary in datalist ]
dates.sort()

# Parsing data to Pandas

Once the dates have been obtained, they must be parsed into a format that can be transformed into a graph.

In [82]:
def parse_date(datestring):
    """ Convert a date string into a pandas datetime object """
    return pd.to_datetime(datestring, format="%Y-%m-%d")

This function above will transform a date into a pandas date time type, so it will allow us to plot it. Below we apply this function to start and end dates, so we know the range of our table.

In [83]:
startdate=parse_date(dates[0])
enddate=parse_date(dates[-1])

# Creating Dataframe

Here we instatiate a dataframe with the data we have so far. Our previously parsed dates are used as the range, whilst the columns for the data have the name of the data we extracted from the API for convenience.

In [84]:
index=pd.date_range(startdate, enddate, freq='D')
timeseriesdf=pd.DataFrame(index=index, columns=['Cases', 'Hospitalised', 'Deaths'])

# Populating Dataframe

Below we iterate through the list of dictionaries obtained from the JSON by using the date in which the data was collected and then iterate through the elements of the dictionary by iterating through the list of keys in those dictionaries.

In [85]:
for entry in datalist:
    date=parse_date(entry['Date'])
    for column in ['Cases', 'Hospitalised', 'Deaths']:
        if pd.isna(timeseriesdf.loc[date, column]): 
            value= float(entry[column]) if entry[column]!=None else 0.0
            timeseriesdf.loc[date, column]=value
            
timeseriesdf.fillna(0.0, inplace=True)

We can then plot the populated dataframe using Matplotlib. Below is the graph of cases:

In [86]:
#timeseriesdf.plot()

We can also plot this graph on a logarythmic scale.

In [87]:
#timeseriesdf.plot(logy=True)

# Getting the Male and Female graphs

Next we create a graph of the case for Males and Females. First we open the file we retrieved from the API.

In [88]:
with open("agedistribution.json", "rt") as INFILE:
    data=json.load(INFILE)

Then we save the data in the file as a dictionary for males and females.

In [89]:
datadic=data['data'][0]

In [90]:
males=datadic['Males']
females=datadic['Females']

Here we collect the ange ranges present in the data. BEcause the ange ranges are the same for male and female we can do it only for males.

In [91]:
ageranges=[x['age'] for x in males]

Next we define a function to turn the age range as a string into a number. We remove the characters around the age range and select the lowest number as the number to plot. Then we find the minimum age of the age ranges for the male and female groups.

In [92]:
def min_age(agerange):
    agerange=agerange.replace('+','')
    start=agerange.split('_')[0]
    return int(start)

In [93]:
ageranges.sort(key=min_age)

Here we create a new dataframe to contain the female and male columns, and using the newly transformed age ranges as the index.

In [94]:
age_df=pd.DataFrame(index=ageranges, columns=['Males','Females', 'Total'])

Then we populate the dataframe with data from our "males" and "females" lists.

In [95]:
for entry in males:
    ageband=entry['age']
    age_df.loc[ageband, 'Males']=entry['value']
    
for entry in females:
    ageband=entry['age']
    age_df.loc[ageband, 'Females']=entry['value']
    
age_df['Total']=age_df['Males']+age_df['Females']

And then we plot this data as a bar chart, one for the Males and Females comaparison, one for the total cases.

In [96]:
#age_df.plot(kind='bar', y=['Males','Females', 'Total'])

In [97]:
#age_df.plot(kind='bar', y='Total')

Next we save the data to a pickle file.

In [98]:
timeseriesdf.to_pickle("timeseriesdf.pkl")
age_df.to_pickle("agedf.pkl")

In [99]:
age_df=pd.read_pickle("agedf.pkl")

Below we create a widget to allow us to toggle between the different graphs that we defined above.

In [100]:
agecols=wdg.SelectMultiple(
    options=['Males', 'Females', 'Total'],
    value=['Males', 'Females'],
    rows=3,
    description='Sex',
    disabled=False
)

def age_graph(graphcolumns):
    ncols=len(graphcolumns)
    if ncols>0:
        age_df.plot(kind='bar', y=list(graphcolumns))
    else:
        print("Click to select data for graph")
        print("(CTRL-Click to select more than one category)")
       
output=wdg.interactive_output(age_graph, {'graphcolumns': agecols})

display(agecols, output)

SelectMultiple(description='Sex', index=(0, 1), options=('Males', 'Females', 'Total'), rows=3, value=('Males',…

Output()

In [101]:
timeseriesdf=pd.read_pickle("timeseriesdf.pkl")

Here we define a widget to allow us to toggle inbetween the first graphs that we plotted.

In [102]:
series=wdg.SelectMultiple(
    options=['Cases', 'Hospitalised', 'Deaths'],
    value=['Cases', 'Hospitalised', 'Deaths'],
    rows=3,
    description='Stats:',
    disabled=False
)

scale=wdg.RadioButtons(
    options=['linear', 'log'],
    description='Scale:',
    disabled=False
)

controls=wdg.HBox([series, scale])

def timeseries_graph(gcols, gscale):
    if gscale=='linear':
        logscale=False
    else:
        logscale=True
    ncols=len(gcols)
    if ncols>0:
        timeseriesdf[list(gcols)].plot(logy=logscale)
    else:
        print("Click to select data for graph")
        print("(CTRL-Click to select more than one category)")

graph=wdg.interactive_output(timeseries_graph, {'gcols': series, 'gscale': scale})

#display(controls, graph)

Lastly we define the position of the widgets.

In [103]:
ctrls=wdg.VBox([series, scale])
form=wdg.HBox([graph, ctrls])
display(form)

HBox(children=(Output(), VBox(children=(SelectMultiple(description='Stats:', index=(0, 1, 2), options=('Cases'…

That's all folks! Thank you for taking the time to look at my first project in Python!