## Get daily COVID19 report data as tidy data using the covid19.mathdro.id API

The covid19.mathdro.id API (documentation at https://github.com/mathdroid/covid-19-api ) can return ALL daily reports using this method:

- `/api/daily/[date]`: detail of updates in a [date] (e.g. /api/daily/2-14-2020)

What we'd like to do is collect ALL of the reports, and compile into "tidy data" format that we can then work with.



### Set up Python libraries and API base

In [0]:
import requests
import pandas as pd
from datetime import datetime

API_BASE_URL = 'https://covid19.mathdro.id/api'

### Get daily case summaries

We first need to know which dates we can request reports from.  We're going to use the result of the `/daily` API method just to get the range of dates available:

In [0]:
daily_cases = requests.get(API_BASE_URL + '/daily').json()

Go through the cases and pull out the `reportDateString` value, adding these to a Python list as we go along.

In [0]:
day_data = []
for day_cases in daily_cases:
  # day_cases['reportDateString'] = pd.to_datetime(day_cases['reportDateString'])
  day_data.append(day_cases['reportDateString'])

This is what our list looks like:

In [4]:
day_data[1:10]

['2020/01/21',
 '2020/01/22',
 '2020/01/23',
 '2020/01/24',
 '2020/01/25',
 '2020/01/26',
 '2020/01/27',
 '2020/01/28',
 '2020/01/29']

### Request daily detail for each day

Using the `daily/[date]` API method, and our list of dates, we get a list of daily detail reports (one for each location with a report for that day).  This entails 1 API call per day available.  We'll store these report lists in a list which we'll call `all_days_detail`.

Note that we have to convert date formats to make this work.

In [5]:
all_days_detail = []
for day in day_data:
  # Convert 2020/12/01 to 12-1-2020 without the zeroes
  day_as_datetime = datetime.strptime(day, '%Y/%m/%d')
  date_day = day_as_datetime.day
  date_month = day_as_datetime.month
  date_year = day_as_datetime.year
  day_with_dashes = '%d-%d-%d' % (date_month, date_day, date_year)
  print(API_BASE_URL + '/daily/' + day_with_dashes)
  day_detail = requests.get(API_BASE_URL + '/daily/' + day_with_dashes).json()
  all_days_detail.append(day_detail)

https://covid19.mathdro.id/api/daily/1-20-2020
https://covid19.mathdro.id/api/daily/1-21-2020
https://covid19.mathdro.id/api/daily/1-22-2020
https://covid19.mathdro.id/api/daily/1-23-2020
https://covid19.mathdro.id/api/daily/1-24-2020
https://covid19.mathdro.id/api/daily/1-25-2020
https://covid19.mathdro.id/api/daily/1-26-2020
https://covid19.mathdro.id/api/daily/1-27-2020
https://covid19.mathdro.id/api/daily/1-28-2020
https://covid19.mathdro.id/api/daily/1-29-2020
https://covid19.mathdro.id/api/daily/1-30-2020
https://covid19.mathdro.id/api/daily/1-31-2020
https://covid19.mathdro.id/api/daily/2-1-2020
https://covid19.mathdro.id/api/daily/2-2-2020
https://covid19.mathdro.id/api/daily/2-3-2020
https://covid19.mathdro.id/api/daily/2-4-2020
https://covid19.mathdro.id/api/daily/2-5-2020
https://covid19.mathdro.id/api/daily/2-6-2020
https://covid19.mathdro.id/api/daily/2-7-2020
https://covid19.mathdro.id/api/daily/2-8-2020
https://covid19.mathdro.id/api/daily/2-9-2020
https://covid19.mathdr

Let's see what one day's report list looks like.  We observe that it's a list of JSON blocks.

In [6]:
all_days_detail[4]

[{'confirmed': '549',
  'countryRegion': 'Mainland China',
  'deaths': '24',
  'lastUpdate': '1/24/20 17:00',
  'recovered': '31',
  '\ufeffprovinceState': 'Hubei'},
 {'confirmed': '53',
  'countryRegion': 'Mainland China',
  'deaths': '',
  'lastUpdate': '1/24/20 17:00',
  'recovered': '2',
  '\ufeffprovinceState': 'Guangdong'},
 {'confirmed': '43',
  'countryRegion': 'Mainland China',
  'deaths': '',
  'lastUpdate': '1/24/20 17:00',
  'recovered': '1',
  '\ufeffprovinceState': 'Zhejiang'},
 {'confirmed': '36',
  'countryRegion': 'Mainland China',
  'deaths': '',
  'lastUpdate': '1/24/20 17:00',
  'recovered': '1',
  '\ufeffprovinceState': 'Beijing'},
 {'confirmed': '27',
  'countryRegion': 'Mainland China',
  'deaths': '',
  'lastUpdate': '1/24/20 17:00',
  'recovered': '',
  '\ufeffprovinceState': 'Chongqing'},
 {'confirmed': '24',
  'countryRegion': 'Mainland China',
  'deaths': '',
  'lastUpdate': '1/24/20 17:00',
  'recovered': '',
  '\ufeffprovinceState': 'Hunan'},
 {'confirmed'

### Create a data frame with one report per row

Iterate through the list of lists, and add each report to a PANDAS data frame.

Note that early reports seemed to use a key name of `\ufeffprovinceState` so where we detect that, we convert it to `provinceState`.

In [0]:
df = pd.DataFrame(columns = ['provinceState', 'countryRegion', 'lastUpdate', 'confirmed', 'deaths', 'recovered'])
#for day_detail in all_days_detail:
for day_detail in all_days_detail:
  for report_detail in day_detail:
    # Early data has '\ufeffprovinceState' as the key name, so fix it where it occurs.
    if '\ufeffprovinceState' in report_detail:
      report_detail['provinceState'] = report_detail.get('\ufeffprovinceState')
      del report_detail['\ufeffprovinceState']

    report_detail['lastUpdate'] = pd.to_datetime(report_detail['lastUpdate'])
    df = df.append(report_detail, ignore_index = True)


Let's take a look at some data in the middle of the data frame:

In [10]:
df.tail(1000).head(100)

Unnamed: 0,provinceState,countryRegion,lastUpdate,confirmed,deaths,recovered,latitude,longitude
4632,,Slovakia,2020-03-10 05:13:07,7,0,0,48.6690,19.6990
4633,,South Africa,2020-03-10 05:13:07,7,0,0,-30.5595,22.9375
4634,North Carolina,US,2020-03-10 02:33:04,7,0,0,35.6301,-79.8064
4635,South Carolina,US,2020-03-10 02:33:04,7,0,0,33.8569,-80.9450
4636,Tennessee,US,2020-03-10 19:53:13,7,0,0,35.7478,-86.6923
...,...,...,...,...,...,...,...,...
4727,Henan,China,2020-03-11 08:13:09,1273,22,1249,33.8820,113.6140
4728,Zhejiang,China,2020-03-11 09:33:12,1215,1,1195,29.1832,120.0934
4729,Hunan,China,2020-03-11 02:18:14,1018,4,995,27.6104,111.7088
4730,Anhui,China,2020-03-11 02:18:14,990,6,984,31.8257,117.2264


We can now do things like subsetting, etc.

In [11]:
df[df['provinceState'] == 'Fairfax County, VA']

Unnamed: 0,provinceState,countryRegion,lastUpdate,confirmed,deaths,recovered,latitude,longitude
4153,"Fairfax County, VA",US,2020-03-08 21:33:02,2,0,0,38.9085,-77.2405
4416,"Fairfax County, VA",US,2020-03-08 21:33:02,2,0,0,38.9085,-77.2405


### Write out the data to a CSV file

In [0]:
df.to_csv('reports.csv', index=None)

In [0]:
# If you're in a Google Colab notebook, you'll need to this in order to download the report to your computer:

# from google.colab import files
# files.download('reports.csv')