## Thailand Covid-19 Data

This notebook uses data from the Thailand Department of Desease Control.
The README for this repo contains the URLs.  
To download all daily counts for cases and outcomes, use the URL
<https://covid19.th-stat.com/api/open/timeline>.

### (As Needed) Download the Dataset

Run this only if you do not have up-to-date data or want a different dataset.  File is saved in a subdirectory named `data`.


In [None]:
# Use wget (standard Unix/Linux util) to download data file
data_url = "https://covid19.th-stat.com/api/open/timeline"
! [ -d data ] || mkdir data
# -N use timestamps to compare files, -t number of retries
! cd data && wget -N -t 5 $data_url

### Read data from previously downloaded dataset.


In [60]:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import json
# large file of all data since 1/1/2020
#filename = "data/timeline.json"
# latest.json = smaller file for development
filename = "data/timeline.json"

# the useful Covid data is in the named element 'Data'
with open(filename, 'r') as f:
    all_data = json.load(f)

covid = pd.DataFrame.from_records(all_data['Data'])

# keep only last 2 months
covid = covid[-60:]

# convert string date to datetime object
covid['Date'] = pd.to_datetime(covid['Date'])

# describe the data
print(f"Dataset has {len(covid)} records")
print(f"Start date  {covid['Date'].min():%F}")
print(f"End date    {covid['Date'].max():%F}")
covid.info()

Dataset has 60 records
Start date  2021-03-03
End date    2021-05-01
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 426 to 485
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Date             60 non-null     datetime64[ns]
 1   NewConfirmed     60 non-null     int64         
 2   NewRecovered     60 non-null     int64         
 3   NewHospitalized  60 non-null     int64         
 4   NewDeaths        60 non-null     int64         
 5   Confirmed        60 non-null     int64         
 6   Recovered        60 non-null     int64         
 7   Hospitalized     60 non-null     int64         
 8   Deaths           60 non-null     int64         
dtypes: datetime64[ns](1), int64(8)
memory usage: 4.3 KB


In [55]:
covid.head()

Unnamed: 0,Date,NewConfirmed,NewRecovered,NewHospitalized,NewDeaths,Confirmed,Recovered,Hospitalized,Deaths
426,2021-03-03,35,63,-28,0,26108,25483,541,84
427,2021-03-04,54,79,-26,1,26162,25562,515,85
428,2021-03-05,79,79,0,0,26241,25641,515,85
429,2021-03-06,64,45,19,0,26305,25686,534,85
430,2021-03-07,65,58,7,0,26370,25744,541,85


In [56]:
covid.plot.line(x='Date', y=['NewConfirmed','NewHospitalized','NewRecovered'], ylabel="Daily Cases", title="Daily New Cases")

<AxesSubplot:title={'center':'Daily New Cases'}, xlabel='Date', ylabel='Daily Cases'>

Plot daily cases and deaths in different plots, since their magnitude differs greatly.

In [63]:
fig, (plt1, plt2) = plt.subplots(nrows=2, sharex=True)
#dates = covid['Date']
#xticks = [dates[0], dates[len(dates)//2], dates[len(dates)-1]]
covid.plot.bar(ax=plt1, x='Date', y='NewConfirmed')
plt1.set_title("Daily Confirmed Cases")

covid.plot.bar(ax=plt2, x='Date', y='NewDeaths')
plt2.set_title("Daily Deaths")

plt2.xaxis.set_major_formatter(matplotlib.dates.DateFormatter("%F"))