## World and Country Covid-19 Data

This notebook shows Covid-19 case data from Our World in Data.  They offer several datasets and different formats.

The data used here is from JHU and has 1 record for each country and date.  The stable URL is    
<http://covid.ourworldindata.org/data/jhu/full_data.csv>


### Download the Covid Dataset (As Needed) 

Run this cell only if you do not have up-to-date data or want a different dataset.  

In [6]:
# Use wget to download the data file
# Unfortunately, Github doesn't use HTTP Last-modified header,
# so wget will always download the file, even if its identical to local copy.
data_url = "http://covid.ourworldindata.org/data/jhu/full_data.csv"
! [ -d data ] || mkdir data
# -N use timestamps for conditional get, -nv non-verbose, -t retries
! cd data && wget -nv -N -t 5  $data_url

Last-modified header missing -- time-stamps turned off.
2021-05-13 13:04:24 URL:http://covid.ourworldindata.org/data/jhu/full_data.csv [5579541] -> "full_data.csv" [1]


### Read the Data and Describe It (Required in order to run other cells)

Use the date field as index in DataFrame, but also keep 'date' attribute.

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Read the data and show basic info

covid = pd.read_csv('data/full_data.csv', 
                     parse_dates=['date'])
# use the 'date' as index
covid.index = pd.to_datetime(covid['date'])

# describe the data
def asdate(timestamp):
    s = str(timestamp)
    k = s.index("T")
    return s[0:k] if k > 0 else s
print(f"Dataset has {len(covid)} records")
print(f"Start date  ", asdate(covid.index.values[0]))
print(f"End date    ", asdate(covid.index.values[-1]))
covid.info()

Dataset has 85467 records
Start date   2020-02-24
End date     2021-05-12
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 85467 entries, 2020-02-24 to 2021-05-12
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   date             85467 non-null  datetime64[ns]
 1   location         85467 non-null  object        
 2   new_cases        85458 non-null  float64       
 3   new_deaths       75862 non-null  float64       
 4   total_cases      85460 non-null  float64       
 5   total_deaths     75704 non-null  float64       
 6   weekly_cases     84457 non-null  float64       
 7   weekly_deaths    84457 non-null  float64       
 8   biweekly_cases   83064 non-null  float64       
 9   biweekly_deaths  83064 non-null  float64       
dtypes: datetime64[ns](1), float64(8), object(1)
memory usage: 7.2+ MB


### Create Plots for a Single Country

Specify the country name as in the "location" attribute of dataset,
and the number of days to show in plot.

The daily values are noisy.  The JHU dataset contains `weekly_` attributes that sum the last 7 days of data. Compute and show moving
averages from those (instead of directly computing moving averages from daily values).

In [8]:
# Specify the name of country for date.
# For USA use:
# country = 'United States'
country = 'India'

# Number of most recent days to show in plot
ndays = 150

cdata = covid[covid['location']==country][-ndays:]
# 'new_cases', 'new_deaths' are very noisy
# 'weekly_cases', 'weekly_deaths' are smoother
plt.figure(figsize=[10,6])   # [width,height] in inches?
plt.title("Daily Covid Cases for "+country)
plt.bar(cdata.index, cdata['new_cases'], color='gray')
plt.plot(cdata['weekly_cases']/7, color='blue')
plt.grid(axis='y')
plt.tight_layout()   # don't add padding to ends of y-axis
plt.show()


In [14]:
# Plot of daily deaths
plt.figure(figsize=[10,6])   # [width,height] in inches?
plt.title("Covid Deaths for "+country)
plt.bar(cdata.index, cdata['new_deaths'], color='gray')
plt.plot(cdata['weekly_deaths']/7, color='red')
plt.grid(axis='y')
plt.tight_layout()   # don't add padding to ends of y-axis
plt.show()