In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from collections import Counter
from ipywidgets import widgets
from scipy.signal import find_peaks

params = {
    'legend.fontsize' : 'x-large',
    'figure.figsize'  : (16, 12),
    'axes.labelsize'  : 'x-large',
    'axes.titlesize'  : 'x-large',
    'xtick.labelsize' : 'x-large',
    'ytick.labelsize' : 'x-large'
}

plt.rcParams.update(params)

## Periodicity in COVID-19 case data

Eyeballing the case data for COVID-19 there are some fairly obvious periodicities and patterns which I'd like to explore in this notebook. Before we begin, select `Run -> Run All Cells` from the menu above, to initialize all of the cells. When this finishes you should see some plots and widgets appear below. The widgets combine interactive controls with output plots to let you explore a few different datasets. 

  1. [National level data (countries) widget](#National-Level-Case-Autocorrelation)
  1. [Canada Provincial data widget](#Canada-Provincial-Case-Autocorrelation)

The analysis consists of pulling in a live copy of the data, extracting the (daily) new case count, forming a difference (daily change), then looking at the autocorrelation of the resulting series. If you want to explore the analysis or change it, click on the "three dots" icons to expand code cells (or choose `View -> Expand all code` in the menu). The data are ingested from regularly updated public sources.

## Raw Case Data

Let's look at the raw data for new cases each day in the world to see the patterns I mentioned above. There's a lot going on in in the underlying data, but even so you should be able to see some periodicities.

In [None]:
worldCSV = 'https://covid.ourworldindata.org/data/owid-covid-data.csv'
worldDF = pd.read_csv(
    worldCSV,
    index_col='date',
    parse_dates=['date']
)
wDF = worldDF[worldDF['location'] == 'World'].dropna(subset=['new_cases'])

plt.plot(wDF.new_cases, linewidth=3)
plt.title('Daily New Cases - Worldwide')

We want to dig a bit deeper on those wiggles. First let's look at national level data

In [None]:
class AutoCorrDisplay:
    
    def __init__(self, df, location='Canada', ndays=14):
        """
        df: a pandas dataframe with a datetime index and columns for 
        location and new_cases.
        
        ## TODO: Assumes daily data, resample if not
        """
        self.df = df
        self.location = location
        self.ndays = ndays
        self.output_widget = widgets.Output()
        self.container = widgets.VBox()
        self.redraw_whole_plot()
        self.draw_app()
        
    def draw_app(self):
        """
        Run once at startup to provide controls and canvas
        """
        self.ndays_slider = widgets.HBox([
                widgets.Label('Rolling avg. window (days):'),
                widgets.IntSlider(
                    min=1,
                    max=60,
                    step=1,
                    value=self.ndays,
                )
        ], layout=widgets.Layout(margin="0 0 0 auto"))
        self.ndays_slider.children[-1].observe(self._on_ndays_slider_change, names='value')
        self.location_dropdown = widgets.Dropdown(
            options=sorted(set(self.df.location)),
            value = self.location,
            description='Location:',
            disabled=False
        )
        self.location_dropdown.observe(self._on_location_dropdown_change, names='value')

        self.container.children = [
            widgets.HBox(
                [self.location_dropdown, self.ndays_slider],
                layout=widgets.Layout(padding='40px 0 25px 0')
            ),
            self.output_widget
        ]
        
        
    def _on_ndays_slider_change(self, change):
        """
        Called whenever the rolling average window size changes.

        Recompute rolling average and redraw the plot.
        """
        self.ndays = change.new
        self.redraw_whole_plot()

    def _on_location_dropdown_change(self, change):
        """
        Called whenever the country selector changes.

        Pull data for selected country and redraw the plot.
        """
        self.location = change.new
        self.redraw_whole_plot()

    def redraw_whole_plot(self):
        """
        Redraw canvas
        """
        out = self.output_widget
        with out:
            localDF = self.df[self.df['location'] == self.location].dropna(subset=['new_cases'])['new_cases']

            fig, ax = plt.subplots(1)
            out.clear_output(wait=True)
            ax.set_xlim([0, 400])
            ax.set_ylim([-1, 1])
            ax.set_ylabel('Autocorrelation')
            ax.set_xlabel('Lag')
            ax.set_title(f"{self.location} - new_cases change rolling avg. autocorrelation, {self.ndays} days")

            if len(localDF > 10):
                pd.plotting.autocorrelation_plot(
                    localDF.diff().rolling(self.ndays).mean()[self.ndays:], 
                    ax=ax, 
                    linewidth=3
                )
            else:
                ax.text(165, 0, "Insufficient data", fontsize=20)
                ax.grid(b=True)
            plt.show()

#app = AutoCorrDisplay()
#app.container

## National Level Case Autocorrelation

This widget uses data from [our world in data](https://covid.ourworldindata.org). Use the dropdown to select specific locations and the slider to control the length of the rolling average.

In [None]:
worldApp = AutoCorrDisplay(worldDF)
worldApp.container

## Canada Provincial Case Autocorrelation

This widget uses data from Canada's [Health Infobase](https://health-infobase.canada.ca). Use the dropdown to select specific provinces and the slider to control the length of the rolling average window.

In [None]:
canadaCSV = 'https://health-infobase.canada.ca/src/data/covidLive/covid19-download.csv'
canadaDF = pd.read_csv(
    canadaCSV,
    index_col='date',
    parse_dates=['date']
)
canadaDF.rename(columns={
    'numtoday' : 'new_cases',
    'prname' : 'location'
}, inplace=True)
canadaApp = AutoCorrDisplay(canadaDF)
canadaApp.container

### Possible explainations

* An artifact of huge case counts in a few countries?
* Artifact of the data collection intervals?
* Seasonal artifacts?
* Related to public health intervention times?
* Is is a coding or classification issue with the data?
* Other/something a mistake above?

