In [3]:
import pandas as pd
from statsmodels.tsa.arima_model import ARIMA
from IPython.display import FileLink, FileLinks

## FluView Forecasting: Introduction

Earlier this year, preliminary time series modeling was conducted on FluView data. The purpose of this analysis was to develop a baseline model for forecasting cases of all types of flu for North America. We used about 5 years worth of data obtained from the CDC to develop this model. We found the data had seasonal components and was nonstationary, and our recommended model is a "nonstationary ensemble" composed of an ARUMA model (specifically ARUMA(5,0,2), s = 52) and multivariate neural network. In-depth EDA and model testing can be found at the link below:

In [4]:
FileLink('Time_Series_Flu_Data_Forecasting.html')

We will continue this analysis with more depth and breadth. Specifically, we will introduce more data into the analysis, going back to the 2010 flu season. This is a critical time period for 2 reasons: this was data gathered prior to the FluView weekly report which began in 2015 (and thus has combined clinical and public health data, whereas FluView only contains public health). We would like to determine if the methods of gathering these data confound our already-established model. Additionally, the H1N1 swine flu epidemic ended around the beginning of 2010, and we are interested in seeing the effects this has had on tracing the flu from then until now.

Since we will also be conducting analysis on the different sub-types of flu, we will not introduce our neural network component into this analysis; instead, we will focus on our ARUMA model.

## Load the Data

In [6]:
flu2015 = pd.read_csv('Data/WHO_NREVSS_Public_Health_Labs.csv', sep=',', header=0, skiprows=1)
flu2015

Unnamed: 0,REGION TYPE,REGION,YEAR,WEEK,TOTAL SPECIMENS,A (2009 H1N1),A (H3),A (Subtyping not Performed),B,BVic,BYam,H3N2v
0,National,X,2015,40,1139,4,65,2,10,0,1,0
1,National,X,2015,41,1152,5,41,2,7,3,0,0
2,National,X,2015,42,1198,10,50,1,8,3,2,0
3,National,X,2015,43,1244,9,31,4,9,1,4,0
4,National,X,2015,44,1465,4,23,4,9,1,4,0
...,...,...,...,...,...,...,...,...,...,...,...,...
241,National,X,2020,21,536,0,0,1,0,0,0,0
242,National,X,2020,22,515,2,0,1,1,0,0,0
243,National,X,2020,23,322,0,0,0,1,0,0,0
244,National,X,2020,24,305,0,0,0,1,0,0,0


In [None]:
flu2010 = pd.read_csv('Data/WHO_NREVSS_Combined_prior_to_2015_16.csv', sep=',', header=0, skiprows=1)
flu2010