# Forecasting Influenza
Flu season in the united states takes between 3,000 to 49,000 lives anually. Due to the lack of reliable forecasting methods, policy makers and public health officials can't optimally prepare for these deadly epidemics.

With the advent of the internet, and consequently global internet traffic datasets, we can now begin to feasbily create models that are capable of epidemilological forecasting. 

This notebook examines the feasbility of creating a model that is capable of predicting epidemilological trends, specifically the US influenza season, using a widely availble free data source: Wikipedia article access logs. 

In [98]:
import epidata as delphi
import pandas as pd
import requests
import time
from collections import defaultdict
%matplotlib inline

# restore API resps from last session
%store -r pageViewResps
%store -r wILIresp

#### Getting Wikipedia Page Views for Flu Related Articles

In [78]:
if 'pageViewResps' not in list(globals().keys()):
    epidata = delphi.Epidata() # interface to CMU delphi API

    with open('./data/allarticles.txt') as f:
        fluRelatedArticles = [article.strip() for article in f]
    years = range(2008, 2017) # 2008 - 2016 full years
    epiranges = [ epidata.range(int(str(yr) + '01'), int(str(yr) + '52')) for yr in years]
    pageViewResps = []
    # API calls to the delphi epidata API
    for epiyear in epiranges:
        resp = epidata.wiki(fluRelatedArticles, epiweeks=epiyear)['epidata']
        pageViewResps.extend(resp)
        time.sleep(15)
    %store pageViewResps

#### Getting state level ILInet data

In [96]:
if 'wILIresp' not in list(globals().keys()):
    wILIresp = epidata.fluview('nat', epidata.range(200801, 201652))['epidata']
    %store wILIresp

Stored 'wILIresp' (list)


#### Putting pageViews API response data in DataFrame

In [94]:
pageToViews = defaultdict(list)
pageViewsIndex = { week['epiweek'] for week in pageViewResps }
pageViewsIndex = list(pageViewsIndex)
pageViewsIndex.sort()

# map each article to it's weekly view counts (from 2008 to 2016)
for week in pageViewResps:
    page, weeklyViews = week['article'], week['count']
    pageToViews[page].append(weeklyViews)
    
pageViews = pd.DataFrame.from_dict(pageToViews, orient='index', dtype='int')
pageViews.fillna(0)
pageViews = pageViews.transpose()
# convert to ints, for some reasons transpose() coereces to floats
for column in pageViews.columns:
    pageViews[column] = pageViews[column].fillna(0.0).astype('int')
pageViews.index = pageViewsIndex
pageViews[:2]

Unnamed: 0,influenza_b_virus,shivering,neuraminidase_inhibitor,equine_influenza,common_cold,influenza_a_virus_subtype_h9n2,avian_influenza,chills,influenza_a_virus_subtype_h1n1,fatigue_(medical),...,influenza_a_virus_subtype_h5n1,influenza_a_virus_subtype_h3n8,influenza_a_virus_subtype_h7n3,influenza_prevention,influenza_a_virus_subtype_h7n9,swine_influenza,vomiting,influenza_a_virus_subtype_h3n2,orthomyxoviridae,viral_pneumonia
200801,14,1251,362,187,19240,16,3292,777,34,957,...,218,13,6,0,0,17,8876,242,565,1222
200802,12,1299,461,212,21084,11,4870,858,33,1045,...,304,15,9,0,0,21,9782,260,885,1482


#### Putting ILInet API response in DataFrame

In [97]:
wILIvalues = [ week['ili'] for week in wILIresp ]
wILIindex = [ week['epiweek'] for week in wILIresp ]
wILIindex.sort()
wILI = pd.DataFrame(wIliValues, columns=['Weekly ILI'], index=wILIindex)
wILI[:2]

Unnamed: 0,Weekly ILI
200801,2.254048
200802,2.091472
