# Libraries

To open this notebook and generate these plots, first clone the repo. Now open the terminal and move to the repo folder. Then run this command to open this notebook in a Jupyterlab running in a docker container ([Docker Engine](https://docs.docker.com/install/) must be already installed on your machine):

`docker run --rm -p 10000:8888 -e JUPYTER_ENABLE_LAB=yes -v "$PWD":/home/jovyan/work arashsaeidpour/fabjupyterlab:plotly`

Now open your browser and type in this address to open the jupyterlab:

`localhost:10000`

Copy paste the token from the terminal window to log in. Now move to `/work/src/` to find this notebook. 

In [150]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objs as go
import plotly.offline as py
import plotly.figure_factory as ff
py.init_notebook_mode(connected=True)
import matplotlib.pyplot as plt
import cufflinks as cf
cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)

# Loading data

In [162]:
df_raw = pd.read_csv('../00-RawData/Per_capita_incidence/ili.clean.percap.csv')
df_raw['date'] = pd.to_datetime(df_raw['date'])

# Processing

In [163]:
try:
    df_A = df_raw.pivot(columns='state',values='percap_a',index='date')
except ValueError:
    print('There are duplicates in the data!')

There are duplicates in the data!


There are duplicates in the data and we have to remove them!

## Removing duplicates

In [165]:
df = pd.DataFrame()
for state in df_raw['state'].unique():
    df = pd.concat([df,df_raw[df_raw['state']==state].drop_duplicates(subset ='date',keep='first')])

In [166]:
df_A = df.pivot(columns='state',values='percap_a',index='date').astype('float64')
df_B = df.pivot(columns='state',values='percap_b',index='date').astype('float64')

## Counting null values for each state

In [167]:
df_n_nulls = pd.DataFrame(columns=['A','B'])
for state in df['state'].unique():
    df_n_nulls.loc[state,'A'] = df_A[state].isnull().sum()
    df_n_nulls.loc[state,'B'] = df_B[state].isnull().sum()

# Filling missing values

In [169]:
df_A_47_states = df_A.interpolate().fillna(method='bfill')
df_B_47_states = df_B.interpolate().fillna(method='bfill')

In [170]:
df_A_37_states = df_A_47_states.drop(['Alaska','Idaho','Kansas','Maine','Michigan','Nevada','New Hampshire','North Dakota','Vermont','Wyoming'],
                                     axis=1)

In [171]:
df_B_37_states = df_B_47_states.drop(['Alaska','Idaho','Kansas','Maine','Michigan','Nevada','New Hampshire','North Dakota','Vermont','Wyoming'],
                                     axis=1)

In [172]:
df_A_47_states.to_pickle('../00-RawData/Per_capita_incidence/df_A_47_states.pickle')
df_B_47_states.to_pickle('../00-RawData/Per_capita_incidence/df_B_47_states.pickle')
df_A_37_states.to_pickle('../00-RawData/Per_capita_incidence/df_A_37_states.pickle')
df_B_37_states.to_pickle('../00-RawData/Per_capita_incidence/df_B_37_states.pickle')