# Manual - United States

I didn't include the US data, because the counts for 2020 are too low -- indicating that the data is incomplete despite the fields saying >100% complete.

In [None]:
#export
from weekly_mort.imports import *
from weekly_mort.core import *

## Manual steps

When I was writing this notebook (2020/04/21), the API was not working. To update the Netherlands data, do the following steps:

1. Go to [the CDC influenza mortality surveilence site](https://gis.cdc.gov/grasp/fluview/mortality.html)
2. Click on `Downloads` -> `Custom Data` -> `Surveillance area: National` -> `Select All Seasons` -> `Select All Age Groups` -> `Download`
3. Do the same for the State-level data. 2. Click on `Downloads` -> `Custom Data` -> `Surveillance area: State` -> `Select All Seasons` -> `Download`
3. Move the .csv files to `_downloads/United States/`
4. Update the LAST_MODIFIED cell below with the latest date, then run all cells.

In [None]:
#export
LAST_MODIFIED = datetime.date(2020, 4, 22)
LAST_MODIFIED

datetime.date(2020, 4, 22)

## Process Data

In [None]:
down_dir, proc_dir = create_country_dirs('United States')

In [None]:
nat = pd.read_csv(down_dir/'National_Custom_Data.csv')
state = pd.read_csv(down_dir/'State_Custom_Data.csv')

In [None]:
nat.SEASON.value_counts()

2014-15    53
2017-18    52
2013-14    52
2016-17    52
2015-16    52
2018-19    52
2019-20    28
Name: SEASON, dtype: int64

In [None]:
assert all(state.AREA == 'State')
assert all(nat.AREA == 'National')

In [None]:
assert all(nat['SUB AREA'].isna())

Unfortunately, there is no breakdown by age (despite what the website might suggest).

In [None]:
assert all(nat['AGE GROUP'] == 'All')
assert all(state['AGE GROUP'] == 'All')

In [None]:
keepcols = ['AREA', 'SUB AREA', 'SEASON', 'WEEK', 
            'NUM INFLUENZA DEATHS', 'NUM PNEUMONIA DEATHS',
            'TOTAL DEATHS', 'PERCENT COMPLETE']

In [None]:
df = pd.concat([nat[keepcols], state[keepcols]])

In [None]:
df.head()

Unnamed: 0,AREA,SUB AREA,SEASON,WEEK,NUM INFLUENZA DEATHS,NUM PNEUMONIA DEATHS,TOTAL DEATHS,PERCENT COMPLETE
0,National,,2019-20,40,16,2707,52465,> 100%
1,National,,2019-20,41,16,2770,52870,> 100%
2,National,,2019-20,42,18,2978,54143,> 100%
3,National,,2019-20,43,30,2986,53922,> 100%
4,National,,2019-20,44,31,2908,54000,> 100%


In [None]:
df['Year'] = df.apply(lambda x: int(x.SEASON[:4] if x.WEEK >= 40 else ('20' + x.SEASON[-2:])), axis=1)

In [None]:
df = df[df.Year >= 2017]

In [None]:
df['Region'] = np.where(df.AREA == 'National', 'Total', df['SUB AREA'])

In [None]:
assert all(df[df.Year < 2020].Year.value_counts() == 52*53)

In [None]:
len(df)

9063

In [None]:
df = df[df['PERCENT COMPLETE'] == '> 100%']

In [None]:
len(df)

6565

I have no idea why previous years still have less than 100% completion...

In [None]:
df[df.Year < 2020].Year.value_counts()

2019    2167
2018    1943
2017    1842
Name: Year, dtype: int64

In [None]:
df.rename({'WEEK': 'Week', 'NUM INFLUENZA DEATHS': 'Influenza',
           'NUM PNEUMONIA DEATHS': 'Pneumonia', 
           'TOTAL DEATHS': 'Total'}, axis=1, inplace=True)

In [None]:
df = df[['Week', 'Year', 'Region', 'Influenza', 'Pneumonia', 'Total']]

Change to 0 indexing

In [None]:
df.Week.min(), df.Week.max()

(1, 52)

In [None]:
df['Week'] = df.Week - 1

In [None]:
df.Week.min(), df.Week.max()

(0, 51)

In [None]:
df = pd.melt(df, id_vars = ['Week', 'Year', 'Region'], var_name='Condition', value_name='Deaths')

Convert deaths to integers

In [None]:
df['Deaths'] = df.Deaths.apply(lambda x: int(x.replace(',', '')))

In [None]:
df.sort_values(['Year', 'Condition', 'Week'], ascending=[False, False, True], inplace=True)

## Caution

I don't know how the data is aggregated, but there are some strange artefacts: when writing this notebook (2020/04/22), the deaths in New York City were larger than in New York! I checked the CDC web visualization where the data was downloaded from, and it had the exact same issue.

Also, the deaths counts don't seem to match up well with totals from other sources.

In [None]:
sub = df[(df.Year == 2020) & (df.Condition == 'Total') & (df.Week >= 10)]

In [None]:
sub[sub.Region == 'New York']

Unnamed: 0,Week,Year,Region,Condition,Deaths
14064,10,2020,New York,Total,1939
14065,11,2020,New York,Total,2035
14066,12,2020,New York,Total,2386
14067,13,2020,New York,Total,3182
14068,14,2020,New York,Total,2736


In [None]:
sub[sub.Region == 'New York City']

Unnamed: 0,Week,Year,Region,Condition,Deaths
14485,10,2020,New York City,Total,1101
14486,11,2020,New York City,Total,1353
14487,12,2020,New York City,Total,2474
14488,13,2020,New York City,Total,4408
14489,14,2020,New York City,Total,3426


## Save

In [None]:
df.to_csv(proc_dir/'deaths.csv', index=False)

## Dates

While there is no proper documentation, the visualization tool shows when weeks start/end.

It appears weeks end on a Saturday, and fractional weeks from the previous year are included in week 1. For example, 2020 week 1 starts on 2019 December 29th.

In [None]:
dates = pd.DataFrame(itertools.chain(gen_weekdates(2017, datetime.date(2017, 1, 1), 5),
                                     gen_weekdates(2018, datetime.date(2017, 12, 31), 5),
                                     gen_weekdates(2019, datetime.date(2018, 12, 30), 5),
                                     gen_weekdates(2020, datetime.date(2019, 12, 29), 5)))

dates.columns = ['Year', 'Week', 'Start', 'End']

dates = dates[dates.Week != 52]  # Get rid of last fractional week of each year

In [None]:
dates.head()

Unnamed: 0,Year,Week,Start,End
0,2017,0,2017-01-01,2017-01-07
1,2017,1,2017-01-08,2017-01-14
2,2017,2,2017-01-15,2017-01-21
3,2017,3,2017-01-22,2017-01-28
4,2017,4,2017-01-29,2017-02-04


In [None]:
# All weeks have 7 days
assert all((dates.End - dates.Start).apply(lambda x: x.days + 1) == 7)

assert all(dates.Year.value_counts() == 52)

assert dates[(dates.Year == 2017) & (dates.Week == 14)].End.iloc[0] == datetime.date(2017, 4, 15)

assert dates[(dates.Year == 2020) & (dates.Week == 5)].End.iloc[0] == datetime.date(2020, 2, 8)

In [None]:
dates.to_csv(proc_dir/'week_dates.csv', index=False)