# StorAge Selection functions and COVID-19

In this notebook, we will estimate the relationship between the age-ranked polutation and the (normalized) age-ranked number of death. 


## 1. Load data

We will use the data provided by the CDC (https://www.cdc.gov/nchs/nvss/vsrr/covid_weekly/index.htm accessed at Nov. 11th, 2020). The 'data_CDC2.csv' file contrains the data that are necessary for thise exercise. Let's read the file first. We will use the [*read_csv*](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) command in *pandas*.

(The numbers of death were counted during 2/1/2020 - 11/7/2020. Population is based on 2019 postcensal estimates from the U.S. Census Bureau.)

In [44]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('./data_CDC.csv', delimiter = ',')
data

Unnamed: 0,Age group,All deaths involving COVID-19,Death caused by all cases,Population
0,Under 1 year,26,13879,3783052
1,1–4 years,16,2598,15793631
2,5–14 years,39,4139,40994163
3,15–24 years,410,26662,42687510
4,25–34 years,1725,54516,45940321
5,35–44 years,4426,76852,41659144
6,45–54 years,11740,140304,40874902
7,55–64 years,28227,321282,42448537
8,65–74 years,48363,486783,31483433
9,75–84 years,59760,592264,15969872


## 2. Estimate the age-ranked polutation and the age-ranked number of death

The age-ranked population, population ranked by age, can be estimated as a cumulative sum of the population data ranked by age. We will use the [*cumsum*](https://numpy.org/doc/stable/reference/generated/numpy.cumsum) function in *numpy*. The age-raneked number of death that is related and not related COVID-19 can be estimated in a similar manner, and we will normalized those by the total number of death of each case. 

We will have to add 0 at the start of each array. (*A quick question: Why do we need to add 0?*)

In [39]:
age_ranked_population = np.cumsum(data['Population'].values)
age_ranked_death_COVID = np.cumsum(data['All deaths involving COVID-19'].values)
age_ranked_death_non_COVID = np.cumsum(data['Death caused by all cases'].values) - age_ranked_death_COVID


age_ranked_population = np.append(0,age_ranked_population)
age_ranked_death_COVID = np.append(0,age_ranked_death_COVID/age_ranked_death_COVID[-1])
age_ranked_death_non_COVID = np.append(0,age_ranked_death_non_COVID/age_ranked_death_non_COVID[-1])

## 3. Create a plot

Let's plot the age-ranked number of death vs. the age-ranekd polutation.

In [45]:
%matplotlib notebook
figM = plt.figure(1)
plt.plot(age_ranked_population,age_ranked_death_COVID, label = 'involving COVID-19')
plt.plot(age_ranked_population,age_ranked_death_non_COVID, label = 'not involving COVID-19')
plt.legend()
plt.xlabel('Age-ranked population')
plt.ylabel('Normalized age-ranked numeber of death [-]')

<IPython.core.display.Javascript object>

Text(0, 0.5, 'Normalized age-ranked numeber of death [-]')

## 4. Discussion

1. Can you try to explain the shape of the blue line? What does this plot tell you? Think about why the slope is steeper over the older population. What age group is more vulerable to death?


2. What is the difference betweeen the two lines? What can you tell about the difference?


3. As for another exercise, please download the '[data_CDC_male_female.xlsx](https://drive.google.com/file/d/1ZeoqLCBuYjgu_Q-vP-_rY1zLVfzy1wOw/view?usp=sharing)' file and estimate the age-ranekd poluation and the age-ranked number of death for male and female. Also, plot the normalized age-ranekd number of deaths vs. the age-ranked poluation similar to the above for the COVID-19 cases and the cases not involving COVID-19. Do you see any difference? You can do this using python (or Jupyter Notebook) or using any other programs you prefer (e.g., Microsoft Excel).