### Average Number of Years Lost For People Who Died of Coronavirus in France

We illustrate the use of the package by estimating the average number of years by which people's lives are shortened due to coronavirus. Using data from [here](https://www.data.gouv.fr/fr/datasets/donnees-de-certification-electronique-des-deces-associes-au-covid-19-cepidc/) that gives us the distribution of ages of people who died from COVID-19 in France, with conservative assumptions (assuming gender of the dead person to be male, taking the middle of age ranges) we find that people's lives are shortened by about 9 years on average. These estimates are conservative for one additional reason: there is likely an inverse correlation between people who die and their expected longevity. And note that given a bulk of the deaths are among older people, when people are more infirm, the quality adjusted years lost is likely yet more modest. Using the most recent SSA data, we find that number to be also 9 years. Assuming people live till 90, the average number of years lost is 7. If we use data from WHO, the average number of years lost (if we take the middle of the age range), is 11. 

In [1]:
import pandas as pd
from lost_years import lost_years_hld, lost_years_ssa, lost_years_who

### Prepare example input in DataFrame
Please look at country codes here:- https://www.lifetable.de/cgi-bin/country_codes.php

In [2]:
df = pd.read_csv('/opt/github/lost_years/examples/covid19/covid-cedc-quot.csv', usecols=['cl_age90', 'Dc_Elec_Covid_cum'], delimiter=';')
df.columns = ['age', 'n_deaths']
df.drop(df.loc[df.age == 0].index, inplace=True)
df

Unnamed: 0,age,n_deaths
73,9,0
74,9,0
75,9,0
76,9,0
77,9,0
...,...,...
16055,90,0
16056,90,0
16057,90,0
16058,90,0


In [3]:
gdf = df.groupby('age').agg({'n_deaths': sum})
df = gdf.reset_index()
df2 = pd.DataFrame({'lowest_age': [0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
                    'middle_age': [5, 15, 25, 35, 45, 55, 65, 75, 85, 99],
                    'highest_age': [9, 19, 29, 39, 49, 59, 69, 79, 89, 99]})
df = df.join(df2)
df['year'] = 2020
df['country'] = 'FRA'
df['sex'] = 'M'
df

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex
0,9,11,0,5,9,2020,FRA,M
1,19,68,10,15,19,2020,FRA,M
2,29,437,20,25,29,2020,FRA,M
3,39,1561,30,35,39,2020,FRA,M
4,49,3628,40,45,49,2020,FRA,M
5,59,14106,50,55,59,2020,FRA,M
6,69,36555,60,65,69,2020,FRA,M
7,79,76238,70,75,79,2020,FRA,M
8,89,145018,80,85,89,2020,FRA,M
9,90,92290,90,99,99,2020,FRA,M


### Get Human Life Table data columns from HLD dataset

In [4]:
highest_ldf = lost_years_hld(df, {'age': 'lowest_age', 'country': 'country', 'sex': 'sex', 'year': 'year'})
highest_ldf.head()

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,hld_country,hld_region,...,hld_version,hld_ref-id,hld_year1,hld_year2,hld_typelt,hld_sex,hld_age,hld_age_interval,hld_life_expectancy,hld_life_expectancy_orig
0,9,11,0,5,9,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,0,1,79.23,79.26
1,19,68,10,15,19,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,10,5,69.63,69.66
2,29,437,20,25,29,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,20,5,59.77,59.79
3,39,1561,30,35,39,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,30,5,50.12,50.15
4,49,3628,40,45,49,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,40,5,40.57,40.6


**Note that the year we are matching to is 2015.**

### Assuming all the people who died were at the bottom of the age ranges

In [5]:
highest_ldf['years_lost'] = highest_ldf['hld_life_expectancy'] * highest_ldf['n_deaths'] / highest_ldf['n_deaths'].sum()
highest_ldf

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,hld_country,hld_region,...,hld_ref-id,hld_year1,hld_year2,hld_typelt,hld_sex,hld_age,hld_age_interval,hld_life_expectancy,hld_life_expectancy_orig,years_lost
0,9,11,0,5,9,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,0,1,79.23,79.26,0.002356
1,19,68,10,15,19,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,10,5,69.63,69.66,0.0128
2,29,437,20,25,29,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,20,5,59.77,59.79,0.07061
3,39,1561,30,35,39,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,30,5,50.12,50.15,0.211503
4,49,3628,40,45,49,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,40,5,40.57,40.6,0.3979
5,59,14106,50,55,59,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,50,5,31.41,31.43,1.19777
6,69,36555,60,65,69,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,60,5,23.05,23.07,2.277819
7,79,76238,70,75,79,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,70,5,15.6,15.6,3.215124
8,89,145018,80,85,89,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,80,5,8.94,8.9,3.504782
9,90,92290,90,99,99,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,90,5,4.36,4.15,1.087784


In [6]:
highest_ldf['years_lost'].sum().round()

12.0

In [7]:
lowest_ldf = lost_years_hld(df, {'age': 'highest_age', 'country': 'country', 'sex': 'sex', 'year': 'year'})
lowest_ldf.head()

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,hld_country,hld_region,...,hld_version,hld_ref-id,hld_year1,hld_year2,hld_typelt,hld_sex,hld_age,hld_age_interval,hld_life_expectancy,hld_life_expectancy_orig
0,9,11,0,5,9,2020,FRA,M,FRA,0,...,1,3201.02,2013,2015,1,1,9,1,70.42,70.42
1,19,68,10,15,19,2020,FRA,M,FRA,0,...,1,3201.02,2013,2015,1,1,19,1,60.53,60.53
2,29,437,20,25,29,2020,FRA,M,FRA,0,...,1,3201.02,2013,2015,1,1,29,1,50.88,50.89
3,39,1561,30,35,39,2020,FRA,M,FRA,0,...,1,3201.02,2013,2015,1,1,39,1,41.33,41.33
4,49,3628,40,45,49,2020,FRA,M,FRA,0,...,1,3201.02,2013,2015,1,1,49,1,32.13,32.14


### Assuming all the people who died were at the top of the age ranges

In [8]:
lowest_ldf['years_lost'] = lowest_ldf['hld_life_expectancy'] * lowest_ldf['n_deaths'] / lowest_ldf['n_deaths'].sum()
lowest_ldf

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,hld_country,hld_region,...,hld_ref-id,hld_year1,hld_year2,hld_typelt,hld_sex,hld_age,hld_age_interval,hld_life_expectancy,hld_life_expectancy_orig,years_lost
0,9,11,0,5,9,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,9,1,70.42,70.42,0.002094
1,19,68,10,15,19,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,19,1,60.53,60.53,0.011127
2,29,437,20,25,29,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,29,1,50.88,50.89,0.060108
3,39,1561,30,35,39,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,39,1,41.33,41.33,0.174409
4,49,3628,40,45,49,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,49,1,32.13,32.14,0.315123
5,59,14106,50,55,59,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,59,1,23.74,23.74,0.905287
6,69,36555,60,65,69,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,69,1,16.21,16.21,1.601885
7,79,76238,70,75,79,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,79,1,9.42,9.42,1.94144
8,89,145018,80,85,89,2020,FRA,M,FRA,0,...,3201.02,2013,2015,1,1,89,1,4.51,4.52,1.768072
9,90,92290,90,99,99,2020,FRA,M,FRA,0,...,3201.05,2014,2016,4,1,99,1,2.19,2.41,0.546387


In [9]:
lowest_ldf['years_lost'].sum().round()

7.0

In [10]:
middle_ldf = lost_years_hld(df, {'age': 'middle_age', 'country': 'country', 'sex': 'sex', 'year': 'year'})
middle_ldf.head()

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,hld_country,hld_region,...,hld_version,hld_ref-id,hld_year1,hld_year2,hld_typelt,hld_sex,hld_age,hld_age_interval,hld_life_expectancy,hld_life_expectancy_orig
0,9,11,0,5,9,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,5,5,74.6,74.63
1,19,68,10,15,19,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,15,5,64.66,64.69
2,29,437,20,25,29,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,25,5,54.93,54.96
3,39,1561,30,35,39,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,35,5,45.32,45.35
4,49,3628,40,45,49,2020,FRA,M,FRA,0,...,1,3201.06,2015,2017,4,1,45,5,35.92,35.94


### Assuming all the people who died were at the middle of the age ranges

In [11]:
middle_ldf['years_lost'] = middle_ldf['hld_life_expectancy'] * middle_ldf['n_deaths'] / middle_ldf['n_deaths'].sum()
middle_ldf

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,hld_country,hld_region,...,hld_ref-id,hld_year1,hld_year2,hld_typelt,hld_sex,hld_age,hld_age_interval,hld_life_expectancy,hld_life_expectancy_orig,years_lost
0,9,11,0,5,9,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,5,5,74.6,74.63,0.002218
1,19,68,10,15,19,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,15,5,64.66,64.69,0.011886
2,29,437,20,25,29,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,25,5,54.93,54.96,0.064892
3,39,1561,30,35,39,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,35,5,45.32,45.35,0.191247
4,49,3628,40,45,49,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,45,5,35.92,35.94,0.352294
5,59,14106,50,55,59,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,55,5,27.09,27.11,1.033034
6,69,36555,60,65,69,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,65,5,19.26,19.27,1.903289
7,79,76238,70,75,79,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,75,5,12.12,12.11,2.497904
8,89,145018,80,85,89,2020,FRA,M,FRA,0,...,3201.06,2015,2017,4,1,85,5,6.28,6.18,2.461972
9,90,92290,90,99,99,2020,FRA,M,FRA,0,...,3201.05,2014,2016,4,1,99,1,2.19,2.41,0.546387


In [12]:
middle_ldf['years_lost'].sum().round()

9.0

### Assume the Longevity is the Same as People in the US

In [13]:
ssa_middle_ldf = lost_years_ssa(df, {'age': 'middle_age', 'sex': 'sex', 'year': 'year'})
ssa_middle_ldf.head()

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,ssa_age,ssa_year,ssa_life_expectancy
0,9,11,0,5,9,2020,FRA,M,5,2016,71.6
1,19,68,10,15,19,2020,FRA,M,15,2016,61.7
2,29,437,20,25,29,2020,FRA,M,25,2016,52.3
3,39,1561,30,35,39,2020,FRA,M,35,2016,43.15
4,49,3628,40,45,49,2020,FRA,M,45,2016,34.08


In [14]:
ssa_middle_ldf['years_lost'] = ssa_middle_ldf['ssa_life_expectancy'] * ssa_middle_ldf['n_deaths'] / ssa_middle_ldf['n_deaths'].sum()
ssa_middle_ldf

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,ssa_age,ssa_year,ssa_life_expectancy,years_lost
0,9,11,0,5,9,2020,FRA,M,5,2016,71.6,0.002129
1,19,68,10,15,19,2020,FRA,M,15,2016,61.7,0.011342
2,29,437,20,25,29,2020,FRA,M,25,2016,52.3,0.061785
3,39,1561,30,35,39,2020,FRA,M,35,2016,43.15,0.18209
4,49,3628,40,45,49,2020,FRA,M,45,2016,34.08,0.334248
5,59,14106,50,55,59,2020,FRA,M,55,2016,25.52,0.973164
6,69,36555,60,65,69,2020,FRA,M,65,2016,17.92,1.770869
7,79,76238,70,75,79,2020,FRA,M,75,2016,11.18,2.304172
8,89,145018,80,85,89,2020,FRA,M,85,2016,5.94,2.328681
9,90,92290,90,99,99,2020,FRA,M,99,2016,2.25,0.561356


In [15]:
ssa_middle_ldf['years_lost'].sum().round()

9.0

### Assume Everyone Lives Till 90

In [16]:
y90_middle_ldf = df.copy()
y90_middle_ldf['y90_life_expectancy'] = 90 - y90_middle_ldf['middle_age']
y90_middle_ldf.head()

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,y90_life_expectancy
0,9,11,0,5,9,2020,FRA,M,85
1,19,68,10,15,19,2020,FRA,M,75
2,29,437,20,25,29,2020,FRA,M,65
3,39,1561,30,35,39,2020,FRA,M,55
4,49,3628,40,45,49,2020,FRA,M,45


In [17]:
y90_middle_ldf['years_lost'] = y90_middle_ldf['y90_life_expectancy'] * y90_middle_ldf['n_deaths'] / y90_middle_ldf['n_deaths'].sum()
y90_middle_ldf

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,y90_life_expectancy,years_lost
0,9,11,0,5,9,2020,FRA,M,85,0.002528
1,19,68,10,15,19,2020,FRA,M,75,0.013787
2,29,437,20,25,29,2020,FRA,M,65,0.076789
3,39,1561,30,35,39,2020,FRA,M,55,0.232096
4,49,3628,40,45,49,2020,FRA,M,45,0.441348
5,59,14106,50,55,59,2020,FRA,M,35,1.334669
6,69,36555,60,65,69,2020,FRA,M,25,2.47052
7,79,76238,70,75,79,2020,FRA,M,15,3.091465
8,89,145018,80,85,89,2020,FRA,M,5,1.960169
9,90,92290,90,99,99,2020,FRA,M,-9,-2.245426


In [18]:
y90_middle_ldf['years_lost'].sum().round()

7.0

### Get Human Life Table data columns from WHO dataset

In [19]:
who_highest_ldf = lost_years_who(df, {'age': 'lowest_age', 'country': 'country', 'sex': 'sex', 'year': 'year'})
who_highest_ldf.head()

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,who_age,who_country,who_life_expectancy,who_sex,who_year
0,9,11,0,5,9,2020,FRA,M,1,FRA,80.1,MLE,2016
0,9,11,0,5,9,2020,FRA,M,1,FRA,79.4,MLE,2016
1,19,68,10,15,19,2020,FRA,M,10,FRA,70.5,MLE,2016
2,29,437,20,25,29,2020,FRA,M,20,FRA,60.6,MLE,2016
3,39,1561,30,35,39,2020,FRA,M,30,FRA,50.9,MLE,2016


### Assuming all the people who died were at the bottom of the age ranges

In [20]:
who_highest_ldf['years_lost'] = who_highest_ldf['who_life_expectancy'] * who_highest_ldf['n_deaths'] / who_highest_ldf['n_deaths'].sum()
who_highest_ldf

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,who_age,who_country,who_life_expectancy,who_sex,who_year,years_lost
0,9,11,0,5,9,2020,FRA,M,1,FRA,80.1,MLE,2016,0.002382
0,9,11,0,5,9,2020,FRA,M,1,FRA,79.4,MLE,2016,0.002361
1,19,68,10,15,19,2020,FRA,M,10,FRA,70.5,MLE,2016,0.012959
2,29,437,20,25,29,2020,FRA,M,20,FRA,60.6,MLE,2016,0.071588
3,39,1561,30,35,39,2020,FRA,M,30,FRA,50.9,MLE,2016,0.214788
4,49,3628,40,45,49,2020,FRA,M,40,FRA,41.3,MLE,2016,0.405048
5,59,14106,50,55,59,2020,FRA,M,50,FRA,32.1,MLE,2016,1.224046
6,69,36555,60,65,69,2020,FRA,M,60,FRA,23.8,MLE,2016,2.351865
7,79,76238,70,75,79,2020,FRA,M,70,FRA,16.2,MLE,2016,3.338683
8,89,145018,80,85,89,2020,FRA,M,80,FRA,9.4,MLE,2016,3.685008


In [21]:
who_highest_ldf['years_lost'].sum().round()

13.0

### Get Human Life Table data columns from WHO dataset

In [22]:
who_lowest_ldf = lost_years_who(df, {'age': 'highest_age', 'country': 'country', 'sex': 'sex', 'year': 'year'})
who_lowest_ldf.head()

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,who_age,who_country,who_life_expectancy,who_sex,who_year
0,9,11,0,5,9,2020,FRA,M,10,FRA,70.5,MLE,2016
1,19,68,10,15,19,2020,FRA,M,20,FRA,60.6,MLE,2016
2,29,437,20,25,29,2020,FRA,M,30,FRA,50.9,MLE,2016
3,39,1561,30,35,39,2020,FRA,M,40,FRA,41.3,MLE,2016
4,49,3628,40,45,49,2020,FRA,M,50,FRA,32.1,MLE,2016


### Assuming all the people who died were at the top of the age ranges

In [23]:
who_lowest_ldf['years_lost'] = who_lowest_ldf['who_life_expectancy'] * who_lowest_ldf['n_deaths'] / who_lowest_ldf['n_deaths'].sum()
who_lowest_ldf

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,who_age,who_country,who_life_expectancy,who_sex,who_year,years_lost
0,9,11,0,5,9,2020,FRA,M,10,FRA,70.5,MLE,2016,0.002096
1,19,68,10,15,19,2020,FRA,M,20,FRA,60.6,MLE,2016,0.01114
2,29,437,20,25,29,2020,FRA,M,30,FRA,50.9,MLE,2016,0.060131
3,39,1561,30,35,39,2020,FRA,M,40,FRA,41.3,MLE,2016,0.174283
4,49,3628,40,45,49,2020,FRA,M,50,FRA,32.1,MLE,2016,0.314828
5,59,14106,50,55,59,2020,FRA,M,60,FRA,23.8,MLE,2016,0.907575
6,69,36555,60,65,69,2020,FRA,M,70,FRA,16.2,MLE,2016,1.600897
7,79,76238,70,75,79,2020,FRA,M,80,FRA,9.4,MLE,2016,1.937318
8,89,145018,80,85,89,2020,FRA,M,85,FRA,6.7,MLE,2016,2.626626
9,90,92290,90,99,99,2020,FRA,M,85,FRA,6.7,MLE,2016,1.671595


In [24]:
who_lowest_ldf['years_lost'].sum().round()

9.0

### Get Human Life Table data columns from WHO dataset

In [25]:
who_middle_ldf = lost_years_who(df, {'age': 'middle_age', 'country': 'country', 'sex': 'sex', 'year': 'year'})
who_middle_ldf.head()

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,who_age,who_country,who_life_expectancy,who_sex,who_year
0,9,11,0,5,9,2020,FRA,M,5,FRA,75.4,MLE,2016
1,19,68,10,15,19,2020,FRA,M,15,FRA,65.5,MLE,2016
2,29,437,20,25,29,2020,FRA,M,25,FRA,55.8,MLE,2016
3,39,1561,30,35,39,2020,FRA,M,35,FRA,46.1,MLE,2016
4,49,3628,40,45,49,2020,FRA,M,45,FRA,36.6,MLE,2016


### Assuming all the people who died were at the middle of the age ranges

In [26]:
who_middle_ldf['years_lost'] = who_middle_ldf['who_life_expectancy'] * who_middle_ldf['n_deaths'] / who_middle_ldf['n_deaths'].sum()
who_middle_ldf

Unnamed: 0,age,n_deaths,lowest_age,middle_age,highest_age,year,country,sex,who_age,who_country,who_life_expectancy,who_sex,who_year,years_lost
0,9,11,0,5,9,2020,FRA,M,5,FRA,75.4,MLE,2016,0.002242
1,19,68,10,15,19,2020,FRA,M,15,FRA,65.5,MLE,2016,0.012041
2,29,437,20,25,29,2020,FRA,M,25,FRA,55.8,MLE,2016,0.06592
3,39,1561,30,35,39,2020,FRA,M,35,FRA,46.1,MLE,2016,0.194538
4,49,3628,40,45,49,2020,FRA,M,45,FRA,36.6,MLE,2016,0.358963
5,59,14106,50,55,59,2020,FRA,M,55,FRA,27.8,MLE,2016,1.060108
6,69,36555,60,65,69,2020,FRA,M,65,FRA,20.0,MLE,2016,1.976416
7,79,76238,70,75,79,2020,FRA,M,75,FRA,12.7,MLE,2016,2.61744
8,89,145018,80,85,89,2020,FRA,M,85,FRA,6.7,MLE,2016,2.626626
9,90,92290,90,99,99,2020,FRA,M,85,FRA,6.7,MLE,2016,1.671595


In [27]:
who_middle_ldf['years_lost'].sum().round()

11.0