# Notebook for Age and Sex adjusted disease prevalence rate estimates

## Standard Imports

In [1]:
import pandas as pd
import numpy as np

## Data input

In [18]:
df = pd.read_csv('T2D_case_total_stats_by_age-range.csv',sep='\t')

In [19]:
df.shape

(12, 5)

In [23]:
df

Unnamed: 0,T2D CASE,TOTAL COHORT,AGE RANGE,SEX,ETHNICITY
0,36,1030,40-49,Male,White
1,126,1510,50-59,Male,White
2,256,2035,60-69,Male,White
3,120,1244,40-49,Male,Black
4,159,889,50-59,Male,Black
5,190,572,60-69,Male,Black
6,33,1217,40-49,Female,White
7,84,1897,50-59,Female,White
8,163,2311,60-69,Female,White
9,132,1630,40-49,Female,Black


### Description of the above dataframe format
1. Column 1 : T2D cases as per the Age-range Bins and by Sex and ethnicity variables

2. Column 2 : Total cases+controls in that Bin

3. Column 3 : Define Age range Bins for your study. For this example case, we took the bins as 10 year Bins, as 40-49, 50-59 and 60-69. Since the UKB Weights data was available only for 40-69 age range (which is the initial Age-range for UKB eligible population for UK Biobank survey program as per this -> https://www.ukbiobank.ac.uk/media/gnkeyh2q/study-rationale.pdf) (Page 3 and Paragraph 2).

**NOTE : For your study choose, Age range bins as 5 year period to follow the US Census or survey criteria. Such as 30-34,35-39,40-44,45-49, and so on...**

4. Column 4 : Sex variable for the binned group selected. For the purposes of US study, conside for now only binary categories. Discuss with your Mentor about the same. All of Us program survey data is a bit extensive.

5. Column 5 : Ethnicity as per UK Biobank data criteria. For All of Us program, choose Race variables and consider hispanic vs non-hispanic data.

In [25]:
## UK 2011 census standard population for the above age bins and criteria 
df['Standard_population'] = [3546474,3055202,2815166,156546,76303,28112,3619888,3106030,2925263,172861,86062,38796]

### resource link for the above and below data information
https://www.ons.gov.uk/aboutus/transparencyandgovernance/freedomofinformationfoi/ethnicgroupsbysexandagefromthe2011census

In [26]:
## UK 2011 census total population for the above sex and ethnicity criteria 
df['Total_population'] = [23630918,23630918,23630918,898200,898200,898200,24578477,24578477,24578477,966690,966690,966690]

In [27]:
df

Unnamed: 0,T2D CASE,TOTAL COHORT,AGE RANGE,SEX,ETHNICITY,Standard_population,Total_population
0,36,1030,40-49,Male,White,3546474,23630918
1,126,1510,50-59,Male,White,3055202,23630918
2,256,2035,60-69,Male,White,2815166,23630918
3,120,1244,40-49,Male,Black,156546,898200
4,159,889,50-59,Male,Black,76303,898200
5,190,572,60-69,Male,Black,28112,898200
6,33,1217,40-49,Female,White,3619888,24578477
7,84,1897,50-59,Female,White,3106030,24578477
8,163,2311,60-69,Female,White,2925263,24578477
9,132,1630,40-49,Female,Black,172861,966690


In [28]:
df['Age_Distribution_of_Std_Pop'] = df['Standard_population']/df['Total_population']

In [29]:
df

Unnamed: 0,T2D CASE,TOTAL COHORT,AGE RANGE,SEX,ETHNICITY,Standard_population,Total_population,Age_Distribution_of_Std_Pop
0,36,1030,40-49,Male,White,3546474,23630918,0.150078
1,126,1510,50-59,Male,White,3055202,23630918,0.129288
2,256,2035,60-69,Male,White,2815166,23630918,0.119131
3,120,1244,40-49,Male,Black,156546,898200,0.174289
4,159,889,50-59,Male,Black,76303,898200,0.084951
5,190,572,60-69,Male,Black,28112,898200,0.031298
6,33,1217,40-49,Female,White,3619888,24578477,0.147279
7,84,1897,50-59,Female,White,3106030,24578477,0.126372
8,163,2311,60-69,Female,White,2925263,24578477,0.119017
9,132,1630,40-49,Female,Black,172861,966690,0.178817


### Here, the variable "Age_Distribution_of_Std_Pop" will be used as factor to multiply Crude Prevalence rate to get the final Age and Sex adjusted T2D prevalence rate estimate

## Crude T2D rate calculations for the Age, Sex and ethnicity bins

In [30]:
df['Crude_rate'] = (df['T2D CASE']/df['TOTAL COHORT'])*100

In [31]:
df

Unnamed: 0,T2D CASE,TOTAL COHORT,AGE RANGE,SEX,ETHNICITY,Standard_population,Total_population,Age_Distribution_of_Std_Pop,Crude_rate
0,36,1030,40-49,Male,White,3546474,23630918,0.150078,3.495146
1,126,1510,50-59,Male,White,3055202,23630918,0.129288,8.344371
2,256,2035,60-69,Male,White,2815166,23630918,0.119131,12.579853
3,120,1244,40-49,Male,Black,156546,898200,0.174289,9.646302
4,159,889,50-59,Male,Black,76303,898200,0.084951,17.885264
5,190,572,60-69,Male,Black,28112,898200,0.031298,33.216783
6,33,1217,40-49,Female,White,3619888,24578477,0.147279,2.711586
7,84,1897,50-59,Female,White,3106030,24578477,0.126372,4.428044
8,163,2311,60-69,Female,White,2925263,24578477,0.119017,7.053224
9,132,1630,40-49,Female,Black,172861,966690,0.178817,8.09816


### Age and Sex adjusted rate for each bin will be Crude T2D rate * Age_Distribution_of_Std_Pop factor

### **source link for the above rate calculation method ->>
https://seer.cancer.gov/seerstat/tutorials/aarates/step3.html

In [32]:
df['Adjusted_rate'] = df['Crude_rate']*df['Age_Distribution_of_Std_Pop']

In [33]:
df

Unnamed: 0,T2D CASE,TOTAL COHORT,AGE RANGE,SEX,ETHNICITY,Standard_population,Total_population,Age_Distribution_of_Std_Pop,Crude_rate,Adjusted_rate
0,36,1030,40-49,Male,White,3546474,23630918,0.150078,3.495146,0.524543
1,126,1510,50-59,Male,White,3055202,23630918,0.129288,8.344371,1.07883
2,256,2035,60-69,Male,White,2815166,23630918,0.119131,12.579853,1.498646
3,120,1244,40-49,Male,Black,156546,898200,0.174289,9.646302,1.68124
4,159,889,50-59,Male,Black,76303,898200,0.084951,17.885264,1.519371
5,190,572,60-69,Male,Black,28112,898200,0.031298,33.216783,1.039624
6,33,1217,40-49,Female,White,3619888,24578477,0.147279,2.711586,0.399359
7,84,1897,50-59,Female,White,3106030,24578477,0.126372,4.428044,0.559581
8,163,2311,60-69,Female,White,2925263,24578477,0.119017,7.053224,0.839455
9,132,1630,40-49,Female,Black,172861,966690,0.178817,8.09816,1.448092


## Now,
### 1. Crude rate for a particular category (such as White Males or Black Females) will be Total cases in that category / Total cohort size for that category

Example, for Black Females: Crude rate = Total cases / Total cohort size => 543 / 3587 = 15.14 (per 100 samples)


### 2. Adjusted rate for a particular category will be SUM of Adjusted rates for the bins in that category.

Example, for Black Females: Adjusted rate = Adj_rate(40-49)+Adj_rate(50-59)+Adj_rate(60-69) = 4.104 (per 100 samples)

       Crude_Prevalence  Adjusted_Prevalence
White  6.98              4.900413917
Black  16.08391608       8.34397393

## For US Census data from 2010 to 2020,
here are some important links for distribution by Age, Sex and Race variables

1. https://www.census.gov/newsroom/press-kits/2020/population-estimates-detailed.html
2. https://www.census.gov/library/visualizations/interactive/race-and-ethnicity-in-the-united-state-2010-and-2020-census.html

### Note: Feel free to plot the data to visualize interesting results