 CSV file found at: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2019/adult19csv.zip

Codebook for variables in data: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2019/adult-codebook.pdf

In [4]:
import pandas as pd

In [5]:
nhis = pd.read_csv('raw_data/adult19.csv')

In [6]:
nhis.head()

Unnamed: 0,URBRRL,RATCAT_A,INCGRP_A,INCTCFLG_A,FAMINCTC_A,IMPINCFLG_A,PPSU,PSTRAT,HISPALLP_A,RACEALLP_A,...,PROXYREL_A,PROXY_A,AVAIL_A,HHSTAT_A,INTV_QRT,RECTYPE,WTFA_A,WTIA_A,HHX,POVRATTC_A
0,4,9,3,0,60000,2,2,122,3,2,...,,,1,1,1,10,13177.008,7601.336,H048109,2.96
1,4,9,3,0,50000,0,2,122,2,1,...,,,1,1,1,10,6140.552,3344.434,H027044,2.97
2,4,12,3,0,65000,1,2,122,2,1,...,,,1,1,1,10,9191.061,6949.498,H058855,4.28
3,4,14,5,0,120000,0,2,122,1,8,...,,,1,1,1,10,7900.035,6446.327,H031993,7.13
4,1,4,1,0,30000,0,2,115,2,1,...,,,1,1,1,10,10875.772,8646.586,H007122,1.13


# Questions represented by Variables and page number from CodeBook: 

PREDIB_A (pg 117)- Has a doctor or other health professional EVER told you that you had
prediabetes or borderline diabetes?

GESDIB_A (pg 118)- Has a doctor or other health professional EVER told you that you had
gestational diabetes, a type of diabetes that occurs ONLY during pregnancy?

DIBEV_A (pg 119)- Has/Not including (gestational diabetes, prediabetes) a doctor or other health professional EVER told you that you had
diabetes?

DIBAGETC_A (pg 120)- Age first diagnosed w/diabetes

DIFYRSTC_A (pg 121)- Years since first diagnosed w/diabetes

DIBPILL_A (pg 122)- Are you NOW taking diabetic pills to lower your blood sugar? These are
sometimes called oral agents or oral hypoglycemic agents.

DIBINS_A (pg 123)- Insulin can be taken by shot or pump. Are you NOW taking insulin?

DIBINSTIME_A (pg 124)- Thinking back to when you were first diagnosed with diabetes, how long was it
before you started taking insulin?

 DIBINSSTOP_A (pg 125)- Since you started taking insulin, have you ever stopped taking it for more
than 6 months?

DIBINSSTYR_A (pg 126)- Was this only during the first year after you were diagnosed with diabetes?

DIBTYPE_A (pg 127)- 
According to your doctor or other health professional, what type of diabetes
do you have? Is it type 1, type 2, or some other type? If you don't remember
or weren't told, that's OK.

In [7]:
survey_df = pd.DataFrame(data=nhis)

In [8]:
survey_df.head()

Unnamed: 0,URBRRL,RATCAT_A,INCGRP_A,INCTCFLG_A,FAMINCTC_A,IMPINCFLG_A,PPSU,PSTRAT,HISPALLP_A,RACEALLP_A,...,PROXYREL_A,PROXY_A,AVAIL_A,HHSTAT_A,INTV_QRT,RECTYPE,WTFA_A,WTIA_A,HHX,POVRATTC_A
0,4,9,3,0,60000,2,2,122,3,2,...,,,1,1,1,10,13177.008,7601.336,H048109,2.96
1,4,9,3,0,50000,0,2,122,2,1,...,,,1,1,1,10,6140.552,3344.434,H027044,2.97
2,4,12,3,0,65000,1,2,122,2,1,...,,,1,1,1,10,9191.061,6949.498,H058855,4.28
3,4,14,5,0,120000,0,2,122,1,8,...,,,1,1,1,10,7900.035,6446.327,H031993,7.13
4,1,4,1,0,30000,0,2,115,2,1,...,,,1,1,1,10,10875.772,8646.586,H007122,1.13


In [9]:
total = len(survey_df.index)
print(total)

31997


In [10]:
diabetes = survey_df.loc[survey_df['DIBEV_A'] == 1]
print(diabetes)

       URBRRL  RATCAT_A  INCGRP_A  INCTCFLG_A  FAMINCTC_A  IMPINCFLG_A  PPSU  \
4           1         4         1           0       30000            0     2   
5           1         9         2           0       40000            0     2   
10          1         9         1           0       34301            2     3   
32          4         1         1           0        5000            0     1   
39          4         4         1           0       28900            0     1   
...       ...       ...       ...         ...         ...          ...   ...   
31948       4        14         4           0       92000            0    28   
31959       2        11         2           0       45000            1    48   
31965       4         6         1           0       20000            0    27   
31968       4         5         1           0       25000            0    27   
31974       4         3         1           0       12000            0    28   

       PSTRAT  HISPALLP_A  RACEALLP_A  

## Infomation about participants with diabetes in survey: 

In [11]:
total_w_diabetes = len(diabetes)
print("Number with diabetes: ",total_w_diabetes)

Number with diabetes:  3355


In [12]:
print("Percentage with diabetes in survey: ",(total_w_diabetes/total)*100,"%")

Percentage with diabetes in survey:  10.485358002312717 %


In [13]:
diabetes["AGEP_A"].replace({99: 0}) #replace response for don't know with 0 so age estimate is correct

4        60
5        78
10       63
32       60
39       74
         ..
31948    74
31959    74
31965    66
31968    79
31974    61
Name: AGEP_A, Length: 3355, dtype: int64

### Simple Demographics

In [14]:
print("Average age: ",diabetes["AGEP_A"].sum()/total_w_diabetes) 

Average age:  64.05365126676602


In [15]:
males = len(diabetes.loc[diabetes['SEX_A'] == 1])
print("Number of males with diabetes: ",males)
print("Percentage: ",males/total_w_diabetes*100,"%")

Number of males with diabetes:  1618
Percentage:  48.22652757078987 %


In [16]:
females = len(diabetes.loc[diabetes['SEX_A'] == 2])
print("Number of females with diabetes: ",females)
print("Percentage: ",females/total_w_diabetes*100,"%")

Number of females with diabetes:  1737
Percentage:  51.77347242921013 %


### Education levels:

In [17]:
did_not_grad = len(diabetes.loc[diabetes['EDUC_A'] == 0])+ len(diabetes.loc[diabetes['EDUC_A'] == 1])+ len(diabetes.loc[diabetes['EDUC_A'] == 2])
print("Number with diabetes that did not graduate from high school: ",did_not_grad)
print("Percentage: ",did_not_grad/total_w_diabetes *100,"%")

Number with diabetes that did not graduate from high school:  560
Percentage:  16.691505216095383 %


In [18]:
high_school = len(diabetes.loc[diabetes['EDUC_A'] == 3])+ len(diabetes.loc[diabetes['EDUC_A'] == 4])+len(diabetes.loc[diabetes['EDUC_A'] == 5])
print("Number with diabetes that did graduate from high school: ",high_school)
print("Percentage: ",high_school/total_w_diabetes *100,"%")

Number with diabetes that did graduate from high school:  1562
Percentage:  46.557377049180324 %


In [19]:
college_grad = len(diabetes.loc[diabetes['EDUC_A'] == 6])+ len(diabetes.loc[diabetes['EDUC_A'] == 7])+ len(diabetes.loc[diabetes['EDUC_A'] == 8])
print("Number with diabetes that graduated from a college program: ", college_grad)
print("Percentage: ",college_grad/total_w_diabetes *100,"%")

Number with diabetes that graduated from a college program:  924
Percentage:  27.54098360655738 %


In [20]:
advanced_degree = len(diabetes.loc[diabetes['EDUC_A'] == 9])+ len(diabetes.loc[diabetes['EDUC_A'] == 10])+ len(diabetes.loc[diabetes['EDUC_A'] == 11])
print("Number with diabetes that recieved an advanced or professional degree: ", advanced_degree)
print("Percentage: ",advanced_degree/total_w_diabetes *100,"%")

Number with diabetes that recieved an advanced or professional degree:  283
Percentage:  8.435171385991058 %


### Comorbidity

In [21]:
high_bp = len(diabetes.loc[diabetes['HYPEV_A'] == 1])
print("Number with Diabetes AND High Blood Pressure: ",high_bp )
print("Percentage with both: ",high_bp/total_w_diabetes*100,"%")

Number with Diabetes AND High Blood Pressure:  2522
Percentage with both:  75.17138599105813 %


In [22]:
high_chol = len(diabetes.loc[diabetes['CHLEV_A'] == 1])
print("Number with Diabetes AND High Cholesterol: ",high_chol )
print("Percentage with both: ",high_chol/total_w_diabetes*100,"%")

Number with Diabetes AND High Cholesterol:  2129
Percentage with both:  63.4575260804769 %


In [23]:
heart_dis = len(diabetes.loc[diabetes['CHDEV_A'] == 1])
print("Number with Diabetes AND Coronary Heart Disease: ",heart_dis )
print("Percentage with both: ",heart_dis/total_w_diabetes*100,"%")

Number with Diabetes AND Coronary Heart Disease:  603
Percentage with both:  17.97317436661699 %


In [24]:
any_cancer = len(diabetes.loc[diabetes['CANEV_A'] == 1])
print("Number with Diabetes AND Cancer: ",any_cancer )
print("Percentage with both: ",any_cancer/total_w_diabetes*100,"%")

Number with Diabetes AND Cancer:  599
Percentage with both:  17.853949329359164 %


### Diabetes Treatment Information

In [25]:
pills = len(diabetes.loc[diabetes['DIBPILL_A'] == 1])
print("Number taking pills for treatment: ",pills)
print("Percentage: ",pills/total_w_diabetes*100,"%")

Number taking pills for treatment:  2456
Percentage:  73.20417287630401 %


In [26]:
insulin = len(diabetes.loc[diabetes['DIBINS_A'] == 1])
print("Number taking insulin for treatment: ",insulin)
print("Percentage: ", insulin/total_w_diabetes*100,"%")

Number taking insulin for treatment:  1069
Percentage:  31.862891207153503 %


In [27]:
no_insulin = diabetes.loc[diabetes['DIBINS_A'] == 2]
no_treat = len(no_insulin.loc[no_insulin['DIBPILL_A'] == 2])
print("Number NOT taking pills OR insulin for treatment: ",no_treat)
print("Percentage NOT taking treatment: ",no_treat/total_w_diabetes*100,"%")

Number NOT taking pills OR insulin for treatment:  409
Percentage NOT taking treatment:  12.190760059612519 %


### Looking at treatment and education level

In [28]:
no_treat = no_insulin.loc[no_insulin['DIBPILL_A'] == 2]
no_treat_total = len(no_insulin.loc[no_insulin['DIBPILL_A'] == 2])
no_treat_no_degree = len(no_treat.loc[no_treat['EDUC_A'] == 0]+ no_treat.loc[no_treat['EDUC_A'] == 1]+no_treat.loc[no_treat['EDUC_A'] == 2])
print("Number with diabetes that do not take treatment and did not graduate high school: ",no_treat_no_degree)
print("Percentage: ",no_treat_no_degree/no_treat_total*100)

Number with diabetes that do not take treatment and did not graduate high school:  66
Percentage:  16.13691931540342


In [29]:
no_treat_hs_degree = len(no_treat.loc[no_treat['EDUC_A'] == 3]+ no_treat.loc[no_treat['EDUC_A'] == 4]+no_treat.loc[no_treat['EDUC_A'] == 5])
print("Number with diabetes that do not take treatment and graduated high school: ",no_treat_hs_degree)
print("Percentage: ",no_treat_hs_degree/no_treat_total*100)

Number with diabetes that do not take treatment and graduated high school:  168
Percentage:  41.075794621026894


### Insurance Data 

In [34]:
insurance = len(diabetes.loc[diabetes['NOTCOV_A'] == 2])
no_insurance = len(diabetes.loc[diabetes['NOTCOV_A'] == 1])
print("Number with diabetes with health insurance coverage: ", insurance)
print("Number with diabetes without health insurance coverage: ", no_insurance )


Number with diabetes with health insurance coverage:  3189
Number with diabetes without health insurance coverage:  158


In [41]:
percentage_insurance = insurance/total_w_diabetes*100
print("Perecentage with insurance:",percentage_insurance,"%")

pencentage_no_insurance = no_insurance/total_w_diabetes*100
print("Pencentage without insurance", pencentage_no_insurance,"%")

Perecentage with insurance: 95.0521609538003 %
Pencentage without insurance 4.709388971684054 %


Insurance data also contains information about private, medicare, medicaid, other types.
Insurance information is also split by age 65. 

Includes many questions about medicare coverage for 65+ starting on page 175 of codebook

In [49]:
income = sum(diabetes['FAMINCTC_A'])

avg_income = income/total_w_diabetes

print(avg_income)



53204.67421758569


Income data starts on page 501 of codebook