# Prevalence of Diabetes Among Adults in CA

In this analysis notebook, I will look specifically into differences in prevalence of diabetes across the age groups in the data set.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import math

In [2]:
# Read in the data set:

diabetes_df = pd.read_csv('../data/Cleaned/diabetes_CLEANED.csv')

In [3]:
# Checking the data type:

diabetes_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 147 entries, 0 to 146
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Geography       147 non-null    object 
 1   year            147 non-null    int64  
 2   category        147 non-null    object 
 3   category_name   147 non-null    object 
 4   percent         147 non-null    float64
 5   lower_cl        147 non-null    float64
 6   upper_cl        147 non-null    float64
 7   Standard Error  147 non-null    float64
dtypes: float64(4), int64(1), object(3)
memory usage: 9.3+ KB


In [4]:
# Looking at the columns we have:

diabetes_df.columns

Index(['Geography', 'year', 'category', 'category_name', 'percent', 'lower_cl',
       'upper_cl', 'Standard Error'],
      dtype='object')

### Observations:

* The age groups are the same as those in the original adult depression rates!
* I know I want to look at differences among age groups, but before that I will quickly organize the data and see if I can glean any interesting info from looking at the data this way:

In [5]:
# Using a groupby function to organize the data by year, category & name, percent:

diabetes_df.groupby(['year','category','category_name'])['percent'].mean().to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,percent
year,category,category_name,Unnamed: 3_level_1
2012,Age,18 to 34 years,1.8
2012,Age,35 to 44 years,6.2
2012,Age,45 to 54 years,9.7
2012,Age,55 to 64 years,17.9
2012,Age,65 years and above,20.1
...,...,...,...
2018,Race-Ethnicity,Hispanic,12.1
2018,Race-Ethnicity,White,8.4
2018,Sex,Female,10.3
2018,Sex,Male,10.5


### Observations:

We have data from 2012-2018! Just looking at the very first year, I can tell this factor will be one I include in my data story.

The two oldest age groups experience significantly higher prevalence of diabetes, wow. They're both in two-digits. Let's dive deeper into differences among age groups with an 'Age' Filter.

In [6]:
# Creating an age_filter:

age_filter = diabetes_df['category'] == 'Age'

age_diabetes_df = diabetes_df[age_filter]

age_grouped = age_diabetes_df.groupby(['year', 'category_name'])['percent'].sum().unstack()

age_grouped 

category_name,18 to 34 years,35 to 44 years,45 to 54 years,55 to 64 years,65 years and above
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012,1.8,6.2,9.7,17.9,20.1
2013,1.5,5.1,12.0,18.3,20.7
2014,1.6,5.4,10.2,18.2,21.8
2015,1.6,5.5,10.0,17.3,20.5
2016,1.5,4.3,10.9,17.3,24.2
2017,1.2,3.7,8.4,15.9,21.3
2018,4.0,5.1,11.6,17.7,22.4


Yup, it definitely looks like diabetes increases with age. However, it's interesting to see that for the 18-34 age group, there's been an overall increase in diabetes.

Although the 55-64 age group didn't experience an overall increase, it's still at a much higher prevalence among these people than the other 3 younger age ranges.

In [7]:
# Looking at means:

diabetes_df = age_grouped.mean(axis=0)

diabetes_df

category_name
18 to 34 years         1.885714
35 to 44 years         5.042857
45 to 54 years        10.400000
55 to 64 years        17.514286
65 years and above    21.571429
dtype: float64

### Observations:

* Diabetes increases with age
* Could be another factor in mental health, other health conditions that have a higher likelihood of affecting older individuals 
* Acts as another potential stress factor

In [8]:
# Only highlighting the 55 to 64 year age group:

col_use = ['55 to 64 years']
diabetes = age_grouped[col_use]
diabetes

category_name,55 to 64 years
year,Unnamed: 1_level_1
2012,17.9
2013,18.3
2014,18.2
2015,17.3
2016,17.3
2017,15.9
2018,17.7


With that, I have gained more insights about the prevalence of diabetes by age group. This is going to be another factor I can point to, besides adverse childhood experiences, that connects back to mental health. Overall, we saw that the older age groups report higher prevalence of diabetes, but for the 18-34 year-olds, although it's at low rates, their percentages increased substantially. 

For 55-64 age range, we see very high percentages across the board, but relatively little or no increase overall. I will discuss this in my final data story, but I'm intrigued about this: maybe these older age groups have the time and money to invest in learning more about their mental and physical health? They can afford to take the time and meet with health professionals and then try to put effort into their minds and bodies for a healthier future. 

I'm excited to talk about this and point to other factors in my data story. This notebook has been very helpful for me to characterize the depression trends among age groups and understand the 55-64 year-olds better.

Thank you for looking through this, and let's move onto the next analysis notebook!