# Project Title : Seasonal Variation in Aging-Associated Health Measures: Alzheimer's and Mental Health Patterns

# **Introduction**
# Aging populations face a variety of health challenges, particularly the growing prevalence of neurodegenerative diseases like Alzheimer's and increasing mental health concerns. Research suggests that these conditions may vary seasonally, influenced by environmental and temporal factors. This project analyzes seasonal patterns in aging-associated health metrics by focusing on Alzheimer's disease and mental health trends across different years and locations. Using data visualization techniques, the dataset reveals trends in cognitive decline and mental health conditions, exploring how factors such as location and time periods contribute to variations in these critical health measures. These visual insights provide a deeper understanding of the broader public health impact of aging-related conditions.

# **Objective**
# To analyze seasonal and yearly patterns in aging-associated health metrics, specifically focusing on Alzheimer's disease prevalence and mental health concerns such as frequent mental distress. By leveraging stratified data on age, gender, race, and geographic location, the analysis will identify trends and correlations between these variables. The goal is to uncover key patterns or disparities that may assist policymakers and healthcare professionals in targeting interventions more effectively.

# Importing required Libraries

In [91]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Data Collection

In [92]:
df=pd.read_csv('/content/drive/MyDrive/Alzheimer_s_Disease_and_Healthy_Aging_Data (1).csv')

In [93]:
df

Unnamed: 0,RowId,YearStart,YearEnd,LocationAbbr,LocationDesc,Datasource,Class,Topic,Question,Data_Value_Unit,...,Stratification2,Geolocation,ClassID,TopicID,QuestionID,LocationID,StratificationCategoryID1,StratificationID1,StratificationCategoryID2,StratificationID2
0,BRFSS~2022~2022~42~Q03~TMC01~AGE~RACE,2022,2022,PA,Pennsylvania,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,...,Native Am/Alaskan Native,POINT (-77.86070029 40.79373015),C05,TMC01,Q03,42,AGE,5064,RACE,NAA
1,BRFSS~2022~2022~46~Q03~TMC01~AGE~RACE,2022,2022,SD,South Dakota,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,...,Asian/Pacific Islander,POINT (-100.3735306 44.35313005),C05,TMC01,Q03,46,AGE,65PLUS,RACE,ASN
2,BRFSS~2022~2022~16~Q03~TMC01~AGE~RACE,2022,2022,ID,Idaho,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,...,"Black, non-Hispanic",POINT (-114.36373 43.68263001),C05,TMC01,Q03,16,AGE,65PLUS,RACE,BLK
3,BRFSS~2022~2022~24~Q03~TMC01~AGE~RACE,2022,2022,MD,Maryland,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,...,"Black, non-Hispanic",POINT (-76.60926011 39.29058096),C05,TMC01,Q03,24,AGE,65PLUS,RACE,BLK
4,BRFSS~2022~2022~55~Q03~TMC01~AGE~GENDER,2022,2022,WI,Wisconsin,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,...,Male,POINT (-89.81637074 44.39319117),C05,TMC01,Q03,55,AGE,65PLUS,GENDER,MALE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
284137,BRFSS~2016~2016~55~Q15~TSC02~AGE~RACE,2016,2016,WI,Wisconsin,BRFSS,Screenings and Vaccines,Colorectal cancer screening,Percentage of older adults who had either a ho...,%,...,"Black, non-Hispanic",POINT (-89.81637074 44.39319117),C03,TSC02,Q15,55,AGE,AGE_OVERALL,RACE,BLK
284138,BRFSS~2017~2017~56~Q45~TOC13~AGE~RACE,2017,2017,WY,Wyoming,BRFSS,Overall Health,Fair or poor health among older adults with ar...,Fair or poor health among older adults with do...,%,...,Hispanic,POINT (-108.1098304 43.23554134),C01,TOC13,Q45,56,AGE,5064,RACE,HIS
284139,BRFSS~2015~2015~56~Q42~TCC04~AGE~RACE,2015,2015,WY,Wyoming,BRFSS,Cognitive Decline,Talked with health care professional about sub...,Percentage of older adults with subjective cog...,%,...,Asian/Pacific Islander,POINT (-108.1098304 43.23554134),C06,TCC04,Q42,56,AGE,AGE_OVERALL,RACE,ASN
284140,BRFSS~2019~2019~54~Q46~TOC10~AGE~RACE,2019,2019,WV,West Virginia,BRFSS,Overall Health,"Disability status, including sensory or mobili...",Percentage of older adults who report having a...,%,...,Hispanic,POINT (-80.71264013 38.6655102),C01,TOC10,Q46,54,AGE,65PLUS,RACE,HIS


In [94]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284142 entries, 0 to 284141
Data columns (total 31 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   RowId                       284142 non-null  object 
 1   YearStart                   284142 non-null  int64  
 2   YearEnd                     284142 non-null  int64  
 3   LocationAbbr                284142 non-null  object 
 4   LocationDesc                284142 non-null  object 
 5   Datasource                  284142 non-null  object 
 6   Class                       284142 non-null  object 
 7   Topic                       284142 non-null  object 
 8   Question                    284142 non-null  object 
 9   Data_Value_Unit             284142 non-null  object 
 10  DataValueTypeID             284142 non-null  object 
 11  Data_Value_Type             284142 non-null  object 
 12  Data_Value                  192808 non-null  float64
 13  Data_Value_Alt

In [95]:
df.shape

(284142, 31)

In [96]:
df.describe()

Unnamed: 0,YearStart,YearEnd,Data_Value,Data_Value_Alt,Low_Confidence_Limit,High_Confidence_Limit,LocationID
count,284142.0,284142.0,192808.0,192808.0,192597.0,192597.0,284142.0
mean,2018.596065,2018.657735,37.676757,37.676757,33.027824,42.595333,800.322677
std,2.302815,2.360105,25.213484,25.213484,24.290016,26.156408,2511.564977
min,2015.0,2015.0,0.0,0.0,-0.7,1.3,1.0
25%,2017.0,2017.0,15.9,15.9,12.6,19.7,19.0
50%,2019.0,2019.0,32.8,32.8,27.0,38.9,34.0
75%,2021.0,2021.0,56.9,56.9,49.4,64.6,49.0
max,2022.0,2022.0,100.0,100.0,99.6,100.0,9004.0


# Data Preprocessing

# Data Cleaning

In [97]:
df.isnull().sum()

Unnamed: 0,0
RowId,0
YearStart,0
YearEnd,0
LocationAbbr,0
LocationDesc,0
Datasource,0
Class,0
Topic,0
Question,0
Data_Value_Unit,0


In [98]:
df.drop(columns=['Data_Value','Data_Value_Alt', 'Data_Value_Footnote_Symbol', 'Data_Value_Footnote', 'Low_Confidence_Limit','High_Confidence_Limit'],inplace=True)

In [99]:
df.isnull().sum()

Unnamed: 0,0
RowId,0
YearStart,0
YearEnd,0
LocationAbbr,0
LocationDesc,0
Datasource,0
Class,0
Topic,0
Question,0
Data_Value_Unit,0


In [100]:
df[['StratificationCategory2', 'Stratification2', 'Geolocation']] = df[['StratificationCategory2', 'Stratification2', 'Geolocation']].apply(lambda x: x.fillna(x.mode()[0]))


In [101]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284142 entries, 0 to 284141
Data columns (total 25 columns):
 #   Column                     Non-Null Count   Dtype 
---  ------                     --------------   ----- 
 0   RowId                      284142 non-null  object
 1   YearStart                  284142 non-null  int64 
 2   YearEnd                    284142 non-null  int64 
 3   LocationAbbr               284142 non-null  object
 4   LocationDesc               284142 non-null  object
 5   Datasource                 284142 non-null  object
 6   Class                      284142 non-null  object
 7   Topic                      284142 non-null  object
 8   Question                   284142 non-null  object
 9   Data_Value_Unit            284142 non-null  object
 10  DataValueTypeID            284142 non-null  object
 11  Data_Value_Type            284142 non-null  object
 12  StratificationCategory1    284142 non-null  object
 13  Stratification1            284142 non-null  

In [102]:
df.isnull().sum()

Unnamed: 0,0
RowId,0
YearStart,0
YearEnd,0
LocationAbbr,0
LocationDesc,0
Datasource,0
Class,0
Topic,0
Question,0
Data_Value_Unit,0


In [103]:
df.duplicated().sum()

0

# Dividing the dataset based on their datatype

In [104]:
def divide_dataset_dtype(df):
  numerical_df = df.select_dtypes(include=['int64','float64'])
  categorical_df = df.select_dtypes(include=['object'])
  return numerical_df, categorical_df

numerical_df, categorical_df = divide_dataset_dtype(df)

In [105]:
numerical_df.head()

Unnamed: 0,YearStart,YearEnd,LocationID
0,2022,2022,42
1,2022,2022,46
2,2022,2022,16
3,2022,2022,24
4,2022,2022,55


In [112]:
numerical_df.shape

(284142, 3)

In [106]:
numerical_df.columns

Index(['YearStart', 'YearEnd', 'LocationID'], dtype='object')

In [107]:
categorical_df.head()

Unnamed: 0,RowId,LocationAbbr,LocationDesc,Datasource,Class,Topic,Question,Data_Value_Unit,DataValueTypeID,Data_Value_Type,...,StratificationCategory2,Stratification2,Geolocation,ClassID,TopicID,QuestionID,StratificationCategoryID1,StratificationID1,StratificationCategoryID2,StratificationID2
0,BRFSS~2022~2022~42~Q03~TMC01~AGE~RACE,PA,Pennsylvania,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,PRCTG,Percentage,...,Race/Ethnicity,Native Am/Alaskan Native,POINT (-77.86070029 40.79373015),C05,TMC01,Q03,AGE,5064,RACE,NAA
1,BRFSS~2022~2022~46~Q03~TMC01~AGE~RACE,SD,South Dakota,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,PRCTG,Percentage,...,Race/Ethnicity,Asian/Pacific Islander,POINT (-100.3735306 44.35313005),C05,TMC01,Q03,AGE,65PLUS,RACE,ASN
2,BRFSS~2022~2022~16~Q03~TMC01~AGE~RACE,ID,Idaho,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,PRCTG,Percentage,...,Race/Ethnicity,"Black, non-Hispanic",POINT (-114.36373 43.68263001),C05,TMC01,Q03,AGE,65PLUS,RACE,BLK
3,BRFSS~2022~2022~24~Q03~TMC01~AGE~RACE,MD,Maryland,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,PRCTG,Percentage,...,Race/Ethnicity,"Black, non-Hispanic",POINT (-76.60926011 39.29058096),C05,TMC01,Q03,AGE,65PLUS,RACE,BLK
4,BRFSS~2022~2022~55~Q03~TMC01~AGE~GENDER,WI,Wisconsin,BRFSS,Mental Health,Frequent mental distress,Percentage of older adults who are experiencin...,%,PRCTG,Percentage,...,Gender,Male,POINT (-89.81637074 44.39319117),C05,TMC01,Q03,AGE,65PLUS,GENDER,MALE


In [108]:
categorical_df.columns

Index(['RowId', 'LocationAbbr', 'LocationDesc', 'Datasource', 'Class', 'Topic',
       'Question', 'Data_Value_Unit', 'DataValueTypeID', 'Data_Value_Type',
       'StratificationCategory1', 'Stratification1', 'StratificationCategory2',
       'Stratification2', 'Geolocation', 'ClassID', 'TopicID', 'QuestionID',
       'StratificationCategoryID1', 'StratificationID1',
       'StratificationCategoryID2', 'StratificationID2'],
      dtype='object')

In [118]:
cleaned_df=df.to_csv('cleaned_data.csv', index=False)

In [None]:
cleaned_df