# Analysis of Biological Resilience in the Aging Population
**Author**: Dr. Jose Alvarez Bustamante, PhD
**Field:** Applied Data Science & Biophysics
**Target Sector**: Silver Economy / Health-Tech / Insurance

---

## 1. Project Overview and Business Value

As the global population shift towards and older demographic, often referred to as the **"Silver Economy"**, understanding aging chronological years is vital.

This project goes beyond simple data visualization. It aims to model **Homestatic Stability** (resilience) using biomarkers. For healthcare providers and insurance companies, identifying individuals with high biological resilience means better risk assessment and personalized intervention strategies.

## 2. Research Question

*Can we identify a "Biological Age" signature that predicts health stability better than chronological age alone using NHANES data?*

## 3. Methodology

We apply a **Biophysical approach** to data science, treating the human body as a complex system. We will use:

* **NHANES 2017-2018 Dataset**: The gold standard for health data.
* **Feature Engineering**: Creating composite indices for Cardiovascular and Metabolic health.
* **Statistical Modeling**: Correlation analysis and predictive modeling of health outcomes.




In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from nhanes.load import load_NHANES_data

# Set aesthetics for professional plots
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = [10, 6]

# Loading the dataset
print("Loading NHANES 2017-2018 data...")
df = load_NHANES_data(year='2017-2018')

# Filtering for our target demographic (Silver Economy: Adults 60+)
silver_df = df[df['AgeInYearsAtScreening'] >= 60].copy()

print(f"Dataset loaded. Total participants over 60 years old: {len(silver_df)}")
silver_df.head()

Loading NHANES 2017-2018 data...
Dataset loaded. Total participants over 60 years old: 2018


Unnamed: 0_level_0,GeneralHealthCondition,EverBreastfedOrFedBreastmilk,AgeStoppedBreastfeedingdays,AgeFirstFedFormuladays,AgeStoppedReceivingFormuladays,AgeStartedOtherFoodbeverage,AgeFirstFedMilkdays,TypeOfMilkFirstFedWholeMilk,TypeOfMilkFirstFed2Milk,TypeOfMilkFirstFed1Milk,...,DaysSmokedCigsDuringPast30Days,AvgCigarettesdayDuringPast30Days,TriedToQuitSmoking,TimesStoppedSmokingCigarettes,HowLongWereYouAbleToStopSmoking,UnitOfMeasureDayweekmonthyear_2_SMQ,CurrentSelfreportedHeightInches,CurrentSelfreportedWeightPounds,TriedToLoseWeightInPastYear,TimesLost10LbsOrMoreToLoseWeight
SEQN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
93705.0,Good,,,,,,,,,,...,,,,,,,63.0,165.0,0.0,11 times or more
93708.0,Good,,,,,,,,,,...,,,,,,,60.0,118.0,0.0,Never
93709.0,,,,,,,,,,,...,30.0,5.0,0.0,,,,62.0,200.0,0.0,6 to 10
93713.0,Very good,,,,,,,,,,...,30.0,15.0,1.0,3.0,1.0,Days,70.0,165.0,0.0,Never
93715.0,Fair or,,,,,,,,,,...,30.0,25.0,1.0,3.0,2.0,Days,68.0,154.0,0.0,Never


## 4. Exploratory Data Analysis (EDA) - Statistical Distribution

In this section, we examine the distribution of key biomarkers within 60+ population.

As a physicist, I look for "outliers" and "variance" of the system, which in biological terms represents the diversity of the aging process.

We will focus on three primary indicators:

1. **BMI (Body Mass Index)**: A proxy for metabolic health.
2. **Systolic Blood Pressure**: An indicator of cardiovascular arterial stiffness.
3. **Glycohemoglobin** (HbA1c)**: A long-term meausure of blood sugar stability.

In [15]:
# Checking for missing values in our key biomarkers

# specific feature engineering for BP
bp_cols = ['SystolicBloodPres1StRdgMmHg', 'SystolicBloodPres2NdRdgMmHg', 'SystolicBloodPres3RdRdgMmHg']
silver_df['SystolicBloodPressure'] = silver_df[bp_cols].mean(axis=1)

target_columns = ['AgeInYearsAtScreening', 'BodyMassIndexKgm2', 'SystolicBloodPressure', 'Glycohemoglobin']

# Let's see if those names are correct for this dataset version
# We print the existing columns to be sure
print("Columns available in the dataset:")
print(silver_df.columns.tolist())

# Calculating missing values

missing_data = silver_df[target_columns].isnull().sum()
print("\nMissing values in key biomarkers:")
print(missing_data)

Columns available in the dataset:
['GeneralHealthCondition', 'EverBreastfedOrFedBreastmilk', 'AgeStoppedBreastfeedingdays', 'AgeFirstFedFormuladays', 'AgeStoppedReceivingFormuladays', 'AgeStartedOtherFoodbeverage', 'AgeFirstFedMilkdays', 'TypeOfMilkFirstFedWholeMilk', 'TypeOfMilkFirstFed2Milk', 'TypeOfMilkFirstFed1Milk', 'TypeOfMilkFirstFedFatFreeMilk', 'TypeOfMilkFirstFedSoyMilk', 'TypeOfMilkFirstFedOther', 'HowHealthyIsTheDiet', 'Past30DayMilkProductConsumption', 'YouDrinkWholeOrRegularMilk', 'YouDrink2FatMilk', 'YouDrink1FatMilk', 'YouDrinkFatFreeskimMilk', 'YouDrinkSoyMilk', 'YouDrinkAnotherTypeOfMilk', 'RegularMilkUse5TimesPerWeek', 'HowOftenDrankMilkAge512', 'HowOftenDrankMilkAge1317', 'HowOftenDrankMilkAge1835', 'CommunitygovernmentMealsDelivered', 'EatMealsAtCommunityseniorCenter', 'AttendKindergartenThruHighSchool', 'SchoolServesSchoolLunches', 'OfTimesweekGetSchoolLunch', 'SchoolLunchFreeReducedOrFullPrice', 'SchoolServeCompleteBreakfastEachDay', 'OfTimesweekGetSchoolBreakfas

KeyError: "['BodyMassIndex', 'SystolicBloodPressure'] not in index"