# PopHealth Observatory Exploratory Workbook

This notebook provides a comprehensive, sandbox-style environment for exercising and validating functionality of the `PopHealthObservatory` / `NHANESExplorer` classes.

Sections include: introspection, synthetic data generation, analytical method smoke tests, stratification, trend evaluation, visualization, edge cases, performance profiling, reproducibility, exports, quality assertions, and an end-to-end integration workflow.

> Note: This workbook creates synthetic data for testing; it does not rely solely on live NHANES pulls except in optional sections.

## 1. Environment & Library Imports

Import core scientific, profiling, and inspection libraries plus the project modules.

In [1]:
# Core imports
import os, sys, time, math, json, inspect, gc, tempfile, hashlib, statistics
from pathlib import Path
from datetime import datetime, timedelta
from typing import List, Dict, Any, Callable, Optional

# Data & analysis libs
import numpy as np
import pandas as pd

# Visualization (optional imports guarded)
try:
    import matplotlib.pyplot as plt
    import seaborn as sns
except Exception as e:
    print(f"Matplotlib/Seaborn not available: {e}")

try:
    import plotly.express as px
except Exception:
    px = None

# Profiling
import cProfile, pstats

# Ensure project root on path
ROOT = Path.cwd()
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

print(f"Notebook started: {datetime.utcnow().isoformat()}Z")
print(f"Python: {sys.version.split()[0]} | pandas: {pd.__version__} | numpy: {np.__version__}")

Notebook started: 2025-09-14T09:47:31.999335Z
Python: 3.13.7 | pandas: 2.3.2 | numpy: 2.3.3


  print(f"Notebook started: {datetime.utcnow().isoformat()}Z")


## 2. Load or Import Observatory Classes
Attempt to import the classes; fall back with a clear message if not available.

In [2]:
try:
    from pophealth_observatory import PopHealthObservatory, NHANESExplorer
    print("Imported PopHealthObservatory & NHANESExplorer successfully.")
except Exception as e:
    PopHealthObservatory = None
    NHANESExplorer = None
    print(f"Import failed: {e}")

print('PopHealthObservatory available:', PopHealthObservatory is not None)
print('NHANESExplorer available:', NHANESExplorer is not None)

Imported PopHealthObservatory & NHANESExplorer successfully.
PopHealthObservatory available: True
NHANESExplorer available: True


In [5]:
from pophealth_observatory.observatory import NHANESExplorer
exp = NHANESExplorer()
merged = exp.create_merged_dataset('2017-2018')
print(merged.head())


Creating merged dataset for 2017-2018...
Trying demographics URL: https://wwwn.cdc.gov/Nchs/Data/Nhanes/Public/2017/DataFiles/DEMO_J.xpt
✓ Success loading demographics from: https://wwwn.cdc.gov/Nchs/Data/Nhanes/Public/2017/DataFiles/DEMO_J.xpt
Trying BMX URL: https://wwwn.cdc.gov/Nchs/Data/Nhanes/Public/2017/DataFiles/BMX_J.xpt
✓ Success loading BMX from: https://wwwn.cdc.gov/Nchs/Data/Nhanes/Public/2017/DataFiles/BMX_J.xpt
Trying BPX URL: https://wwwn.cdc.gov/Nchs/Data/Nhanes/Public/2017/DataFiles/BPX_J.xpt
✓ Success loading BPX from: https://wwwn.cdc.gov/Nchs/Data/Nhanes/Public/2017/DataFiles/BPX_J.xpt
Merged dataset created with 9254 participants and 23 variables
   participant_id  gender  age_years  race_ethnicity  education  \
0         93703.0     2.0        2.0             6.0        NaN   
1         93704.0     1.0        2.0             3.0        NaN   
2         93705.0     2.0       66.0             4.0        2.0   
3         93706.0     1.0       18.0             6.0    

In [8]:

## Generate Manifest (XPT only, 2010-2022)
manifest = exp.get_detailed_component_manifest(
    as_dataframe=True,
    year_range=("2010","2022"),
    file_types=["XPT"],
)
print(manifest['summary_counts'])

{'Demographics': {'XPT': 7}, 'Examination': {'XPT': 105}, 'Laboratory': {'XPT': 440}, 'Dietary': {'XPT': 79}, 'Questionnaire': {'XPT': 286}}
