Please note: For the public repository, data/ has been omitted to respect data privacy/licensing.
The data/ directory contains the datasets used in the project. It includes two subdirectories:
processed/
Interim data files generated manually or by a script within this repository
The assets/ directory contains final, presentation-ready tables, figures, and slides
figures/
Saved figures in PNG (raster/pixel) and PDF (vector) formats by interactive Dash apps
slides/
Summary slides
The src/ directory contains the source code for this project. It is organized into the following subdirectories:
get_stage_list()returns the list of names of stages stratified by biomarkers Ab42, amyloid PET, (p-Tau or t-Tau)demographics_characteristics()computes a summary statistics table of the study populationmultiple_linear_regression()fits a multiple linear regression model and returns model statisticscluster_corr_df()hierarchially clusters a correlation matrixget_linkage_methods()returns list of available linkage methods for hierarchical clusteringget_cluster_criteria()returns list of available cluster criteria for hierarchical clustering
remove_diagonal()masks the diagonal of a square matrix with NaNfill_mirror()fills a triangular matrix to a square matrix with its transposemask_outlier()returns a mask that removes outliers when applied. Outliers are determined by Local Outlier Factorcorr_remove_outliers()computes the outlier-removed correlation coefficient for each pair of variables; returns a correlation matrixdialog_select_directory()prompts the user to select a directory in a dialog selection window and returns its absolute pathdialog_select_file()prompts the user to select a file in a dialog selection window and returns its absolute pathstandard_layout()configures a standard layout for plot template, axes, and fontsadd_box()draws a box plot on top of a strip plotadd_pairwise_comparison()annotates pairwise comparison results on top of a strip plot or box plotannotation_t_test()computes the p-value from independent-sample t-testannotation_cohens_d()computes Cohen's d effect sizeannotation_tukey()performs Tukey's multiple comparison post-hoc to obtain the p-value
tukey()conducts Tukey's multiple comparison post-hoc in R for a single dependent variabletukey_multiple_dvs()conducts Tukey's multiple comparison post-hoc in R sequentially for a list of dependent variables; returns a table of the resultant p-values- Input:
VITALS_14Jul2023.csv - Output:
bmi.csv - Input:
ADNI_HAASS_WASHU_LAB_13Jul2023.csv - Output:
strem2.csv - Input:
ADNIMERGE_14Jul2023.csv,bmi.csv - Output:
demographics.csv - Input:
ADNIMERGE_14Jul2023.csv,bmi.csv - Output:
demographics_tau.csv - Input:
ADNIMERGE_14Jul2023.csv,bmi.csv,strem2.csv - Output:
demographics_biomarkers.csv - Input:
ADNIMERGE_14Jul2023.csv - Output:
converters.csv - Input:
ADNIMERGE_14Jul2023.csv - Output:
converters_to_ad.csv - Input from ARIC server:
ARIC_NP/DATA_NP/Visits/Visit 5/derive54_np.sas7bdat,DATA_NP/Visits/Visit 5/derive_ncs51_np.sas7bdat,DATA_NP/Visits/Visit 1/derive13_np.sas7bdat - Input:
all_eleigible_samples_AS2021_25v3.xlsx,lipoproteins_6_29_23.csv,dictionary.csv - Output:
lipoprotein_list.csv,pilot.csv,demographic_characteristics.csv - Input from ARIC server:
ARIC_NP/DATA_NP/Visits/Visit 5/derive54_np.sas7bdat,DATA_NP/Visits/Visit 5/derive_ncs51_np.sas7bdat,DATA_NP/Visits/Visit 1/derive13_np.sas7bdat,DATA_NP/Visits/MultiVisit/V5_V11 Longitudinal MRI data/v5_v11_mri_derv_np_240221.sas7bdat - Input:
all_eleigible_samples_AS2021_25v3.xlsx,ARIC_Pilot_Updated_06032022.csv,lipoproteins_6_29_23.csv,dictionary.csv - Output:
lipoprotein_list.csv,pilot.csv,demographic_characteristics.csv - Input:
HDL Proteome Watch 2023 Final.xlsx - Output:
hdl_proteome_davidson.csv lipidomics_tukey.ipynbANCOVA followed by Tukey post-hoc to determine which plasma lipids or biomarkers differ significantly between stageslipidomics_boxplot.ipynbDistribution of plasma lipids or biomarkers across stagessurvival.ipynbSurvival analysis (Kaplan-Meier survival curve, Cox's proportional hazard model) comparing risk of conversion to AD between biomarker groups.survival_hdl_ratio.ipynbSurvival analysis comparing cognitive decline between tertiles of non-small HDL FC-to-CE ratio.somascan_pca.ipynbClustering of CSF proteins by PCA, followed by linear regression with dependent variable pTausomascan_boxplot.ipynbDistribution of CSF proteins across cognitive statusesstrem2_lipidomics_regression.ipynbLinear regression of CSF sTREM2 on plasma lipids.strem2_lipoprotein_regression.ipynbLinear regression of CSF sTREM2 on plasma lipoprotein subclasses.calcium_all_sites.ipynbDistribution of calcium measurements compared between Vista and Roche, data from all sites combinedimagej_particle_results_hdl.ipynbHDL1 and HDL2 particle analysis on EM images using results exported from ImageJ
This directory contains the library code for the project. The utility functions are organized into the following categories:
general.py
General utility functions
stats.py
Statistical analysis
dialog.py
Tkinter dialogs
plotly.py
Modifications to Plotly figure objects
r_interface.py
Interface to R
This directory contains scripts that process original files and/or processed files to processed files for downstream analyses.
adni/
bmi.ipynbBody Mass Index (BMI)
strem2.ipynbCSF soluble triggering receptor expressed on myeloid cells 2 (sTREM2)
demographics.ipynbBasic demographics
demographics_tau.ipynbDemographics with tau biomarker data
demographics_biomarkers.ipynbDemographics with amyloid and tau biomarker data and stage assignment
lipidomics.ipynbPlasma lipidomics, Meikle lab, longitudinal
lipoprotein.ipynbNightingale NMR analysis of lipoproteins and metabolites
somascan.ipynbCSF proteomics SOMAscan 7000+ proteins post-QC, Cruchaga lab
converters.ipynbLongitudinal decline in cognitive status (CN to MCI, MCI to AD, or CN to AD), excluding participants diagnosed with AD at baseline
converters_to_ad.ipynbLongitudinal decline in cognitive status from CN or MCI to AD, excluding participants diagnosed with AD at baseline
aric/
pilot.ipynbDemographics and brain MRI data from the ARIC server for participants included in the pilot study
pilot_eligible.ipynbDemographics and brain MRI data for ARIC participants eligible under the inclusion criteria
other/
davidson.ipynbSean Davidson HDL Proteome Watch 2023
This directory contains Jupyter notebook files that perform analyses.
adni/
calcium/
other/
The analysis in this repository contributed to the following publications:
-
Li, D.; Mantyh, W. G.; Men, L.; Jain, I.; Glittenberg, M.; An, B.; Zhang, L.; Li, L.; for the Alzheimer’s Disease Neuroimaging Initiative. sTREM2 in Discordant CSF Aβ42 and P‐tau181. Alz & Dem Diag Ass & Dis Mo 2025, 17 (1), e70072. https://doi.org/10.1002/dad2.70072.
-
Li, D.; An, B.; Men, L.; Glittenberg, M.; Lutsey, P. L.; Mielke, M. M.; Yu, F.; Hoogeveen, R. C.; Gottesman, R.; Zhang, L.; Meyer, M.; Sullivan, K.; Zantek, N.; Alonso, A.; Walker, K. A. The Association of High-Density Lipoprotein Cargo Proteins with Brain Volume in Older Adults in the Atherosclerosis Risk in Communities (ARIC). Journal of Alzheimer’s Disease 2025, 103 (3), 724–734. https://doi.org/10.1177/13872877241305806.