Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep biomarkers of human aging: Application of deep neural networks to biomarker development #37

Open
cgreene opened this issue Aug 5, 2016 · 2 comments

Comments

@cgreene
Copy link
Member

cgreene commented Aug 5, 2016

http://www.impactaging.com/papers/v8/n5/full/100968.html

@sw1
Copy link
Contributor

sw1 commented Aug 23, 2016

Summary

This a pretty straightforward ML paper. With the aim of predicting chronological age from blood chemistry, they built a bunch of DNNs, optimized them, and reported the results, as well as combining the best performing ones into an ensemble. They also identified important features.

This can clearly be grouped with other EHR papers, probably as a cited example for deep learning's use in a hospital context. I can't imagine anything beyond that, though.

Problem

Research has shown that biomarkers of age-associated pathology may reflect senescence modifications and hence may act as aging clocks, but most of these biomarkers are not important in inferring health status and are therefore not frequently collected.

Here, they tried to use readily collected data (blood chemistry) to predict patient chronological age.

Methods

Data

62,414 patient records containing age, sex, and 41 common blood markers such as glucose, cholesterol, etc. They omitted WBC data because of how variable it is across the general population. All biomarkers were normalized to 0-1.

Design

They treated this as a regression problem -- not a classification problem -- primarily so they could associate patient age with biomarkers that are correlated.

4 metrics were evaluated using 90/10 training/testing split: Pearson's correlation, R-squared, MAE, and prediction accuracy coded 0 or 1 where 1 is given if the predicted value falls withing a predefined neighborhood of the true value (+/- 10 years).

They ranked important features in a fashion similar to RFs: they randomly shuffled features and calculated the drop in performance, in terms of R-squared.

40 DNNs were fit with various hyperparameters, and they selected 21 DNNs that performed the best to be combined into an ensemble via stacked generalization (elastic net turned out to be the stacking best model). They selected these 21 DNNs by iteratively adding each DNN prediction vector into the ensemble in an order based on either (1) decreasing R-squared or (2) decreasing correlation. Both methods identified a 21 net ensemble.

They used feedfoward NNs with at least 4 layers. The best performing DNN in their ensemble had 5 hidden layers with 2000, 1500, 1000, 500, and 1 node each, used PReLU, AdaGrad, L2 regularization, and dropout (0.2)

Results

That best performing DNN had an R-squared of 0.8 and 82% prediction accuracy and outperformed KNNs, SVMs, RFs, among others.

The ensemble had a 0.82 R-squared and a 83.5% prediction accuracy.

They compared their performance to studies that predicted age with transcriptomic biomarkers (R-squared = 0.6) and epigenetic biomarkers (R-squared = 0.93 and R-squared = 0.89).

The feature importance procedure ranked albumin, glucose, ALP, urea and RBCs at the top.

Notes

The DNN is available here: www.aging.ai

@agitter
Copy link
Collaborator

agitter commented Aug 23, 2016

DOI link: http://doi.org/10.18632/aging.100968

dhimmel added a commit to dhimmel/deep-review that referenced this issue Nov 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants