-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proofreading #845
Proofreading #845
Conversation
Co-authored-by: Anna Greene <annagreene@users.noreply.github.com> Co-authored-by: Casey Greene <cgreene@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick question. Otherwise LGTM
@@ -30,7 +30,7 @@ Similarly, for continuous outcomes, linear regression can be seen as a single-la | |||
Thus, in some ways, supervised deep learning approaches can be seen as an extension of regression models that allow for greater flexibility and are especially well-suited for modeling non-linear relationships among the input features. | |||
Recently, hardware improvements and very large training datasets have allowed these deep learning techniques to surpass other machine learning algorithms for many problems. | |||
In a famous and early example, scientists from Google demonstrated that a neural network "discovered" that cats, faces, and pedestrians were important components of online videos [@url:http://research.google.com/archive/unsupervised_icml2012.html] without being told to look for them. | |||
What if, more generally, deep learning take advantage of the growth of data in biomedicine to tackle challenges in this field? Could these algorithms identify the "cats" hidden in our data---the patterns unknown to the researcher---and suggest ways to act on them? In this review, we examine deep learning's application to biomedical science and discuss the unique challenges that biomedical data pose for deep learning methods. | |||
What if, more generally, deep learning takes advantage of the growth of data in biomedicine to tackle challenges in this field? Could these algorithms identify the "cats" hidden in our data---the patterns unknown to the researcher---and suggest ways to act on them? In this review, we examine deep learning's application to biomedical science and discuss the unique challenges that biomedical data pose for deep learning methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -112,7 +112,7 @@ In oncology, current "gold standard" approaches include histology, which require | |||
One example is the PAM50 approach to classifying breast cancer where the expression of 50 marker genes divides breast cancer patients into four subtypes. | |||
Substantial heterogeneity still remains within these four subtypes [@doi:10.1200/JCO.2008.18.1370; @doi:10.1158/1078-0432.CCR-13-0583]. | |||
Given the increasing wealth of molecular data available, a more comprehensive subtyping seems possible. | |||
Several studies have used deep learning methods to better categorize breast cancer patients: For instance, denoising autoencoders, an unsupervised approach, can be used to cluster breast cancer patients [@doi:10.1142/9789814644730_0014], and CNN can help count mitotic divisions, a feature that is highly correlated with disease outcome in histological images [@doi:10.1007/978-3-642-40763-5_51]. | |||
Several studies have used deep learning methods to better categorize breast cancer patients: For instance, denoising autoencoders, an unsupervised approach, can be used to cluster breast cancer patients [@doi:10.1142/9789814644730_0014], and CNNs can help count mitotic divisions, a feature that is highly correlated with disease outcome in histological images [@doi:10.1007/978-3-642-40763-5_51]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -12,7 +12,7 @@ In spite of such optimism, the ability of deep learning models to indiscriminate | |||
Imagine a deep neural network is provided with clinical test results gleaned from electronic health records. | |||
Because physicians may order certain tests based on their suspected diagnosis, a deep neural network may learn to "diagnose" patients simply based on the tests that are ordered. | |||
For some objective functions, such as predicting an International Classification of Diseases (ICD) code, this may offer good performance even though it does not provide insight into the underlying disease beyond physician activity. | |||
This challenge is not unique to deep learning approaches; however, it is important for practitioners to be aware of these challenges and the possibility in this domain of constructing highly predictive classifiers of questionable actual utility. | |||
This challenge is not unique to deep learning approaches; however, it is important for practitioners to be aware of these challenges and the possibility in this domain of constructing highly predictive classifiers of questionable utility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -48,30 +48,28 @@ The technique of reusing features from a different task falls into the broader a | |||
Though we've mentioned numerous successes for the transfer of natural image features to new tasks, we expect that a lower proportion of negative results have been published. | |||
The analysis of magnetic resonance images (MRIs) is also faced with the challenge of small training sets. | |||
In this domain, Amit et al. [@tag:Amit2017_breast_mri] investigated the tradeoff between pre-trained models from a different domain and a small CNN trained only with MRI images. | |||
In contrast with the other selected literature, they found a smaller network trained with data augmentation on few hundred images from a few dozen patients can outperform a pre-trained out-of-domain classifier. | |||
In contrast with the other selected literature, they found a smaller network trained with data augmentation on a few hundred images from a few dozen patients can outperform a pre-trained out-of-domain classifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
Another way of dealing with limited training data is to divide rich data---e.g. 3D images---into numerous reduced projections. | ||
Shin et al. [@tag:Shin2016_cad_tl] compared various deep network architectures, dataset characteristics, and training procedures for computer tomography-based (CT) abnormality detection. | ||
They concluded that networks as deep as 22 layers could be useful for 3D data, despite the limited size of training datasets. | ||
However, they noted that choice of architecture, parameter setting, and model fine-tuning needed is very problem- and dataset-specific. | ||
Moreover, this type of task often depends on both lesion localization and appearance, which poses challenges for CNN-based approaches. | ||
Straightforward attempts to capture useful information from full-size images in all three dimensions simultaneously via standard neural network architectures were computationally unfeasible. | ||
Instead, two-dimensional models were used to either process image slices individually | ||
(2D), or aggregate information from a number of 2D projections in the native space (2.5D). | ||
Instead, two-dimensional models were used to either process image slices individually (2D) or aggregate information from a number of 2D projections in the native space (2.5D). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Jagannatha and Yu [@pmcid:PMC5119627] further employed a bidirectional LSTM structure to extract adverse drug events from electronic health records, and Lin et al. [@doi:10.18653/v1/w17-2341] investigated using CNN to extract temporal relations. | ||
While promising, a lack of rigorous evaluations of the real-world utility of these kinds of features makes current contributions in this area difficult to evaluate. | ||
Jagannatha and Yu [@pmcid:PMC5119627] further employed a bidirectional LSTM structure to extract adverse drug events from electronic health records, and Lin et al. [@doi:10.18653/v1/w17-2341] investigated using CNNs to extract temporal relations. | ||
While promising, a lack of rigorous evaluation of the real-world utility of these kinds of features makes current contributions in this area difficult to evaluate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 above here (stopped doing each individual change)
Taken together, these results suggest that differentially private GANs may be an attractive way to generate sharable datasets for downstream reanalysis. | ||
|
||
Federated learning [@url:http://proceedings.mlr.press/v54/mcmahan17a.html] and secure aggregations [@url:https://eprint.iacr.org/2017/281.pdf | ||
@tag:Papernot2017_pate] are complementary approaches that reinforce differential privacy. | ||
Federated learning [@url:http://proceedings.mlr.press/v54/mcmahan17a.html] and secure aggregations [@url:https://eprint.iacr.org/2017/281.pdf; @tag:Papernot2017_pate] are complementary approaches that reinforce differential privacy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 the above changes in this file
@@ -387,7 +387,7 @@ They achieved impressive performance, even for cell types where the subset perce | |||
However, they did not benchmark against random forests, which tend to work better for imbalanced data, and their data was relatively low dimensional. | |||
|
|||
Neural networks can also learn low-dimensional representations of single-cell gene expression data for visualization, clustering, and other tasks. | |||
Both scvis [@doi:10.1101/178624] and scVI [@arxiv:1709.02082] are unsupervised approaches based on VAEs. | |||
Both scvis [@doi:10.1101/178624] and scVI [@arxiv:1709.02082] are unsupervised approaches based on variational autoencoders (VAEs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the changes on this file
@@ -1,6 +1,6 @@ | |||
## The impact of deep learning in treating disease and developing new treatments | |||
|
|||
Given the need to make better, faster interventions at the point of care---incorporating the complex calculus of a patients symptoms, diagnostics, and life history---there have been many attempts to apply deep learning to patient treatment. | |||
Given the need to make better, faster interventions at the point of care---incorporating the complex calculus of a patient's symptoms, diagnostics, and life history---there have been many attempts to apply deep learning to patient treatment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
content/06.discussion.md
Outdated
@@ -198,7 +198,7 @@ The contribution scores were then used to identify key phrases from a model trai | |||
Interpretation of embedded or latent space features learned through generative unsupervised models can reveal underlying patterns otherwise masked in the original input. | |||
Embedded feature interpretation has been emphasized mostly in image and text based applications [@tag:Radford_dcgan; @tag:Word2Vec], but applications to genomic and biomedical domains are increasing. | |||
|
|||
For example, Way and Greene trained a variational autoencoder (VAE) on gene expression from The Cancer Genome Atlas (TCGA) [@doi:10.1038/ng.2764] and use latent space arithmetic to rapidly isolate and interpret gene expression features descriptive of high grade serous ovarian cancer subtypes [@tag:WayGreene2017_tybalt]. | |||
For example, Way and Greene trained a variational autoencoder on gene expression from The Cancer Genome Atlas (TCGA) [@doi:10.1038/ng.2764] and use latent space arithmetic to rapidly isolate and interpret gene expression features descriptive of high grade serous ovarian cancer subtypes [@tag:WayGreene2017_tybalt]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VAE was defined above - should we just use VAE here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll change to VAE. I primarily wanted to move the acronym definition to the first use.
By the way, I'm not sure if this counts for our iteration of the deep-review or if we should rely on @evancofer and @stephenra to review & merge. What do you think? |
Let's let the new maintainers finish the review and merge when they approve. |
Ok - sounds good! Probably important to note that this one should probably be a regular merge instead of a squash merge due to the multi-contributor commits 😁 |
I wonder whether the merge commit in a squash merge would retain the co-authoring? The GitHub guide isn't clear. The |
LGTM |
LGMT, as well. |
This build is based on a759d28. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/deep-review/builds/356509108 https://travis-ci.org/greenelab/deep-review/jobs/356509109 [ci skip] The full commit message that triggered this build is copied below: Merge pull request #845 from agitter/typos Proofreading
This build is based on a759d28. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/deep-review/builds/356509108 https://travis-ci.org/greenelab/deep-review/jobs/356509109 [ci skip] The full commit message that triggered this build is copied below: Merge pull request #845 from agitter/typos Proofreading
These are primarily minor edits noticed during proofreading. The missing reference separator led to inconsistent reference numbering. I'll open a Manubot issue to discuss that. The updated citation tag now cites the research paper instead of the correction to that paper.