Proofreading #845

agitter · 2018-03-21T17:01:25Z

These are primarily minor edits noticed during proofreading. The missing reference separator led to inconsistent reference numbering. I'll open a Manubot issue to discuss that. The updated citation tag now cites the research paper instead of the correction to that paper.

Co-authored-by: Anna Greene <annagreene@users.noreply.github.com> Co-authored-by: Casey Greene <cgreene@users.noreply.github.com>

cgreene

One quick question. Otherwise LGTM

cgreene · 2018-03-21T17:03:45Z

content/02.intro.md

@@ -30,7 +30,7 @@ Similarly, for continuous outcomes, linear regression can be seen as a single-la
 Thus, in some ways, supervised deep learning approaches can be seen as an extension of regression models that allow for greater flexibility and are especially well-suited for modeling non-linear relationships among the input features.
 Recently, hardware improvements and very large training datasets have allowed these deep learning techniques to surpass other machine learning algorithms for many problems.
 In a famous and early example, scientists from Google demonstrated that a neural network "discovered" that cats, faces, and pedestrians were important components of online videos [@url:http://research.google.com/archive/unsupervised_icml2012.html] without being told to look for them.
-What if, more generally, deep learning take advantage of the growth of data in biomedicine to tackle challenges in this field? Could these algorithms identify the "cats" hidden in our data---the patterns unknown to the researcher---and suggest ways to act on them? In this review, we examine deep learning's application to biomedical science and discuss the unique challenges that biomedical data pose for deep learning methods.
+What if, more generally, deep learning takes advantage of the growth of data in biomedicine to tackle challenges in this field? Could these algorithms identify the "cats" hidden in our data---the patterns unknown to the researcher---and suggest ways to act on them? In this review, we examine deep learning's application to biomedical science and discuss the unique challenges that biomedical data pose for deep learning methods.


cgreene · 2018-03-21T17:03:52Z

content/02.intro.md

@@ -112,7 +112,7 @@ In oncology, current "gold standard" approaches include histology, which require
 One example is the PAM50 approach to classifying breast cancer where the expression of 50 marker genes divides breast cancer patients into four subtypes.
 Substantial heterogeneity still remains within these four subtypes [@doi:10.1200/JCO.2008.18.1370; @doi:10.1158/1078-0432.CCR-13-0583].
 Given the increasing wealth of molecular data available, a more comprehensive subtyping seems possible.
-Several studies have used deep learning methods to better categorize breast cancer patients: For instance, denoising autoencoders, an unsupervised approach, can be used to cluster breast cancer patients [@doi:10.1142/9789814644730_0014], and CNN can help count mitotic divisions, a feature that is highly correlated with disease outcome in histological images [@doi:10.1007/978-3-642-40763-5_51].
+Several studies have used deep learning methods to better categorize breast cancer patients: For instance, denoising autoencoders, an unsupervised approach, can be used to cluster breast cancer patients [@doi:10.1142/9789814644730_0014], and CNNs can help count mitotic divisions, a feature that is highly correlated with disease outcome in histological images [@doi:10.1007/978-3-642-40763-5_51].


cgreene · 2018-03-21T17:04:00Z

content/03.categorize.md

@@ -12,7 +12,7 @@ In spite of such optimism, the ability of deep learning models to indiscriminate
 Imagine a deep neural network is provided with clinical test results gleaned from electronic health records.
 Because physicians may order certain tests based on their suspected diagnosis, a deep neural network may learn to "diagnose" patients simply based on the tests that are ordered.
 For some objective functions, such as predicting an International Classification of Diseases (ICD) code, this may offer good performance even though it does not provide insight into the underlying disease beyond physician activity.
-This challenge is not unique to deep learning approaches; however, it is important for practitioners to be aware of these challenges and the possibility in this domain of constructing highly predictive classifiers of questionable actual utility.
+This challenge is not unique to deep learning approaches; however, it is important for practitioners to be aware of these challenges and the possibility in this domain of constructing highly predictive classifiers of questionable utility.


cgreene · 2018-03-21T17:04:12Z

content/03.categorize.md

@@ -48,30 +48,28 @@ The technique of reusing features from a different task falls into the broader a
 Though we've mentioned numerous successes for the transfer of natural image features to new tasks, we expect that a lower proportion of negative results have been published.
 The analysis of magnetic resonance images (MRIs) is also faced with the challenge of small training sets.
 In this domain, Amit et al. [@tag:Amit2017_breast_mri] investigated the tradeoff between pre-trained models from a different domain and a small CNN trained only with MRI images.
-In contrast with the other selected literature, they found a smaller network trained with data augmentation on few hundred images from a few dozen patients can outperform a pre-trained out-of-domain classifier.
+In contrast with the other selected literature, they found a smaller network trained with data augmentation on a few hundred images from a few dozen patients can outperform a pre-trained out-of-domain classifier.


cgreene · 2018-03-21T17:05:23Z

content/03.categorize.md


 Another way of dealing with limited training data is to divide rich data---e.g. 3D images---into numerous reduced projections.
 Shin et al. [@tag:Shin2016_cad_tl] compared various deep network architectures, dataset characteristics, and training procedures for computer tomography-based (CT) abnormality detection.
 They concluded that networks as deep as 22 layers could be useful for 3D data, despite the limited size of training datasets.
 However, they noted that choice of architecture, parameter setting, and model fine-tuning needed is very problem- and dataset-specific.
 Moreover, this type of task often depends on both lesion localization and appearance, which poses challenges for CNN-based approaches.
 Straightforward attempts to capture useful information from full-size images in all three dimensions simultaneously via standard neural network architectures were computationally unfeasible.
-Instead, two-dimensional models were used to either process image slices individually
-(2D), or aggregate information from a number of 2D projections in the native space (2.5D).
+Instead, two-dimensional models were used to either process image slices individually (2D) or aggregate information from a number of 2D projections in the native space (2.5D).


cgreene · 2018-03-21T17:07:37Z

content/03.categorize.md

-Jagannatha and Yu [@pmcid:PMC5119627] further employed a bidirectional LSTM structure to extract adverse drug events from electronic health records, and Lin et al. [@doi:10.18653/v1/w17-2341] investigated using CNN to extract temporal relations.
-While promising, a lack of rigorous evaluations of the real-world utility of these kinds of features makes current contributions in this area difficult to evaluate.
+Jagannatha and Yu [@pmcid:PMC5119627] further employed a bidirectional LSTM structure to extract adverse drug events from electronic health records, and Lin et al. [@doi:10.18653/v1/w17-2341] investigated using CNNs to extract temporal relations.
+While promising, a lack of rigorous evaluation of the real-world utility of these kinds of features makes current contributions in this area difficult to evaluate.


👍 above here (stopped doing each individual change)

cgreene · 2018-03-21T17:08:51Z

content/03.categorize.md

 Taken together, these results suggest that differentially private GANs may be an attractive way to generate sharable datasets for downstream reanalysis.

-Federated learning [@url:http://proceedings.mlr.press/v54/mcmahan17a.html] and secure aggregations [@url:https://eprint.iacr.org/2017/281.pdf
-@tag:Papernot2017_pate] are complementary approaches that reinforce differential privacy.
+Federated learning [@url:http://proceedings.mlr.press/v54/mcmahan17a.html] and secure aggregations [@url:https://eprint.iacr.org/2017/281.pdf; @tag:Papernot2017_pate] are complementary approaches that reinforce differential privacy.


👍 the above changes in this file

cgreene · 2018-03-21T17:11:17Z

content/04.study.md

@@ -387,7 +387,7 @@ They achieved impressive performance, even for cell types where the subset perce
 However, they did not benchmark against random forests, which tend to work better for imbalanced data, and their data was relatively low dimensional.

 Neural networks can also learn low-dimensional representations of single-cell gene expression data for visualization, clustering, and other tasks.
-Both scvis [@doi:10.1101/178624] and scVI [@arxiv:1709.02082] are unsupervised approaches based on VAEs.
+Both scvis [@doi:10.1101/178624] and scVI [@arxiv:1709.02082] are unsupervised approaches based on variational autoencoders  (VAEs).


👍 for the changes on this file

cgreene · 2018-03-21T17:11:23Z

content/05.treat.md

@@ -1,6 +1,6 @@
 ## The impact of deep learning in treating disease and developing new treatments

-Given the need to make better, faster interventions at the point of care---incorporating the complex calculus of a patients symptoms, diagnostics, and life history---there have been many attempts to apply deep learning to patient treatment.
+Given the need to make better, faster interventions at the point of care---incorporating the complex calculus of a patient's symptoms, diagnostics, and life history---there have been many attempts to apply deep learning to patient treatment.


cgreene · 2018-03-21T17:12:18Z

content/06.discussion.md

@@ -198,7 +198,7 @@ The contribution scores were then used to identify key phrases from a model trai
 Interpretation of embedded or latent space features learned through generative unsupervised models can reveal underlying patterns otherwise masked in the original input.
 Embedded feature interpretation has been emphasized mostly in image and text based applications [@tag:Radford_dcgan; @tag:Word2Vec], but applications to genomic and biomedical domains are increasing.

-For example, Way and Greene trained a variational autoencoder (VAE) on gene expression from The Cancer Genome Atlas (TCGA) [@doi:10.1038/ng.2764] and use latent space arithmetic to rapidly isolate and interpret gene expression features descriptive of high grade serous ovarian cancer subtypes [@tag:WayGreene2017_tybalt].
+For example, Way and Greene trained a variational autoencoder on gene expression from The Cancer Genome Atlas (TCGA) [@doi:10.1038/ng.2764] and use latent space arithmetic to rapidly isolate and interpret gene expression features descriptive of high grade serous ovarian cancer subtypes [@tag:WayGreene2017_tybalt].


VAE was defined above - should we just use VAE here?

Yes, I'll change to VAE. I primarily wanted to move the acronym definition to the first use.

cgreene · 2018-03-21T17:13:38Z

By the way, I'm not sure if this counts for our iteration of the deep-review or if we should rely on @evancofer and @stephenra to review & merge. What do you think?

agitter · 2018-03-21T17:52:05Z

Let's let the new maintainers finish the review and merge when they approve.

cgreene · 2018-03-21T17:56:12Z

Ok - sounds good! Probably important to note that this one should probably be a regular merge instead of a squash merge due to the multi-contributor commits 😁

agitter · 2018-03-21T18:04:02Z

I wonder whether the merge commit in a squash merge would retain the co-authoring? The GitHub guide isn't clear. The Co-authored-by: keyword is retained in the commit message of the merge commit, but I don't know whether it needs to be at the beginning or end of the commit message.

evancofer · 2018-03-21T18:41:08Z

LGTM
@cgreene I will go ahead and merge once @stephenra takes a look at it.

stephenra · 2018-03-21T18:44:06Z

LGMT, as well.

This build is based on a759d28. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/deep-review/builds/356509108 https://travis-ci.org/greenelab/deep-review/jobs/356509109 [ci skip] The full commit message that triggered this build is copied below: Merge pull request #845 from agitter/typos Proofreading

agitter and others added 6 commits March 16, 2018 13:25

Missing reference separator

2996b11

Add university to affiliations

636de32

Study and Treat typos and edits

66765f7

Intro typos

4fd7078

Minor edits to Discussion

aab853c

Categorize section proofreading and update reference

086735f

Co-authored-by: Anna Greene <annagreene@users.noreply.github.com> Co-authored-by: Casey Greene <cgreene@users.noreply.github.com>

cgreene reviewed Mar 21, 2018

View reviewed changes

Use VAE acronym

a0a3361

evancofer merged commit a759d28 into greenelab:master Mar 21, 2018

agitter deleted the typos branch March 21, 2018 21:45

agitter mentioned this pull request Mar 22, 2018

Reference numbering with misspecified citation manubot/rootstock#117

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proofreading #845

Proofreading #845

agitter commented Mar 21, 2018

cgreene left a comment

cgreene Mar 21, 2018

cgreene Mar 21, 2018

cgreene Mar 21, 2018

cgreene Mar 21, 2018

cgreene Mar 21, 2018

cgreene Mar 21, 2018

cgreene Mar 21, 2018

cgreene Mar 21, 2018

cgreene Mar 21, 2018

cgreene Mar 21, 2018

agitter Mar 21, 2018

cgreene commented Mar 21, 2018

agitter commented Mar 21, 2018

cgreene commented Mar 21, 2018

agitter commented Mar 21, 2018

evancofer commented Mar 21, 2018 •

edited

Loading

stephenra commented Mar 21, 2018

Proofreading #845

Proofreading #845

Conversation

agitter commented Mar 21, 2018

cgreene left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgreene commented Mar 21, 2018

agitter commented Mar 21, 2018

cgreene commented Mar 21, 2018

agitter commented Mar 21, 2018

evancofer commented Mar 21, 2018 • edited Loading

stephenra commented Mar 21, 2018

evancofer commented Mar 21, 2018 •

edited

Loading