Pattern Discovery / Categorize #167

cgreene · 2016-12-22T22:18:06Z

Motivation: We need to start filling in the sections with relevant literature.

Progress: This includes some thoughts on unsupervised EHR analysis and some extraction work via NLP. I have stubbed in portions for @sw1 and potentially @traversc. If you guys want to contribute you can either work of of this PR and make a pull request into this branch of my repo, or you can start writing now and create a PR against the repo once this goes through. My impression is that @sw1 is the lead on this portion given his previous volunteering.

Timing: I estimate that I'll get to wrap up the text around EHRs tomorrow. I just got a paper that I was struggling to get that seems relevant. I need some time to review it. I will also aim to talk some about the imaging work, but that may be a separate PR.

cgreene · 2016-12-22T22:18:31Z

No reviewers directly named at this time - will name reviewers after I get to make some changes tomorrow.

sw1 · 2016-12-23T08:44:48Z

sections/03_categorize.md

+researcher time and cost required to develop specific solutions, but it may not
+lead to performance increases.
+
+TODO: survival analysis/readmission prediction methods from EHR/EMR style data


Any preference for how long? How detailed? For now:

Nevertheless, recent work has revealed one domain in which deep networks have proven superior to traditional methods. Survival analysis models the time leading to an event of interest from a shared starting point, and in the context of EHR data, often associates these events to subject covariates. Exploring this relationship is difficult, however, given that EHR data types are heterogeneous, covariates are often missing, and conventional approaches require the covariate-event relationship be linear and aligned to a specific starting point (@arXiv:1608.02158). Early approaches, such as the Faraggi-Simon feed-forward network, aimed to relax the linearity assumption, but performance gains were lacking (@doi:10.1016/S0167-9473(99)00098-5). Katzman et al. in turn developed a deep implementation of the Faraggi-Simon network that, in addition to outperforming Cox regression, was capable of comparing the risk between a given pair of treatments, thus potentially acting as recommender system (@arXiv:1606.00931). To overcome the remaining difficulties, researchers have turned to deep exponential families, a class of latent generative models that are constructed from any type of exponential family distribution (@arXiv:1411.2581v1). This resulted in a deep survival analysis model capable of overcoming challenges posed by missing data and heterogeneous data types, while uncovering nonlinear relationships between covariates and failure time. They showed their model more accurately stratified patients as a function of disease risk score compared the current clinical implementation (@arXiv:1608.02158).

This sounds good, I have nothing more to add, except for possibly mentioning other non-linear methods. I can think of Random Forest used in survival analysis (@arxivL0811.1645)

…eenelab#169

…e notes on

cgreene · 2016-12-23T21:21:54Z

Ok - given that @sw1 has some content to contribute this is probably a good time to remove the [WIP] tag and aim to merge. This is a bit more of a laundry list than I like, but once we get the pieces in place we should return for some deeper analysis. This has some placeholders for expected contributions.

agitter · 2016-12-27T13:55:19Z

Acknowledging the review request, I should be able to do it within 24 hours

agitter

I have no major comments, the general direction looks good to me. I left some minor questions, but feel free to keep things as they are.

agitter · 2016-12-28T13:39:06Z

sections/06_discussion.md

@@ -137,3 +137,7 @@ interpretability problems, how can we best ensure reproducible models? What
 might a clinician, or policy maker, need to see in a deep model in order to
 influence healthcare decisions? Or, is deep learning a hypothesis generation
 machine that requires manual validation?*
+
+### Transfer learning/transferability of features


As you point out, transfer learning and multi-task learning has come up as a theme in many papers and domains. Do you suggest we defer most of that discussion in the domain-specific sections and coalesce it here? Or introduce it in the specific sections and have a cross-domain recap here?

I guess I prefer to introduce in specific sections and cross-domain recap here. I think it's going to be key in bio for many near-future advances.

agitter · 2016-12-28T13:41:33Z

sections/03_categorize.md

@@ -19,7 +19,7 @@ tackle any of them? Are there example approaches whereby deep learning is
 already having a transformational impact? I (Casey) have added some sections
 below where I think we could contribute to the field with our discussion.*

-### Major challenges
+### Major Areas of Existing Contributions


Should we settle on title case or lower case for section and sub-section headings?

We should settle on one. I don't have a preference either way. What do you prefer? I suggest we make one quick PR to go through and match case as soon as we decide.

Went with sentence case.

agitter · 2016-12-28T13:42:59Z

sections/03_categorize.md

+all of this work, the researchers must work around a specific challenge - the
+limited number of well annotated training images. To expand the number and
+diversity of images, the researchers have employed approaches where they employ
+adversarial examples [@doi:10.1101/095786] or first train towards human-created


If we haven't introduced adversarial networks earlier in the manuscript, it may deserve more attention.

Definitely. These are adversarial examples though. They take the same base image and apply perturbations to it. I think you're probably thinking of something more like this - which is an adversarial network generated training example:
https://arxiv.org/pdf/1612.07828v1.pdf

Got it. You're right, my mind jumped to Generative Adversarial Networks when I saw adversarial.

agitter · 2016-12-28T13:44:26Z

sections/03_categorize.md

+limited number of well annotated training images. To expand the number and
+diversity of images, the researchers have employed approaches where they employ
+adversarial examples [@doi:10.1101/095786] or first train towards human-created
+features before subsequent fine tuning [@doi:10.1007/978-3-319-46723-8_13]. The


What does "train towards human-created features" mean? It sounds interesting. Is this initializing weights using prior knowledge? Training a network with manually-defined features first and then using those weights to initialize a different network?

There are some features that have been developed in the past for this task. They first perform supervised learning where the task is to regress output nodes (if I recall correctly) towards those features. Then, after that process, they fine tune specifically on their supervised examples. This is perhaps a more efficient way (in terms of # of examples) to build the network by pushing it towards intermediate features that are thought to be useful.

agitter · 2016-12-28T13:46:33Z

sections/03_categorize.md

+with deep learning approaches. In recent work, Wang et al.[@arxiv:1606.05718]
+analyzed stained slides to identify cancers within slides of lymph node slices.
+The approach provided a probability map for each slide. On this task a
+pathologist has about a 3% error rate. Their algorithm had about a 7% error


Do we need to be more precise or is this the terminology used in the paper? Does their error rate include false positives and false negatives?

I think the language in the paper was pretty loose. The pathologist had no false positives, so theirs are all false negatives. For the algorithm it had the normal FP/FN tradeoff. I think we may want to be less precise here. I'll make that revision now.

agitter · 2016-12-28T13:48:55Z

sections/03_categorize.md

+evaluated were unigrams and bigrams. These are the counts for single words and
+two-word combinations in a free text document. They subset the full set of words
+and word combinations to the 400 most commonly used ones. The machine learning
+algorithms that they employed (naive bayes, logistic regression, and deep neural


"Naive Bayes" or "naive Bayes"?

agitter · 2016-12-28T13:50:17Z

sections/03_categorize.md

+
+##### Opportunities
+
+However, significant work needs to be done to move these from conceptual


@cgreene are you planing to write these placeholder sections or are you looking for help here?

At this point, bullets are probably good. Still have some more work to do on this but wanted to give @sw1 a chance to integrate his new text.

agitter · 2016-12-28T13:51:52Z

sections/03_categorize.md

+Additionally, unique barriers exist in this space that may hinder progress in
+this field.
+
+###### Data sharing and privacy?


@XieConnect are you planning to write something about this topic?

cgreene · 2016-12-28T18:23:53Z

Based on @agitter I'm going to go ahead and merge this. This will allow @sw1 to have the token. @sw1 - please open a quick PR to add in your paragraph. Thanks!

Pattern Discovery / Categorize

cgreene added 4 commits December 21, 2016 12:39

some initial text

d382442

fill in a bit more, express some opinions

ff28a44

add @sw1 section stub

8a4bfc0

reflow -> 80c/line

339ba00

sw1 reviewed Dec 23, 2016

View reviewed changes

cgreene added 3 commits December 23, 2016 15:28

discuss mammography work greenelab#163 greenelab#164 greenelab#168 gr…

53565ec

…eenelab#169

mention transfer learning point greenelab#139

2b87389

discuss slide analysis paper greenelab#102

6568f87

cgreene mentioned this pull request Dec 23, 2016

Deep Learning for Identifying Metastatic Breast Cancer #102

Closed

mention greenelab#71 which also exists and which @brettbj will provid…

fd68c6d

…e notes on

cgreene mentioned this pull request Dec 23, 2016

Multi-task Deep Neural Networks for Automated Extraction of Primary Site and Laterality Information from Cancer Pathology Reports #139

Closed

cgreene added 2 commits December 23, 2016 16:07

clean up a blank line that makes codeclimate angry

703fad7

discuss greenelab#158 which combines imaging + EHRs

a7440bf

cgreene requested review from agitter and brettbj December 23, 2016 21:22

cgreene changed the title ~~[WIP] Pattern Discovery / Categorize~~ Pattern Discovery / Categorize Dec 23, 2016

cgreene added 2 commits December 23, 2016 16:23

note additional interesting area currently not covered

4217c87

fix typo

2e13564

agitter requested changes Dec 28, 2016

View reviewed changes

cgreene added 2 commits December 28, 2016 13:00

capitalization, stub in some opportunities

11a9d28

try to clarify pathologist/algorithm comparison

765992b

cgreene mentioned this pull request Dec 28, 2016

Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration #158

Open

cgreene merged commit f878276 into greenelab:master Dec 28, 2016

cgreene deleted the ehr-pattern-discovery branch December 28, 2016 18:23

dhimmel pushed a commit to dhimmel/deep-review that referenced this pull request Nov 3, 2017

Merge pull request greenelab#167 from cgreene/ehr-pattern-discovery

95bdecf

Pattern Discovery / Categorize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pattern Discovery / Categorize #167

Pattern Discovery / Categorize #167

cgreene commented Dec 22, 2016

cgreene commented Dec 22, 2016

sw1 Dec 23, 2016 •

edited

Loading

traversc Dec 23, 2016

cgreene commented Dec 23, 2016

agitter commented Dec 27, 2016

agitter left a comment

agitter Dec 28, 2016

cgreene Dec 28, 2016

agitter Dec 28, 2016

cgreene Dec 28, 2016

cgreene Dec 28, 2016

agitter Dec 28, 2016

cgreene Dec 28, 2016

agitter Dec 28, 2016

agitter Dec 28, 2016

cgreene Dec 28, 2016

agitter Dec 28, 2016

cgreene Dec 28, 2016

agitter Dec 28, 2016

agitter Dec 28, 2016

cgreene Dec 28, 2016

agitter Dec 28, 2016

cgreene commented Dec 28, 2016


		##### Opportunities

		However, significant work needs to be done to move these from conceptual

Pattern Discovery / Categorize #167

Pattern Discovery / Categorize #167

Conversation

cgreene commented Dec 22, 2016

cgreene commented Dec 22, 2016

sw1 Dec 23, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgreene commented Dec 23, 2016

agitter commented Dec 27, 2016

agitter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgreene commented Dec 28, 2016

sw1 Dec 23, 2016 •

edited

Loading