Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. #6

Closed
cgreene opened this issue Aug 5, 2016 · 15 comments

Comments

@cgreene
Copy link
Member

cgreene commented Aug 5, 2016

Paper needs to be read carefully for relevance.
http://dx.doi.org/10.1142/9789814644730_0014

@cgreene
Copy link
Member Author

cgreene commented Aug 5, 2016

As an author of this paper, I am not the best person to review it. Would love to have at least one non-author pitch in. Maybe @hussius since I know you wrote a blog post about it.

Biology:
What gene expression patterns do denoising autoencoders identify in gene expression biopsies? Primarily using these data as proof-of-concept for the method's robustness.

Computational Methods:

  • Denoising autoencoder, single layer
  • Theano used, training on large METABRIC dataset.
  • Model, data, etc all available: http://discovery.dartmouth.edu/~cgreene/da-psb2015/
  • No source code
  • Subsequent evaluation of unsupervised model using well known features (tumor/normal, breast cancer subtypes, transcription factors via ENCODE, pathway information)
  • All features with annotations end up evaluated on independent TCGA data.

Results

  • Finds that known features are represented in DA model - e.g. subtype.
  • Interestingly (though maybe not surprising for unsupervised methods) - the features show no signs of overfitting in the independent TCGA dataset.
  • Suggests that these methods may be robust to dataset effects & feature construction may provide some support to overcome cross-dataset issues.

Example of unsupervised method, analysis of transcriptional regulation, potentially some discussion around pathway activities that may be relevant to the review.

@hussius
Copy link

hussius commented Aug 5, 2016

I can give it a shot.

@michaelmhoffman
Copy link
Contributor

Does a single-layer model fit into a review on "deep" learning?

@cgreene
Copy link
Member Author

cgreene commented Aug 9, 2016

@michaelmhoffman : I am not necessarily a proponent of building models that are sufficiently complex to trigger some sort of arbitrary "deep" nomenclature. I'm most interested in methods that include some strong data-driven feature construction work. In the scope of this review, probably adding the "with neural networks" constraint.

The thing that I like about true deep architectures is that feature construction gets baked in to the learning algorithm. The thing that I like about this "shallow learning" architecture is that a biologist can take a look at it and interpret features.

I guess I'd say - personally - if it passes the threshold of data-driven feature construction with neural networks then I think it's the type of research that I think will be primed for data-intensive discoveries.

@akundaje
Copy link
Contributor

akundaje commented Aug 9, 2016

@cgreene Fully agree with you. Only caution that I think is again not stressed enough in current reviews is the interpretability of even a single layer model should be done very cautiously. Neural nets learn distributed representations and even though individual neurons/filters may appear interpretable, they should not be overinterpretted as "this filter is a CTCF motif" like some papers do. There are often many filters that collectively capture a single predictive pattern like a motif. There are ways to re-derive these. Looking at filters for an intuitive feel of what the network is great. Using individual filters outside of the network is dangerous and wrong IMHO.

On a side note, sorry if I'm being negatively critical of too many things :). Just feel like the use of deep nets in compbio is still in its infancy and if we can avoid propagating suboptimal practices we should do that through this review and papers.

@cgreene
Copy link
Member Author

cgreene commented Aug 9, 2016

@akundaje : Definitely important not to over-interpret outside of the context of the network. No problem & totally agree on infancy. I think we need people to take the optimistic and pessimistic sides on many topics if we want to put together a solid perspective.

@hussius
Copy link

hussius commented Aug 9, 2016

@akundaje Personally I think it's great that you are critical - I am learning a lot here. Perhaps one of the "features" of this review, as @cgreene implies, could be to have a more balanced/objective perspective. The existing reviews, while good, seem to downplay the problems.

@cgreene
Copy link
Member Author

cgreene commented Aug 9, 2016

Yes! If the answer to our question is that the things that would need to be
true for this to be a disruptive approach are implausible then I think that
would be a particularly unique contribution!

On Tue, Aug 9, 2016, 5:43 PM Mikael Huss notifications@github.com wrote:

@akundaje https://github.com/akundaje Personally I think it's great
that you are critical - I am learning a lot here. Perhaps one of the
"features" of this review, as @cgreene https://github.com/cgreene
implies, could be to have a more balanced/objective perspective. The
existing reviews, while good, seem to downplay the problems.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAhHs723eY8HAuaXyzdGKyUJss3p1Lt-ks5qePRugaJpZM4Jdwab
.

@cgreene
Copy link
Member Author

cgreene commented Oct 14, 2016

I've labeled this paper for the 'study' component. It's not receiving more discussion at this point so I've closed it. We're now using 'open' papers only for items undergoing active discussion.

@akundaje - maybe you could contribute a paragraph to the study section on the hazards of over-interpretation? I agree with @hussius that this is an important topic in the field.

@cgreene cgreene closed this as completed Oct 14, 2016
@akundaje
Copy link
Contributor

Sure. I can help with that next week.

On Oct 14, 2016 9:23 AM, "Casey Greene" notifications@github.com wrote:

I've labeled this paper for the 'study' component. It's not receiving more
discussion at this point so I've closed it. We're now using 'open' papers
only for items undergoing active discussion.

@akundaje https://github.com/akundaje - maybe you could contribute a
paragraph to the study section on the hazards of over-interpretation? I
agree with @hussius https://github.com/hussius that this is an
important topic in the field.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAI7EaLV6yUctPHAktT9QuMu9XSPAtOVks5qz6xsgaJpZM4Jdwab
.

@cgreene
Copy link
Member Author

cgreene commented Oct 14, 2016

Awesome! @agitter : when you stub in the "study" section can you make sure there's a spot for interpretation of these models? We may instead end up putting it in our concluding/general thoughts, but that seems like a good home for now.

@agitter
Copy link
Collaborator

agitter commented Oct 18, 2016

@cgreene Sure, I can include an interpretation subsection in 'study' for now. Soon we should have a better idea of whether all of the meta-commentary (interpretation, evaluation, pitfalls, etc.) fits in the study/treat/categorize sections or warrants a separate discussion section.

@rezahay
Copy link

rezahay commented Mar 18, 2018

Dear Casey,
I just read your excellent paper. Would you please elaborate a bit more about linking features to the sample characteristics? ("We evaluated the balanced accuracy for each node at each threshold to predict the desired sample characteristic.") I wonder to know how you have calculated the balanced accuracy based on the activity values at the thresholds.
thanks in advance,
Reza

@cgreene
Copy link
Member Author

cgreene commented Mar 19, 2018

Hi @rezahay: If I recall correctly Jie defined the specified thresholds. Then she determined, at that threshold, what the balanced accuracy would have been if that was the cut-point for a classifier. The same node + threshold then gets tested on the independent test dataset.

@rezahay
Copy link

rezahay commented Mar 20, 2018

Dear Casey,
Thanks a lot for your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants