Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added PPI section with MHC subsection #638

Merged
merged 9 commits into from Jan 4, 2018

Conversation

@zietzm
Copy link
Member

commented Aug 10, 2017

References #575 and includes MHC-peptide papers

Added a section on Protein-Protein Interactions (PPI) with a subsection on MHC-peptide binding prediction.

@agitter mentioned PPI networks as a possible area of interest in #575, but this has been neglected here for the sake of not adding too much to an already long section. If a PPI network subsection is still desired, I would be more than happy to add one, but I understand the necessity to minimize additional length being added to this paper.

@cgreene cgreene referenced this pull request Nov 3, 2017
17 of 17 tasks complete
@agapow

agapow approved these changes Nov 8, 2017

Copy link
Contributor

left a comment

Looks good - only minor suggestions. My one fear is that this is fairly technically detailed, perhaps moreso than a lot of the rest of the ms.

However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions.
Additionally, common types of high-throughput screens for PPIs, such as the yeast two-hybrid, can have issues with high rates of false positive results [@doi:10.1186/s12964-015-0116-8 @doi:10.1002/pmic.200800150].

This section will focus on advances in *de novo* PPI prediction.

This comment has been minimized.

Copy link
@agapow

agapow Nov 8, 2017

Contributor

Might benefit from a more explicit linking statement of the need for PPI prediction and thus DL.

Beyond predicting whether or not two proteins interact, Du et al. [@doi:10.1016/j.ymeth.2016.06.001] showed that a tandem stacked-autoencoder/deep-neural-network method could be used to predict residue contacts for the interfacial regions of interacting proteins.
A combination of a hidden Markov model with Fisher scores yielded uniform-length features for each residue. Their method significantly exceeded classical machine learning accuracy.

Because many studies used predefined higher-level features, one of the benefits of deep learning— automatic feature extraction— is not fully leveraged.

This comment has been minimized.

Copy link
@agapow

agapow Nov 8, 2017

Contributor

space before emdash

Because MHCnuggets had to be trained for every MHC allele, performance was far better for alleles with abundant, balanced training data.

In a comparison of several current methods, Bhattacharya et al. found that the top methods— NetMHC, NetMHCpan, MHCflurry, and MHCnuggets— showed comparable performance, but large differences in speed.
In the authors analysis, convolutional neural networks (in this case, HLA-CNN) showed comparatively poor performance, while shallow and recurrent neural networks performed the best.

This comment has been minimized.

Copy link
@agapow

agapow Nov 8, 2017

Contributor

Delete "in the authors analysis" as unnecessary

@zietzm zietzm force-pushed the zietzm:ppi_section branch from 25552b0 to aca3eb8 Nov 8, 2017

@agitter agitter referenced this pull request Nov 9, 2017

@agitter agitter added this to the journal-revisions milestone Nov 17, 2017

@cgreene

This comment has been minimized.

Copy link
Member

commented Dec 18, 2017

@zietzm were you planning to make modifications as discussed in #689? Wondering if we should wait for that before reviewing this PR.

@zietzm

This comment has been minimized.

Copy link
Member Author

commented Dec 18, 2017

@cgreene I had planned to incorporate those changes here. Sorry for the delay in getting those updates ready. I want to get the section finished ASAP, and I hope to push some changes by the beginning of next week.

@cgreene

This comment has been minimized.

Copy link
Member

commented Dec 18, 2017

@zietzm 👍 will wait to review further until then

@agitter agitter added the study label Dec 19, 2017

@agitter
Copy link
Collaborator

left a comment

Thanks for the great contributions. I have several suggestions, and my main comment is to think about how to summarize the many MHC methods. In some places I tried trimming text that isn't critical.

@cgreene do you think we need to further shorten this section? I think that if we can condense some of the MHC paragraphs we'll be okay.

@@ -424,6 +424,92 @@ summarized above also apply to interfacial contact prediction for protein
complexes but may be less effective since on average protein complexes have
fewer sequence homologs.

### Protein-Protein Interactions

Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807].

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

Comma before which

@@ -424,6 +424,92 @@ summarized above also apply to interfacial contact prediction for protein
complexes but may be less effective since on average protein complexes have
fewer sequence homologs.

### Protein-Protein Interactions

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

Capitalize only the first Protein

### Protein-Protein Interactions

Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807].
PPIs are key to many cellular processes like metabolism and immune responses.

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

PPIs are involved in almost all cellular processes. Perhaps we could cut this line? I'm looking for places to shorten the text.

PPIs are key to many cellular processes like metabolism and immune responses.
Abundant interaction data have been generated in-part thanks to advances in high-throughput screening methods, such as yeast two-hybrid and affinity-purification with mass spectrometry.
However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions.
Additionally, common types of high-throughput screens for PPIs, such as the yeast two-hybrid, can have issues with high rates of false positive results [@doi:10.1186/s12964-015-0116-8 @doi:10.1002/pmic.200800150].

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

If we keep this line, the new manubot style requires ; between references.

Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807].
PPIs are key to many cellular processes like metabolism and immune responses.
Abundant interaction data have been generated in-part thanks to advances in high-throughput screening methods, such as yeast two-hybrid and affinity-purification with mass spectrometry.
However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions.

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

This sentence alone might be enough to motivate the need for PPI prediction. Then you could cut the line about false positive rates, because a reader might wonder whether computational predictions really have lower false positive rates than Y2H.

A way of working with different network types was shown by Gligorijevic et al., [@doi:10.1101/223339] who developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks.
This common lower-level representation allows for the combination of various PPI data sources towards a single predictive task.
An SVM classifier trained on the compressed features from the middle layer of the autoencoder outperformed previous methods in predicting protein function.
The key advancement of this method is the use of deep learning to incorporate higher-order network information for protein function prediction.

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

This might already be clear enough from the rest of the paragraph. You could cut it.

The key advancement of this method is the use of deep learning to incorporate higher-order network information for protein function prediction.

Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2].
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes.

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

Change which to that.


Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2].
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes.
Generalization to unseen nodes is especially useful for PPI networks, as these networks represent various types of interactions between proteins in a variety of species, and they can be updated frequently.

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

I don't think I'm following this. Isn't an unseen node a new protein in a PPI network? Do we encounter new proteins? Or is the idea that a trained model generalizes to new graphs?

Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2].
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes.
Generalization to unseen nodes is especially useful for PPI networks, as these networks represent various types of interactions between proteins in a variety of species, and they can be updated frequently.
In a classification task for the prediction of protein function, Chen and Zhu [@arxiv:1710.10568v1] optimized this approach and enhanced the graph convolutional network with a preprocessing step to improve significantly both training time and prediction accuracy.

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

What is the preprocessing step?

They found that MHCnuggets — the recurrent neural network — was by far the fastest training among the top performing methods.
In predicting interactions between proteins, deep learning has achieved state-of-the-art results and shows promise to overcome previous challenges in the field.

### PPI networks and graph analysis

This comment has been minimized.

Copy link
@agitter

agitter Dec 30, 2017

Collaborator

We also discuss graph convolutions in the drug discovery section and could link those topics. We have a sentence Modern neural networks can operate directly on the molecular graph as input. that could be changed to Modern neural networks, such as those discussed previously for PPI networks, can operate directly on the molecular graph as input.

@agitter
Copy link
Collaborator

left a comment

Thanks, these are excellent revisions and address all of my initial comments. The only remaining items to resolve before merging are:

  • two minor commas noted here
  • decide what you'd like to do with the #### header
  • resolve conflicts with master

Shallow, feed-forward neural networks are competitive methods and have made progress toward pan-allele and pan-length peptide representations.
Sequence alignment techniques are useful for representing variable-length peptides as uniform-length features [@doi:10.1110/ps.0239403; @doi:10.1093/bioinformatics/btv639].
For pan-allelic prediction, NetMHCpan [@doi:10.1007/s00251-008-0341-z; @doi:10.1186/s13073-016-0288-x] used a pseudo-sequence representation of the MHC class I molecule which included only polymorphic peptide contact residues.

This comment has been minimized.

Copy link
@agitter

agitter Jan 2, 2018

Collaborator

Comma before which

MHCflurry's imputation method increases its performance on poorly characterized alleles, making it competitive with NetMHCpan for this task.
Kuksa et al. [@doi:10.1093/bioinformatics/btv371] developed a shallow, higher-order neural network (HONN) comprised of both mean and covariance hidden units to capture some of the higher-order dependencies between amino acid locations.

This comment has been minimized.

Copy link
@agitter

agitter Jan 2, 2018

Collaborator

Nice improvement, the HONN makes sense now.


An important challenge in PPI network prediction is the task of combining different networks and types of networks.
A way of working with different network types was shown by Gligorijevic et al., [@doi:10.1101/223339] who developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks.
Gligorijevic et al., [@doi:10.1101/223339] developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks.

This comment has been minimized.

Copy link
@agitter

agitter Jan 2, 2018

Collaborator

Can remove the comma after et al.

zietzm added some commits Jan 4, 2018

Re-add PPI section reference
", such as those discussed previously for PPI networks,"
@zietzm

This comment has been minimized.

Copy link
Member Author

commented Jan 4, 2018

@agitter thanks for your help on these sections! I think my last few commits should now have the PR ready.

@agitter

agitter approved these changes Jan 4, 2018

@agitter agitter merged commit 75f0dc2 into greenelab:master Jan 4, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@agitter agitter referenced this pull request Jan 4, 2018
7 of 7 tasks complete

dhimmel pushed a commit that referenced this pull request Jan 4, 2018

Added PPI section with MHC subsection (#638)
This build is based on
75f0dc2.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/deep-review/builds/325010006
https://travis-ci.org/greenelab/deep-review/jobs/325010007

[ci skip]

The full commit message that triggered this build is copied below:

Added PPI section with MHC subsection (#638)

* Added PPI and MHC sections

* Updates to PPI/MHC subsection

* PPI network section

* Updates to all PPI sections

* Commas and header

* Remove accidental newline

* Re-add PPI section reference

", such as those discussed previously for PPI networks,"

dhimmel pushed a commit that referenced this pull request Jan 4, 2018

Added PPI section with MHC subsection (#638)
This build is based on
75f0dc2.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/deep-review/builds/325010006
https://travis-ci.org/greenelab/deep-review/jobs/325010007

[ci skip]

The full commit message that triggered this build is copied below:

Added PPI section with MHC subsection (#638)

* Added PPI and MHC sections

* Updates to PPI/MHC subsection

* PPI network section

* Updates to all PPI sections

* Commas and header

* Remove accidental newline

* Re-add PPI section reference

", such as those discussed previously for PPI networks,"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.