Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added PPI section with MHC subsection #638

Merged
merged 9 commits into from Jan 4, 2018
Merged

Conversation

@zietzm
Copy link
Contributor

@zietzm zietzm commented Aug 10, 2017

References #575 and includes MHC-peptide papers

Added a section on Protein-Protein Interactions (PPI) with a subsection on MHC-peptide binding prediction.

@agitter mentioned PPI networks as a possible area of interest in #575, but this has been neglected here for the sake of not adding too much to an already long section. If a PPI network subsection is still desired, I would be more than happy to add one, but I understand the necessity to minimize additional length being added to this paper.

@cgreene cgreene mentioned this pull request Nov 3, 2017
17 tasks
agapow
agapow approved these changes Nov 8, 2017
Copy link
Contributor

@agapow agapow left a comment

Looks good - only minor suggestions. My one fear is that this is fairly technically detailed, perhaps moreso than a lot of the rest of the ms.

Loading

However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions.
Additionally, common types of high-throughput screens for PPIs, such as the yeast two-hybrid, can have issues with high rates of false positive results [@doi:10.1186/s12964-015-0116-8 @doi:10.1002/pmic.200800150].

This section will focus on advances in *de novo* PPI prediction.
Copy link
Contributor

@agapow agapow Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might benefit from a more explicit linking statement of the need for PPI prediction and thus DL.

Loading

Beyond predicting whether or not two proteins interact, Du et al. [@doi:10.1016/j.ymeth.2016.06.001] showed that a tandem stacked-autoencoder/deep-neural-network method could be used to predict residue contacts for the interfacial regions of interacting proteins.
A combination of a hidden Markov model with Fisher scores yielded uniform-length features for each residue. Their method significantly exceeded classical machine learning accuracy.

Because many studies used predefined higher-level features, one of the benefits of deep learning— automatic feature extraction— is not fully leveraged.
Copy link
Contributor

@agapow agapow Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before emdash

Loading

Because MHCnuggets had to be trained for every MHC allele, performance was far better for alleles with abundant, balanced training data.

In a comparison of several current methods, Bhattacharya et al. found that the top methods— NetMHC, NetMHCpan, MHCflurry, and MHCnuggets— showed comparable performance, but large differences in speed.
In the authors analysis, convolutional neural networks (in this case, HLA-CNN) showed comparatively poor performance, while shallow and recurrent neural networks performed the best.
Copy link
Contributor

@agapow agapow Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "in the authors analysis" as unnecessary

Loading

@agitter agitter mentioned this pull request Nov 9, 2017
@agitter agitter added this to the journal-revisions milestone Nov 17, 2017
@cgreene
Copy link
Member

@cgreene cgreene commented Dec 18, 2017

@zietzm were you planning to make modifications as discussed in #689? Wondering if we should wait for that before reviewing this PR.

Loading

@zietzm
Copy link
Contributor Author

@zietzm zietzm commented Dec 18, 2017

@cgreene I had planned to incorporate those changes here. Sorry for the delay in getting those updates ready. I want to get the section finished ASAP, and I hope to push some changes by the beginning of next week.

Loading

@cgreene
Copy link
Member

@cgreene cgreene commented Dec 18, 2017

@zietzm 👍 will wait to review further until then

Loading

@agitter agitter added the study label Dec 19, 2017
Copy link
Collaborator

@agitter agitter left a comment

Thanks for the great contributions. I have several suggestions, and my main comment is to think about how to summarize the many MHC methods. In some places I tried trimming text that isn't critical.

@cgreene do you think we need to further shorten this section? I think that if we can condense some of the MHC paragraphs we'll be okay.

Loading

@@ -424,6 +424,92 @@ summarized above also apply to interfacial contact prediction for protein
complexes but may be less effective since on average protein complexes have
fewer sequence homologs.

### Protein-Protein Interactions

Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807].
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comma before which

Loading

@@ -424,6 +424,92 @@ summarized above also apply to interfacial contact prediction for protein
complexes but may be less effective since on average protein complexes have
fewer sequence homologs.

### Protein-Protein Interactions
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize only the first Protein

Loading

### Protein-Protein Interactions

Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807].
PPIs are key to many cellular processes like metabolism and immune responses.
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPIs are involved in almost all cellular processes. Perhaps we could cut this line? I'm looking for places to shorten the text.

Loading

PPIs are key to many cellular processes like metabolism and immune responses.
Abundant interaction data have been generated in-part thanks to advances in high-throughput screening methods, such as yeast two-hybrid and affinity-purification with mass spectrometry.
However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions.
Additionally, common types of high-throughput screens for PPIs, such as the yeast two-hybrid, can have issues with high rates of false positive results [@doi:10.1186/s12964-015-0116-8 @doi:10.1002/pmic.200800150].
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this line, the new manubot style requires ; between references.

Loading

Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807].
PPIs are key to many cellular processes like metabolism and immune responses.
Abundant interaction data have been generated in-part thanks to advances in high-throughput screening methods, such as yeast two-hybrid and affinity-purification with mass spectrometry.
However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions.
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence alone might be enough to motivate the need for PPI prediction. Then you could cut the line about false positive rates, because a reader might wonder whether computational predictions really have lower false positive rates than Y2H.

Loading

A way of working with different network types was shown by Gligorijevic et al., [@doi:10.1101/223339] who developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks.
This common lower-level representation allows for the combination of various PPI data sources towards a single predictive task.
An SVM classifier trained on the compressed features from the middle layer of the autoencoder outperformed previous methods in predicting protein function.
The key advancement of this method is the use of deep learning to incorporate higher-order network information for protein function prediction.
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might already be clear enough from the rest of the paragraph. You could cut it.

Loading

The key advancement of this method is the use of deep learning to incorporate higher-order network information for protein function prediction.

Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2].
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes.
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change which to that.

Loading


Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2].
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes.
Generalization to unseen nodes is especially useful for PPI networks, as these networks represent various types of interactions between proteins in a variety of species, and they can be updated frequently.
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm following this. Isn't an unseen node a new protein in a PPI network? Do we encounter new proteins? Or is the idea that a trained model generalizes to new graphs?

Loading

Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2].
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes.
Generalization to unseen nodes is especially useful for PPI networks, as these networks represent various types of interactions between proteins in a variety of species, and they can be updated frequently.
In a classification task for the prediction of protein function, Chen and Zhu [@arxiv:1710.10568v1] optimized this approach and enhanced the graph convolutional network with a preprocessing step to improve significantly both training time and prediction accuracy.
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the preprocessing step?

Loading

They found that MHCnuggets — the recurrent neural network — was by far the fastest training among the top performing methods.
In predicting interactions between proteins, deep learning has achieved state-of-the-art results and shows promise to overcome previous challenges in the field.

### PPI networks and graph analysis
Copy link
Collaborator

@agitter agitter Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also discuss graph convolutions in the drug discovery section and could link those topics. We have a sentence Modern neural networks can operate directly on the molecular graph as input. that could be changed to Modern neural networks, such as those discussed previously for PPI networks, can operate directly on the molecular graph as input.

Loading

Copy link
Collaborator

@agitter agitter left a comment

Thanks, these are excellent revisions and address all of my initial comments. The only remaining items to resolve before merging are:

  • two minor commas noted here
  • decide what you'd like to do with the #### header
  • resolve conflicts with master

Loading


Shallow, feed-forward neural networks are competitive methods and have made progress toward pan-allele and pan-length peptide representations.
Sequence alignment techniques are useful for representing variable-length peptides as uniform-length features [@doi:10.1110/ps.0239403; @doi:10.1093/bioinformatics/btv639].
For pan-allelic prediction, NetMHCpan [@doi:10.1007/s00251-008-0341-z; @doi:10.1186/s13073-016-0288-x] used a pseudo-sequence representation of the MHC class I molecule which included only polymorphic peptide contact residues.
Copy link
Collaborator

@agitter agitter Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comma before which

Loading

MHCflurry's imputation method increases its performance on poorly characterized alleles, making it competitive with NetMHCpan for this task.
Kuksa et al. [@doi:10.1093/bioinformatics/btv371] developed a shallow, higher-order neural network (HONN) comprised of both mean and covariance hidden units to capture some of the higher-order dependencies between amino acid locations.
Copy link
Collaborator

@agitter agitter Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement, the HONN makes sense now.

Loading


An important challenge in PPI network prediction is the task of combining different networks and types of networks.
A way of working with different network types was shown by Gligorijevic et al., [@doi:10.1101/223339] who developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks.
Gligorijevic et al., [@doi:10.1101/223339] developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks.
Copy link
Collaborator

@agitter agitter Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove the comma after et al.

Loading

@zietzm
Copy link
Contributor Author

@zietzm zietzm commented Jan 4, 2018

@agitter thanks for your help on these sections! I think my last few commits should now have the PR ready.

Loading

agitter
agitter approved these changes Jan 4, 2018
@agitter agitter merged commit 75f0dc2 into greenelab:master Jan 4, 2018
2 checks passed
Loading
@agitter agitter mentioned this pull request Jan 4, 2018
7 tasks
dhimmel pushed a commit that referenced this issue Jan 4, 2018
This build is based on
75f0dc2.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/deep-review/builds/325010006
https://travis-ci.org/greenelab/deep-review/jobs/325010007

[ci skip]

The full commit message that triggered this build is copied below:

Added PPI section with MHC subsection (#638)

* Added PPI and MHC sections

* Updates to PPI/MHC subsection

* PPI network section

* Updates to all PPI sections

* Commas and header

* Remove accidental newline

* Re-add PPI section reference

", such as those discussed previously for PPI networks,"
dhimmel pushed a commit that referenced this issue Jan 4, 2018
This build is based on
75f0dc2.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/deep-review/builds/325010006
https://travis-ci.org/greenelab/deep-review/jobs/325010007

[ci skip]

The full commit message that triggered this build is copied below:

Added PPI section with MHC subsection (#638)

* Added PPI and MHC sections

* Updates to PPI/MHC subsection

* PPI network section

* Updates to all PPI sections

* Commas and header

* Remove accidental newline

* Re-add PPI section reference

", such as those discussed previously for PPI networks,"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants