Skip to content
/ USMPep Public

Major Histocompatibility Complex (MHC) Binding Affinity Prediction

License

Notifications You must be signed in to change notification settings

nstrodt/USMPep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

USMPep: Universal Sequence Models for Major Histocompatibility Complex Binding Affinity Prediction

USMPep is a simple recurrent neural network for MHC binding affinity prediction. It is competitive with state-of-the-art tools for a single model trained from scratch, while ensembling multiple regressors and language model pretraining can slightly improve its performance. In our paper we report the excellent predictive performance of USMPep on several benchmark datasets.

For a detailed description of technical details and experimental results, please refer to our paper:

USMPep: Universal Sequence Models for Major Histocompatibility Complex Binding Affinity Prediction

Johanna Vielhaben, Markus Wenzel, Wojciech Samek, and Nils Strodthoff

@article{Vielhaben:2020USMPep,
author = {Vielhaben, Johanna and Wenzel, Markus and Samek, Wojciech and Strodthoff, Nils},
title = {{USMPep: Universal Sequence Models for Major Histocompatibility Complex Binding Affinity Prediction}},
journal = {BMC Bioinformatics},
year = {2020},
month={Jul},
volume = {21},
number = {1},
pages={279},
issn={1471-2105},
doi = {10.1186/s12859-020-03631-1},
url= {https://doi.org/10.1186/s12859-020-03631-1}
}

This is the accompanying code repository where we also provide a pretrained language model and predictions of our models on the test datasets discussed in our paper.

We present an extended version of USMPep, that we evaluated on a recent SARS-CoV-2 dataset, in our paper:

Predicting the Binding of SARS-CoV-2 Peptides to the Major Histocompatibility Complex with Recurrent Neural Networks

Johanna Vielhaben, Markus Wenzel, Eva Weicken, Nils Strodthoff

@misc{Vielhaben:2021USMPep,
      title={Predicting the Binding of SARS-CoV-2 Peptides to the Major Histocompatibility Complex with Recurrent Neural Networks}, 
      author={Johanna Vielhaben and Markus Wenzel and Eva Weicken and Nils Strodthoff},
      year={2021},
      eprint={2104.08237},
      archivePrefix={arXiv},
      primaryClass={q-bio.QM}
}

USMPep builds on the UDSMProt-framework: Universal Deep Sequence Models for Protein Classification

Dependencies

for training/evaluation: pytorch fastai fire

for dataset creation: numpy pandas scikit-learn biopython sentencepiece lxml

Installation

We recommend using conda as Python package and environment manager. Either install the environment using the provided proteomics.yml by running conda env create -f proteomics.yml or follow the steps below:

  1. Create conda environment: conda create -n proteomics and conda activate proteomics
  2. Install pytorch: conda install pytorch -c pytorch
  3. Install fastai: conda install -c fastai fastai=1.0.52
  4. Install fire: conda install fire -c conda-forge
  5. Install scikit-learn: conda install scikit-learn
  6. Install Biopython: conda install biopython -c conda-forge
  7. Install sentencepiece: pip install sentencepiece
  8. Install lxml: conda install lxml

Optionally (for support of threshold 0.4 clusters) install cd-hit and add cd-hit to the default searchpath.

Usage

See the USMPep User Guide for extensive usage information.

A second User Guide provides usage information for the extended version of USMPep.

Binding Affinity Predictions

We provide peptide binding affinity predictions for our tools, see git-data-folder and the corresponding readme file for details.

About

Major Histocompatibility Complex (MHC) Binding Affinity Prediction

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published