Skip to content

Predicting immunogenic peptide recognized by TCR through ensemble deep learning

License

Notifications You must be signed in to change notification settings

JiangBioLab/DLpTCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor

Overview

Here, we report DLpTCR a computational framework that integrated three deep-learning models for predicting the likelihood of the interaction between TCR and peptide presented by MHC molecules. DLpTCR obtained excellent performance on independent testing dataset, thereby allowing robust identification of immunogenic T cell epitopes.

Installation

Download DLpTCR by

git clone https://github.com/JiangBioLab/DLpTCR

This package can be installed in this ways (the easy way):

# If needed:
pip install -r requirements.txt
# Or
conda install --yes --file requirements.txt
# Or you can create a new environment 
conda create --name dlptcr --file requirements.txt

Note the code depends on the numpy, tensorflow and other packages. So have those installed first. The build will likely fail if it can't find them. For more information, see:

  • NumPy: Library for efficient matrix math in Python
  • tensorflow: An end-to-end open source machine learning platform in Python

Contents

data

We collected experimentally verified TCR-pMHC pairs from the VDJdb, IEDB and
TetTCR-seq dataset for constructing a high-quality benchmark dataset. These peptide-TCR pairs were split into training, testing and independent testing datasets with regard to their TCR α- and β-chains so that each peptide-TCR pair only existed in one split, in detail as following: 1)TCRA_train.csv and TRB_Train.csv are the training datasets for constructing and training the models. 2)TCRA_test.csv and TCRB_test.csv are the testing datasets for testing the constructed models. 3)TCRA_COVID-19.csv and TCRB_COVID-19.csv are independent testing data for evaluating the performance of ensemble classifiers. 4)TRA-VDJdb_TCR cross-reactivity.rar and TRB_VDJdb_TCR cross-reactivity.rar are used to assess the prediction ability of ensemble classifiers for TCR cross-reactivity. 5)TCRAB_IEDB.csv is used to evaluate the integrated model for predicting the peptide-TCRαβ interaction.

model

The final base classifiers of DLpTCR are deposited in this folder.

  1. FULL_A_ALL_onehot.h5, CNN_A_ALL_onehot.h5 and RESNET_A_ALL_pca15.h5 are the base classifiers of ensemble model for predicting the peptide-TCRα interaction.
  2. FULL_B_ALL_pca18.h5, CNN_B_ALL_pca20.h5 and RESNET_B_ALL_pca10.h5 are the base classifiers of ensemble model for predicting the peptide-TCRβ interaction.

pca

The folder contains the features generated by full training datasets using PCA encoding method. we padded each sequence of a pair to the maximum length of 20 and encoded them using Principal Component Analysis (PCA) encoding. For each amino acid, we selected the top 20 PCs explained over 95% of total data variation and generated different vectors using 8-20 PCs to represent its biochemical signatures, respectively.

code

The source code of feature extraction, five-fold cross-validation, model construction and training, and prediction are deposited in this floder 'code'.

  1. The source code in folder 'fold' are used to select the appropriate features by five-fold cross validation.
  2. The source code in folder 'train' are used to construct and train the base classifiers.
  3. The source code (XXX_Feature_Extraction.py) is used to implement feature extraction.
  4. The source code (DLpTCR.py) is used to predict the peptide-TCR interaction.

How to Use

Running on GPU or CPU

After you install DLpTCR, TensorFlow will be installed along with DLpTCR. Refer to Keras documentation to configure TensorFlow to run on GPU/CPU. Note that, if you want to use GPU, you also need to install CUDA and cuDNN; refer to their websites for instructions. If you use "conda install tensorflow-gpu" to install TensorFlow. CPU is only suitable for predicting not training.

For general users who want to perform immunogenic peptide prediction by our provided model :

cd to the DLpTCR/code folder which contains DLpTCR_server.py, Model_Predict_Feature_Extraction.py. python >>> from Feature_Extraction import * >>> from DLpTCR_server import * >>> input_file_path = '../data/Example_file.xlsx'

Please refer to document 'Example_file.xlsx' for the format of the input file. Column names are not allowed to change.

>>> model_select = "AB"  

model:pTCRα user_select = "A" model:pTCRβ user_select = "B" model:pTCRαβ user_select = "AB"

>>> job_dir_name = 'test'
>>> user_dir = './user/' + str(job_dir_name) + '/'

The predicted files will be stored in the path "user_dir".

>>> user_dir_Exists = os.path.exists(user_dir)
>>> if not user_dir_Exists: 
    os.makedirs(user_dir)

>>> error_info,TCRA_cdr3,TCRB_cdr3,Epitope = deal_file(input_file_path, user_dir, model_select)
>>> output_file_path = save_outputfile(user_dir, user_select, input_file_path,TCRA_cdr3,TCRB_cdr3,Epitope)

also,you can use the API.py to predict the peptide-TCR interaction.

python API.py

For advanced users who want to perform training and predicting by using their own data:

For custom training:

CPU is only suitable for prediction not training. For custom general training using user’s training data:

python Train_Test_Onehot_Chem_Feature_Extraction.py
python Train_Test_PCA_Feature_Extraction.py

The code in Folder DLpTCR/code/fold is then used for 5-fold cross-validation to filter out the best features:

#example
python CNN_A_fold_onehot.py

The code in folder DLpTCR/code/train is then used to filter out the best features for model training

#example
python CNN_A_ALL_onehot.py

Citation:

Please cite the following paper for using DLpTCR:

DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor

About

Predicting immunogenic peptide recognized by TCR through ensemble deep learning

Resources

License

Stars

Watchers

Forks

Languages