The 5-fold cross-validation split used in the paper can be downloaded from here. The DeepLigand model provided in this repository is trained on all the five folds combined.
- R > 3.3
- CUDA 8.0 with cudnn 5.1
With the above prerequisites installed, install and activate a Conda environment with all necessary Python packages by:
conda env create -f environment.yml
source activate deepligand
python update_bilm.py
To deactivate this environment:
source deactivate
python preprocess.py -f $INFILE -o $OUTDIR
INFILE
: a file of MHC-peptide pair to predict on (example). The names of the MHC supported are listed in the first column of this file.OUTDIR
: output directory
python main.py -p $OUTDIR/test.h5.batch -o $OUTDIR/prediction
OUTDIR
: output directory
The resulting predictions will be saved as HDF5 dataset under $OUTDIR/prediction
in batches. Below is an example of access the dataset in the first batch:
import h5py
with h5py.File('$OUTDIR/prediction/h5.batch1', 'r') as f:
pred = f['pred'][()]
The dataset (pred
) has three columns. The first two columns correspond to the predicted mean and variance (2nd column) of binding affinity between the input peptide and MHC allele. The third column is the predicted probablity that the input peptide is a natural ligand of the input MHC allele.