Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


MHCfovea integrates a supervised prediction module and an unsupervised summarization module to connect important residues to binding motifs.


First, the MHCfovea's predictor was trained on 150 observed alleles; 42 important positions were highlighted from MHC-I sequence (182 a.a.) using ScoreCAM. Next, we made predictions on 150 observed and 12,858 unobserved alleles against a peptide dataset (number: 254,742), and extracted positive predictions (score > 0.9) to generate the binding motif of an allele. Finally, after clustering the N- and C-terminal sub-motifs, we built hyper-motifs and the corresponding allele signatures based on 42 important positions to reveal the relation between binding motifs and MHC-I sequences.

The resultant pairs of hyper-motifs and allele signatures can be easily queried through a web interface (


MHCfovea takes MHC-I alleles (all alleles in the IPD-IMGT/HLA database (version 3.41.0) are available) and peptide sequences as inputs to predict the binding probability. For each queried allele, MHCfovea provides the cluster information and allele information of N- and C-terminal clusters respectively.

  • cluster information
    • hyper-motif: the pattern of binding peptides in a specific cluster
    • allele signature: the pattern of MHC-I alleles in a specific cluster
  • allele information
    • sub-motif: the binding sub-motif of the queried allele
    • highlighted allele signature: the consensus residues of the allele signature and the queried allele

If you find MHCfovea useful in your research please cite:

@article {MHCfovea_2021,
title   = {Connecting {MHC}-{I}-binding motifs with {HLA} alleles via deep learning},
author  = {Lee, Ko-Han and Chang, Yu-Chuan and Chen, Ting-Fu and Juan, Hsueh-Fen and Tsai, Huai-Kuang and Chen, Chien-Yu},
journal = {Communications Biology},
year    = {2021},
volume  = {4},
number  = {1},
pages   = {1194},
doi     = {10.1038/s42003-021-02716-8},
issn    = {2399-3642}


  1. Python3 is required
  2. Download/Clone MHCfovea
git clone
cd MHCfovea
  1. Install reqiured package
pip3 install -r requirements.txt


usage: predictor [-h] [--alleles ALLELES] [--get_metrics] input output_dir

    MHCfovea, an MHCI-peptide binding predictor. In this prediction process, GPU is recommended.

    Having two modes:
    1. specific mode: each peptide has its corresponding MHC-I allele in the input file; column "mhc" or "allele" is required
    2. general mode: all peptides are predicted with all alleles in the "alleles" argument

    Input file:
    only .csv file is acceptable
    column "sequence" or "peptide" is required as peptide sequences
    column "mhc" or "allele" is optional as MHC-I alleles

    Output directory contains:
    1. prediction.csv: with new column "score" for specific mode or [allele] for general mode
    2. interpretation: a directory contains interpretation figures of each allele
    3. metrics.json: all and allele-specific metrics (AUC, AUC0.1, AP, PPV); column "bind" as benchmark is required

positional arguments:
  input              The input file
  output_dir         The output directory

optional arguments:
  -h, --help         show this help message and exit
  --alleles ALLELES  alleles for general mode
  --get_metrics      calculate the metrics between prediction and benchmark


python3 mhcfovea/ example/input.csv example/output

input file

sequence mhc

output file

sequence mhc score %rank
PVPTYGLSV B*07:02 0.606 0.616
APGARNTAAVL B*07:02 0.987 0.015
SPAPPTCHEL B*07:02 0.997 0.004
PGLAVKELK B*07:02 0.569 0.692
GPMVAGGLL B*07:02 0.966 0.024

interpretation figure


The folder of development contains all source codes for the development of MHCfovea. The following is the description of these files.

  • for building training, validation, and benchmark dataset
  • utility functions for data analysis
  • for the training process
  • the model architecture
  • utility functions for training process
  • for the prediction process
  • functions for CAM algorithm
  • for the CAM process
  • for the prediction on all HLA alleles
  • utility functions for the interpretation of ScoreCAM results
  • utility functions for the summarization