hlabud provides methods to retrieve sequence alignment data from IMGTHLA and convert the data into convenient R matrices ready for downstream analysis. See the usage examples to learn how to use the data with logistic regression and dimensionality reduction. We also share tips on how to visualize the 3D molecular structure of HLA proteins and highlight specific amino acid residues.
For example, let’s consider a simple question about two HLA genotypes.
What amino acid positions are different between two genotypes?
library(hlabud)
a <- hla_alignments("DRB1")
a$release
## [1] "3.56.0"
dosage(a$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
## F26 Y26 D28 E28 F47 Y47 G86 V86
## DRB1*03:01:05 0 1 1 0 1 0 0 1
## DRB1*03:02:03 1 0 0 1 0 1 1 0
What nucleotides are different?
n <- hla_alignments("DRB1", type = "nuc")
n$release
## [1] "3.56.0"
dosage(n$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
## A164 T164 C171 G171 A227 T227 A240 G240 G344 T344 G345 T345 A357 G357
## DRB1*03:01:05 1 0 1 0 0 1 1 0 0 1 1 0 1 0
## DRB1*03:02:03 0 1 0 1 1 0 0 1 1 0 0 1 0 1
The quickest way to get hlabud is to install from GitHub:
# install.packages("devtools")
devtools::install_github("slowkow/hlabud")
See the usage examples to get some ideas for how to use hlabud in your analyses.
-
Get HLA allele frequencies from Allele Frequency Net Database (AFND)
-
Download and unpack all data from the latest IMGTHLA release
hlabud
provides access to the data in IMGT/HLA database. Therefore, if
you use hlabud
then please cite the IMGT/HLA paper:
- Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. IPD-IMGT/HLA Database. Nucleic Acids Res. 2020;48: D948–D955. https://doi.org/10.1093/nar/gkz950
hlabud
also provides access to the data in Allele Frequency Net
Database (AFND). Therefore, if you use hlabud::hla_frequencies()
then
please cite the AFND paper:
- Gonzalez-Galarza FF, McCabe A, Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools. Nucleic Acids Res. 2020;48: D783–D788. https://doi.org/10.1093/nar/gkz1029
Additionally, you can also cite the hlabud
package like this:
- Slowikowski K. hlabud: HLA analysis in R. Zenodo. https://doi.org/10.5281/zenodo.11093557
I recommend this article for anyone new to HLA, because the beautiful figures help to build intuition:
- La Gruta NL, Gras S, Daley SR, Thomas PG, Rossjohn J. Understanding the drivers of MHC restriction of T cell receptors. Nat Rev Immunol. 2018;18: 467–478.
Learn about the conventions for HLA nomenclature:
- Marsh SGE, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, et al. Nomenclature for factors of the HLA system, 2010. Tissue Antigens. 2010;75: 291–455.
HATK is set of Python scripts for processing and analyzing IMGT-HLA data. Here is the related article:
- Choi W, Luo Y, Raychaudhuri S, Han B. HATK: HLA analysis toolkit. Bioinformatics. 2021;37: 416–418. doi:10.1093/bioinformatics/btaa684
For case-control analysis of HLA genotype data, consider the BIGDAWG R package available on CRAN. Here is the related article:
- Pappas DJ, Marin W, Hollenbach JA, Mack SJ. Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline. Hum Immunol. 2016;77: 283–287.
HLAdivR is another R package for calculating HLA divergence.