Skip to content

slowkow/hlabud

Repository files navigation

hlabud

R-CMD-check DOI

hlabud provides methods to retrieve sequence alignment data from IMGTHLA and convert the data into convenient R matrices ready for downstream analysis. See the usage examples to learn how to use the data with logistic regression and dimensionality reduction. We also share tips on how to visualize the 3D molecular structure of HLA proteins and highlight specific amino acid residues.

For example, let’s consider a simple question about two HLA genotypes.

What amino acid positions are different between two genotypes?

library(hlabud)
a <- hla_alignments("DRB1")
a$release
## [1] "3.56.0"
dosage(a$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
##               F26 Y26 D28 E28 F47 Y47 G86 V86
## DRB1*03:01:05   0   1   1   0   1   0   0   1
## DRB1*03:02:03   1   0   0   1   0   1   1   0

What nucleotides are different?

n <- hla_alignments("DRB1", type = "nuc")
n$release
## [1] "3.56.0"
dosage(n$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
##               A164 T164 C171 G171 A227 T227 A240 G240 G344 T344 G345 T345 A357 G357
## DRB1*03:01:05    1    0    1    0    0    1    1    0    0    1    1    0    1    0
## DRB1*03:02:03    0    1    0    1    1    0    0    1    1    0    0    1    0    1

Installation

The quickest way to get hlabud is to install from GitHub:

# install.packages("devtools")
devtools::install_github("slowkow/hlabud")

Examples

See the usage examples to get some ideas for how to use hlabud in your analyses.

Citation

hlabud provides access to the data in IMGT/HLA database. Therefore, if you use hlabud then please cite the IMGT/HLA paper:

hlabud also provides access to the data in Allele Frequency Net Database (AFND). Therefore, if you use hlabud::hla_frequencies() then please cite the AFND paper:

Additionally, you can also cite the hlabud package like this:

Related work

I recommend this article for anyone new to HLA, because the beautiful figures help to build intuition:

Learn about the conventions for HLA nomenclature:

HATK is set of Python scripts for processing and analyzing IMGT-HLA data. Here is the related article:

  • Choi W, Luo Y, Raychaudhuri S, Han B. HATK: HLA analysis toolkit. Bioinformatics. 2021;37: 416–418. doi:10.1093/bioinformatics/btaa684

For case-control analysis of HLA genotype data, consider the BIGDAWG R package available on CRAN. Here is the related article:

HLAdivR is another R package for calculating HLA divergence.