# Prepare HLA allelic sequences from IPD-IMGT/HLA database

By Joyce Kang (joyce_kang@hms.harvard.edu)

Last updated Feb 15, 2023

This notebook describes how to generate a database of allelic sequences for scHLApers, starting from the IMGTHLA database downloaded from https://github.com/ANHIG/IMGTHLA.

Note that for the Kang et al. paper (and this tutorial) we used IMGTHLA version 3.47.0 and provide the preprepared database. If you use the preprepared database, you can skip these steps and move straight to Step 2 of the pipeline (`2_make_personalized_refs`). If you want to prepare your own database with the newest IMGTHLA version, you can run the steps below.

In [1]:
# Load libraries
suppressPackageStartupMessages({
    library(Biostrings)
    library(dplyr)
    library(purrr)
    library(readr)
    library(tidyr)
    library(stringr)
    library(stringi)
})

source('hlaseqlib_code.R')

The `hlaseqlib_code.R` file contains utility functions derived from https://github.com/genevol-usp/hlaseqlib, written by Vitor Aguiar. Use the `hlaseqlib_code.R` functions rather than the original hlaseqlib because they have been modified to accomodate genomic sequences.

# Process genomic .fa files

## Example of genomic alignment and how to compile the index

Let's read in some example gene alignments. The starting files are in the `IMGTHLA/alignments` folder. 
In each alignment, a dash (–) indicates identity to the reference sequence, an asterisk (\*) denotes an unsequenced/unknown base, and period (.) denotes locations of insertion/deletions (via: https://www.ebi.ac.uk/ipd/imgt/hla/alignment/help/).

In [2]:
# Read in the alignments file for HLA-DRB1
hla_df = hla_read_alignment('DRB1', 'IMGTHLA', imgtfile='gen')
hla_df$len = str_length(hla_df$cds)
unique(hla_df$len) # All aligned sequences are of length 18487

Note that the column is named "cds" but actually contains the genomic sequence (not CDS).

In [3]:
hla_df %>% head(4)

allele,cds,len
<chr>,<chr>,<int>
DRB1*01:01:01:01,********************************GAAAGACCTGAAAGATCACGGTGCCTTCATTTCAA.CTGTGAGACATGAAGTAATTTTCCCAAATCTACAACATTAAGATATGGTGCAATAAGGACCAGATTAAAGGTCTCCTGATTTGCGGCCATGTTCCCTCCATCTCCTTTACTCCTAAACACACTCACAC..TCACTACTGCAAATAGTTGTCTTGTCAAGTGGGAAATGAATGCTCTTACAAGGCTCAAACTTGTGAACACATCACTGACCAGCACAGAGCT...........GGCTA.CAATAGCTCCCCAATTAAGGTGTTTTACATGCAACTGGTTCAAACCTTCCAAGTGCTAAATTAAAACAATCCTTTAAAGAAGGAAATTCTGTTTCAGAA.GAGGACCTTCATACAGCATCTCTGACCAGCAACTGATGATGCTATTGAACTCAGATGCTGATTGGTTCTCCAACACGAGATTACCCAACCCAGGAGCAAGGAAATCAGTAACTTCCTCCCTATAACTTGGAATGTGGGTGGAG.GGGTTCATAGTTCTCCCTGAGTGAGACTTGCCTGCTTCTCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATGACAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAA.TAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCAT.TGGATCCCAAATTATTT..CCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTT...CTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAA.TAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTA.CTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCA.TAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTA............................................AAGAACTTAAGGCATCCTCTGAAAAA.CTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTT.AACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAA...TGTATAAATTATATGAAATTCTATTTTT.TAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCT.....ATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACT.TTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAA.GAAATGACTCTTAATAAAAAAAT.TTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATG.TTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTG..AAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGG.TAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTT.ATT...ATTTATTTACCT.ATTT.A.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................TTTGTTTATTTATTTTATTTTATTTT....GTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTT.GTATTTTT..AGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAA.CTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTT.GCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGA........GCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGG.....CTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAA.TTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTT.CAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGA.GAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGT.AGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGG.CTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAA....ATGAATCC.GGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAA..CATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATT.GATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTT.CAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAA...TAA.TGTGAGCTGCAGTTATAGGGATCAAAAAAGTTA.CAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAA.TTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGT.ACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTT.CCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTA....................................................................................................................................................................................................................................................................................................................TATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTT..........GACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATA..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................TCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCT..........GTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGG.GGG.TCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCC.GTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTT.CGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT....................................................AAG....................AAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTC....TGTTGTGGGAGGGGAGGCAGGAGGGGG.CTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTA.....TATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAA..CACAGTTACTATTTTATTATAATGCTA...GTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTC.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................CCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTT..ATTTGTAATTGCTTTGAATTATTTT.GACCTATTTATTGGCGGATTATAATTATTGCTCT.....AAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAAC..TTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAA.TAGTTGTATTTATTGAGAGTATGCTAGTGT..TGGGGATTTT.CCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAA.TTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAA.TTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCT.GAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGG.TAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTG.AAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGC.TCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTA...................................................................................................................................................................................................................................................................................................................................CAAGAGGTTTCTAAA.TGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTC.TCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCAC.TTT.TAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACTCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTG..........C..........TTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAAT.GATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGT.TGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGAATGAAAAGCTTTCCTGCTTGGCAGTTATTCTTCCACAAGAGAGGGCTTTCTCAGGACCTGGTTGCTACTGGTTCGGCAACT..GCAGAAAATGTCCTCCCTTGTGGCTTCCTCAGCTCCTGCCCTTGGCCTGAAGTCCCA..GCATTGATGACAGCGCCTCATCTTC.AACTTTT******************************************************************************************************,18487
DRB1*01:01:01:02,*******************************************************************.********************************************************************************************************************************..*******************************************************************************************...........*****.***************************************************************************************************.*******************************************************************CTCCAACACGAGATTACCCAACCCAGGAGCAAGGAAATCAGTAACTTCCTCCCTATAACTTGGAATGTGGGTGGAG.GGGTTCATAGTTCTCCCTGAGTGAGACTTGCCTGCTTCTCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATGACAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAA.TAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCAT.TGGATCCCAAATTATTT..CCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTT...CTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAA.TAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTA.CTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCA.TAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTA............................................AAGAACTTAAGGCATCCTCTGAAAAA.CTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTT.AACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAA...TGTATAAATTATATGAAATTCTATTTTT.TAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCT.....ATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACT.TTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAA.GAAATGACTCTTAATAAAAAAAT.TTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATG.TTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTG..AAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGG.TAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTT.ATT...ATTTATTTACCT.ATTT.A.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................TTTGTTTATTTATTTTATTTTATTTT....GTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTT.GTATTTTT..AGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCACCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAA.CTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTT.GCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGA........GCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGG.....CTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAA.TTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTT.CAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGA.GAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGT.AGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGG.CTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAA....ATGAATCC.GGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAA..CATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATT.GATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTT.CAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAA...TAA.TGTGAGCTGCAGTTATAGGGATCAAAAAAGTTA.CAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAA.TTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGT.ACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTT.CCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTA....................................................................................................................................................................................................................................................................................................................TATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTT..........GACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATA..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................TCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCT..........GTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGG.GGG.TCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCC.GTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTT.CGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT....................................................AAG....................AAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTC....TGTTGTGGGAGGGGAGGCAGGAGGGGG.CTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTA.....TATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAA..CACAGTTACTATTTTATTATAATGCTA...GTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTC.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................CCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTT..ATTTGTAATTGCTTTGAATTATTTT.GACCTATTTATTGGCGGATTATAATTATTGCTCT.....AAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAAC..TTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAA.TAGTTGTATTTATTGAGAGTATGCTAGTGT..TGGGGATTTT.CCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAA.TTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAA.TTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCT.GAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGG.TAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTG.AAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGC.TCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTA...................................................................................................................................................................................................................................................................................................................................CAAGAGGTTTCTAAA.TGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTC.TCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCAC.TTT.TAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACTCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTG..........C..........TTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAAT.GATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGT.TGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGAATGAAAAGCTTTCCTGCTTGGCAGT********************************************************..*********************************************************..*************************.*************************************************************************************************************,18487
DRB1*01:01:01:03,*******************************************************************.********************************************************************************************************************************..*******************************************************************************************...........*****.***************************************************************************************************.*******************************************************************CTCCAACACGAGATTACCCAACCCAGGAGCAAGGAAATCAGTAACTTCCTCCCTATAACTTGGAATGTGGGTGGAG.GGGTTCATAGTTCTCCCTGAGTGAGACTTGCCTGCTTCTCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATGACAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAA.TAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCAT.TGGATCCCAAATTATTT..CCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTT...CTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAA.TAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTA.CTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCA.TAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTA............................................AAGAACTTAAGGCATCCTCTGAAAAA.CTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTT.AACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAA...TGTATAAATTATATGAAATTCTATTTTT.TAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCT.....ATTTTTCCCCAGCTATGTTGTCATCACTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACT.TTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAA.GAAATGACTCTTAATAAAAAAAT.TTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATG.TTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTG..AAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGG.TAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTT.ATT...ATTTATTTACCT.ATTT.A.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................TTTGTTTATTTATTTTATTTTATTTT....GTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTT.GTATTTTT..AGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAA.CTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTT.GCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGA........GCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGG.....CTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAA.TTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTT.CAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGA.GAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGT.AGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGG.CTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAA....ATGAATCC.GGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAA..CATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATT.GATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTT.CAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAA...TAA.TGTGAGCTGCAGTTATAGGGATCAAAAAAGTTA.CAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAA.TTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGT.ACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTT.CCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTA....................................................................................................................................................................................................................................................................................................................TATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTT..........GACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATA..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................TCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCT..........GTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGG.GGG.TCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCC.GTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTT.CGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT....................................................AAG....................AAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTC....TGTTGTGGGAGGGGAGGCAGGAGGGGG.CTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTA.....TATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAA..CACAGTTACTATTTTATTATAATGCTA...GTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTC.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................CCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTT..ATTTGTAATTGCTTTGAATTATTTT.GACCTATTTATTGGCGGATTATAATTATTGCTCT.....AAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAAC..TTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAA.TAGTTGTATTTATTGAGAGTATGCTAGTGT..TGGGGATTTT.CCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAA.TTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAA.TTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCT.GAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGG.TAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTG.AAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGC.TCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTA...................................................................................................................................................................................................................................................................................................................................CAAGAGGTTTCTAAA.TGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTC.TCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCAC.TTT.TAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACTCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTG..........C..........TTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAAT.GATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGT.TGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGAATGAAAA**************************************************************************..*********************************************************..*************************.*************************************************************************************************************,18487
DRB1*01:01:01:04,*******************************************************************.********************************************************************************************************************************..*******************************************************************************************...........*****.***************************************************************************************************.***********************************************************************************************************************************************.******************GAGTGAGACTTGCCTGCTTCTCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATGACAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAA.TAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCAT.TGGATCCCAAATTATTT..CCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTT...CTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAA.TAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTA.CTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCA.TAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTA............................................AAGAACTTAAGGCATCCTCTGAAAAA.CTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTT.AACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAA...TGTATAAATTATATGAAATTCTATTTTT.TAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCT.....ATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACT.TTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAA.GAAATGACTCTTAATAAAAAAAT.TTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATG.TTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTG..AAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGG.TAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTT.ATT...ATTTATTTACCT.ATTT.A.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................TTTGTTTATTTATTTTATTTTATTTT....GTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTT.GTATTTTT..AGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAA.CTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTT.GCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGA........GCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGG.....CTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAA.TTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTT.CAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGA.GAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGT.AGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGG.CTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAA....ATGAATCC.GGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAA..CATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATT.GATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTT.CAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAA...TAA.TGTGAGCTGCAGTTATAGGGATCAAAAAAGTTA.CAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAA.TTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGT.ACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTT.CCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTA....................................................................................................................................................................................................................................................................................................................TATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTT..........GACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATA..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................TCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCT..........GTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGG.GGG.TCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCC.GTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTT.CGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT....................................................AAG....................AAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTC....TGTTGTGGGAGGGGAGGCAGGAGGGGG.CTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTA.....TATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAA..CACAGTTACTATTTTATTATAATGCTA...GTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTC.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................CCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTT..ATTTGTAATTGCTTTGAATTATTTT.GACCTATTTATTGGCGGATTATAATTATTGCTCT.....AAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAAC..TTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAA.TAGTTGTATTTATTGAGAGTATGCTAGTGT..TGGGGATTTT.CCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAA.TTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAA.TTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCT.GAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGG.TAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTG.AAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGC.TCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTA...................................................................................................................................................................................................................................................................................................................................CAAGAGGTTTCTAAA.TGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTC.TCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCAC.TTT.TAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACCCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTG..........C..........TTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAAT.GATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGT.TGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGAATGAAAAGCTTTCCTGCTTGGCAGTTATTCTTCCACAAGAGAGGGCTTTCTCAGGACCTGGTTGCTACTGGTTCGGCAACT..GCAGAAAATGTCCTCCCTTGTGGCTTCCTCAGCTCCTGCCCTTGGCCTGAAGTCCCA..GCATTGATGACAG************.*************************************************************************************************************,18487


The next step processes HLA alignment data in `*.gen` files and return a data frame, filling in missing bases for any allele with the bases from the closest allele.

In [4]:
hladb = tibble(locus = 'DRB1') %>%
    mutate(data = map(locus, ~hla_compile_index(., 'IMGTHLA'))) %>%
    filter(!is.na(data)) %>%
    unnest(data) %>%
    filter(!grepl("N$", allele)) %>%
    select(-locus) %>%
    mutate(allele = paste0("IMGT_", allele))

# Print distribution of allele sequence lengths
hladb$len = str_length(hladb$cds)
table(hladb$len)

Processing locus DRB1

“The `...` argument of `group_indices()` is deprecated as of dplyr 1.0.0.
Please `group_by()` first



11229 11232 11237 11240 11260 11545 11546 11547 11548 11565 11567 11568 11569 
    2    11     2     1     1     1     3     4     1     1     6     3     7 
11571 11573 11576 11581 11583 11587 13882 13883 13890 13893 13894 13895 13896 
    4     6     1     1     2     3     3     1     1     2     9     3     2 
13897 13898 13899 13900 13901 13902 13903 13904 13905 13906 13907 13908 13909 
    1     6     1     2    12     2    10     2    11     2     7     3     6 
13911 13912 13913 13914 13915 13916 13917 13918 13919 13920 13921 13922 13923 
    3     3     8    13     4     8    11     3    14     3    11     2     1 
13924 13925 13926 13927 13928 13930 13931 13932 13933 13934 13935 13936 13937 
    4     2     2     2     1     1     3     4     2     5     5     2     2 
13938 13940 13941 13946 13950 13953 13994 15229 15231 15232 15233 15235 15236 
    1     1     4     2     3     1     1     2     8     6     2     4     2 
15237 15239 15240 15241 15242 15243 15244 15245 152

You can confirm that the allelic sequences match those published on the IPD-IMGT/HLA website: https://www.ebi.ac.uk/ipd/imgt/hla/alignment/ (remember to select "Genomic-Full Length")

In [5]:
example_alleles = c('IMGT_DRB1*01:01:01:01', 'IMGT_DRB1*03:01:01:01','IMGT_DRB1*04:01:01:01')
hladb[which(hladb$allele %in% example_alleles),] %>% head()

allele,cds,len
<chr>,<chr>,<int>
IMGT_DRB1*01:01:01:01,GCATCCACAGAATCACATTTTCTAGTGTTGAAAGACCTGAAAGATCACGGTGCCTTCATTTCAACTGTGAGACATGAAGTAATTTTCCCAAATCTACAACATTAAGATATGGTGCAATAAGGACCAGATTAAAGGTCTCCTGATTTGCGGCCATGTTCCCTCCATCTCCTTTACTCCTAAACACACTCACACTCACTACTGCAAATAGTTGTCTTGTCAAGTGGGAAATGAATGCTCTTACAAGGCTCAAACTTGTGAACACATCACTGACCAGCACAGAGCTGGCTACAATAGCTCCCCAATTAAGGTGTTTTACATGCAACTGGTTCAAACCTTCCAAGTGCTAAATTAAAACAATCCTTTAAAGAAGGAAATTCTGTTTCAGAAGAGGACCTTCATACAGCATCTCTGACCAGCAACTGATGATGCTATTGAACTCAGATGCTGATTGGTTCTCCAACACGAGATTACCCAACCCAGGAGCAAGGAAATCAGTAACTTCCTCCCTATAACTTGGAATGTGGGTGGAGGGGTTCATAGTTCTCCCTGAGTGAGACTTGCCTGCTTCTCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGCTCCCTGGAGGCTCCTGCATGACAGCGCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGCGGGTGCTGAGCTACTATGGGGTGGGGAAAATAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAATTGTGACGTTTTCTTCAGAGATTGCCCATCTTTATCATTGGATCCCAAATTATTTCCTCCATAAAAGGAGCTTGGGTACTTGCCCTCTTCATGAGACTTGTGTAAGGGGCCTTTGCACAAGTCATTTCTTTTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATTCTAAAATAAATTCCCCATACAGCACTTCCCTTTATTATGTTGACTTATGTCAGACAAAAGGAGGTTCTTACTGAAAATTTTGTGGGAGTCAAGGGAATTCAAAGGGTCTCTCCTAGACGATCCAGTGTTAGGTTCCCCACAGGACCTTTGGTGTTGGCCATAGTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTCCTTTCTTTTCTTGCTGAACTCCAATGTTTATAAGGCCTGTATCCCTGTAGCGTATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTCCTTTCTTGTGGGGAAAAGTCCCTGGAACTGAAGCTGAGATTGTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTAAAGAACTTAAGGCATCCTCTGAAAAACTAGCCCAGGTTCGTGTTCATTATGAATCTTTTTTAACCTTTCTGTACTTGTTTCTCTTGCATCTCCTATGTGCTCTAACTAGACATGACAGAAGAGATTTAACTAATGTATAAATTATATGAAATTCTATTTTTTAAGTCAAAAATAATCAACTATCAGAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGATGATGTGAACATTGTTCACGTCTCATAGGGCTGAAAGTCAATGGGCAAGTCTTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCCTAGGTTCACCCATCATGCCCTCAGCTTTCCTTAACTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCTATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAGCTTGCACAGAACCTTAGCACTATGCGATTCATTGAAAGAGACTTTTTTTCTCTTTTTGAGGTAGGGCCTGGCTCTGTCACCCAGTCTATAGCTCAGTGGTGTGATCGAGGCTCACTGCAACCTCTGCCACCCATGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGACATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGTTTCAAGCAATCCTCCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAAGAAATGACTCTTAATAAAAAAATTTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATTAAAATGAAAGGGTGCTGGAGCTTGAGCCCCTTCTTGCTTTCCAGGATCCCTACAGTGATCAGTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTCAGCAGCTACTCTTTACTGGGCTCCATTCTAGGTTCAAATCATTCTATTTGATTAAGTTAGAGAGCGTCCCTACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGATAATGTTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACCTACTTTAGTTCATATGTTAGGGAGCTCCACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCGTGAAGATCTAAGGGATGAAAGTTCCAGGGAGACCAAATGGCGGGAAAGCCCTGGTGTGGGAAACTATGTGGAGGGAGAGAAAGAAAGCTAGAGGGGCTGAAGTATAGAAAGCAACGAAATGGAGAGGCAGAAGATGAGGTAGGACACAGAGAGGAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAAAATTTGAATTTTATTATTTATTTACCTATTTATTTGTTTATTTATTTTATTTTATTTTGTGATGGAGTCTCTCTCTGTTGCCCAGGCTAGATGGCAGTGGCACAATCTCGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAGCAATTCTCCTGCCTCAGCCTCCCAATTAGCTGGGGCCACAGGTGCATGCCACCACACCTGGCTAATTTTTTGTATTTTTAGTAGGGATAGGATTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACTGCACCCGGCCTATATTCAATTTTTAAAACTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGGTTCACCAGTGGTTAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGGGAAAACAAGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAGAAGATAAAGTGGACAGACTCGGCATGTATTTTTGCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGAGCTTATTCCTAAGGATTTTCTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATGATTTGAAGTGGGTGGTTTGAAATAAAAGTTTTGTTTAAATATGAGATGATTGACTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGGCTAGGGCTTCAGGTATTTATGTTGGAGGCATCAATACGTGTAGTGTGTTAAATTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAACAAGAAGAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAACCAAGAGAACATCATGTATGTAAGTCAAGGAAAACAGATTTTTTTCAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAATATGACAGAGAAGCAAGTGCTGGGATTGGTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGGAGAGCAAATAAAAGTGAGGACGAGGTTAAGGTTGACTGTTCTGTGTAGAGAGCTTCAGGGAAGGATTGTTCTCTGGGTTCAGGGAGCCAGCTGAACCTAAAGGAAAAGGCTAAAGAAGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATGCTCAGCCATTATTAGCAAGGAAATACAAGAGAGCCCCTGTGTGCAGTGGTGACTACTCATGCAAAATGTCACACAGCCAATATTTAACACAGCCAGGATTTCACACAGCCAATATTTATTAGTGACATAGAATATATTTGTTATTGCTCTAGGTCATGAGAATGGAGTGACAAAATGAATCCGGTCGCCATCAGTATATGCCACATAACATTTTGCAGTGACTGTGTGCCAGGCCTATGAATTTCAGTATTCAATTTCAATAATGATCCTGTTGTATCTGTGGTATTTAAAAACATATACATCTCTGTAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTTAGTGCTGGAAATTGATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTTCAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATACTAATTGCTTTAGAATTAAGGAATACAAATAATGTGAGCTGCAGTTATAGGGATCAAAAAAGTTACAATGGGAATGTATTTGAGTGTTTATTATGTGATCAGTGCTAAGAAGTGTCATCATTTAATTTTACACTTAACAGTAATCCTGTGAGGATTATGCTATTATTAAATGCACTTGATACATTACAAAAAGGCTTATGGTTGATATAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGACCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGTACCAGGTTGTTTCTTAGAGTGGGAGCAGAGATGCAAGGGCTGCTAGTTCCGATGTGTAGGAGAAACTATCATTCATTTTGCATTTATCATTTTAAACGTTCTATATGTCTATCCTGGGCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTGAAGTTTGACTGGCAAAATTGGGCTAGAGTTACGAAATAAAATACAGGTTCCTAGTGAAATCTGAATTTCAGATACACAACCATAATTTATTGGAAATCCAAATTTAACTGGATATCTTCTATTATTCTCTGTTCTAGAACTCCACACTTCTAACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCGCTTTCACTGCTCTTTAAGCTCCCCCAGTAGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTACAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGGTGAAGGCGGTGGGTGCTGTTGAAGGAACCGGTAAAGCCTGTGGGATGAGAGAAGGAGCAGAGAGTGTTTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGGACGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGGGGGTCCCAAAAGCCTGGGGATGAGAAGGGGTTTTCCCGCATGGTCCCCCAGGCCCCCGTTCGCCTCAGGAAGACGGAGGATGAGCTCCTGGGCTGCTGGTGGTGGGCGTTGCGGGTGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCTGCCCAGGGAGCCCCGGATGGTGGCGTCGCTGTCAGTGTCTTCTCAGGAGGCTGCCCGTGTGACCGGATCCTTCGTGTCCCCACAGCACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTAAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAAGAAAGAGAGAGAGCGCGCCATCTGTGAGCATTTAGAATCCTCTCAATCCCCAGCAAGCAGTTCTGAGAGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGGGTATGTCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTGATTCTTATCCTTGGAGACCTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGAGACCTTGATTGTCTCGGGTCCTTAGAGATGCGGGGAAGGGAAATGTAAGGGGTGTGTGATTGGGGTGAAGGTTTAGGGGAGGACTGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCGGTTCCAGACTCTCCCTGGCATACACCCTTCGTGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGAGGAACTGAGTAGACCTCTGAGGCACTCCTGAAGCTTCTTTATATCTAAATTTCTTGCTAGTTTTTTGGGTTTTTTTAGTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAACACAGTTACTATTTTATTATAATGCTAGTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTATGACACAACTCACCTCACTTTCCCCTTTGTTGACCTTTATTATGACATTCACCAAAAGTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTTATTTGTAATTGCTTTGAATTATTTTGACCTATTTATTGGCGGATTATAATTATTGCTCTAAGAATTCCCTATTGTATTTGGTAGGCAATGGACAATGATCTATGGTCTGATATCTTGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTGCTATAAGATTATTAACACTTTATTGATATTTGATCCAGCATTTGCTTCAGTTTGTGGTTTGTATGTGGATTTTGAAAGTTCTTTTCCATGTTAAGAATTTGAACTTTTTATTTAATAAAATATATTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGCATTGTTGTCTCCAGGTTCCTCCTTACTTCTAAAAAAAATAGTTGTATTTATTGAGAGTATGCTAGTGTTGGGGATTTTCCTGGGCATAAGCACCCCAAGTAACAAGTCCCAGACACTGCCTTAATCCAAATGTGACTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTATGCTTGTGTTACACGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACATTCCCTAATCCTTTTCTCTGCTTCGGGACTCATGCTTTTCTAGGAACGTAAAAATTTGGAGAATCATTTCTGTCTGTCCCACATTCCCAGGGGCAGAACAATTTCTGTTTTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAATTCATACTCGGTTTCCTCATGTACCCAATTCTGTCCCTTTATCTATGCATATTTCTTTAAATCATATTTTTCTGTCATGGGGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCGCCCTCACAGTGGAATGGAGTGAGAAGCTTTCTGACCTTATAAACTGAAGGCTATCTTCAGTCATTGTTTTATATATTTTACATGCATTAATCCTCATATAACCCCAAGAGGTAAATTAGTATAATTATCCTTCATTGTAGGTGACAAAGTTGAGACACAGAAGAATCAAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTCGAACCAAGCAACCTGGCTCAAATATCAGTTTTAATTACTACACTCTATACTTTCCAAGATTTGTAAACAGTTTGACAATGCATGCCAATTTAAAGCTATGAAGAAACAAACACAATTTTTCACAACACCTCTCAAATCTAATGGGTCCTCACTGTCAAGATTAAATTCCAGGCTGATGACACTGTAAGGTCACATGGCCAGCTGTGCTATAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACACACAGCCAAACCAGGAGACTTACTCTGTCTTCCTGACTCATTCTCTCTACATTTGTTTTCTCCTAGTTGAGCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGGCTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGATTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGTGTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTAATCCCTGAGTGTCAGGTTTCTCCTCTCCCACACCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGGTAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCACCTGCCAGGCAGGAGAGACTGTCCCTCTTTTGAACCTCCCCATGATTTCGCAGGTCAGGGTCACCCACTCTCCCCAGGCTCCAGGCCCTGCTTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGCTCTGAGTTATTTTTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTACCTGACATGAGGGGAATCCAATCTCAGCCCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCTTCATTGAGTCTCAGGCTTTCTGTTGATCAGATGTTGAACGCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACATCGTAACTGTTCAGCGTGATTTGAAATCCTTTTTTTCTCCTGAAATGGCTAGTTATTTTAATTCTTGTGGGGCAGGCTTCTGCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAGCATCACTAAACCTGGATTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGGAGCTGACTCTCTCCATAGGCTTTTCTGGAGGAGGAACCATGGTTTTGCTGAGAGGTTAGTTCTCAGTATACGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGCTCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTCAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTTGTTTGAGATCCTGGAGCAAATTAAGGAAGAGCCACTAAGGCTAATGGAATTACACTGGATCCTGTGACAGACACTTCAGGCTTCATGGGTCACATGGTCTGTCTCTGCTCCTTTCTGCCCCTTTCTGCCCTGGTTGGTGCGGGTTGTGGTGTTAGAGAAATCTCAGGTGGGAGATCTGCGGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCAATAACTTTTAAGTGTATTTCTTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTAATCCTCTTTTAGAAACAGACACAGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTGGGGTGGGGACTTGGAGAACTTTTCTTACAAGAGGTTTCTAAATGCCCCAATCAGTGCTTTGTAAAAACACACCAATAGGTTCTCTGTGGCTAGCTGGATGTTTGTAAAATGGACCAATTTGCACTCTGTAAAATGGACCAATCAGCATTCTTTAAAATGGACCAATCAGCACTTTTAAAATGGGCCAATCAGCACTCTTTAAAATGGACCAATCAGCACTCTTTAAAATGGACCAATCTGCAGGACATGGGCAGGGACAAATACAGGAATAAAAGCTGGCCACCCCAACCAGCAGTGGCAACCCACTCAGGTCCTCTTCCCTGCTGTGGAAGTTTTGTTCTTTTGATCTTCACAATAAATCTTGCTGCTGCCTACTCTTTGGGTCCCTGCCGCCTTTAAGAGCTGTAACACTCACTGTGAAGGTCTGCTTGAAGTCAGCGAGACCACAAACCCACTGGAAGGAACAAACTGCGGACACACTAGAATGATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAATTCCTCCTCTCCACTTAGAAAAGGGCTGTGCTCTGCGGGACTATTGGCTCAGGGGAGACTCAGGAACTTGTTTTTCTGCTTCCTGCAGTGCTCTCATCTGAGCCCTTGAAAGAGGAGAAAAGAAACTGTTAGTAGAGCCAGGTTGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAATGCAGATGACCACATTCAAGGAAGAACCTTCTGTCCCAGCTTTGCAGAATGAAAAGCTTTCCTGCTTGGCAGTTATTCTTCCACAAGAGAGGGCTTTCTCAGGACCTGGTTGCTACTGGTTCGGCAACTGCAGAAAATGTCCTCCCTTGTGGCTTCCTCAGCTCCTGCCCTTGGCCTGAAGTCCCAGCATTGATGACAGCGCCTCATCTTCAACTTTTGTGCTCCCCTTTGCCTAAACCGTATGGCCTCCCGTGCATCTGTACTCACCCTGTACGACAAACACATTACATTATTAAATGTTTCTCAAAGATGGAGTTAAA,11232
IMGT_DRB1*03:01:01:01,GCATCCACAGAATCACATTTTCCAGTATTGAAAGACCTGAAAGATCACGGTGCCTTCATTTCAACTGTGAGACATGATGTAATTTTCACAAATCTACAACAGTAAGATATAGTGCAACAGGACCAGATTAAGGTCTCCTGGTTTGCAACCATGTCCCCTCCATCTCCTTTACTCCTGAACACACTCACTCCTGCAAACAGTTCTCTTGTCAAGTGGGAAATGAATGCTCTTACAAGGCTCAAACTTGTGAACACATCACTGACCAGCACAGAGCTAAAATAATTGGGGCTAAAAATACCGCCCCAATTAAAGTGTTTTACATGCAACTGGTTCAAACCTTTCAAGTACTAAAAACAATCCTGTAAAGAAGGAAATTCTGTTTCAGAAGAGGACCTTCATACAGCATCTCTGACCAGCAACTGATGATGCTATTGAACTCAGATGCTGATTCGTTCTCCAACACTAGATTACCCAATCCAGGAGCAAGGAAATCAGTAACTTACTCCCTATAACTTGGAATGTGGGTGGAGGGGTTCATAGTTCTCCCTGAGTGAGACTTGCCTGCTGCTCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAGGCTCCCTGGAGGCTCCTGCATGGCAGTTCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCAGACGTAAGTGCACATTGTGGGTGCTGAGCTACTATGGGGTGGGGAAAATAGGGAGTTTTGTTAACATTGTGCCCAGGCCATGTACCTTAAGAAATTGTGACGTATTTTTCAGAGATTGCCCATCTTTATCATATGGATCCCAAATAATTTCCCCACCACAAAAGGAGCTTGGCTACTTGCCCACTCCATGAGACTTGTGTAAGGGGCCTCCATACAGGTCATTTCTACTCAAATCTCCACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATGCTAAAATAAATTCCCCATACAGCACTTGCCTTTATTATGTTGACTTATGTCAGACAAAGGAGGTTTTTTCTGAAAATTTTGTGGAAGTCAAGGGAATTTAAAGGGTCTCTCCTAAACGATCCTGGGTTATGTCACCCACAGGACCTTTGGTGTTGGCCCCTCTTCCTCATATGTGAGGATGGACCCAGTGGCCTCCCCATTATCTACTTTCTTTTCTTTCTGAACTCCAATGTTTATAAAGCCTGTACCCCTGTAGTGTATGTAGGTTGTCTGACAGAAGTTATACTTAGTGCTCTTTCTTTCTTGTGGGGAAAAATCCCTGGAACTGAAGCTGAGATCTTTAGTACTTGGAGTCACCTTACAGATACAGAGCATTTATGAGGTATTCTTTGGTGCCTAAAGAACTTAAGGCATCCTCTGAAAACCTGGCCCAGGTTAGTGTTTATTATGAATCTCTTTTAACCTTTCTATACTTGTTTCTCCTACATCTCCTAAGTGCTCCAACTAGACATGACAGAAGAGATTTAACTAACGTAGTATGAGTTATATAAAATTCTATTTTTGTAAGTCAAAAATAATCAAATATCAAAAATTTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGACAATGTGGACATTGTTCACATCTCATAGGGCTGAAAGTCAATGGGCAAGTCCTGGGAACTCATTGTCTTACTGGGGTCTTGTCCTAAATTTCATAGGTTCACCCATCATGCCCTCAACTTTCCTTAATTAGCCATGTCTGCTTACCTCTTCCTCCAGTTTCTCTCTTATTTTTCCCCAGCTATGTTGTTATCATTTCCAGAAATCTCTAAAGCTTGCACAGATCCTTAGCACTATGAGATCCATTGAAAGAGATAGTTTTTTTCTTTTTGAGATAGGGCCTGGCTCTGTCACCCAGGCTGTAGCTCAGTGGTGCGATCGAGGCTCACTGCAACCTCTGCCTCCCACGCTCAAGTGATCCTCCCTCCTCAGGCTCCAGAGTAGCTGGGAATACAGGCAGGCAACCACGCCCAGCTAATTTTTGTAATTTTGGTAGAAATGAGATTTTGCCATGTTGCCCAGGCTGGTCTTAAACTGCTGGACTCAAGCAATCCACCTGCCTTGGCCTCCCAACATGCTAGGATTATAGATGTGAGCCACTGTGCCCAGGCAAAAAGAGATGACTCTCAATAAAAAAAAGTCCTTTTTCTTAAATCACTGTTTCTTTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGAAGGGGTCTAGTGTGACATCAGAATGAAAGAGTGCTGGAGTTTGAGCCCCTTCTTGCTTTCCAAGATCCCCACAGTTATCACTTCCCATACCCTGGTTTATTCATGTAAACCACACTTATTTTTCTTAGCAGCTACTGTGTACTCGGCTCCATTCTAGGTTCAGATCATTCTATTTGATTAAGACAGAGAGGGTCCCGACTCTCATGGAAGTTACACAACAATAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAATAAAATATTAGAGAGTCATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAGAAGAGAGACATGGATTCACCTACTTTAGTTCATATGTTTAGGGAGCTCTACCTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAGTAGACAGTCATGAAGATCTAATGGACGAAAGCTCCAGAGAGACCAAATGGGGGGAAAGCCCTGGTGTGGGAAATTATGTGGAGAGAGAGAAAGACGGCTAGAGGGGCTGATGTATAGAAAGTAAGGAAATGGAGAGGCAGAAGATGAGGTAGGACACAGAGAGAAAGTCAGGAGCCTCATCATTATAGGCTCTGATGTCCACGGTAAAAAATTTGAATTTTATTTTATTTATTTATTTATTTATTTATTTATTTTATTTATTTATTTATTTATTTTTTTAAGATGGAGTCTCGCTTTGTTGCCCAGGCTGGAGTGCAGTGGCGTGATCTCAGCTCACTGCAAGCTCCACCTCCTGGGTTCATGCCATTCTCCTGCCTCAGCCTCCCTAGTAGCTGGGACTACAGGCACCTGCCACCACGCCTGGCTAATTTTTAGTATTTTTAGTAGTGAGGGCATTTTGTCATGTTAACCAGGGTGGTCTCCATCTCCTGACCTCGTGATCCACCTGCCTCAGCCTCCCAAAGTGCTGAGACTACAGGTGTGAGCCACCACGCCTGGCCTATTTTTTTTTTTTTTGAGACGGAGTTTCGTTCTTGTTGCCCAGGCTGGAGTGCAATGGTGCAATCTCGGCTCACTGCAACCTTCGCCTCCCTGGTTCAAGTGATTCTCCTGCCTCAGCTTCCCAAGTATCTGAGATTACAGGCACCCACCACCATACCTGGCTAATTTTTATTTTTTTGTATTTTTAGTAGACATCGGGTATCACCATGTTGACCATGCTGGTCTCGAACTCCTGACCTCAGATAATCTGTCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCGCGGCCGGACTGAATTTTATTTAAATAGATACGAGAAGCTACTGTATGGTTACAAGGAGAGTCAATTTATATTCAATTTATTTTTATTTTATTTTATTTTTTTGGTGATGGGGTCTTGCTCTGTTGCCCAGGCTAGATTGTAGTGGCACAATCTCGGCTCACTGCAACCTCTGCCTTCTGGGTTCAGGCGATTCTCCTGACTCAGCCTCCAGAGTAGCTGGGACCACAGGTACATGCCACCACACCTGGCTAAGTTTTTGTATTTTTTAGTAGAGACAGGATTTCACCATGTTAGCCAGGATGGTCTCGATCTCCTGATCTCGTGATCCACCCACCTTGGCCTCCCAAAGTGCAGGGATTACAGGCCTGAGCCACTGCGCCCGGCCTATATTCAATTTTTAAAACTAATTCTAGCTACTCTGTGGGGATTGGAATGTTGGGGTTCACAAGTGGTCAGGAAGACTATTTAGGAGCACAGCAGGGAATTCTCCAGCGAAAACAGGCTTGTGGCTTCATGGAGTGCATTAGTGATAAAGACGGTGAAAAAGATAAAGTGGACAGACTTGGCATGTATTTTTCCTTAGCTTGTTAATGAATTACTGTAAAGGGGGTAGAACAATCAAGCTTATTCCTAAGGATTTTGTTTTGACAAATAAGTGGGTGGTAGTGTTGTTTATTGAGATAGGAAAAACTATGGGAGGAAATTATTTGAAGTGGGTGGTTGGAAATAAAAGTTTTGTTTAAATTTGAGATGATTTATTGACATTTATGTGGAGCAATCAGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGGCTAGGGCTTCAAGTATTTATGTTGGCGGCATCAATACGTGTAGTGTGTTAAATTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAAAAAGAACAGGGCACAGGCCAGCAAAGGGGGCTGAGAAAGAGCCCAGGGATGTTGGAGAAAAATCAAGAGAACATGATGCGTGTACGTCAAGGAAAATAGATTTTTTTCAAGGAGAAGGGAGAGGCCAATTGTGGTGAGTACCACTAAGCGGAGGGGGAAGTGAGAACGTGACAGAGAAGCAAGTGCTGGGTTTGGTGGAGTTGATATTTGCAGTCAGTGGAGTATCCAGGGAGGAAACTGGATTGGACAATTTGAAGAGCGAGTAGAAGTGAGGATGAGGTTAAGATTGACTGTTTTGAGTAGAGAGCTTCAGGGAAGGACTGCACTCTGGGTTCAGGGAGCCAGCTGGATCAAAAGGAAAAGGCTAAAGAGGCTGAAGAGAAGCAGGAGGACCTGTGAACCAGAGATGCTCAGTCATTATTAGCGAGGAAATACTAGAAAGCCCCTGTGTGCAGTGATGACTACTCATGCAGAAGGTCACACAGCCAATATTTAACACAGCCAGTATTTCACACAGCTAATATTTATTAGTGACATAGAATATACCAGTTATTACTCTAGGTCATGAGAATGGAGTGATAAATAAAATGAATCCGGTCGCCATCAGTATATGCCATGTAACATTTTGCAGTGACTGTGTACCAGGCCTGTGAATTTCAGTATGCAATTTCAATAATGATCCTGCTGTATCTGTGGTGTTTAAAAACATATACATCTCTGGAATCTAAAATTGAGAGGTTATAAGTAAAACCCAGTATTACAAATTGAGTGCTGGAAATCAGATTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGCTGGACTGCATTATTAGACATAAACCAGTGTTTCTGCCCTATGTTTTCAGAATGACAATTCTTTATGAAACTCATAGAAGAACAGAAGACAACTGCAAAATCATGATGAAGATAGTAATTGCTTTAGAATTAAGGAATACAAAAAATAATGTGAGCTGTAGTTATAGGGATCATAAAAGTTTAAATGGGAATGTATTTGAGTATGTGATCAGTGCTAAGAAGAGTCATCATTTAATTTTACACTTAACAGTAATCTCGTGAGGATTACGCTATTATTAAATGCATTTGATAGATTACAAAAAGGCTTATGGTTGGTAAAAATTGACCCAAGTAGAAGAGATCATGTTTTTATTCAGGTTTTCTGATTCTAGAGTTTGAGAGTTTGTCCATCATTAGTGAGTAGTGACTATATTGTGTCTGAATTATTGACAGAATTTCTGATATTCATATGTACCAGGTTGTTTCTTAGAGTGGGGATAGAGATGCAAGGGCTGCTAGTTCCGATGTATTGGGGAAACTTTCATTCATTTTGCATTTATCATTTTAAAAGTTCTGTATGTCTATAATGGTCATGTGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTTCTAAGGTTTGACTAGCAAGTTGGGCTAGAGTTACCAAATAAAATACATGTTCCTAGTGAAATCTGAATTTCAGATACAAAACCATAATTTATTGAAAATCCAAATTTAACTGGGCATCCTCTGGTTTTATTTGCCACATCTGTCAACCCTAAGTGTGACACATGGACATGGATTACAGTGCTAACCATGCAAGCCACGGTGACAGCAACTTCACACATGTTTATTTTTAACTTTCTCTGTAAGAAAGTGCTTAGATAATTTAGGGATAAAAAGATAGACATTGCTTGATCCAGGGTGCACACCTCTCTGCCACCATTTCTAAAGGGCAAAGGGAGATTTCTGCAGGTCTTGCTCACAGTCTGGGGAGCTGCTCATTTTTGTAAAGTGTCTGTATGAGAATGTCATTTTCTTGGTTTCCTCCTTTCCGAGGGGACTTGACTACAAAACCAAGAGTTCTGCCTCTGGCCAAGGCTGGTAATTTGATGCCTGCTAGTATTGTTGGGAGTGGGAGACTGAAAGAAATGAGTTAGTTGGGGCATTTAACGGGAATAAAATAGCTGTGGTTGTGACTCATTACTACAGATAATTAGTGGACCAGTGGCAGAGAAATTAAGAAAAAAGATGATGTGAATGATAAATGATATGATTAGTGACTGCTTGGTAAGGCAAGGAAATCATTAAATCTTGGTTCTCATCAAGTTCATTTTCTGGAAAGATAGCACTGTATTGGGAGCAGAATTCTACAAAACCTTCTTTTTACATAGGACCAAGATTTTCAACAAATATTTTTCAATGCAATTCTCAGCTGCTCCATAACTAATAGTAGCTTGTTCAACACAGATTTTTTCAGATGATTCACACCTGTGGTACTTACCCAGGGATGGTTCACCACCCCTCCCTTCCCTCTCATCATCGTTGGGGAACGGTGACAATGTTTGGAACAATTTTTGGTTGTCACAAACAGGGGTTTCTTCTGATATTCAATGAGTAGAAGCCAGGGACACTGCTAGAGAACCCACAATGTTCAGAACAGCCTCTGCCATCAACAAGGAATTATCTGGTCCAAAATGTCAATAGTGCTGAGGCTAAGAGCACTGGTTCACACTGTGCTCTTTCTGAAAATTCTAGACTCACATCTGTTATACACTCACCACACAGTTTAGTCTTTTATTTTTGCTTGTTTCATTATAAACAATTAGACAGTTGCATAAATTCAACCACTTTCTTGTTGAATCCATTTAGTCAATGCAAGCTCAATATTTTCATATTTATTTTTTGCCTTATGCAATATTTTTCAACATTTTCATGAGTTGTCGGTCATCACTATCTCTATTAACTTTCAACAACTTGCCCTTGTAAGTCACAAATAGTGATGCTGCTGAAATTATTTCTCACTAACATGCCTCAGATTTCTGTAGTGATTCTACATTTGATATTATTCACAATGTAAAATGCTTCTATTTATTCATTTCGCTTTTACCCAAGGATTATTTTTAAGTTATTTTTGTCATTTTCACACTTCAAACATAAAGACAAAAACATCAAAAATATAGTGTTTTACATATGTGCATATTTTCACACATATATGTATGTATATTTATATGTATTGAAAGTACAGAAGCACATGTCACCAATAAGAGCTCTGAGACACCTTCGACCACTTACCCTTATCAGATGAGTTGTGGAAACAAGTTTTTTTAACTGAATTTCTGAGCTTTGTGGATTTAGAAATGCAAAGGAAGGTTTGTGGACATTCACAGGGATCATGATTTTATTCTCCTTAAAACTCTTCTGTACTTTCCAATTGTCCTTAGTATAAATCCAAAATCCTAACACCACCCAAGAGGCTTTTCAATACCTGGCTCCTGTGATTTCTCCGGGCTAATCTGTTACCCTCCTTCCCCTCAGCCTCTCTGCTTTAGTGAACTTTCTCCTAGTTTTTTGAAGAAGTTCATCAATTCAAGCTTTTGTACATGGGATTTCCTAAACCTGAAATGTGCCTCCCGTTTTGTCCAAACAGACACGGGCTCCACTCTGCCCCCTGGCTCACACCTGCTTAACCTGTCAAGTCACATCTGAACCGTCACTATTAAGAGGGTCCTTCTCTGGCACCCTAATGTAATTGAGATCATCCTATTATTCTCTGTTCTAGAACTCCACACTTCCGACATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTGTTGGGCCATCACTTTCACTGCTCTTTAAGCTCCCCCAGCGGAGTGGAGAGGTCTGTTTTCCCTCGTTTGGATTCCTAGAGGCAGCGCAGACCAGGCACAAGGTCAGCACTAAGGAAGGGTTCAGAGGATGAACGCGGTGGGTGCTGTTTAAGGAACCGGTAAACATGTGGGATGAGAGAAGGAGCAGAGTGTCTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGCGCGGGCTGCGGTGCTGGGCGGATCCTCCTCCAGCTCCTGCTTGGAGGTCTCCAGAACAGGCTGGAGGTAGGGAGGGGGGTCCCAAAAGCCTGGGGATCAGACGTGGTTTTCCCGCCTGGTCCCCCAGGCCCCCTTTCGCCTCAGGAAGACAGAGGAGGAGCCCCTGGGCTGCTGGTGGTGGGCGTTGCGGCGGGGGCCGGTTAAGGTTCCCAGTGCCCGCACCCCACCCAGGGAGCCCCGGATGGCGGCGTCACTGTCAGTGTCTTCTCAGGAGGCCGCCTGTGTGACTGGATCGTTCGTGTCCCCACAGCACGTTTCTTGGAGTACTCTACGTCTGAGTGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTACCTGGACAGATACTTCCATAACCAGGAGGAGAACGTGCGCTTCGACAGCGACGTGGGGGAGTTCCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTGGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTGAGCTGGGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGAGAGAGAGACAGAGAGACAGAGAGAGAGAGCGCCATCTGTGAGCATTTAGAATCCTCTCTATCCTGAGCAAGGAGTTCTGAGGGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCTGTGTCTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTGCTTCTTATTCTTGGAGGACTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCGGGCGGCTGGAGAGAGAGGTGACCTTGATTGTCTCGGGTCCTTAGAGATGCAGGGAAGGGAAATGTAAGGGGTGTGTGGTTGGGGTGAAGGTTTAGGGGAGGAGAGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAGGAGGCCAGTTCCAGACTGTCCCTGGCACACACCCTTCATGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAATTTCTAGGGGAATTGAGGAGACCTCTGAGGCATCTCTGAAGCTTCTTTAGGTCTAAATTTCTTGCTAGTTTTTTGTTTTTTATTGTGTATATTTTTACATAGTAGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTTAACACAGTTACTATTTTATTATAATGCTAATAGTTTTCTAGTAGTTACATATTATTCTTTTATATATAATAGTTGTGACACAACTTACCTCACTTTCCCCTTTGTTGACCTTTATTATGACATTCACCAAAATTTGAAAATGTATGTTTCTGGTTAATTTTTAATTTATATTTTTTTCATTTATAATTCTTTTGAATTATTTTGACCTATTTATTGGCCAGTTTTAATAACTGCTGTAAGAATTCCCTATTGTATTTGGTAGGGAATGGACAATGATCTACTGCCTAATATCTCGAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTCTTTGTACTGTGAGATTATTAACACTTTATTGATATTTGATTCAGCATTTGCTCCAGTTTGTGGTTTGTATGTTGATTTTGAAAATTCTTTTCCATGTTAAGAATTTGAACATTTTTATATAATAAAATATGTTGCAAAATTTTTATTAATGATTTACAATCCATCTTAAATCTGCCATTTTGTGGTATTGTTGTCTCCAGGTTTCTCCTTACTTCTAAAAAAAATTGCATTTATTGAGAGTCTGCTAGTGTTAGGGATTTTCCTGGGCATAAGCACCCCAAGTGACGAGTCCCAGACACTGCCTTAATCCAAATGTGATTCTGGAAAGAAAAATCATTTTACAATGATAGGCCTAATAATAATTAAGCTTGTGTTGCATGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACGTTCCTTAATCCTTCTCTCTGCTTTATGACTCATGCTTTTCTGGGAAAGTAAAAATTTGGAGAATCATTTCTGTCTGTCCCACCTTCCCAGGGGCAGAACCATTTCTGTGGTGTTCTAAGGTGTGAGTGCATGGCGGTAGTATTCCTAAAAATTCATATTCGGTTTCGTCATGTACCCAACTCTGTCCCGTTATCTATCAACATTGTTTTAAATCATATATTTCTGTCAAGGTGTACAAGGATGATAAATAGGTGCCAAGTGGAGCACCCAAGTGTGATGAGCCCCCTCACAGTGGAATGGAGTGTGAAGCTTTATGACCTCATAAATTGAAGGTTATCTTCAGGCATTGTTTTATATATTTTACATGCATTAATCCTCATATAATCCCAAGAGGTAAATTAGTATAATTATCCTTCATTATAGGTGACAAAGTTGAGACACAGAAGAATCAAACTCTTAAGGCAGACCTTGGATTTGAACCAGGCAACCTGGCTCAGATATCAGTTTTAATTACTACACTCTGTACTTTCAAAGATTTGTAAACACTTTGACAATGCATGACAATTTCAAGCTATGAAGAAACAAACACAATTTTTCACAATATCTCTCAAATCTAATAGGTCCTCACTATCAAGATTAAGTTCCAGGCTGATGACACTGTAAGGCCACATGGCCAGCTGTGCTGGAGGCCTGGTCAAGGTCAGAGCCTGGGTTTGCAGAGAAGCAGACAAACAGCCAAACAAGGAGACTTACTCTGTCTTCATGACTCATTCCCTCTACCTTTTTTCTCCTAGTCCATCCTAAGGTGACTGTGTATCCTTCAAAGACCCAGCCCCTGCAGCACCATAACCTCCTGGTCTGTTCTGTGAGTGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAATGGCCAGGAAGAGAAGACTGGGGTGGTGTCCACAGGCCTGATCCACAATGGAGACTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGCGTGACAAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGAAGGGGACTGTGCTCATCCCTGAGTGTCAGGTTTCTCCTCTCCGACATCCTATTTTCATTTGCTCCATGTTCTCATCTCCATCAGCACAGGTCACTGGGGGTAGCCCTGTAGGTGTTTCTAGAAACACCTGTACCTCCTGGAGAAGCAGTCTCGCCTGCCAGGCAGGAGAGGCTGTCCCTCTTTTGAACCTCCCCATGATGTCACAGGTCAGGGTCACCCACCCTCCCCGGGCTCCAGGCACTGCCTCTGGGTCTGAGACTGAGTTTCTGGTGCTGTTGATCTGAGTTATTTGTTGTGATCTGGGAAGAGGAGAAGTGTAGGGGCCTTCCTGACATGAGGGGAGTCCAATCTCAGCTCTGCCTTTTATTAGCTCTGTCACTCTAGACAAACTACTTAGCCTCATTGAGTCTCAGGCTTTCTGTGGATCAGATGTTGAACTCTTGCCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGTCTGAACCTTGTAACTGTTCAGTGTGATTTGAAATCCTTTTTTTCTCCAGAAATGGCTAGTTATTTTAGTTCTTGTGGGTCAGACTTCTTCCCCATTTTCAAAGCTCTGAATCTTAGAGTCTCAATTAAAGAGGTTCAATTTGGAATAAACACTAAACCTGGCTTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTTGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGTAGCTGGCTCTCTCCATAGGCTTTTCTGGAGGAGGAACTATGGCTTTGCTGAGGTTAGTTCTCAGTATATGAGTGGCCCTGAATAAAGCCTTTCTTTCCCCAAACGGCTCTAATGTCCTGCTAATCCAGAAATCATCAGTGCATGGTTACTATGTGAAAGCATAATAGCTTGTGGCCTGCAGAGACAAGAGGAAGGTTAACAAGTAGGGGTCCTTTGGTTTGAGATCTTGGAGCAGATTAAGGAAGAGCCACTAAGACTAATGGAATTACACTGGATCCTGTGACAGACACTTTACCCTTCATGGGTCACATGGTCTGTTTCTGCTCCTCTCTGCCCTGGCTGGTGTGGGTTGTAGTGACAGAGAACTCTCCGGTGGGAGATCTGGGGCTGGGACATTGTGTTGGAAGACAGATTTGCTTCCATAAATTTTAAGTGTATATATTTTCCTCTTTTTCCCAGGACACTCTGGACTTCAGCCAAGAGGTAATACCTTTTAATCCTCTTTTAGAAACAGATACGGTTTCCCTAGTGAGAGGTGAAGCCAGCTGGACTTCTGGGTCGGGTAGGGACTTGCAGAACTTTCCTGTCTTAGGAGAGGTTTCTAAATGCACCAATCAGTGCTCTGTAAAAACACACCAATTGGCACTCTGTGGCTAGATAGATGTTTGTAAAATGGACTAATCAGCACTCTGTAAAATGGAGCAATCCACACTCTGTAAAATGGACCAATCAATGCTCTTTAAAATGGACCAATCAGCAGGACATGGGCGGGGACAAATAAGGGAATACAAGCTGGCCACCCCAGCCAGCAGCAGCAACCCGCTCAGGTCGCCTTCCATGCTGTGGAAGCTTTGTTCTTTTGCTCTTCACAATAAATCTTGCTGTTGCTCACTCTTCGGGTCTGTGCCACCTTTAAGAGCTGTAACACTCACTGTGAAGATTCGCGGCTTCATTCTTGAAGTCAGCGAAACCACGAACCCACCGGAAGGAACAAACTCTGGACACACTAGAATTGATGGTAGAGGTGATAAGGCATGAGACAGAAATAATAGGAAAGACTTTGGATCCAAATTTCTGATCAGGCAATTTACACCAAAACTCCTCCTCTCCACTTAGAAAAGGCCTGTGCTCTGTGGGACTATTGGCTCTGGGAGACTCAGGAACTTGTTTTTCTTCTTCCTGCAGTGCTCTCATCTGAGTCCCTGAAAGAGAGGAAAAGAAACTGTTAGTAGAGTCAGGTTGAAAACAACACTCTCCTCTGTCTTTTGCAGGATTCCTGAGCTGAAGTGCAGATGACACATTCAAAGAAGAACTTTCTGCCCCAGCTTTGCAGGATGAAAAGCTTTCCCTCCTGGCTGTTATTCTTCCACAAGAGAGGGCTTTCTCAGGACCTGGTTGCTACTGGTTCAGCAACTGCAGAAAATGTCCTCCCTTGTGGCTTCCTCAGCTCCTGTTCTTGGCCTGAAGCCCCACAGCTTTGATGGCAGTGCCTCATCTTCAACTTTTGTGCTCCCCTTTGCCTAAACCCTATGGCCTCCTGTGCATCTGTACTCACCCTGTACCACAAACACATTACATTATTAAATGTTTCTCAAAGATGGAGTTAAA,13908
IMGT_DRB1*04:01:01:01,GCATCCACAGAATCACAGCATTTTCCAGTATTGAAAGACCTGAAAGATCACGGTGCCTTCATTTTAACTGTGAGACATGAAGTAATTTTCCCAAGTCTACAACAGTAAGATATGGTGCAATAAGGACCAGATTAAAAGTCTCCTGATTTGCAACCATGTTCCCTCCATCTCCTTTACTCCTAAGCACACTCACACACTCACTCCTGCAAACAATTCTCTTGTCAAGTGGGAAATGAATGCTCTTACAAGGCTCAAATTTGTGAACACATCACTGACCAGCACAGAGCTGGCTAACAATAGGGACACAATTAAGGTGTTTTACACGCAACTGGTTCAAACCTTTCAAGTACTAAATTAAAACAATCCTTTAAAGAAGGAAATTGTTTCAGAAAAGGACCTTCATACAGCATCTCTGACCAGCGACTGATGATGCTATTGTACTCAGATGCTGATTCGTTCTCCAACACTAGATTACCCAATCCACGAGCAAGGAAATCAGTAACTTCTTCCCTATAATTTGGAATGTGGGTGGAGAGGGGTCATAGTTCTCCCTGAGTGAGACTCACCTGCTCCTCTGGCCCCTGGTCCTGTCCTGTTCTCCAGCATGGTGTGTCTGAAGTTCCCTGGAGGCTCCTGCATGGCAGCTCTGACAGTGACACTGATGGTGCTGAGCTCCCCACTGGCTTTGGCTGGGGACACCCGACGTAAGTGCACATTGTGGGTGCTGACCTACTATGGGGTGGGGAAAAAAGGGAGTTGTGTTAACATTGTGCCCAGGCCATGTCCCTTAAGAAAGTGTGACATTTTCTTCAGGGATTGCCCATCTTTATCATATGGATCCCAAATTATTTCCACCACAAATGGAACTTGGCTACTTGCCCTATTCATGAGACTGTGTAAAGGGCCTTTGTACAGGCCATGTTTTACTTTAAATCTCTACCAATAAAACCTTTGCATCACATGTCCTCAGGGTCTTTAGAGGATTTAGAAATAAGGATGCTAAAATAAATTCCTCATACAGCACTTCCCTTTATCATGTTGACTTATGTCAGACGAAACAAGGTTTTGTTTTGAAAATTTTGTGGGAGTCAAAGGAATTCAAAGGGTCTCTCCTAGACGATCCTGTGTTGTCCTCCACAGGACCTGTGGTGTTGGCCCCTCTTCCTCATATGTGAGGATGTACCCAGTGGCCTCCCCATTGTTTCCTTTCTTTTTTTTCTGAACTCCAGTGTTTATAAAGCCTGTATCCCTGTAGCATATGTAGGTTCTCTGACAGAAGTTATACTTAGTGCTCTTTCTTTCTTATGGGGAAAAATCCCTGGATCTGAAACTGACATCTTTAGTACTTGGAGTCACCCTACAGGTAAAGACCATTTATGAGGTATTCATTGGTGCCTCCTCTTGATCGGTCTCTCAGACCCCTGAATACTTGGATACTCCTCAAGAACTTAAGGCATCCTCTGAAAAACTGGCCCAGATTAGTGCTTATTATTAATCTTTTATAACCTTTCTATACTTGTTTCTCCTGCATGCTCTAACTAGACATGACAGAAGAGATTCAACTAACATAGGATAAATTATATGAAATTCTATTTTTGTAAGTCAAAAATAGTCAAATACCAGAAAATTAATAATGTTCAAACTATATACTCTGTGTGGGGTTACCGAGACGACGTGGACATTGTTCACATCTAATAGGGCTGAAAGTCAATGAAGAAGTCCTGGAAACTCCTTGTCTTACTGGGGTCTTGTCCTAAATTTCATAGGTTCACCCATCATGCCCTCAGCTTTCCTTAATTAGCCATGTCTGCTTATCTCTACCTCCAGTTTCTCTCTATTTTTCCCCAGCTATGTTGTCATCATTTCCAGAAATCTCTAAAACTTGCAAAGATCCTTAGCACTATGAGATCCATTGAAAGAGATAATTTTTTTCTTTTTGAGACAGGGCTTGGTTCTGTCACCCAGGCTGTAGTGCAGTGGTGTGATCTAGGCTCACTGCAACCTCTGCTTCCCACGCTCAAGTGATCCTCCCTCCTCAGCCTCCAGAGTAGCGGAGACTACAGGCAGGCAAACATGTGCAGCTAATTTTCATGATTTTGTTAGAGATGAGATTTTGCCATGTTGCCCAGGCTGTTCTTAAACTCCTGGACTCAAGCAATCCTCCTGCCTTAGCCTCCCAATATGCTAGGATTATAGATGTGAGCCATTGTGCCCAGGCAAAAAGAGATGAACCTTAATTTAAAAATTTCCTTTTTCTTAAATCACTGTTTCTCTATCTGTGAATTCTTCTTCCAACTAGAAGGAGGAGAAAGAAGAAGTTTGCCTGTATTTCTCACCAGGAGGAGGAGTCTAGTGTGATATCAAAATGAAAGAGTGCTGGAGCTTGATCCCCTTCTTGCTTTCCAGGATCCCTGCAGTGATCAGTTCCCACACCCTGGTTTATTCATGTAAAGCACACTTATTTTTTTCAGCAGCTACTCTTTACTGGGCTCCATTCTAAGTTCAAATCATTCTATTTGAGTAAGATAGAGAGGGTCCCGACTCTCATGGAAGTTACACAAGAGTAGAGGAGACAGACACTAACCCAATAAGCATTTAACAAAGAAGAAAATGTTAGAGAGACATAGTGCACTGAAGAAAAGACATCAGGTTTGTGAAAAAGAGAGACATGGATTCACTTACTTTGGTTCATATGCTTAGGCAGCTATAACTGAGAAAGTGACATTCAGCTGAGACAACAAAATAAATAGACAGTCGTGAAGATCTAAAGGACGAAAGTTCCAGGGAGAATGAATGGGGGGGAAGCTCTGGTGTGGGAAATTATGTGGAAGGACAGAAAGAAGGCTAGAGGGACTGAACTATAGCAAGCAAGGAAATGGAGAGGCAGAAGATGAGGTAGGACACAGAGAGGAAGTCAGGAGCCTCATCATATTAGACTCTGATGGCCATGGTAAAAAAATTGAATTTTATTTTATTTTTATTTATTTTTTGAGACGGAGATTTGTTCTTGTTGCCCAGGCTGGAGTGCAATGGCGCGATCTCGACTCACTGCAACCTCTGCCTCCTGGGTTCAAGTGATTCTCCTGCCTCAGCTTCCCAAGTAGCTGGGATTACAGGTGCCTGCGACCATACTCGGCTTATTTTTTTGTATTTTTAGTAGAGACAGGGTATCACCATGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCAGATAATCTGCCTGGCTTCCCAAAGTGCTGAGATTACAGGCGTGAGCCACCATGCCCAACCTGAATTTTATTTGAATAGATATGAGAAGCTACTGTATGGTTACAAGGACAGTCAATTTATATTCGATTTTTTTTTTTTGAGACAGAGTCTTGCTCTGTTGCCCAGGCTAGATTGCAGTGGTACAATCTCAGCTCACTGCAACCTCTGCCTCCTGGGTTCCAGCAATTCTCCTGCCTCAGCCTCCCAAGTAGCTGAGACCACAGGTACATGCCACTACACCTGGCTAATTTTTTGTATTTTTAGTAGAGATGGGGTTTCACCGTGTTAGCCAGGATGGTCTTGATCTCCTGACCTCGTGATCCACTCCCCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGCCACCACGCCCGGCCTATATTCAATTATTAAAATTAATTCTAGCTACTCTGTGGGGATTGGATTGTTGGGTTTCACAAGTGGTCAGGAAGACTATTTAGGATCACAGCAGGGAATTCTCCAGGGAAAACAGGCTTGTGGCTTCATAGAGTGCATTAGTGATAAAGACAGTGAAAACGACAAAGTGGACAGACTAGGCATGTATTTTTGCTTAGCTTGTTAATGGATTACTCTAAAGGGGGTAGAAAAATCAAGCTTATTCCTAAGGATTTTGTTTTGACAAATAAGTGGATGGTGGTGTTTATTGAGATAGGAAAAACTGTGGGAGGAAATGATTTGAAGTGGGTGGTTGGAAATAAAAGTTTTGTTTAAATTTGAGATGATTTATTGACATTTATGTGGAGCAATCCGAAGGTCAATGGCATTTAAGAGACTCATGGTGAGGTGAGGCCAGGGCTTCAGGTATTTATGTTGGCGGCATCAGTACGTGTAATGTGTTAAATTCCAGGGAGTGGAAGAGGATACATAGGGAGATGGATTGTGTGGAGAAAAAAGAAGAGGGTACAGGCCAGCAAAGGGGGCTGAGACAGAGCCCAGGGATGCTGGAGAAAACCCAAGAGAACATAATGGGTGTAAGTCATGGAAAATAGATTATTTTCAAGGAGAAGGGAGAGGTCAATTGTGGTGAGTACCACTAAGAGGAGGGGGAAGTGAGAACGTGACAGAGAAGCAAGTGCTGGGTTTGCTGGAGTTGATATTTGCAGTCAATGGAGTATCCAGGGAGGAAACTGGATTGGACCATTTGAAGAGCAAGTAGAAGTGAGGACGAGGTTAAGGGTGACTATTTTAAGTAGAGAGCTTCAGGGAAGGACTGTGCTCTGGGTTCAGGGAGCCTGCTGGATCTAAAGGAAAAGGGCTGAAGAGGCTGAAGAGAAGGAGGAGGACCTGTGAACCAGAGATACTGAGTTATTATTAGCAAGGAAATACTAGAGGGTCCCTGTGTGCAGTGCTGACTGCTCATGCAAAAGGTCACACAGACAATATTTCACACAGCCAGTATTTATTAGTGACATAGAATATGCCAGTTATTACTCTAGGTCATGAGAATAGAGTGATAAATAAAATGAATCTGGTCGCCATCGGTATATGCCATGTAACATTTTGCAGTGACTGTGTACCAGGCCTATGAATTTCAGTATGCAATTTCAATAACGATCCTGTTGTATCTGTGGTGTTTAAAAACATATACATCTCTGGAATCTAAAATTGAGAGGATATAAGTAAAACCCAGTATTAGAAATTTAGTGCTGGAAATCAGACTGCAGTTTAAATCTGAGCATATAGAAAGTCCCTTTCTTCTATGTCAGCAGATGCCTTTTGTGTGAGGTTTAGGTATACTACATTATTAGACATAAACCAGTGATTCTGCCCTATGTTTTCAGAATGACAATTCTTTATGAAACTAATAGAAGAACAGAAGACAATTGCAAAATCATGATGAAGATGCTAGTGGCTTTAGAACCAAGGAATACAAAAAATAATGTGAGCTGCAGTTATAGGGATTATAAAAGTTAAAATGGGAATGCATTTGAGTGTTTATTATGTGATCAGTGCTAATAAGAGTCATCATTTAATTTTACACTTAACAATAATCCTGTGAGGATTAAGCTATTATTAAATGCATTTGATAGATTACAAAAAGGCTTACCGTTGGTAAAAATTGACCCAAGGGGAAGAGGTCACATTTTTATTCAGATTTTCTGATTCTAGAGTTTGAGAGTCTGTCCATCATTAGTGAGTAGTGACAATACTGTGTCTAAATTATCGACAGAATTTCTGATATTCATATGTACTATGTTGTTTCTTAGAGTGTGGGCAGAGATTCAGGGCTGCTAGTTCCAATGTATAGGAGAAACTTTCATTCATTGTGCATTTATCATTTTAAAAGTTCTAGGCTGGGTGCGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCAAGGCGGGCAGATCACGAGGTCAGGAGATGAAGACCATCCTGGCTAACATGGTGAAACCTCGTCTCTACTAAAAATACAAAAAATTAGCTGGGCGTGGTGGTGTGCACCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCATGAACCTGGGAGGCGGAGCTTGCAGTGAGCTGAGATCGTGCCACTGCACTCCAGCTCCACCCTGGGCAAAAGAGCGAAACTCCGTCTCAAAAAAAAAAAAAAGTTCTATATGTCTGTCATGGCATATGTTGAAGAACACAAGGAAGTATTAAATCACTCCTTCTGAGGTTTGTCTAGCAAGTTGGGCTAGGATTGCCAAATAAAATACAGGTTTCTAGTTAAATCTGAATTTCAGATACACAACTATAATTTACTGAAAATCCAAATGTAACTTGGCATCCTCTGATTTTATTTGCCAAATCTGTCAACCCTACATGAGACACATGAGCATGGATTACGGTGTTACCCATGGAAGCCACAGCCACAGTGACAGCGACTTCACACATGTTTATTTTTTAACTTTCTCTCTGTAAAGAAAGTGCTTAGATAATTTAGGGATAAAAAGATAGACATTGTTTGATCCAGGATGCACTCCTCTCTGCCATCGTTTCTAAAGGGCAAAGAGAGATTTCCACAGGTCTTACTCACAGTCTGACTCACAGTCTGGGGACCTGCTCATGCTTTGAAACTGTCTGTATGAGAATGTCATTTTCTTGGTTTCTCCCTTTCTGAGGGGACTTGACTACAAAACTGAGAGTTCTACCTCTGGCCAAGGCTGGAAATTTGATGCCTGCTAGTATTGTTGGGAATGGGAGACTGAAATAAATGAGTTAGTTGGGGCATTAAACAGGAATAAAATAGCTGTGGTTGTGATTCATTACTACAATTAGTGGACTAGTGGCAGAGAAATTAAGAAAGAAGATGATGTGAGAGATAAATTATATGATTTGGTAAGGCAAGGGAATCAGTAAATCTTGGTTCTGAACAAGTTCATTTTCTGGAAAGATAGCACTGTACTGGGACCAGAATTCTACAAAACATCCGTTTTATGTAAGACCAAGATTTTCAACAAATATTTTTCAATGCAGTTCTCAGCTGCTCCATAACTAATAGTGACTTATTCAACACAGATATTTTCAGATGGTTCACACCCATGTTTCTTACCCAGGGACAGTTCACCACCCCTCCCCTTCCCTCCCATCACTCTTGAGGAACATGTGGCAATGTTAGAATAATTTTTGGTTGTCACAACAGGGGTTTCTTCTGATATTTAATGAGCAGAAGCCAGGGACACTGCTAGAGAACCCACAATGTTCAGAATAGACTCCATCACCAACCAAGATTTATCTCGTCCAAAATGTCAATAGTGCTGAGGCTGGAAGCATTGGTTCACACTGTGCTCTTTCTGAAAAATGTAGACTCGCTTTTTTTTTTTTTTTTTTTGAGATGGGGTCTTGCTCTGTCGCCCAGACTGGAGTGCAGTGGCTCCATCTCAGCTCACTACAACCTCTGCCTCCCAGGTTCAAGCGATTCTCCTGTCTCAGCCTCCCCAGTAGCTGGGATTACAGGTGCACCCTGCCATGCCCGGCTAATTTTTTGTATTTTAGTAGAGATGGGGTTTCACCATGTTGCCCAGGCTGGTCTCGAACTCCTGAGCTCAGGCAATCCACCCGCTTTGGTCTCCCAAAGTGCTAGGATTACACGCATGAGCCACCGCGCCCGGCCTAGACTCACATCTTTTATACACTTACTGCCCAATTCAGTTCTTTATGGTTTATTTTTGCTTGTTTCATTATAAAAAACTAGACAGTTGCATAAATTCAACCACTTACTTGTTGAATCCATTTAGTCAATGCAAGCTCAACATTTTCATATTTATTTTTTGCCTTATGCAATATTGTTCAACATTTTCATAAGTTGTTGGTCAGCACTATCTCTATTAACTTTCAACAGTTTGCCCTTCTAAGTCACAAATAGTGATGCTGCTGCAATTATTTTTCACTAACATGCCTCAGATTTCTGTAGTGATTCTACATTTGATATTATTCACAATGTAAAATGCTTCTATTTATTCATTTCACTTTTACCCACAGGATTATTTTTAAGTTATTTTTGTCATTTTCACACTTCAACCAAACATAAAGACAAAAACATCAAAAATATGTACATAGTGTTATACATAGGTGTATATTTACACACATATATGCACATATGTTTATATGTATTGAAACTACAGAAGCACATGTCACCAATAAGAGCTCTGAGACACCTTTGACCACTTACCCTTATCAGATGAGATTTGCCAAATGAGTTTTGGGAACAAATTTCTTTTAACTGAATTTCTGAGCTTTGTGGATTTAGAAATGCAACTGAAAGTTTGTGGACATTTACGAGGATCATAGTTTTATTCTCCTTAAAACTCTTCAATACTTTCCCATTGTCTTTAGTAAATCCAAAATCCTAACACCACTCACGAGGCTTTTCAACACCTGGCTTCTTGTGATTTCTCCAATCTAACCTTTTACCCTCCTTCCCCTCAGCCTCTCTGCTTTAGTGAACTTTGTTCTAGTTTTTTGAAGTTCATCATCAATTCAAGCTTTTGTACATGGGATTTCCTAAACCTGAAATGTGCCTCCGGTTTTGTCCAAACAGACACACAGGCTCCACTCTGCCCCCTGGCTCACACCTGCTTAACTTGTTAAGTCACATCTGTAACTGTCACTCTTCTCTGGCACCCTAAAGGAATTGAGATCATCCTATTATTCTCTGTTCTAGAACTCCACACTTCTGAAATTTCTCATTCCTGTCTAAGCTCTTGTGTGTTTGGTTTTTGGCCATCACTTTCACTGCTCTTAAAGCTCCCCCAGCGGAGTGGAGAGGTCTGTTTTCCCGTGTTTGGATTCCTAGAGGCAGCGCAGGCCTGGCACAAGGTCATCACTAAGGAAGTGTTCACAGGATGAAAGCGGTGCGTGCTGTTTAAGGAAAGGGTAAAGCCTTTAAATGGTAAAGGGTTGAGAGAAGGAGCAAAGTGCCTTTGGGGTGGAGGCTCCCAGGAGGAGGCGGCGCGGGCTGCGGTGCTGGACGGATCCTCCTCCAGCTCCTGCCTGGAGGTCTCCAGAACAGGCTGGAGGCAGGGAGGGGGTCCCAAAAGCCTTGGGATCAGAGGTAGTTTTTCCACCTGGTCCCCCAGACCCCCGTCCGCCTCAGAAAGACAGAGGATGAGCCCCTGGGCTGCGTGTTGTCGGGGTTGCGGGTGGGGCCAGATAGTGTCTTCCCCGGAGGCCGCTTCTGTAACCGGATCGTTCTTGTCCCCCCAGCACGTTTCTTGGAGCAGGTTAAACATGAGTGTCATTTCTTCAACGGGACGGAGCGGGTGCGGTTCCTGGACAGATACTTCTATCACCAAGAGGAGTACGTGCGCTTCGACAGCGACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACCTCCTGGAGCAGAAGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAGGTGAGCGCGGCGCGGGGCGGGGCCTGAGTCCCTGTGAGCGGAGAATCTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGCGCCATCTGTGAGCATTTAGAATCCTCTCTATCCTGAGCAAGGAGTTCTGCGGGCACAGGTGTGTGTGTAGAGTGTGGATTTGTCCGTGTCTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTGCTTCTTATTCTTGGAGACTTCTGTGGGGAGGTGACAAGGGAGGTGGGTGCTGGGGGCTGGAGAGAGAGGCGACCTTGATTGTCTCGGGTCCTTAGAGATGCAAGGAAGGGAAATGTATGGGGTGTGTGGTTGGGGTGAAGGTTTAGGGGAGGAGAGCTGAGGGGTAAGGAAGGTTTGGGATAATGTGAAGAGGCCAGTTTCAGACTGTCCCTGGCACACACCCTTCATGTAATCTCTGAAATAAAAGTGTGTGCTGTTTGTTTGTAAAAGCATTAGATTAACTTCTAGGGGAATTGAGTAGACCTCTGAGGCACCTCTGAAGCTTCTTTAGGTATAAATTTCTTGCTAGTTTTTTGTTTTCTTAGTGTTATATTTTTACATAGTTGAAATGACTGTGAAACTAACTTTTTGAATTAAAGTTTGAAAACACTGTTACTATTTTATTATAATGCTAATAATTTCATAGTTACTTTTTAAATATATAATAGTTGTGACACAAATTACCTCACTTTCTTTGTTTTTTTTTTTCTTACACTTTAAGTTTTAGGGTACATGTGCACAACGTGCAGGTTTGTTACATATGTATACATGTGCCATGTTGGTGTGCTGCACCCATTAACTCGTCATTTAACATTAGGTATATCTCCTAATGCTATCCCTCCCCACCCCCCCACCCCACAACAGGCCCCAGTGTGTGATGTTCCCCTTCCTGTGTCCATGTGTTCTCACTGTTCAATTCCCACCTATGAGTGAGAACATGCGGTGTTCGGTTTTTTGTCCTTGCCATAGTTTGCTGAGAATGATGGTTTCCAGCTTCATCCATGTCCCTACAAAGGACATGAACTCATTCTTTTTTGTGGCTGCATAGTATTCCATAGTGTATATGTGCCACATTTTCTTAATCCAGTCTATCATTGTTGGACATTTGGGTTGGTTCCAAGTCTTTGCTATTGTGAATAGTGCCGCAATAAACATACATGTGCATATGTCTTTATAGCAGCATGATTTATAATCCTTGGGTTATATACCCAGTAATGGGATGGCTGGGTCAAATGGTATTTCTAGTTCTAGATCCCTGAGGAATCGCCACACTGACTTCCACAATGGTTGAACTAGTTTAGAGTCCCACCAACAGGGTAAAAGTGTTCCTATTTCTCCACATCCTCTCCAGCACCTGTTGCTTCCTGACTTTTTAATGATCGCCATTCTAACTGGTGTGAGATGGTATCTCATTGTGGTTTTGATTTGCAATTCTCTGATGGCCAGTGATGATGAGCATTTTTTCATGTGTCTTTTGGCTGCATAAATGTCTTCTTTTGAGAAGTGTCTGTTCATGTCCTTTGCCCACTTTTTGATGGGGTATTTTGTTTTTTTCTTGTAAATTTGTTTGAGTTCATTGCAGATTCTGGATATTAGCCCTTTGTCATATGAGTAGATTGCAAAAATTTTCTCCCATTCTGTAGGTTGCCTCTTCACTCTGATGGTAGTTTCTTTTGCTGTGCAGAAGCTCTTTAGTTTAATTAGATCCCATTTGTCCATTTTGGCTTTTGTTGCCATTGCTTTTGGTGTTTTAGACATGAAGTTCTTGCCCATGCCTATGTCCTGAATGGTATTGCCTAGGTTTTCTTCTAGGGTTTTTATGGTTTCAGGTCTAACATTTAAGTCTTTAATCCATCTTGAATTAATTTTTGTATAAGCAAATTACGTCACTTTCCCCATTGATGACCTTTATTATGACATTCACCAATAGTTGAAAATGTATGTTTCTGGTTAATTTTTGATTTATATTTTTTTGATTTGTAATTATTTTGAATTATTTTGACCTATTTATTGGCCAGTTGTAATTACTGCTCTGCTCTACGAATTACCTGTTGTATTTGGTAGGTAATGGACAATGATCTATTGTCTCTTATCTTTAGGGCTTAGTATTTTTCTCAGTGACTTTGTGGGTTTGTTGTACTGTAAGATTATTAACACTTTATTGATATTTGATTCAGTATTTTCTCCAGTTTGTGGTATGTATATTTTGAAAATTCTTTTCCATGTTAAGAATTTGAACATTTTTATTTAATAAAATATATTGCAAAATGTTAATTAATGATTCACAAACTAGCTCAAGTCTACCATTTTGTGGTATTGATGTCTCCAGGTTTCTCCTTCCTTCTTAAAAAAAAATGTATTTATTGAGAGTATGCTAGTGTCAGGGATTTCCCTAGGCATAAGCACTCCAAGTAATGAGTCCCAGACACTGCCTTGATCCAAATGTCATTCTGGAAAGAAAAATCATTTTACAGTGATAAGCCTAATAATAGTTATACTTGTTTTGCCTGGGAGATGCATTGATCAGCTAAATGTAAATATAAGAACTTTCAAAACTAAAATGACGTTCCTTAATCTTTCTCTCTGCTTTAGGAATCATGCTTTCTTAGGAACTTAAAGATTTGGAGAATCATTTCTGTCTGTCCCACCTTCCCAGGAGCATAACCATTTCTGTGGTGTTCTAAGGTGTGAGTGCATGGCAGTAGTATTCCTAAAAATCCATATTCAGTTTCCTCATGTGCCCTACTCCGTCCCTTTCTCTATCCACATTGCTTTAAATCATATTTTTCTCTCAAGGTGTACAAGGATGATAAATAGGTGCCAAGTGGAGAACCCAAGTGTGACGAGCCCTCTCACAGTAGAATGGAGTGAGAAGCTTTCTGACCTCATAAATTGAAGGCTATCGTAATTCATTCTTTTATATATTTTACTTGCATTAATCCTCATATAACCTCAAGAGGTAAATTAATATAATTATCCTCCATTATTGGAGAGAAAGTTGAGACACAAAAGAATCAAAAACTCTTCCAGGATCAACCAGTAAAAGGCAGACCTTGGATTTGAACCAGGCAACCTGGCTCAGAAGTCAGTTTTAATTACCACACTCTGTACTTTCAAAGATTTGTAAACGCTTTGACAATGCATGTCAATTTCAAGCTATGAAGAGCCAAACATAATTTTTCACAATATCTCTCAAATCTAATGGGTCCCCACTATAAAGATTAAATTCCAGGCTGATGACACTGTGAGGCCACATGGCCAGCTGTGCTGGAGGCCTGCTCAAGGCCAGAGCCTAGGTTTACAGAGAAGCAGACAAAAAGCTAAACAAGGAGACTTACTCTGTCTGCATGACTTATTCCCTCTACCTTGTTTTCTCCTAGTCTATCCTGAGGTGACTGTGTATCCTGCAAAGACCCAGCCCCTGCAGCACCACAACCTCCTGGTCTGCTCTGTGAATGGTTTCTATCCAGGCAGCATTGAAGTCAGGTGGTTCCGGAACGGCCAGGAAGAGAAGACTGGGGTGGTGTCCACAGGCCTGATCCAGAATGGAGACTGGACCTTCCAGACCCTGGTGATGCTGGAAACAGTTCCTCGGAGTGGAGAGGTTTACACCTGCCAAGTGGAGCACCCAAGCCTGACGAGCCCTCTCACAGTGGAATGGAGTGAGCAGCTTTCTGACTTCATAAATTTCTCACCCACCAAGACGCGAACTTTACTAATCCCTGAGTATCAGGCTTCTCCTATCCCACATCCTATTTTCATTTGCTCCACGTTCTCATCTCCATCAGCACAGGTCACTGGGGGGTAGCCCTGTAATACTTTCTAGAAACACCTGTACCCCCTGGGGAAGCAGTCATGCCTGCCAGGCAGGAGAGGCTGTCCCTCTTTTGAACCTCCCCATGATGTCACAAGTCGGGGTCACCTGCTGTCTGTGGGCTCCAGGCCCTGCCTCTGGGTCTGAGACTGAGTTTCTGGTACTGTTGCTCTGAGTCGTTTGTTGTAATCTGAGAAGAGGAGAAGTATAGGGACCTTCCTGACATGAGGGGAGTCCAATCTCAGCTCCGCCTTTTATTAGATCTGTCACTCTAGGCAACTACTTAACCTCATTGGGTCTCAGGCTTTCTGTTCATCAGATGTTGAAGTCCTGTCTTACATCAAGGCTGTAATATTTGAATGAGTTTGATGACTGAACCTTGTAACTGTTCAGTGTGATTTGAAAACCTTTCTCAAGAAATGGTCAGTTATTTTAGTTCTTGCAGAGCAGCCTTCTTTCTCATTTTCAAAGCTCTGAATCTCAAGGTGTCAATTAAAGAGGTTCCATTTGGGATAAAAATCACTAAACCTGGCTTCCTCTCTCAGGAGCACGGTCTGAATCTGCACAGAGCAAGATGCTGAGTGGAGTCGGGGGCTTCGTGCTGGGCCTGCTCTTCCTTGGGGCCGGGCTGTTCATCTACTTCAGGAATCAGAAAGGTGAGGAGCCTTTGGTAGCTGGCTGTCTCCATACGCTTTTCTGGAGGAGGAACTATGGCTTTGCTGAAGTTGGTTCTCAGCATATGAATGGCCCTGGATAAAGCCTCTCTACTCCCAAATGACCTCCAATGTTCTGCAAATCCAGAAATCATCAGTGCATGGTTGCTATGTCAAAGCATAATAGCTTGTGGCCTACAGAGATAACAGAAAGATTAACAGGTATAGGTGCTTTGGTTGAGATCGTGGAGCAAATTAAGGAAGAGCAACTAAAGCTAATACAATTACACTGGATCCTGTGACAGACACTTCACACTTCATGGGTCACATGGTCTGTTTCTGCTCCTCTCTGCCCTGGCTGGTGTGGGTTGTGGTGTCAGAGAACTCTCAGGTGGGAGATCTGGAGCTGGGACATTGTGTTGGAGGACAGATTTGCTTCCATATCCTTTAAGTGTATATCTTCTCTTTTTCCTAGGACACTCTGGACTTCAGCCAACAGGTAATACCTTTTCATCCTCTTTAAGAAACAGATTTGGAGGCCAGGCGCAGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGAATCATGAGGTCAGGAGTTCGAGACCAGCCTGACCAACGTGGTGAAACCCCGTCTCTACTAAAAATACAAAAAAAAATCAGTCGGGCGTGGTGGTGTGCGCCTGTAATCCCAGCTACTCAGGAGGCCAAGGCAGGAGAATCGCTGGAACCCAGGAGGCAGAGGTTGCAGTGAGCCGAGATTGGGCCACTGCACTCCAGCCTAGGTGACAGAGTGAGACCCCATCTCAAAAAAACAAAAAAAAGAAAGAAAGAAACAGATTTCCTTTCCCTAGAATGATGGTAGAGGTAATAAGGCATGAGACAGAAGTAATAGCAAAGACATTGGATCCAAATTTCTGATCAGGCAATTTACACCAGAACTCCTCCTCTCCACTTAGAAAAGGCCTGTGCTCTGCAGGAGTATTGACTCATGGAGACTTCAGAACTTGTTTTTCTTCTTCCTGCAGTGCTCTCATCTGAGTCCTTGAAAGAGGGCAAAATAAACTGTTAGTAGAGCCAGGTCTGAAAACAACACTTTCTTGCGTCTCTGCAGGATTCCTGAGCTGAAGTGAAGATGACCACATTCAAGGAAGAACCTTCTGCCCCAGCTTTGCAGGATGAAACACTTCCCCGCTTGGCTCTCATTCTTCCACAAGAGAGACCTTTCTCCGGACCTGGTTGCTACTGGTTCAGCAGCTGCAGAAAATGTCCTCCCTTGTGGCTGCCTCAGCTCGTACCTTTGGCCTGAAGTCCCAGCATTAATGGCAGCCCCTCATCTTCAAGTTTTGTGCTCCCCTTTACCTAATGCTTCCTGCCTCCCATGCATCTGTACTCCTGCTGTGCCACAAACACATTACATTATTAAATGTTTCTCAAACATGGAGTTAAA,15235


## Build genomic sequences for all classical  HLA genes

Same as single example above, but loops through all genes.

In [6]:
if (! dir.exists('IMGTHLA/alignments_gen_aligned_filled')) {
    dir.create('IMGTHLA/alignments_gen_aligned_filled') # output directory
}

for (g in c('A', 'B', 'C', 'DRB1', 'DQA1', 'DQB1', 'DPA1', 'DPB1')) { # takes ~20 mins

    ## Code from HLApers
    hladb = tibble(locus = g) %>%
        mutate(data = map(locus, ~hla_compile_index(., 'IMGTHLA'))) %>%
        filter(!is.na(data)) %>%
        unnest(data) %>%
        filter(!grepl("N$", allele)) %>%
        select(-locus) %>%
        mutate(allele = paste0("IMGT_", allele)) %>%
        split(.$allele) %>%
        map_chr("cds") %>%
        DNAStringSet() %>%
        writeXStringSet(paste0('IMGTHLA/alignments_gen_aligned_filled/', g, '_gen_aligned_filled.fa'))   
}

Processing locus A

Processing locus B

Processing locus C

Processing locus DRB1

Processing locus DQA1

Processing locus DQB1

Processing locus DPA1

Processing locus DPB1



## Add additional sequence from GRCh38 to the 5' and 3' end
We noticed that the IMGT sequences for HLA-A, HLA-DQA1, and HLA-DQB1 are truncated at the 3' UTR and 5' UTR compared to GRCh38 (gene start/stop defined by Gencode v38). Hence, we "pad" the IMGT sequences for those genes with the sequences at `add_to_IMGT.fa`. Note: if you are using the latest version of IMGTHLA and/or a different Gencode GTF file, you should check whether the start/stop boundaries have not changed.

Resulting "padded" fastas will output to `alignments_gen_aligned_filled_added`.

In [7]:
if (! dir.exists('IMGTHLA/alignments_gen_aligned_filled_added')) {
    dir.create('IMGTHLA/alignments_gen_aligned_filled_added') # output directory
}

# Read in additional padding sequences
additions = Biostrings::readDNAStringSet('add_to_IMGT.fa')
additions

DNAStringSet object of length 8:
    width seq                                               names               
[1]   994 [47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m...[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT

In [8]:
# Read in genomic fastas to be padded
A = Biostrings::readDNAStringSet('IMGTHLA/alignments_gen_aligned_filled/A_gen_aligned_filled.fa')
DQA1 = Biostrings::readDNAStringSet('IMGTHLA/alignments_gen_aligned_filled/DQA1_gen_aligned_filled.fa')
DQB1 = Biostrings::readDNAStringSet('IMGTHLA/alignments_gen_aligned_filled/DQB1_gen_aligned_filled.fa')
DPA1 = Biostrings::readDNAStringSet('IMGTHLA/alignments_gen_aligned_filled/DPA1_gen_aligned_filled.fa')
DPB1 = Biostrings::readDNAStringSet('IMGTHLA/alignments_gen_aligned_filled/DPB1_gen_aligned_filled.fa')

### Pad HLA-A

In [9]:
A

DNAStringSet object of length 3886:
       width seq                                            names               
   [1]  3503 [47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mG[39m[49m...[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m

In [10]:
newA = A
for (allele in A@ranges@NAMES) {
    newA[[allele]] = paste0(additions[['HLA-A_5p']], A[[allele]], additions[['HLA-A_3p']])
}
newA
# Write new file
Biostrings::writeXStringSet(newA, 'IMGTHLA/alignments_gen_aligned_filled_added/A_gen_final.fa')

DNAStringSet object of length 3886:
       width seq                                            names               
   [1]  4626 [47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m...[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m

### Pad HLA-DQA1

In [11]:
DQA1

DNAStringSet object of length 200:
      width seq                                             names               
  [1]  6492 [47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m...[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[3

In [12]:
newDQA1 = DQA1
for (allele in DQA1@ranges@NAMES) {
    newDQA1[[allele]] = paste0(additions[['HLA-DQA1_5p']], DQA1[[allele]], additions[['HLA-DQA1_3p']])
}
newDQA1

# Write new file
Biostrings::writeXStringSet(newDQA1, 'IMGTHLA/alignments_gen_aligned_filled_added/DQA1_gen_final.fa')

DNAStringSet object of length 200:
      width seq                                             names               
  [1] 18892 [47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m...[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[3

### Pad HLA-DQB1

In [13]:
DQB1

DNAStringSet object of length 557:
      width seq                                             names               
  [1]  7480 [47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m...[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[3

In [14]:
newDQB1 = DQB1
for (allele in DQB1@ranges@NAMES) {
    newDQB1[[allele]] = paste0(additions[['HLA-DQB1_5p']], DQB1[[allele]], additions[['HLA-DQB1_3p']])
}
newDQB1

# Write new file
Biostrings::writeXStringSet(newDQB1, 'IMGTHLA/alignments_gen_aligned_filled_added/DQB1_gen_final.fa')

DNAStringSet object of length 557:
      width seq                                             names               
  [1]  9295 [47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m...[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[3

### Pad HLA-DPA1

In [15]:
DPA1

DNAStringSet object of length 189:
      width seq                                             names               
  [1]  9775 [47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m...[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[3

In [16]:
newDPA1 = DPA1
for (allele in DPA1@ranges@NAMES) {
    newDPA1[[allele]] = paste0(additions[['HLA-DPA1_5p']], DPA1[[allele]])
}
newDPA1

# Write new file
Biostrings::writeXStringSet(newDPA1, 'IMGTHLA/alignments_gen_aligned_filled_added/DPA1_gen_final.fa')

DNAStringSet object of length 189:
      width seq                                             names               
  [1] 16457 [47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m...[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[3

### Pad HLA-DPB1

In [17]:
DPB1

DNAStringSet object of length 634:
      width seq                                             names               
  [1] 11468 [47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m...[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[3

In [18]:
newDPB1 = DPB1
for (allele in DPB1@ranges@NAMES) {
    newDPB1[[allele]] = paste0(DPB1[[allele]], additions[['HLA-DPB1_3p']])
}
newDPB1

# Write new file
Biostrings::writeXStringSet(newDPB1, 'IMGTHLA/alignments_gen_aligned_filled_added/DPB1_gen_final.fa')

DNAStringSet object of length 634:
      width seq                                             names               
  [1] 13963 [47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mC[39m[49m...[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[3

**Copy other genes (HLA-B, HLA-C, HLA-DRB1) into new folder**

These genes were not truncated relative to the reference sequence, so they don't need to be padded.

In [19]:
old_prefix = 'IMGTHLA/alignments_gen_aligned_filled'
new_prefix = 'IMGTHLA/alignments_gen_aligned_filled_added'
genes = c('B', 'C', 'DRB1')
old_suffix = '_gen_aligned_filled.fa'
new_suffix = '_gen_final.fa'
done = purrr::map(genes, ~file.copy(file.path(old_prefix, paste0(., old_suffix)), new_prefix))
file.rename(file.path(new_prefix, paste0(genes, old_suffix)), file.path(new_prefix, paste0(genes, new_suffix)))

# Process nucleotide coding sequence .fa files

## Example of alleles with a CDS (.nuc) sequence but no genomic (.gen) sequence

In [21]:
## Get genomic database
gen_hladb = tibble(locus = 'DPB1') %>%
        mutate(data = map(locus, ~hla_compile_index(., 'IMGTHLA'))) %>%
        filter(!is.na(data)) %>%
        unnest(data) %>%
        filter(!grepl("N$", allele)) %>%
        select(-locus) %>%
        mutate(allele = paste0("IMGT_", allele)) %>%
        split(.$allele) %>%
        map_chr("cds") %>%
        DNAStringSet()

Processing locus DPB1



Some alleles (e.g. DPB1\*112:01) have a nucleotide CDS sequence (.nuc) entry in the IMGTHLA database but are missing a corresponding entry in the (.gen) alignment. To be able to provide a genomic-length sequence in cases like this, for each allele with only a .nuc sequence, we find the closest allele (in terms of Hamming distance on the .nuc sequence) that does have a .gen sequence.

In [22]:
grep('IMGT_DPB1\\*01:01', names(gen_hladb)) %>% any()

In [23]:
grep('IMGT_DPB1\\*112:01', names(gen_hladb)) %>% any()

In [24]:
## Get nuc database
nuc_hladb = tibble(locus = 'DPB1') %>%
        mutate(data = map(locus, ~hla_compile_index(., 'IMGTHLA', imgtfile = 'nuc'))) %>% # note change to nuc
        filter(!is.na(data)) %>%
        unnest(data) %>%
        filter(!grepl("N$", allele)) %>%
        select(-locus) %>%
        mutate(allele = paste0("IMGT_", allele)) %>%
        split(.$allele) %>%
        map_chr("cds") %>%
        DNAStringSet()

Processing locus DPB1



In [25]:
'IMGT_DPB1*112:01' %in% names(nuc_hladb) # present in nuc but not gen

In [26]:
names(nuc_hladb) %>% length()
names(gen_hladb) %>% length()

In [27]:
# Shorten names in both nuc and gen files to 4-digit (2-field) resolution
nuc_4_digit = str_extract(names(nuc_hladb), "[^:]*:[^:]*") %>% unique()
gen_4_digit = str_extract(names(gen_hladb), "[^:]*:[^:]*") %>% unique()
length(nuc_4_digit)
length(gen_4_digit)

In [28]:
# Get a list of names of nuc alleles not present in the gen database
nuc_4_digit[which(!nuc_4_digit %in% gen_4_digit)] %>% head()

In [29]:
# Get a list of names of gen alleles not present in the nuc database --> should be none
gen_4_digit[which(!gen_4_digit %in% nuc_4_digit)] %>% head()

## Build nuc sequences for all classical  HLA genes

Ran script (`compile_nuc_index.R`) as a job due to longer runtime (~1 hour).

## For each gene, get a list of "nuc only" and "nuc and gen" alleles:

"nuc only" alleles = 4-digit alleles for which there is not a matching .gen file

"nuc and gen" alleles = 4-digit alleles for which there IS a matching .gen file

In [30]:
if (! dir.exists('IMGTHLA/nuc_only_alleles')) {
    dir.create('IMGTHLA/nuc_only_alleles') # output directory
}

In [31]:
for (g in c('A', 'B', 'C', 'DRB1', 'DQA1', 'DQB1', 'DPA1', 'DPB1')) {
    nuc = readDNAStringSet(paste0('IMGTHLA/alignments_nuc_aligned_filled/', 
                                  g, '_nuc_aligned_filled.fa'), format="fasta")
    gen = readDNAStringSet(paste0('IMGTHLA/alignments_gen_aligned_filled_added/', 
                                  g, '_gen_final.fa'), format="fasta")
    
    # Shorten to 4-digit alleles
    nuc_4_digit = str_extract(names(nuc), "[^:]*:[^:]*") %>% unique()
    gen_4_digit = str_extract(names(gen), "[^:]*:[^:]*") %>% unique()
    length(nuc_4_digit)
    length(gen_4_digit)

    # Get a list of names of nuc alleles not present in the gen database
    write(nuc_4_digit[which(!nuc_4_digit %in% gen_4_digit)], 
          paste0('IMGTHLA/nuc_only_alleles/', g, '_nuc_only_alleles_4digit.txt'))
    # Get a list of names of nuc alleles present in the gen database
    write(nuc_4_digit[which(nuc_4_digit %in% gen_4_digit)], 
          paste0('IMGTHLA/nuc_only_alleles/', g, '_nuc_and_gen_alleles_4digit.txt'))
}

In [32]:
nuc_only_alleles = read.table('IMGTHLA/nuc_only_alleles/DPB1_nuc_only_alleles_4digit.txt')
'IMGT_DPB1*112:01' %in% nuc_only_alleles$V1

In [33]:
nuc_and_gen = read.table('IMGTHLA/nuc_only_alleles/DPB1_nuc_and_gen_alleles_4digit.txt')
'IMGT_DPB1*112:01' %in% nuc_and_gen$V1

## Function to find closest nuc allele with matching gen sequence
This function finds the closest allele with a .gen sequence for each allele with only a .nuc sequence.

In [34]:
# Modified from hlaseqlib function by Vitor Aguiar
make_dist_matrix <- function(hla_df, has_gen_alleles, nuc_only_alleles) {
    cds_sequenced <- stringr::str_split(hla_df$cds, "", simplify = TRUE) %>%
        apply(2, function(x) !any(x == "*")) # sequence fully known

    run <- rle(cds_sequenced)

    ends <- cumsum(run$lengths)
    starts <- ends - run$lengths + 1L

    run_df <- tibble::tibble(value = run$values, start = starts, end = ends) %>%
        dplyr::filter(value == TRUE) %>%
        dplyr::slice(which.max(end - start + 1)) # find the longest stretch that is common amongst the CDS-sequenced alleles

    hla_df_cds_common <- hla_df %>%
        dplyr::mutate(cds = substring(cds, run_df$start, run_df$end))

    hla_df_cds_common_has_gen <- hla_df_cds_common %>%
        dplyr::filter(paste0('IMGT_', str_extract(allele, "[^:]*:[^:]*")) %in% has_gen_alleles) # 4-digit

    hla_df_cds_common_nuc_only <- hla_df_cds_common %>%
        dplyr::filter(paste0('IMGT_', str_extract(allele, "[^:]*:[^:]*")) %in% nuc_only_alleles) # 4-digit

    cds_common_has_gen <- hla_df_cds_common_has_gen$cds
    names(cds_common_has_gen) <- hla_df_cds_common_has_gen$allele
    
    cds_common_nuc_only <- hla_df_cds_common_nuc_only$cds
    names(cds_common_nuc_only) <- hla_df_cds_common_nuc_only$allele

    # Make distance matrix of dimensions [#nuc only sequences] x [# nuc with gen sequences],
    # where each entry is the Hamming distance between the "nuc only" and "nuc with gen" sequences.
    stringdist::stringdistmatrix(cds_common_nuc_only, cds_common_has_gen,
                                 method = "hamming", useNames = "names", nthread = 1)
}

### Example of what function does

Example for HLA-DRB1 only. For "nuc only" alleles, grab closest nuc allele that has a .gen version at the 4-digit level

In [35]:
str_extract("04:01:01:01", "[^:]*:[^:]*")

In [38]:
gene = 'DRB1'
has_gen_alleles = read.table('IMGTHLA/nuc_only_alleles/DRB1_nuc_and_gen_alleles_4digit.txt')$V1
nuc_only_alleles = read.table('IMGTHLA/nuc_only_alleles/DRB1_nuc_only_alleles_4digit.txt')$V1

# Process alignments
hla_df <- hla_read_alignment('DRB1', 'IMGTHLA', imgtfile='nuc')

# find closest allele with a .gen sequence for each allele with only a .nuc sequence
distmatrix <- make_dist_matrix(hla_df, has_gen_alleles, nuc_only_alleles)

Rows are alleles with only a .nuc sequence, columns are alleles with .nuc and .gen

In [39]:
distmatrix[1:5, 1:5]
dim(distmatrix)

Unnamed: 0,DRB1*01:01:01:01,DRB1*01:01:01:02,DRB1*01:01:01:03,DRB1*01:01:01:04,DRB1*01:01:01:05
DRB1*01:04,4,4,4,4,4
DRB1*01:05,1,1,1,1,1
DRB1*01:06,4,4,4,4,4
DRB1*01:08,1,1,1,1,1
DRB1*01:09,2,2,2,2,2


In [40]:
matching_alleles = colnames(distmatrix)[max.col(-distmatrix, ties.method="first")] 
df = cbind(rownames(distmatrix), matching_alleles) %>% as.data.frame()
colnames(df) = c('nuc_only_allele', 'matching_allele')
head(df)

Unnamed: 0_level_0,nuc_only_allele,matching_allele
Unnamed: 0_level_1,<chr>,<chr>
1,DRB1*01:04,DRB1*01:02:06
2,DRB1*01:05,DRB1*01:01:01:01
3,DRB1*01:06,DRB1*01:02:06
4,DRB1*01:08,DRB1*01:01:01:01
5,DRB1*01:09,DRB1*01:01:01:01
6,DRB1*01:10,DRB1*01:01:01:01


Write matching gen sequence to database (listed under nuc name)

In [41]:
if (! dir.exists('IMGTHLA/alignments_FINAL')) {
    dir.create('IMGTHLA/alignments_FINAL') # output directory
}

In [42]:
# Read in genomic string set
genSet = readDNAStringSet(paste0('IMGTHLA/alignments_gen_aligned_filled_added/', 
                                 gene, '_gen_final.fa'), format = 'fasta')
newSet = DNAStringSet()

for (i in 1:nrow(df)) {
    matching_allele = sub("(:[^:]+):.*", "\\1", df[i, 'matching_allele'])
    
    genSet_match = genSet[which(grepl(matching_allele, names(genSet), fixed = TRUE))]
    # The matching allele matches nothing in genSS (error)
    if (length(genSet_match) == 0) {
        message('Error')
    }
    # The matching allele matches something in gen up to 4 digits
    g = genSet_match[1]
    
    # Add the allele to the genSS 
    metadata(g)$names = genSet_match[1]
    gene_to_add = DNAStringSet(g, use.names=TRUE)
    newSet = append(newSet, gene_to_add)
}

# Add "matching_gen" to name to indicate that we matched the nuc to its nearest nuc with a gen
newNames = paste0('IMGT_', df$nuc_only_allele, ':matching_gen') 
names(newSet) = newNames
newSet
finalSet = append(genSet, newSet)

# Write final output alignment files, containing genomic sequences for virtually all possible alleles
Biostrings::writeXStringSet(finalSet, paste0('IMGTHLA/alignments_FINAL/', gene, '_all_alleles.fa'))

DNAStringSet object of length 2063:
       width seq                                            names               
   [1] 11229 [47m[30mG[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m...[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mT[39m[49m[47m[30mC[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mT[39m[49m[47m[30mG[39m[49m[47m[30mG[39m[49m[47m[30mA[39m[49m[47m[30mG[39m[49m[47m[30mT[39m[49m[47m[30mT[39m[49m[47m[30mA[39m[49m[47m

## Do for all genes

In [44]:
for (gene in c('A', 'B', 'C', 'DRB1', 'DQA1', 'DQB1', 'DPA1', 'DPB1')) {
    message(gene)
    has_gen_alleles = read.table(paste0('IMGTHLA/nuc_only_alleles/', gene, '_nuc_and_gen_alleles_4digit.txt'))$V1
    nuc_only_alleles = read.table(paste0('IMGTHLA/nuc_only_alleles/', gene, '_nuc_only_alleles_4digit.txt'))$V1

    # Process alignments
    hla_df <- hla_read_alignment(gene, 'IMGTHLA', imgtfile='nuc')

    # find closest allele with a .gen sequence for each allele with only a .nuc sequence
    distmatrix <- make_dist_matrix(hla_df, has_gen_alleles, nuc_only_alleles)

    matching_alleles = colnames(distmatrix)[max.col(-distmatrix, ties.method="first")] 
    hamming_dist = matrixStats::rowMins(distmatrix)
    #matching_alleles %>% length()
    df = cbind(rownames(distmatrix), matching_alleles, hamming_dist) %>% as.data.frame()
    colnames(df) = c('nuc_only_allele', 'matching_allele', 'min_dist')

    # Read in genomic string set
    genSet = readDNAStringSet(paste0('IMGTHLA/alignments_gen_aligned_filled_added/', gene, '_gen_final.fa'), format = 'fasta')
    newSet = DNAStringSet()

    for (i in 1:nrow(df)) {
        matching_allele = sub("(:[^:]+):.*", "\\1", df[i, 'matching_allele'])
    
        genSet_match = genSet[which(grepl(matching_allele, names(genSet), fixed = TRUE))]
        # The matching allele matches nothing in genSS (error)
        if (length(genSet_match) == 0) {
            message('Error')
        }
        # The matching allele matches something in gen up to 4 digits
        g = genSet_match[1]
    
        # Add the allele to the genSS 
        metadata(g)$names = df[i, 'nuc_only_allele']
        gene_to_add = DNAStringSet(g, use.names = TRUE)
        newSet = append(newSet, gene_to_add)
    }

    # matching_gen indicates that we matched the nuc to its nearest nuc with a gen
    newNames = paste0('IMGT_', df$nuc_only_allele, ':matching_gen') 
    #newNames %>% head()
    names(newSet) = newNames

    finalSet = append(genSet, newSet)
    write.csv(df, paste0('IMGTHLA/alignments_FINAL/', gene, '_matching_gen.csv'), quote = F)
    Biostrings::writeXStringSet(finalSet, paste0('IMGTHLA/alignments_FINAL/', gene, '_all_alleles.fa'))
}

A

B

C

DRB1

DQA1

DQB1

DPA1

DPB1



# Concatenate all alleles for all genes together

To make the final database (ready for 2_make_personalized_refs), run the following on the terminal:

`cat IMGTHLA/alignments_FINAL/*.fa > IMGTHLA_all_alleles_FINAL.fa`

# Done!

In [2]:
sessionInfo()

R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.5 (Santiago)

Matrix products: default
BLAS/LAPACK: /PHShome/jbk37/anaconda3/envs/hla_new/lib/libopenblasp-r0.3.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] stringi_1.7.8       stringr_1.4.0       tidyr_1.2.0        
 [4] readr_2.1.2         purrr_0.3.4         dplyr_1.0.8        
 [7] Biostrings_2.58.0   XVector_0.30.0      IRanges_2.24.1     
[10] S4Vectors_0.28.1    BiocGenerics_0.36.1

l