Skip to content

kir-mapper is a toolkit for calling SNPs, alleles, and haplotypes for KIR genes from short-read second-generation sequencing (NGS) data.

License

Notifications You must be signed in to change notification settings

erickcastelli/kir-mapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kir-mapper

Castelli EC et al. kir-mapper: A Toolkit for Killer-Cell Immunoglobulin-Like Receptor (KIR) Genotyping From Short-Read Second-Generation Sequencing Data. HLA 2025 Mar;105(3):e70092. doi: 10.1111/tan.70092

Version 1.01 (December, 2024)

Author: Erick C. Castelli (erick.castelli@unesp.br)

What is kir-mapper?

kir-mapper is a toolkit for calling SNPs, alleles, and haplotypes for KIR genes from short-read second-generation sequencing (NGS) data. kir-mapper supports both single-end and paired-end Illumina sequencing data. It is compatible with Ion Torrent data uppon some adjustments. This toolkit presents methods for:

  1. Getting unbiased alignments in the context of the hg38 reference genome.
  2. Estimating copy numbers.
  3. Calling SNPs and InDels across the KIR genes in the context of the hg38 reference genome.
  4. Calling KIR alleles, with reports listing potential new SNPs.
  5. Inferring haplotypes within KIR genes, and among KIR genes.

Summary

What is kir-mapper

Important notes

Install

-- Configuring kir-mapper environment using Conda/Miniconda

-- Installing using Docker

-- Installing everything by yourself

kir-mapper configuration

Quick reference for kir-mapper usage

-- Aligning reads to the hg38 reference genome - map

-- Estimating copy numbers - ncopy

-- Calling SNPs and alleles - genotype

-- Calling haplotypes and solving ambiguites - haplotype

Other methods

Practical notes

Support

Version history

Manual

Important notes:

Data compatibility: We tested kir-mapper with Illumina short-read data from whole-genome sequencing (WGS), whole-exome sequencing (WES), and targeted sequencing. It might work with Ion Torrent with some adjustments.

System compatibility: MacOS (Intel), Linux, or WSL2/Linux. We have tested it with MacOS 10.15, Ubuntu 22.04 LTS, and Ubuntu 22.04 LTS under WSL2. Other versions might be compatible. For MacOS, we tested only with Intel Macs.

Read depth: Please note that read depth is essential. We recommend coverage of at least 20x for WGS and 50x for WES.

Read size: You will get much better results when dealing with a read size larger than 100 nucleotides and paired-end sequencing. kir-mapper is also compatible with single-end sequencing data. The pipeline might produced biases results with shorter reads ( < 100).

Sample size: The minimum sample size we tested is 50 samples. The sample size is essential to get accurate estimations for copy numbers.

Always indicate the full path for any input file or output folder.

Back to Summary

Install

kir-mapper depends on a list of libraries and third-party programs, including samtools, bcftools, freebayes, and others. In addition, it depends on some libraries such as ZLIB and BOOST.

You can choose how to install kir-mapper and its dependencies. You can opt for Conda, Docker, or install everything by yourself. We recommend Conda.

Configuring kir-mapper environment using conda

To install kir-mapper and all its dependencies, use Conda and the kir-mapper.yml file, as follows.

This tutorial assumes that you are using Miniconda (version 3). You can adapt it as necessary.

USER represents your username. Change it to fit your username.

  1. If Conda/Bioconda is not installed, follow the Bioconda installation instructions, with a preference for Miniconda

  2. Make sure you added the proper channels:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority true
  1. Clone the kir-mapper GitHub repo
git clone https://github.com/erickcastelli/kir-mapper
  1. Enter the kir-mapper repository.
cd kir-mapper
  1. Now, download the last version of the kir-mapper database and unzip it:
wget --no-check-certificate https://www.castelli-lab.net/support/kir-mapper_db_latest.zip
unzip kir-mapper_db_latest.zip
  1. Use conda to create an environment for kir-mapper using the kir-mapper.yml from the repository
conda env create -f kir-mapper.yml
  1. Use conda to create an environment for shapeit4 using the shapeit4.yml from the repository
conda env create -f shapeit4.yml
  1. Copy the shapeit4 binary to the kir-mapper conda environment. Replace USER by your username.
cp /home/USER/miniconda3/envs/shapeit4/bin/shapeit4 /home/USER/miniconda3/envs/kir-mapper/bin
  1. Now, activate the kir-mapper environment.
conda activate kir-mapper
  1. From the kir-mapper directory (you are already there), create a new folder named build and enter it.
mkdir build && cd build
  1. Compile kir-mapper from the /build folder. If this doesn't work, try step 11.
cmake ../src/
make
  1. If step 11 doesn't work, try this. Replace USER by your username.

First, delete the build folder and create a new one

cd ..
rm -rf build
mkdir build && cd build

Now, compile kir-mapper with this option, replacing USER by your username

BOOST_ROOT=/home/USER/miniconda3/envs/kir-mapper ZLIB_ROOT=/home/USER/miniconda3/envs/kir-mapper cmake ../src/
make
  1. If step 12 and 13 failed, try this. Replace USER by your username.

First, delete the build folder and create a new one

cd ..
rm -rf build
mkdir build && cd build

Now, compile kir-mapper with this option, replacing USER by your username

cmake -DBoost_INCLUDE_DIR=/home/USER/miniconda3/envs/kir-mapper/include \
      -DBoost_LIBRARY_DIR=/home/USER/miniconda3/envs/kir-mapper/lib \
      -DZLIB_ROOT=/home/USER/miniconda3/envs/kir-mapper \
      ../src
make
  1. Copy the kir-mapper binary to the /usr/bin, or /usr/local/bin, or folder /bin from your kir-mapper environment (e.g.: /home/USER/miniconda3/envs/kir-mapper/bin). Alternativelly, you can run kir-mapper from the build folder. Replace USER by your username.
cp kir-mapper /home/USER/miniconda3/envs/kir-mapper/bin/

or

cp kir-mapper /usr/local/bin
  1. Run kir-mapper. The setup process usually starts automatically. If it doesn't, you can call it by typing the following:
kir-mapper setup
  1. Follow the setup steps. kir-mapper will automatically detect most programs (BWA, samtools, bcftools, freebayes, etc). The only exception is the path for the database (from step 5).

  2. You are all set. Don't forget to activate the kir-mapper environment before using it.

conda activate kir-mapper

Back to Summary

Installing using Docker

Here, we assume that Docker is already installed in your system.

  1. Clone the Kir-mapper GitHub repo
git clone https://github.com/erickcastelli/kir-mapper
  1. Enter the kir-mapper repository.
cd kir-mapper
  1. Build the kir-mapper image. This might take a while.
docker build -t kir-mapper -f Dockerfile .
or
sudo docker build -t kir-mapper -f Dockerfile .
  1. Once completed, you may run the kir-mapper docker image like this:
docker run -it kir-mapper
or
sudo docker run -it kir-mapper
  1. Now, type kir-mapper. The program should be available, with all dependencies and the database already set.
kir-mapper
  1. By using this method, Docker has already downloaded the database, and the software is configured. There is no need to run kir-mapper setup

  2. To run Docker with access to the host system, you may use a command like this: /home/USER is the path to the folder where the data you want to process is, and /data is how you can access it inside the docker image.

docker run -it -v /home/USER:/data kir-mapper

Back to Summary



Installing everything by yourself

kir-mapper depends on a list of libraries and third-party programs, as follows:

  • boost 1.74
  • cmake >= 3.26.4
  • make >= 4.3
  • zlib >= 1.2.13
  • R >= 4.2
  • R ggplot2, plotly, htmlwidgets, stringr, shiny, dplyr, forcats, pacman
  • gcc/cpp compiler >= 11
  • bwa 0.7.17
  • freebayes 1.3.6
  • samtools 1.19.2
  • bcftools 1.19
  • STAR 2.7.10b or 2.7.11a
  • whatshap 2.2
  • shapeit4 4.2.2
  • picard-tools 3.3.0
  • java >= 11
  • bgzip and tabix

To compile the program by your self, assuming that everything above is available, follow these steps:

  1. Clone the kir-mapper GitHub repo
git clone https://github.com/erickcastelli/kir-mapper
  1. Enter the kir-mapper repository
cd kir-mapper
  1. Create a new folder named build and enter it
mkdir build && cd build
  1. Compile the program
cmake ../src/
make
  1. Now, download the last version of the kir-mapper database:
wget --no-check-certificate https://www.castelli-lab.net/support/kir-mapper_db_latest.zip
  1. Unzip the database.
unzip kir-mapper_db_latest.zip
  1. Run kir-mapper. The setup process usually starts automatically. If it doesn't, you can call it by typing the following:
kir-mapper setup
  1. Follow the setup steps. kir-mapper will automatically detect most programs if they are available in the system. The only exception is the path for the database (from steps 5 and 6). You can also indicate the path for each binary.

Back to Summary



Kir-mapper configuration

If you followed any installation mode described above, kir-mapper is already configured. If it is not, you can configure it by typing kir-mapper setup

Remember, you need a copy of the kir-mapper database to run any analysis.

wget --no-check-certificate https://www.castelli-lab.net/support/kir-mapper_db_latest.zip
unzip kir-mapper_db_latest.zip

kir-mapper uses a hidden configuration file (.txt) in your home folder containing the path for all necessary programs. If the program does not find this file, it enters the setup mode automatically. You can also call this mode by typing kir-mapper setup

kir-mapper setup

Follow the instructions provided to indicate the path of all necessary programs. kir-mapper might find the programs automatically. The only exception is the database and the PICARD jar file.

The setup process will save the configuration file in your home folder. This is an example of this file. You can edit it by using nano ~/.kir-mapper. Replace USER by your username.

db=/home/USER/kir-mapper/kir-mapper_db_latest/
samtools=/home/USER/miniconda3/envs/kir-mapper/bin/samtools
bcftools=/home/USER/miniconda3/envs/kir-mapper/bin/bcftools
bwa=/home/USER/miniconda3/envs/kir-mapper/bin/bwa
whatshap=/home/USER/miniconda3/envs/kir-mapper/bin/whatshap
freebayes=/home/USER/miniconda3/envs/kir-mapper/bin/freebayes
picard=/home/USER/miniconda3/envs/kir-mapper/bin/picard.jar
star=/home/USER/miniconda3/envs/kir-mapper/bin/STAR
shapeit4=/home/USER/miniconda3/envs/shapeit4/bin/shapeit4

If you need to indicate a different configuration file while running kir-mappe, plase use the comand -config to indicate this alternative configuration file. Example:

kir-mapper map -config /alternative_path/.kir-mapper

Back to Summary



Quick reference for kir-mapper usage

For full details on how to use kir-mapper, please check kir-mapper documentation MANUAL.md

In brief, there are four main methods, that should be used in this specific order:

  • map
  • ncopy
  • genotype
  • haplotype

Typing kir-mapper will display all the functions available.

Typing kir-mapper map, for instance, will display all the options for the map function.

Aligning reads to the hg38 reference genome - map

Usage: kir-mapper map [OPTIONS]
Required:
	-r1  STRING and -r2 STRING: path to paired-end read files
		 or
	-r0  STRING: path to single-end read file
		 or
	-bam STRING: path to BAM file (reads aligned to the hg38 reference with BWA-MEM)
	
	
Optional:
	-output STRING: full path to the output folder.
	-sample STRING: name/id for the sample
	-threads INT: number of threads
	--exome: indicate that this data is WES
	(check manual for other options)

Example for a sample tagged as "Test". "Test" will be the name for the sample in all outputs.

# Re-aligning a BAM file
kir-mapper map -bam original_BAM.bam -sample test -output /home/USER/output 

# Aligning FASTQ 
kir-mapper map -r1 R1.fastq.gz -r2 R2.fastq.gz -sample test -output /home/USER/output 

# Aligning FASTQ from exomes 
kir-mapper map -r1 R1.fastq.gz -r2 R2.fastq.gz -sample test -output /home/USER/output --exome 

Examples using the sample data provided in /samples

kir-mapper map -r1 HG00096.R1.fastq.gz -r2 HG00096.R2.fastq.gz -sample HG00096 -output /home/USER/output 
kir-mapper map -r1 HG02461.R1.fastq.gz -r2 HG02461.R2.fastq.gz -sample HG02461 -output /home/USER/output 
kir-mapper map -bam HG00403.KIR.bam -sample HG00403 -output /home/USER/output 
kir-mapper map -bam HG01583.KIR.bam -sample HG01583 -output /home/USER/output 

When evaluating many samples simultaneously, run map for every sample, indicating the same output but different sample names (as indicated in the example above).

The outputs from map are BAM files with aligned reads to the hg38 reference genome and gene-specific fastq files. The final BAM is the ".adjusted.bam" when not using PICARD tools or ".adjusted.nodup.bam" when using PICARD tools. You can inspect/explore the BAM files using IGV.

Back to Summary

Estimating copy numbers - ncopy

Usage: kir-mapper ncopy [OPTIONS]
Required:
	-output STRING: full path to the output folder. The same used by function map.
	
Optional:
	-threads INT: number of threads
	-reference STRING: the reference with two copies for all samples - KIR3DL3, HLA-G, HLA-E, 5UPKIR
	--exome: indicate that this data is WES
	(check manual for all the optionals)

Example

kir-mapper ncopy -output /home/USER/output 
or
kir-mapper ncopy -output /home/USER/output --exome

This function will estimate the number of copies for every KIR gene and sample. The final outputs are plots in PNG and HTML format with the coverage ratio between the target gene and the selected reference. Users must evaluate the plots to define the correct thresholds and edit the thresholds.txt file accordingly. If any threshold is modified, you must run ncopy again to reflect the modifications.

To evaluate the thresholds, using a browser, please open the .html files inside folder /home/USER/output/ncopy/plots. Define the thresholds to separate samples with 0, 1, 2, 3, or >3 copies, changing it on the thresholds.txt at /home/USER/output/ncopy

Then, run ncopy again. This will update all the plots and the copy numbers for all samples.

kir-mapper ncopy -output /home/USER/output 

Alternatively, you can use the R script named kir-mapper_plot_app.R inside the /home/USER/output/ncopy. This script can assist you in defining the thresholds and updating the plots. When using this script, there is no need to run copy again in case you change any threshold.

Back to Summary

Calling SNPs and alleles - genotype

Usage: kir-mapper genotype [OPTIONS]
Required:
	-output STRING: full path to the output folder. The same used by function map and ncopy.
	
Optional:
	-threads INT: number of threads
	--full: call SNPs and Indels also in introns
	(check manual for all the optionals)

Example

kir-mapper genotype -output /home/USER/output 

This function will call SNPs and InDels across all exons from KIR genes, by using freebayes and an internal algorithm to detect and remove unlike genotypes. It also phases the variants using whatshap. After, the program detects which KIR alleles are compatible with the observed variants.

The outputs are VCF files for every gene, and reports with the detected alleles for every sample, listing eventual mismatches.

The VCF files are placed inside /home/USER/output/genotype/[GENE_NAME]/vcf The reports for each sample are placed inside /home/USER/output/genotype/[GENE_NAME]/reports The summary with all allele calls is placed inside /home/USER/output/genotype/[GENE_NAME]/calls

All the SNPs are reported in the context of the hg38 reference genome. For genes that are not annotated in the primary sequence of chr19 (e.g. KIR2DL5), reads from these are aligned and reported in an alternative contig.

These are the locations for all genes in the alternative contigs:

  • KIR2DL2, chr19_KI270921v1_alt:53185-67900
  • KIR2DL5AB, chr19_KI270921v1_alt:175661-185557
  • KIR2DS1, chr19_KI270921v1_alt:204223-218437
  • KIR2DS2, chr19_KI270921v1_alt:36890-51500
  • KIR2DS3, chr19_KI270921v1_alt:81118-95700
  • KIR2DS5, chr19_KI270890v1_alt:36829-52100
  • KIR3DP1, chr19_KI270923v1_alt:61981-67693
  • KIR3DS1, chr19_KI270921v1_alt:159375-174162

Sometimes, kir-mapper genotype reports ambiguities, i.e., more than one combination of alleles that fit the observed genotypes. The following method kir-mapper haplotype may solve ambiguities.

Back to Summary

Calling haplotypes and solving ambiguites - haplotype

Usage: kir-mapper haplotype [OPTIONS]
Required:
	-output STRING: full path to the output folder. The same used by function map, ncopy, genotype.
	
Optional:
	-threads INT:   number of threads
	--centromeric:  calling haplotype only on the centromeric genes
	--telomeric:    calling haplotype only on the telomeric genes
	(check manual for all the optionals)

Example. Attention, this function only works with higher sample sizes.

kir-mapper haplotype -output /home/USER/output --centromeric

This function will call full haplotypes (all variants will be phased) using shapeit4. Afterward, it generates the predicted sequences for each gene and sample, comparing them with those in the IPD-IMGT/KIR database.

The outputs are phased VCFs and reports with the detected alleles for every sample.

All the SNPs are reported in the context of the hg38 reference genome. For genes that are not annotated in the primary sequence of chr19 (e.g. KIR2DL5), reads are aligned and reported in an alternative contig.

Back to Summary



Practical notes

Custom database, with alleles that are not in the IMGT/HLA database

For now, it is not possible to add new alleles to kir-mapper. We will update the database regularly. Please contact the author if you need to add something.

Evaluating samples from different ancestry backgrounds

We do not recommend applying kir-mapper ncopy to samples from different populations. The thresholds are quite different among Europeans, Africans, and Asians, for instance. In our tests, we applied kir-mapper map and kir-mapper copy to all European or African samples separately. After that, we grouped all samples by using kir-mapper group to create a kir-mapper output with all samples before running kir-mapepr genotype.

Version history

Version 1.01, December 2024

**New features**
Included support for calling an alternative configuration file (-config).
Included the Dockerfile and support to Docker
Picard tools is automatically detected by the setup function.

Version 1.0, November 2024, first kir-mapper release.

Support

Create a GitHub issue.

About

kir-mapper is a toolkit for calling SNPs, alleles, and haplotypes for KIR genes from short-read second-generation sequencing (NGS) data.

Resources

License

Stars

Watchers

Forks

Languages