hapBlocker

(Work Related)

Algorithm to output haplotype blocks across a population in the manner of a mosaic-style plot.

Overview

Given a VCF with samples from a population, it is desirable to find the genomic regions (blocks) where there is a high percentage of the same SNPs between samples. This algorithm takes in a VCF from a population and outputs these blocks in the manner of a mosaic-style plot.

This 'mosaic-style' means that the first sample will have one block with it's own haplotype for each chromosome. For the remaining samples, the algorithm compares the SNPs from the current sample to the above samples and if there are enough identical SNPs (and other criteria passes), the current sample will declare a block with the above sample's haplotype. Precedence for comparison is given to the top sample, working downward. If no above sample passes the criteria, a block is declared with the sample's haplotype.

Usage & Parameters

python3 hapBlocker.py vcf.gz index.gz.[csi,tbi] samples min_snps percent-threshold comparators

Parameters (all required):

vcf.gz: gzipped VCF file. Note that chromosomes must be named such that 'Chr' is included
index.gz: gzipped VCF index file in csi or tbi format
samples: path to a file that lists the samples, each on their own line. The names of the samples must match the names in the VCF file. If wanting to visualize, list the samples in the order they should appear in the plot
min_snps: the number of SNPs that must be compared before a block can be declared
percent-threshold: the percent of identical SNPs across 'min_snps' needed to declare a block
comparators: the first X rows in the 'samples' file that should be compared to. For example, if you have many samples and only want to see blocks with parental haplotypes, choose 2 (assuming parents are the first 2 samples in the 'samples' file). If you want samples to be compared to every sample above it, choose the number of samples-1

Output

The blocks are outputted to stdout in the following format:

Chr Start End Block# Sample Haplotype

The output is tab-delimited, where 'Block#' refers to the nth outputted block for the sample

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
hapBlocker.py		hapBlocker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

hapBlocker

Overview

Usage & Parameters

Output

About

Uh oh!

Releases

Packages

Languages

mcbbaker/hapBlocker

Folders and files

Latest commit

History

Repository files navigation

hapBlocker

Overview

Usage & Parameters

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages