Quick and dirty python script to plot the different types of mutations by sample. The input is a .vcf
file.
argparse
to parse CLI argumentsscikit-allel
to parse.vcf
filespandas
andnumpy
to create and manipulate data tablesmatplotlib
to plot the results
- homref: homozygous position that matches the reference sequence
- het: heterozygous position
- homalt: homozygous position that does not match the reference sequence
There are two positional arguments to be supplied. The first is the input .vcf
file and the second is the stem name of the output files.
e.g. python plotmut.py input.vcf outstem
Two tab-separated tables, one of which contains the raw counts of different mutations, and the other contains their ratios. The homref, het and homalt values are relative to all positions. The ratios in the rest of columns are relative to the number of mutations (i.e. homref is exluded). The number of total and variable positions are counted for each sample and is used to assess the ratios of different mutations.
.pdf
files showing the count, the ratio of hom and het regions relative to all positions, the ratio of different transitions and the ratio of different transversions relative to variable positions.
- Not very fast
- Only uses diploid sites and SNPs (i.e. indels and MNPs are excluded)