Spaced seeds improve k-mer-based metagenomic classification
This repository contains all supplementary material for paper Spaced seeds improve k-mer-based metagenomic classification by K.Brinda, M.Sykulski, G.Kucherov. Current version available at http://arxiv.org/abs/1502.06256.
Snakemake scripts used in Section 3.1.2 (Classifying unaligned reads) and 3.3 (Correlation on real genomes) are available here.
rank.cor.seed..weight.*.pdf- Spearman rank correlation between alignment (dis)similarity and score (hits or coverage), alignment (read) lengths 100 and 250, various spaced seed weights
relative.mutual.information..seed.weight.*.pdf- mutual information divided by entropy is ploted as a measure of interdependence between alignment (dis)similarity and score (hits or coverage)
- smooth.scatter..spaced.vs.contig.pdf - scatter plots of alignment (dis)similarity vs score (hits or coverage), alignment length 100
- smooth.scatter..spaced.vs.contig.zoom.rl100.pdf - as above zoomed region,
3 report files with scatter plots of alignment (dis)similarity vs score (hits or coverage), plots in several flavors, experiments on 3 real genomes.
Plots comparing seed-Kraken with original Kraken, performance and sensitivity on several data sets, spaced seeds of various weights and spans, tables with all results and used seeds.