Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a tutorial of detecting specific pathogen in sequencing data #31

Closed
shenwei356 opened this issue Jun 8, 2023 · 1 comment
Closed
Labels
documentation Improvements or additions to documentation

Comments

@shenwei356
Copy link
Owner

Sample data:

Creating a KMCP database:

# split reference genomes into 10 chunks with 150-bp overlaps
kmcp compute -k 21 -n 10 -l 150 -I refs/ -O refs-n10-l150

# index with a small FPR for small genomes
kmcp index -f 0.001 -I refs-n10-l150/ -O refs.kmcp

Searching reads against the KMCP database:

kmcp search -d refs.kmcp/ testdata.fq.gz -o testdata.fq.gz.kmcp.tsv.gz

23:19:42.530 [INFO] processed queries: 676694, speed: 32.606 million queries per minute
23:19:42.530 [INFO] 8.0837% (54702/676694) queries matched

Profiling:

# --level strain is used when no taxonomy is given.
# some preset profiling modes are available.
kmcp profile --level strain testdata.fq.gz.kmcp.tsv.gz \
    | tee profile.tsv

csvtk cut -t -f ref,percentage,coverage,score,chunksFrac,reads profile.tsv \
    | csvtk pretty -t
ref           percentage   coverage     score    chunksFrac   reads
-----------   ----------   ----------   ------   ----------   -----
NC_045512.2   100.000000   275.461793   100.00   1.00         54702

coverage is the vertical coverage or depth, score is a similarity score, and chunksFrac is the horizontal coverage of the genome.

@shenwei356 shenwei356 added the documentation Improvements or additions to documentation label Jun 8, 2023
@shenwei356
Copy link
Owner Author

shenwei356 commented Jun 9, 2023

Added: https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens/

KMCP v0.9.3 or later versions is needed, which fixed a bug in chunk computation when splitting circular genomes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant