Skip to content

Commit

Permalink
[MRG] Adding threshold-bp and scaled relationship to faqs (#2930)
Browse files Browse the repository at this point in the history
This pull request is in response and fixes #2929 

I have adding a short dialogue about threshold-bp as referenced in other
documents as well as how I understand its function.
  • Loading branch information
ccbaumler committed Jan 17, 2024
1 parent bc00e28 commit ccb1d49
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions doc/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,25 @@ them with a lower scaled value.

Please also see [What resolution should my signatures be?](using-sourmash-a-guide.md#what-resolution-should-my-signatures-be-how-should-i-create-them).

## What threshold-bp value should I use with `sourmash prefetch` and `sourmash gather`?

The parameter `--threshold-bp` sets the minimum estimated overlap for reporting
a match, in both the `gather` and `prefetch` commands. The default is 50kb, and
this works well for microbial-genome-scale work, where the genomes are often
quite large (one or more megabases).

In case you need more sensitivity, setting `--threshold-bp=0` will return any
match that shares at least one hash. This will also increase potential
false positives, however.

We have found a good intermediate threshold is 3 times the `scaled` value, e.g.
`--threshold-bp=3000` for a scaled value of 1000. This requires at least three
overlapping hashes before a match is reported. If you are using a lower scaled
value (a higher density sketch) because you are looking for matches between
shorter sequences, then setting threshold-bp to 3 times that scaled value will
take advantage of the increased sensitivity to short matches without introducing
more false positives.

## How do k-mer-based analyses compare with read mapping?

tl;dr very well! But it's a bit one sided: if k-mers match, reads will
Expand Down

0 comments on commit ccb1d49

Please sign in to comment.