Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sbt_search error: mismatch in max hash; comparison fail #146

Closed
brooksph opened this issue Mar 13, 2017 · 8 comments · Fixed by #183
Closed

sbt_search error: mismatch in max hash; comparison fail #146

brooksph opened this issue Mar 13, 2017 · 8 comments · Fixed by #183
Labels

Comments

@brooksph
Copy link
Contributor

brooksph commented Mar 13, 2017

When using sbt_search with microbes.k31.sbt.json sourmash spits out an error. I calculated a signature for a single genome using sourmash compute -k 31 --scaled 10000 *fna.gz. The output of this works fine with sbt_gather but fails with sbt_search.

../dev/sourmash sbt_search -k 31 --dna microbes.k31.sbt.json ../fasta/herpetosiphon_geysericola.fa.gz.sig --threshold=0.001 --best-only

# running sourmash subcommand: sbt_search
loaded query: herpetosiphon_geysericola.fa.g... (k=31, DNA)
Traceback (most recent call last):
  File "/Users/philliptbrooks/dev/sourmash/sourmash", line 5, in <module>
    main()
  File "/Users/philliptbrooks/dev/sourmash/sourmash_lib/__main__.py", line 56, in main
    cmd(sys.argv[2:])
  File "/Users/philliptbrooks/dev/sourmash/sourmash_lib/commands.py", line 572, in sbt_search
    for leaf in tree.find(search_fn, query, args.threshold):
  File "/Users/philliptbrooks/dev/sourmash/sourmash_lib/sbt.py", line 155, in find
    if search_fn(node_g, *args):
  File "/Users/philliptbrooks/dev/sourmash/sourmash_lib/sbtmh.py", line 37, in search_minhashes
    matches = node.data.estimator.count_common(sig.estimator)
  File "/Users/philliptbrooks/dev/sourmash/sourmash_lib/__init__.py", line 179, in count_common
    return self.mh.count_common(other.mh)
  File "sourmash_lib/_minhash.pyx", line 100, in sourmash_lib._minhash.MinHash.count_common (sourmash_lib/_minhash.cpp:2698)
ValueError: mismatch in max_hash; comparison fail

@ctb
Copy link
Contributor

ctb commented Mar 13, 2017 via email

@ctb
Copy link
Contributor

ctb commented Mar 17, 2017

bump.

@brooksph
Copy link
Contributor Author

@ctb I think this is a bug in sbt_search unless sbt_search and sbt_gather use scaled signatures in a different manner. I'm using the same sbt with sbt_search and sbt_gather. When I calculate a signature with --scaled 10000 and use sbt_search I get an error but sbt_gather returns the best match as expected. In contrast, if I don't calculate the signature with scaled sbt_search returns several matches. Same behavior with the -k 21, 31, and 51 sbt's. I will poke @luizirber about this.

@ctb
Copy link
Contributor

ctb commented Mar 19, 2017

sounds reasonable. can you build a small test case (i.e. using only a few .sig files and putting them in an SBT) that generates the error?

I bet if you calculate the signatures with --with-cardinality and then build an SBT with them, sbt_search with a --scaled signature will fail and sbt_gather will succeed.

@brooksph
Copy link
Contributor Author

@ctb worked just like that. Also, gather doesn't work when the query signatures are calculated with cardinality. Notebook https://github.com/brooksph/2017_sbt_search_project/blob/master/dev/Untitled.ipynb
and signatures https://github.com/brooksph/2017_sbt_search_project/tree/master/dev

@ctb
Copy link
Contributor

ctb commented Apr 19, 2017

Hi @brooksph could you check out #183 and see if it works on the full microbes SBT? thx

@luizirber luizirber moved this from TODO to Done in sourmash 2.0 May 19, 2017
@jolespin
Copy link

I'm getting a similar error. How can I fix this?

(metagenomics_env) -bash-4.1$ rm scaffolds.drop_bacteria.fasta.sig; sourmash compute --scaled 2000 --dna --seed 0 scaffolds.drop_bacteria.fasta
== This is sourmash version 2.0.1. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

setting num_hashes to 0 because --scaled is set
computing signatures for files: scaffolds.drop_bacteria.fasta
Computing signature for ksizes: [21, 31, 51]
Computing only nucleotide (and not protein) signatures.
Computing a total of 3 signature(s).
... reading sequences from scaffolds.drop_bacteria.fasta
calculated 3 signatures for 738 sequences in scaffolds.drop_bacteria.fasta
saved 3 signature(s). Note: signature license is CC0.
(metagenomics_env) -bash-4.1$ sourmash search -k 21 -o sourmash_output scaffolds.drop_bacteria.fasta.sig /usr/local/scratch/METAGENOMICS/jespinoz/db/sourmash_db/genbank-d2-k21.sbt.json --threshold 0
== This is sourmash version 2.0.1. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loaded query: scaffolds.drop_bacteria.fasta... (k=21, DNA)
loaded 1 databases.

Traceback (most recent call last):
  File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/bin/sourmash", line 11, in <module>
    sys.exit(main())
  File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/lib/python2.7/site-packages/sourmash/__main__.py", line 83, in main
    cmd(sys.argv[2:])
  File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/lib/python2.7/site-packages/sourmash/commands.py", line 857, in search
    args.best_only, args.ignore_abundance)
  File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/lib/python2.7/site-packages/sourmash/search.py", line 62, in search_databases
    for leaf in tree.find(search_fn, tree_query, threshold):
  File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/lib/python2.7/site-packages/sourmash/sbt.py", line 213, in find
    if search_fn(node_g, *args):
  File "/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/lib/python2.7/site-packages/sourmash/sbtmh.py", line 119, in search_minhashes
    score = node.data.minhash.similarity(sig.minhash)
  File "sourmash/_minhash.pyx", line 359, in sourmash._minhash.MinHash.similarity
  File "sourmash/_minhash.pyx", line 338, in sourmash._minhash.MinHash.jaccard
  File "sourmash/_minhash.pyx", line 333, in sourmash._minhash.MinHash.compare
  File "sourmash/_minhash.pyx", line 321, in sourmash._minhash.MinHash.intersection
ValueError: mismatch in seed; comparison fail

@ctb
Copy link
Contributor

ctb commented Jul 25, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants