Parallelization of `sourmash search` #2071

mr-eyes · 2022-06-01T14:21:10Z

I am doing an experiment that will require searching thousands of signatures (wort) and was thinking if there's a possibility to implement a parallel version of sourmash search to speed up the process. Maybe integrating the multi-processing Rust code would be great, or adding parallelization to the current Python code.

Relevent: #2069 #2066

The text was updated successfully, but these errors were encountered:

ctb · 2022-06-01T14:37:55Z

hi @mr-eyes parallel search is not yet implemented in the Rust code included in the main sourmash codebase; see greyhound for that: #1752. Might not fit this use case anyway, since some of the individual metagenomes are quite large and the greyhound technique involves loading multiple of them into memory at once. Not sure.

@luizirber implemented a different approach in MAGsearch; see http://ivory.idyll.org/blog/2021-MAGsearch.html and https://blog.luizirber.org/2020/07/24/mag-results/ for background. The sra_search code loads many query genomes/metagenomes into memory and then does a parallel search against 100s of thousands of signatures.

Note that it only performs containment analyses, and not Jaccard similarity.

Somewhere in there, either Luiz or @bluegenes put together a snakemake setup that is working quite well for me, at least. My copy is on farm at ~ctbrown/scratch/magsearch. I think I use the command

snakemake -s magsearch.snakefile --configfile config-seaphage.yml -j 48

to run it.

Note that it's extremely disk intensive so we try to avoid running it with more than 48 threads / more than one at a time on the cluster.

ctb · 2023-09-04T13:24:26Z

https://github.com/sourmash-bio/pyo3_branchwater now covers some of this - see manysearch and multisearch.

ctb mentioned this issue Feb 5, 2023

sourmash sketch & search use one thread only #2458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization of `sourmash search` #2071

Parallelization of `sourmash search` #2071

mr-eyes commented Jun 1, 2022 •

edited

ctb commented Jun 1, 2022 •

edited

ctb commented Sep 4, 2023

Parallelization of sourmash search #2071

Parallelization of sourmash search #2071

Comments

mr-eyes commented Jun 1, 2022 • edited

ctb commented Jun 1, 2022 • edited

ctb commented Sep 4, 2023

Parallelization of `sourmash search` #2071

Parallelization of `sourmash search` #2071

mr-eyes commented Jun 1, 2022 •

edited

ctb commented Jun 1, 2022 •

edited