Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OMP parallellism generates wrong strobe<--> read associations in the index #14

Open
blinard-BIOINFO opened this issue Apr 4, 2024 · 1 comment

Comments

@blinard-BIOINFO
Copy link

@ksahlin

Fix: we had to comment this line in main.cpp to cancel parallelism:

//#pragma omp parallel for num_threads(n_threads) shared(read_cnt, output_file, q_id) private(acc,seq_rc, query_mers,query_mers_rc)

Bug observed with parallelism:

  • For a set of 100 reads of 100bp, we expected around 100 strobemers associated to each query in the computed index.
  • Observation: One read was associated to 300 strobemers in the index (3 times more than expected), two reads were associated to only ~50 strobemers.

It seems that the omp pragma is incorrect and variables q_id or read_cnt have the wrong value or are read incorrectly by the threads (probably updated by another threads beforehand?).

As a consequence, the full set of strobemers that should be associated to a given query can be actually partially associated to a other queries in the final index. Because is it related to parallelism, it is hard to reproduce with an example.

Maybe variables used query identifiers should be made more atomic ?

@ksahlin
Copy link
Owner

ksahlin commented Apr 8, 2024

I see, thanks for the report!

In more optimized implementations of the multiple query mapping problem we have removed OpenMP and are instead relying on std::thread - which makes the parallelisation more efficient, especially as the number of cores increase, using a "single producer multiple consumers" solution, e.g., as in strobealign.

Nevertheless, it is good to know this for future projects using the implementation in this repository. The easy way out for me is to, like you, deactivate parallelisation in this repo. I may add a PC solution later if I have time or find someone to implement it with threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants