Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quasimapping with multiple threads odd behaviour #127

Closed
iqbal-lab opened this issue Jun 4, 2018 · 15 comments
Closed

Quasimapping with multiple threads odd behaviour #127

iqbal-lab opened this issue Jun 4, 2018 · 15 comments
Labels

Comments

@iqbal-lab
Copy link
Collaborator

iqbal-lab commented Jun 4, 2018

I'm running on a single dedicated server (not shared)

Running with this commit

{
    "version_number": "0.5.0",
    "last_git_commit_hash": "d8a3082a921579e65081fa1932c42c4f2fb7953a",
    "truncated_git_commits": [
        "d8a3082 - Robyn Ffrancon, 1527688551 : enhancement: build command optionally skips building PRG",
        "2dac562 - Robyn Ffrancon, 1527601335 : enhancement: quasimap commands ensures that build command executed successfully",
        "760b759 - Robyn Ffrancon, 1527599820 : enhancement: build stops and returns non-zero if no variants sites found in prg",
        "f3b8cff - Robyn Ffrancon, 1527597315 : enhancment: removed unused skip optimisation code",
        "e22cd4f - Robyn Ffrancon, 1527590325 : fix: SA indexes associated with correct site-allele paths for allele encapsulated mappings"
    ]
}

With k=5, and 8 threads, quasimapping of a fixed fastq to a fixed PRG takes 1 hr 45 mins.
Output showed 67 million reads processed

With k=7 and 8 threads, it is still running 15 hours after starting, output shows it has processed 37 million (last print out was 2 hours ago)

Machine: ebi7-017
Command
gramtools quasimap --gram-directory results/gramk7 --reads fastq/out.fq.gz --max-threads=8 2>> error_qmapk7_thread8 1>> output_qmapk7_thread8

pwd
/tmp/benchmarking

@iqbal-lab
Copy link
Collaborator Author

iqbal-lab commented Jun 5, 2018

This process eventually finished after 25 hours

k7 (8 thread) results
Count all reads: 67115296
Count skipped reads: 630240
Count mapped reads: 185418

Timer report:
seconds
Load data 38.42
Quasimap 512413
Total elapsed time: 512452

For comparison, k5 results (8 threads)

Count all reads: 67115296
Count skipped reads: 630240
Count mapped reads: 76199

Timer report:
seconds
Load data 37.32
Quasimap 29117.6

Different number of mapped reads surprised me a bit

@iqbal-lab
Copy link
Collaborator Author

I've restarted a new run, to see if it is reproducible

@ffranr ffranr added the bug label Jun 7, 2018
@iqbal-lab
Copy link
Collaborator Author

iqbal-lab commented Jun 11, 2018

Also seen here #117
going from k5 to 10, time taken goes from 1.75 to about 7 hours (8 threads) . See bottom tables in main description of the issue.

@iqbal-lab
Copy link
Collaborator Author

With k11, using the standard Plasmodium PRG

/nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/gram_k11/

Quasimap 4.7 million 150bp reads with 1 thread takes 187,000 seconds (40 hours).

Quasimapping with 8 threads takes 45,000 seconds (12.5 hours).

Maybe this can be closed

@ffranr
Copy link
Contributor

ffranr commented Oct 9, 2018

@iqbal-lab Was the number of mapped reads consistent between runs?

@iqbal-lab
Copy link
Collaborator Author

Precisely the same!

@ffranr
Copy link
Contributor

ffranr commented Oct 9, 2018

@iqbal-lab In a previous comment you showed that the number of mapped reads was erroneously inconsistent between kmer sizes (when using multiple threads?). Do we know if that issue persists?

@iqbal-lab
Copy link
Collaborator Author

I am now running at k13, will confirm

@ffranr
Copy link
Contributor

ffranr commented Oct 15, 2018

With k11, using the standard Plasmodium PRG
/nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/gram_k11/
Quasimap 4.7 million 150bp reads with 1 thread takes 187,000 seconds (40 hours).
Quasimapping with 8 threads takes 45,000 seconds (12.5 hours).
Maybe this can be closed

@iqbal-lab Where can I find the quasimap output directory for the above please?

@iqbal-lab
Copy link
Collaborator Author

iqbal-lab commented Oct 15, 2018

/nfs/research1/zi/zi/analysis/2018/0920_test_gramtools_for_leffler/quasimapk11

@bricoletc
Copy link
Member

Tested multi-threading of quasimap by mapping 250,000 reads (so 500,000 with the reverse complements) from (yoda) /nfs/leia/research/iqbal/bletcher/Pf_benchmark/all_reads/original/PG0496-C.bam to (big) pf3k prg.

Results:
Threading_Pf
Time in seconds.

So here multi-threading gives no speedup.

All runs produced consistent results for reads mapped:

Count all reads: 500000
Count skipped reads: 190
Count mapped reads: 204169

@iqbal-lab
Copy link
Collaborator Author

Can you give exact command line and lsf command?

@bricoletc
Copy link
Member

Can you give exact command line and lsf command?

bsub -R select[mem>60000] rusage[mem=60000] -M60000 -J threads_4 -n 4 -o /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/logs/t4_k8.o \
-e /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/logs/t4_k8.e singularity exec /nfs/leia/research/iqbal/bletcher/Singularity_Images/8b46a86_gramtools.img gramtools quasimap \
--gram-dir /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/gram_k8 --run-dir /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/run_t4 \
--reads /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/subsetted_reads/PG0496-C.trim.fq.1.1MSubset.gz --max-threads 4

@bricoletc
Copy link
Member

UPDATE:
The previous plot shows total CPU time and not elapsed real (wall clock) time.
Looking at wall clock time we see multi-threading really is working:
Wall_clock_mthreads

Of interest, k=8 achieves a 4.1-fold speedup compared to only 2.7-fold speedup on k=11, going from 1 thread to 10 threads.

@leoisl
Copy link

leoisl commented Apr 11, 2019

Hello!

Nice! Thanks for this! If it is not too much to ask, could you also make an additional plot please? I think one showing the true speed-up and the theoretical best speed-up (see for e.g. https://stackoverflow.com/questions/26514264/plot-speed-up-curve-vs-number-of-openmp-threads-scalability ) can be useful for us to know if we should try to improve multithreading or not!

Thanks!

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants