Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bismark_methylation_extractor multicore option #52

Closed
avilella opened this issue Jul 1, 2016 · 3 comments
Closed

bismark_methylation_extractor multicore option #52

avilella opened this issue Jul 1, 2016 · 3 comments

Comments

@avilella
Copy link

avilella commented Jul 1, 2016

Hi,

I am running bismark_methylation_extractor on a machine instance with 8 CPUs and 15GB of RAM. I set it up so that it runs with --multicore 8 option and 10G of RAM for sorting.

I am looking at the logs (see below), and it's only using 13% CPU. Could it be I am missing a parameter here? Is there a way to make it faster or more efficient?

++ ./bismark_v0.15.0/bismark_methylation_extractor -s --ignore 10 --cytosine_report --bedGraph --gzip --multicore 8 --buffer_size 10G file.bam --genome_folder input_indexes/
 *** Bismark methylation extractor version v0.15.0 ***
Summarising Bismark methylation extractor parameters:
===============================================================
Bismark single-end SAM format specified (default)
Number of cores to be used: 8
First 10 bp will be disregarded when processing the methylation call string
Output will be written to the current directory ('/home/dnanexus')
Summarising bedGraph parameters:
[...]
Stored sequence information of 2580 chromosomes/scaffolds in total
==============================================================================
Methylation information will now be written into a genome-wide cytosine report
==============================================================================
>>> Writing genome-wide cytosine report to: CEG40-111-P027B000idx-1_S1_L001_R1_001.1.cutB_bismark_bt2_se.barcoded.CpG_report.txt.gz <<<
Writing cytosine report for chromosome chr1 (stored 16706 different covered positions)
CPU: 13% (8 cores) * Memory: 4457/15039MB * Storage: 129GB free * Net: 15ᅢᄁニモ/0ᅢᄁニムMBpsJul 1, 2016 8:55 AM
CPU: 13% (8 cores) * Memory: 4456/15039MB * Storage: 129GB free * Net: 15ᅢᄁニモ/0ᅢᄁニムMBps
Writing cytosine report for chromosome chr2 (stored 13873 different covered positions)Jul 1, 2016 8:58 AM
Writing cytosine report for chromosome chr3 (stored 9621 different covered positions)Jul 1, 2016 9:02 AM
CPU: 13% (8 cores) * Memory: 4457/15039MB * Storage: 129GB free * Net: 0MBpsJul 1, 2016 9:05 AM
CPU: 13% (8 cores) * Memory: 4457/15039MB * Storage: 129GB free * Net: 0MBps
Writing cytosine report for chromosome chr4 (stored 8915 different covered positions)Jul 1, 2016 9:06 AM
Writing cytosine report for chromosome chr5 (stored 10201 different covered positions)Jul 1, 2016 9:09 AM
Writing cytosine report for chromosome chr6 (stored 9174 different covered positions)Jul 1, 2016 9:12 AM
CPU: 13% (8 cores) * Memory: 4456/15039MB * Storage: 129GB free * Net: 0MBpsJul 1, 2016 9:15 AM
CPU: 13% (8 cores) * Memory: 4456/15039MB * Storage: 129GB free * Net: 0MBps
[...]
@FelixKrueger
Copy link
Owner

Hi Albert, the multi-core option works on extracting the methylation information from the BAM file. The subsequent bedGraph sorting and cytosine report step do in fact run on a single core. It would appear that your status report is generated at the cytosine report stage, and 13% might be just about 1 core at 100% on an 8 core machine, would that make sense? This step iterates through the genomic sequence, identifies every C and looks up whether or not the C was found in the coverage file. I theory this could be made more parallel, e.g. by doing it for each chromosome separately, but we did't find it very pressing to work on the efficiency of this step.

@avilella
Copy link
Author

avilella commented Jul 1, 2016

Gotcha. So I expect 100% usage in the bam reading step, then after that, 1
CPU.

Definitely going to create a new feature request for this ;-)

On Fri, Jul 1, 2016 at 9:31 AM, FelixKrueger notifications@github.com
wrote:

Hi Albert, the multi-core option works on extracting the methylation
information from the BAM file. The subsequent bedGraph sorting and cytosine
report step do in fact run on a single core. It would appear that your
status report is generated at the cytosine report stage, and 13% might be
just about 1 core at 100% on an 8 core machine, would that make sense? This
step iterates through the genomic sequence, identifies every C and looks up
whether or not the C was found in the coverage file. I theory this could be
made more parallel, e.g. by doing it for each chromosome separately, but we
did't find it very pressing to work on the efficiency of this step.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#52 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAJpN5E5yoBy6i0iYU4eynHc8TORf_g2ks5qRNBXgaJpZM4JC6zt
.

@FelixKrueger
Copy link
Owner

Considering this question resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants