bismark_methylation_extractor multicore option #52

avilella · 2016-07-01T08:18:27Z

Hi,

I am running bismark_methylation_extractor on a machine instance with 8 CPUs and 15GB of RAM. I set it up so that it runs with --multicore 8 option and 10G of RAM for sorting.

I am looking at the logs (see below), and it's only using 13% CPU. Could it be I am missing a parameter here? Is there a way to make it faster or more efficient?

++ ./bismark_v0.15.0/bismark_methylation_extractor -s --ignore 10 --cytosine_report --bedGraph --gzip --multicore 8 --buffer_size 10G file.bam --genome_folder input_indexes/
 *** Bismark methylation extractor version v0.15.0 ***
Summarising Bismark methylation extractor parameters:
===============================================================
Bismark single-end SAM format specified (default)
Number of cores to be used: 8
First 10 bp will be disregarded when processing the methylation call string
Output will be written to the current directory ('/home/dnanexus')
Summarising bedGraph parameters:
[...]
Stored sequence information of 2580 chromosomes/scaffolds in total
==============================================================================
Methylation information will now be written into a genome-wide cytosine report
==============================================================================
>>> Writing genome-wide cytosine report to: CEG40-111-P027B000idx-1_S1_L001_R1_001.1.cutB_bismark_bt2_se.barcoded.CpG_report.txt.gz <<<
Writing cytosine report for chromosome chr1 (stored 16706 different covered positions)
CPU: 13% (8 cores) * Memory: 4457/15039MB * Storage: 129GB free * Net: 15ￃﾢﾆﾓ/0ￃﾢﾆﾑMBpsJul 1, 2016 8:55 AM
CPU: 13% (8 cores) * Memory: 4456/15039MB * Storage: 129GB free * Net: 15ￃﾢﾆﾓ/0ￃﾢﾆﾑMBps
Writing cytosine report for chromosome chr2 (stored 13873 different covered positions)Jul 1, 2016 8:58 AM
Writing cytosine report for chromosome chr3 (stored 9621 different covered positions)Jul 1, 2016 9:02 AM
CPU: 13% (8 cores) * Memory: 4457/15039MB * Storage: 129GB free * Net: 0MBpsJul 1, 2016 9:05 AM
CPU: 13% (8 cores) * Memory: 4457/15039MB * Storage: 129GB free * Net: 0MBps
Writing cytosine report for chromosome chr4 (stored 8915 different covered positions)Jul 1, 2016 9:06 AM
Writing cytosine report for chromosome chr5 (stored 10201 different covered positions)Jul 1, 2016 9:09 AM
Writing cytosine report for chromosome chr6 (stored 9174 different covered positions)Jul 1, 2016 9:12 AM
CPU: 13% (8 cores) * Memory: 4456/15039MB * Storage: 129GB free * Net: 0MBpsJul 1, 2016 9:15 AM
CPU: 13% (8 cores) * Memory: 4456/15039MB * Storage: 129GB free * Net: 0MBps
[...]

The text was updated successfully, but these errors were encountered:

FelixKrueger · 2016-07-01T08:31:19Z

Hi Albert, the multi-core option works on extracting the methylation information from the BAM file. The subsequent bedGraph sorting and cytosine report step do in fact run on a single core. It would appear that your status report is generated at the cytosine report stage, and 13% might be just about 1 core at 100% on an 8 core machine, would that make sense? This step iterates through the genomic sequence, identifies every C and looks up whether or not the C was found in the coverage file. I theory this could be made more parallel, e.g. by doing it for each chromosome separately, but we did't find it very pressing to work on the efficiency of this step.

avilella · 2016-07-01T08:35:06Z

Gotcha. So I expect 100% usage in the bam reading step, then after that, 1
CPU.

Definitely going to create a new feature request for this ;-)

On Fri, Jul 1, 2016 at 9:31 AM, FelixKrueger notifications@github.com
wrote:

Hi Albert, the multi-core option works on extracting the methylation
information from the BAM file. The subsequent bedGraph sorting and cytosine
report step do in fact run on a single core. It would appear that your
status report is generated at the cytosine report stage, and 13% might be
just about 1 core at 100% on an 8 core machine, would that make sense? This
step iterates through the genomic sequence, identifies every C and looks up
whether or not the C was found in the coverage file. I theory this could be
made more parallel, e.g. by doing it for each chromosome separately, but we
did't find it very pressing to work on the efficiency of this step.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#52 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAJpN5E5yoBy6i0iYU4eynHc8TORf_g2ks5qRNBXgaJpZM4JC6zt
.

FelixKrueger · 2016-07-01T18:33:40Z

Considering this question resolved.

FelixKrueger closed this as completed Jul 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bismark_methylation_extractor multicore option #52

bismark_methylation_extractor multicore option #52

avilella commented Jul 1, 2016

FelixKrueger commented Jul 1, 2016

avilella commented Jul 1, 2016

FelixKrueger commented Jul 1, 2016

bismark_methylation_extractor multicore option #52

bismark_methylation_extractor multicore option #52

Comments

avilella commented Jul 1, 2016

FelixKrueger commented Jul 1, 2016

avilella commented Jul 1, 2016

FelixKrueger commented Jul 1, 2016