You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run the MAE and rnaVariantCalling and I am getting a OOM error in markDuplicates (see below).
I am submitting this as a slurm job and I allocated 10 cores and 180Gb for the last run. I do not recall (I may be wrong) having to allocate more memory when running a GATK based pipeline for RNAseq data.
Should I just allocate more memory or use the config file to manage it ?
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job markDuplicates since they might be corrupted:
DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bai
Below is the end of the sample specific log file DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log
INFO 2024-02-05 23:22:39 MarkDuplicates Sorting list of duplicate records.
INFO 2024-02-05 23:22:42 MarkDuplicates After generateDuplicateIndexes freeMemory: 17094205248; totalMemory: 25199378432; maxMemory: 32178700288
INFO 2024-02-05 23:22:42 MarkDuplicates Marking 29761681 records as duplicates.
INFO 2024-02-05 23:22:42 MarkDuplicates Found 3318 optical duplicate clusters.
INFO 2024-02-05 23:22:42 MarkDuplicates Reads are assumed to be ordered by: coordinate
INFO 2024-02-05 23:23:26 MarkDuplicates Written 10,000,000 records. Elapsed time: 00:00:44s. Time for last 10,000,000: 44s. Last read position: chr4:73,408,762
INFO 2024-02-05 23:24:10 MarkDuplicates Written 20,000,000 records. Elapsed time: 00:01:28s. Time for last 10,000,000: 44s. Last read position: chr8:108,203,053
INFO 2024-02-05 23:25:00 MarkDuplicates Written 30,000,000 records. Elapsed time: 00:02:18s. Time for last 10,000,000: 49s. Last read position: chr14:94,378,547
INFO 2024-02-05 23:25:44 MarkDuplicates Written 40,000,000 records. Elapsed time: 00:03:02s. Time for last 10,000,000: 43s. Last read position: chr19:58,355,146
INFO 2024-02-05 23:26:18 MarkDuplicates Written 50,000,000 records. Elapsed time: 00:03:36s. Time for last 10,000,000: 33s. Last read position: chrM:8,968
INFO 2024-02-05 23:26:40 MarkDuplicates Writing complete. Closing input iterator.
INFO 2024-02-05 23:26:40 MarkDuplicates Duplicate Index cleanup.
INFO 2024-02-05 23:26:40 MarkDuplicates Getting Memory Stats.
INFO 2024-02-05 23:26:40 MarkDuplicates Before output close freeMemory: 288414536; totalMemory: 335544320; maxMemory: 32178700288
INFO 2024-02-05 23:26:40 MarkDuplicates Closed outputs. Getting more Memory Stats.
INFO 2024-02-05 23:26:40 MarkDuplicates After output close freeMemory: 188927600; totalMemory: 234881024; maxMemory: 32178700288
[Mon Feb 05 23:26:40 EST 2024] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 9.16 minutes.
Runtime.totalMemory()=234881024
Using GATK jar $HOME/.conda/envs/drop_env_133/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar $HOME/.conda/envs/drop_env_133/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar MarkDuplicates -I DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam -O DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam -M DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/picard-tools-marked-dup-metrics.txt --CREATE_INDEX true --TMP_DIR /tmp --VALIDATION_STRINGENCY SILENT
The text was updated successfully, but these errors were encountered:
UPDATE:
I issued the same "gatk MarkDuplicates" command as in the log file, using an interactive node with only 32G of memory and it completed.
May be the problem is with DROP/snakemake default settings for memory management, but I am not sure how to change that. Any suggestions ?
Hi, I think there was an issue with the maskMultiVCF because a path couldn't be accessed. It is working now. It could have been that.
180 Gb for 10 samples should be more than enough.
You could add specific resource allocations to the headers of the scripts.
Hi,
I am trying to run the MAE and rnaVariantCalling and I am getting a OOM error in markDuplicates (see below).
I am submitting this as a slurm job and I allocated 10 cores and 180Gb for the last run. I do not recall (I may be wrong) having to allocate more memory when running a GATK based pipeline for RNAseq data.
Should I just allocate more memory or use the config file to manage it ?
Thanks
Error in rule markDuplicates:
jobid: 247
input: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam.bai
output: DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bai
log:
DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log (check log file(s) for error details)
shell:
DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/picard-tools-marked-dup-metrics.txt --CREATE_INDEX true --TMP_DIR "/tmp" --VALIDATION_STRINGENCY SILENT 2> DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log
Removing output files of failed job markDuplicates since they might be corrupted:
DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam, DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bai
Below is the end of the sample specific log file DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/logs/markDuplicates/661T.log
INFO 2024-02-05 23:22:39 MarkDuplicates Sorting list of duplicate records.
INFO 2024-02-05 23:22:42 MarkDuplicates After generateDuplicateIndexes freeMemory: 17094205248; totalMemory: 25199378432; maxMemory: 32178700288
INFO 2024-02-05 23:22:42 MarkDuplicates Marking 29761681 records as duplicates.
INFO 2024-02-05 23:22:42 MarkDuplicates Found 3318 optical duplicate clusters.
INFO 2024-02-05 23:22:42 MarkDuplicates Reads are assumed to be ordered by: coordinate
INFO 2024-02-05 23:23:26 MarkDuplicates Written 10,000,000 records. Elapsed time: 00:00:44s. Time for last 10,000,000: 44s. Last read position: chr4:73,408,762
INFO 2024-02-05 23:24:10 MarkDuplicates Written 20,000,000 records. Elapsed time: 00:01:28s. Time for last 10,000,000: 44s. Last read position: chr8:108,203,053
INFO 2024-02-05 23:25:00 MarkDuplicates Written 30,000,000 records. Elapsed time: 00:02:18s. Time for last 10,000,000: 49s. Last read position: chr14:94,378,547
INFO 2024-02-05 23:25:44 MarkDuplicates Written 40,000,000 records. Elapsed time: 00:03:02s. Time for last 10,000,000: 43s. Last read position: chr19:58,355,146
INFO 2024-02-05 23:26:18 MarkDuplicates Written 50,000,000 records. Elapsed time: 00:03:36s. Time for last 10,000,000: 33s. Last read position: chrM:8,968
INFO 2024-02-05 23:26:40 MarkDuplicates Writing complete. Closing input iterator.
INFO 2024-02-05 23:26:40 MarkDuplicates Duplicate Index cleanup.
INFO 2024-02-05 23:26:40 MarkDuplicates Getting Memory Stats.
INFO 2024-02-05 23:26:40 MarkDuplicates Before output close freeMemory: 288414536; totalMemory: 335544320; maxMemory: 32178700288
INFO 2024-02-05 23:26:40 MarkDuplicates Closed outputs. Getting more Memory Stats.
INFO 2024-02-05 23:26:40 MarkDuplicates After output close freeMemory: 188927600; totalMemory: 234881024; maxMemory: 32178700288
[Mon Feb 05 23:26:40 EST 2024] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 9.16 minutes.
Runtime.totalMemory()=234881024
Using GATK jar $HOME/.conda/envs/drop_env_133/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar $HOME/.conda/envs/drop_env_133/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar MarkDuplicates -I DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.out.bam -O DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/bam/661T/661T_Aligned.sortedByCoord.dupMarked.out.bam -M DROP/Analysis_60M_60F_ExternalSamples/processed_data/rnaVariantCalling/out/picard-tools-marked-dup-metrics.txt --CREATE_INDEX true --TMP_DIR /tmp --VALIDATION_STRINGENCY SILENT
The text was updated successfully, but these errors were encountered: