Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with test run #154

Closed
vitorpavinato opened this issue Jan 30, 2024 · 11 comments
Closed

issue with test run #154

vitorpavinato opened this issue Jan 30, 2024 · 11 comments

Comments

@vitorpavinato
Copy link

Hi,

I got this error message when I tried to run the test code:

snakemake -d .test/ecoli --cores 1 --use-conda
[Tue Jan 30 16:31:01 2024]
Error in rule create_db_intervals:
    jobid: 21
    input: results/GCA_003018455.1/data/genome/GCA_003018455.1.fna, results/GCA_003018455.1/data/genome/GCA_003018455.1.fna.fai, results/GCA_003018455.1/data/genome/GCA_003018455.1.dict, results/GCA_003018455.1/intervals/master_interval_list.list
    output: results/GCA_003018455.1/intervals/db_intervals/intervals.txt, results/GCA_003018455.1/intervals/db_intervals
    log: logs/GCA_003018455.1/db_intervals/log.txt (check log file(s) for error details)
    conda-env: /fs/scratch/PAS1554/snpArcher/.test/ecoli/.snakemake/conda/ec2d2883921c842412450a0289e25d36_
    shell:
        
        gatk SplitIntervals -L results/GCA_003018455.1/intervals/master_interval_list.list         -O results/GCA_003018455.1/intervals/db_intervals -R results/GCA_003018455.1/data/genome/GCA_003018455.1.fna -scatter 1         -mode INTERVAL_SUBDIVISION         --interval-merging-rule OVERLAPPING_ONLY &> logs/GCA_003018455.1/db_intervals/log.txt
        ls -l results/GCA_003018455.1/intervals/db_intervals/*scattered.interval_list > results/GCA_003018455.1/intervals/db_intervals/intervals.txt
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job create_db_intervals since they might be corrupted:
results/GCA_003018455.1/intervals/db_intervals
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-01-30T163042.968256.snakemake.log

Thanks

@cademirch
Copy link
Collaborator

Hi @vitorpavinato, sorry for the trouble running the test dataset. Could you provide your Snakemake version, as well as the output from the log file specified in the your post?

@vitorpavinato
Copy link
Author

Hi @cademirch,

Sure, here is the snakemake version: 7.32.4
Where I can find the log file?

I am using a SLURM cluster and I ran on scratch.
I try to find the file with

find / -n 2024-01-30T163042.968256.snakemake.log 2>/dev/null

But I didn't return anything.

Thanks for the prompt response.

@cademirch
Copy link
Collaborator

cademirch commented Jan 31, 2024 via email

@vitorpavinato
Copy link
Author

Great,

Here is what I got from the log file:

Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job                     count
--------------------  -------
all                         1
bam_sumstats                2
bwa_map                     1
callable_bed                2
collect_covstats            1
collect_fastp_stats         2
collect_sumstats            2
compute_d4                  1
create_cov_bed              2
create_db_intervals         2
dedup                       1
fastp                       1
format_interval_list        1
genmap                      1
index_reference             1
mappability_bed             1
merge_d4                    1
picard_intervals            1
sort_gatherVcfs             2
total                      26

Select jobs to execute...

[Tue Jan 30 16:30:52 2024]
rule fastp:
    input: data/local_fastq/my_sample1_1.fastq.gz, data/local_fastq/my_sample1_2.fastq.gz
    output: results/GCA_000008865.2/filtered_fastqs/SAMN12676327/SRR10058855_1.fastq.gz, results/GCA_000008865.2/filtered_fastqs/SAMN12676327/SRR10058855_2.fastq.gz, results/GCA_000008865.2/summary_stats/SAMN12676327/SRR10058855.fastp.out
    log: logs/GCA_000008865.2/fastp/SAMN12676327/SRR10058855.txt
    jobid: 11
    benchmark: benchmarks/GCA_000008865.2/fastp/SAMN12676327_SRR10058855.txt
    reason: Missing output files: results/GCA_000008865.2/filtered_fastqs/SAMN12676327/SRR10058855_2.fastq.gz, results/GCA_000008865.2/summary_stats/SAMN12676327/SRR10058855.fastp.out, results/GCA_000008865.2/filtered_fastqs/SAMN12676327/SRR10058855_1.fastq.gz
    wildcards: refGenome=GCA_000008865.2, sample=SAMN12676327, run=SRR10058855
    resources: tmpdir=/tmp, mem_mb=4000, mem_mib=3815

Activating conda environment: .snakemake/conda/f32d3b737d797443a34140e7912c58cd_
[Tue Jan 30 16:30:53 2024]
Finished job 11.
1 of 26 steps (4%) done
Select jobs to execute...

[Tue Jan 30 16:30:53 2024]
rule collect_fastp_stats:
    input: results/GCA_000008865.2/summary_stats/SAMN12676327/SRR10058855.fastp.out
    output: results/GCA_000008865.2/summary_stats/SAMN12676327_fastp.out
    jobid: 12
    reason: Missing output files: results/GCA_000008865.2/summary_stats/SAMN12676327_fastp.out; Input files updated by another job: results/GCA_000008865.2/summary_stats/SAMN12676327/SRR10058855.fastp.out
    wildcards: refGenome=GCA_000008865.2, sample=SAMN12676327
    resources: tmpdir=/tmp

[Tue Jan 30 16:30:53 2024]
Finished job 12.
2 of 26 steps (8%) done
Select jobs to execute...

[Tue Jan 30 16:30:53 2024]
checkpoint create_db_intervals:
    input: results/GCA_003018455.1/data/genome/GCA_003018455.1.fna, results/GCA_003018455.1/data/genome/GCA_003018455.1.fna.fai, results/GCA_003018455.1/data/genome/GCA_003018455.1.dict, results/GCA_003018455.1/intervals/master_interval_list.list
    output: results/GCA_003018455.1/intervals/db_intervals/intervals.txt, results/GCA_003018455.1/intervals/db_intervals
    log: logs/GCA_003018455.1/db_intervals/log.txt
    jobid: 21
    benchmark: benchmarks/GCA_003018455.1/db_intervals/benchmark.txt
    reason: Missing output files: results/GCA_003018455.1/intervals/db_intervals/intervals.txt
    wildcards: refGenome=GCA_003018455.1
    resources: tmpdir=/tmp, mem_mb=5000, mem_mib=4769
DAG of jobs will be updated after completion.

Activating conda environment: .snakemake/conda/ec2d2883921c842412450a0289e25d36_
[Tue Jan 30 16:31:01 2024]
Error in rule create_db_intervals:
    jobid: 21
    input: results/GCA_003018455.1/data/genome/GCA_003018455.1.fna, results/GCA_003018455.1/data/genome/GCA_003018455.1.fna.fai, results/GCA_003018455.1/data/genome/GCA_003018455.1.dict, results/GCA_003018455.1/intervals/master_interval_list.list
    output: results/GCA_003018455.1/intervals/db_intervals/intervals.txt, results/GCA_003018455.1/intervals/db_intervals
    log: logs/GCA_003018455.1/db_intervals/log.txt (check log file(s) for error details)
    conda-env: /fs/scratch/PAS1554/snpArcher/.test/ecoli/.snakemake/conda/ec2d2883921c842412450a0289e25d36_
    shell:
        
        gatk SplitIntervals -L results/GCA_003018455.1/intervals/master_interval_list.list         -O results/GCA_003018455.1/intervals/db_intervals -R results/GCA_003018455.1/data/genome/GCA_003018455.1.fna -scatter 1         -mode INTERVAL_SUBDIVISION         --interval-merging-rule OVERLAPPING_ONLY &> logs/GCA_003018455.1/db_intervals/log.txt
        ls -l results/GCA_003018455.1/intervals/db_intervals/*scattered.interval_list > results/GCA_003018455.1/intervals/db_intervals/intervals.txt
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job create_db_intervals since they might be corrupted:
results/GCA_003018455.1/intervals/db_intervals
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-01-30T163042.968256.snakemake.log

@cademirch
Copy link
Collaborator

Thanks. Can you also paste in the split intervals log? Should be in .test/ecoli/logs

@vitorpavinato
Copy link
Author

is this one found at .test/ecoli/logs/GCA_003018455.1/db_intervals ?

16:31:01.301 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/fs/scratch/PAS1554/snpArcher/.test/ecoli/.snakemake/conda/ec2d2883921c842412450a0289e25d36_/share/gatk4-4.1.8.0-0/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 30, 2024 4:31:01 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:31:01.474 INFO  SplitIntervals - ------------------------------------------------------------
16:31:01.474 INFO  SplitIntervals - The Genome Analysis Toolkit (GATK) v4.1.8.0
16:31:01.474 INFO  SplitIntervals - For support and documentation go to https://software.broadinstitute.org/gatk/
16:31:01.474 INFO  SplitIntervals - Executing as vitorpavinato@owens-login04.hpc.osc.edu on Linux v3.10.0-1160.102.1.el7.x86_64 amd64
16:31:01.474 INFO  SplitIntervals - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_382-b05
16:31:01.475 INFO  SplitIntervals - Start Date/Time: January 30, 2024 4:31:01 PM EST
16:31:01.475 INFO  SplitIntervals - ------------------------------------------------------------
16:31:01.475 INFO  SplitIntervals - ------------------------------------------------------------
16:31:01.475 INFO  SplitIntervals - HTSJDK Version: 2.22.0
16:31:01.475 INFO  SplitIntervals - Picard Version: 2.22.8
16:31:01.475 INFO  SplitIntervals - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:31:01.475 INFO  SplitIntervals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:31:01.475 INFO  SplitIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:31:01.475 INFO  SplitIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:31:01.476 INFO  SplitIntervals - Deflater: IntelDeflater
16:31:01.476 INFO  SplitIntervals - Inflater: IntelInflater
16:31:01.476 INFO  SplitIntervals - GCS max retries/reopens: 20
16:31:01.476 INFO  SplitIntervals - Requester pays: disabled
16:31:01.476 INFO  SplitIntervals - Initializing engine
16:31:01.785 INFO  SplitIntervals - Shutting down engine
[January 30, 2024 4:31:01 PM EST] org.broadinstitute.hellbender.tools.walkers.SplitIntervals done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1164443648
***********************************************************************

A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "CP027599.1 : 1 - 5942969" is not valid for this input.

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /fs/scratch/PAS1554/snpArcher/.test/ecoli/.snakemake/conda/ec2d2883921c842412450a0289e25d36_/share/gatk4-4.1.8.0-0/gatk-package-4.1.8.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /fs/scratch/PAS1554/snpArcher/.test/ecoli/.snakemake/conda/ec2d2883921c842412450a0289e25d36_/share/gatk4-4.1.8.0-0/gatk-package-4.1.8.0-local.jar SplitIntervals -L results/GCA_003018455.1/intervals/master_interval_list.list -O results/GCA_003018455.1/intervals/db_intervals -R results/GCA_003018455.1/data/genome/GCA_003018455.1.fna -scatter 1 -mode INTERVAL_SUBDIVISION --interval-merging-rule OVERLAPPING_ONLY

@cademirch
Copy link
Collaborator

Thanks for posting that @vitorpavinato. Unfortunately, I cannot seem to recreate this. Could you let me know your python3 version? I suspect this may be the issue. For reference, I ran the same test on SLURM using python3==3.11.4.

@vitorpavinato
Copy link
Author

Yes, sure. The conda environment I set has Python 3.12.1. I also should mention that I used conda instead of mamba in here:

mamba create -c conda-forge -c bioconda -n snparcher snakemake
mamba activate snparcher

@cademirch
Copy link
Collaborator

Could you try with Python 3.11.X? I believe this is an issue with Snakemake and fstrings in 3.12.

@vitorpavinato
Copy link
Author

Hi @cademirch. Just to let you know the test worked with python=3.11.6. Consider setting a requirements-like file to enforce the python version that works with Snakemake and fstrings.

@cademirch
Copy link
Collaborator

Thanks for the update @vitorpavinato. We've pinned the python and snakemake version in the docs now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants