Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localization issues : cromwell just uses hard-links #18

Closed
Adrien-Evo opened this issue Jul 9, 2018 · 3 comments
Closed

Localization issues : cromwell just uses hard-links #18

Adrien-Evo opened this issue Jul 9, 2018 · 3 comments

Comments

@Adrien-Evo
Copy link

Hi,
I've posted this issue on the cromwell github too.
So I'm running the ENCODE ATAC SEQ pipelineon a SGE cluster.
We don't allow hard-links in my facility (beegfs filesystem). Therefore I've been trying to use the localization parameters in the cromwell configuration file but to no avail. The backend file is being used since I can get errors message by putting non supported keyword in the localization array.

I've been trying it with different version of CROMWELL (30.2, 31, 32, 32)

Here is the script generated by cromwell based on my WDL file :

# make the directory which will keep the matching files
mkdir /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2

# symlink all the files into the glob directory
( ln -L merge_fastqs_R?_*.fastq.gz /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 2> /dev/null ) || ( ln merge_fastqs_R?_*.fastq.gz /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 )

# list all the files that match the glob into a file called glob-[md5 of glob].list
ls -1 /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2 > /sandbox/users/foucal-a/test_atac-pipe/cromwell-executions/atac/f4fd93fa-6f3a-42a6-94f2-459901d245c4/call-trim_adapter/shard-0/execution/glob-4f26c666d13d1cb48973da7f646a7de2.list

I have the error when the script tries to symlink all the files into the glob directory.
Here is the WDL code :

 scatter( i in range(length(fastqs_)) ) {
                # trim adapters and merge trimmed fastqs
                call trim_adapter { input :
                        fastqs = fastqs_[i],
                        adapters = if length(adapters_)>0 then adapters_[i] else [],
                        paired_end = paired_end,
                }
                # align trimmed/merged fastqs with bowtie2s
                call bowtie2 { input :
                        idx_tar = bowtie2_idx_tar,
                        fastqs = trim_adapter.trimmed_merged_fastqs, #[R1,R2]
                        paired_end = paired_end,
                        multimapping = multimapping,
                }
        }

With the function :

task trim_adapter { # trim adapters and merge trimmed fastqs
        # parameters from workflow
        Array[Array[File]] fastqs               # [merge_id][read_end_id]
        Array[Array[String]] adapters   # [merge_id][read_end_id]
        Boolean paired_end
        # mandatory
        Boolean? auto_detect_adapter    # automatically detect/trim adapters
        # optional
        Int? min_trim_len               # minimum trim length for cutadapt -m
        Float? err_rate                 # Maximum allowed adapter error rate
                                                        # for cutadapt -e
        # resource
        Int? cpu
        Int? mem_mb
        Int? time_hr
        #Commenting this line as a test. PRoblem with hard link
        String? disks

        command {
                python $(which encode_trim_adapter.py) \
                        ${write_tsv(fastqs)} \
                        --adapters ${write_tsv(adapters)} \
                        ${if paired_end then "--paired-end" else ""} \
                        ${if select_first([auto_detect_adapter,false]) then "--auto-detect-adapter" else ""} \
                        ${"--min-trim-len " + select_first([min_trim_len,5])} \
                        ${"--err-rate " + select_first([err_rate,'0.1'])} \
                        ${"--nth " + select_first([cpu,2])}
        }
        output {
                # WDL glob() globs in an alphabetical order
                # so R1 and R2 can be switched, which results in an
                # unexpected behavior of a workflow
                # so we prepend merge_fastqs_'end'_ (R1 or R2)
                # to the basename of original filename
                # this prefix will be later stripped in bowtie2 task
                Array[File] trimmed_merged_fastqs = glob("merge_fastqs_R?_*.fastq.gz")
        }
        runtime {
                cpu : select_first([cpu,2])
                memory : "${select_first([mem_mb,'12000'])} MB"
                time : select_first([time_hr,24])
                disks : select_first([disks,"local-disk 100 HDD"])
        }
}

My backend.conf :

include required(classpath("application"))

backend {
  default="SGE"
  providers {
    SGE {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        concurrent-job-limit = 10000
        runtime-attributes= """
        Int? cpu=1
        Int? memory=4
        String? disks
        String? time
        String? preemptible
        """
        submit = """
        qsub \
            -terse \
            -V \
            -b n \
            -wd ${cwd} \
            -N ${job_name} \
            ${'-pe smp ' + cpu} \
            ${'-l h_vmem=' + memory + "G"} \
            -o ${out} \
            -e ${err} \
            ${script}
        """
        kill = "qdel ${job_id}"
        check-alive = "qstat -j ${job_id}"
        job-id-regex = "(\\d+)"

        filesystems {
          local {
            localization: [
              "soft-link","copy","hard-link"
            ]
            caching {
              duplication-strategy: [ "soft-link","copy","hard-link"]
              hashing-strategy: "file"
            }
          }
        }
      }
    }
  }
}
engine{
        filesystems{
                local{
                        localization: [
                                "soft-link","copy","hard-link"
                                ]
                        caching {
                                duplication-strategy: [ "soft-link","copy","hard-link"]
              hashing-strategy: "file"
            }
          }
       }
}

I wonder if there is something wrong with my config files or if Cromwell's localization is at fault.

@leepc12
Copy link
Contributor

leepc12 commented Jul 9, 2018

Can you completely remove "hard-link" from the backend file and try again? Also, please post this issue on the cromwell github repo too. They might have some insights about this. https://github.com/broadinstitute/cromwell/issues

@Adrien-Evo
Copy link
Author

Adrien-Evo commented Jul 10, 2018

So I tried quite a few combination of options for the localization but to no avail. It is reporting errors though when putting a fake options like

 localization: [
                        "DOG","copy","hard-link"
]

I reported it on the cromwell repo.

@leepc12
Copy link
Contributor

leepc12 commented Oct 16, 2018

closing this due to long inactivity

@leepc12 leepc12 closed this as completed Oct 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants