Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with AWS batch params #500

Closed
jdwheaton opened this issue Nov 19, 2020 · 8 comments
Closed

Error with AWS batch params #500

jdwheaton opened this issue Nov 19, 2020 · 8 comments
Labels
bug Something isn't working

Comments

@jdwheaton
Copy link

Thanks for the new release! I was attempting to migrate a functioning workflow from 1.4.2 to 2.0 and I get this error:

Specify correct --awsqueue and --awsregion parameters on AWSBatch!. Expression: (params.awsqueue || params.awsregion)

The executed command is:

nextflow run nf-core/rnaseq -profile docker,awsbatch --awsregion us-west-2 --awsqueue <a_functional_queue_name> --outdir "s3://<an_s3_bucket>/rnaseq_test/" -work-dir "s3://<an_s3_bucket>/rnaseq_test/work/" --reads "s3://<an_s3_bucket>/00000001/*_{1,2}.fastq.gz" --fasta "s3://<an_s3_bucket>/genome.fa" --star_index "s3://<an_s3_bucket>/STARIndex/" --gtf "s3://<an_s3_bucket>/genes.gtf"

This command is fully functional on release 1.4.2 -- what am I missing here?

@drpatelh
Copy link
Member

Hi @jdwheaton The latest version of the pipeline now takes a samplesheet input via the --input parameter. --reads has been deprecated. Although, I am not entirely sure that will solve the error you are getting 🤔 The error is being raised from this line. Will need to play with this a little to make sure there isn't an issue with the evaluation in that function.

For now, you can just create your own custom.config file with the contents below:

params {
  tracedir = './'
}

process.executor = 'awsbatch'
process.queue    = <a_functional_queue_name>
aws.region       = 'us-west-2'
executor.awscli  = '/home/ec2-user/miniconda/bin/aws'

And try and run with the command:

nextflow run nf-core/rnaseq \
    --input "samplesheet.csv" \
    --fasta "s3://<an_s3_bucket>/genome.fa" \
    --gtf "s3://<an_s3_bucket>/genes.gtf" \
    --star_index "s3://<an_s3_bucket>/STARIndex/" \
    --outdir "s3://<an_s3_bucket>/rnaseq_test/" \
    -profile docker \
    -work-dir "s3://<an_s3_bucket>/rnaseq_test/work/" \
    -c custom.config 

@drpatelh drpatelh added the bug Something isn't working label Nov 20, 2020
@drpatelh
Copy link
Member

I haven't used AWS batch to submit jobs before but I am hoping that will work.

@drpatelh
Copy link
Member

This is indeed quite a problematic bug. We don't normally directly test the awsbatch profile which is why it was missed here. Thanks for reporting! It should be fixed when #510 is merged into dev. Once that happens it would be great if you can try and re-run with -r dev instead to see if all is well again. This fix warrants a minor release but I am hoping to test and fix something else before that so hopefully I will do this in the next week or so 🤞

@drpatelh
Copy link
Member

This should be fixed now in #510. Be grateful if you can test with -r dev please 🙂

@jdwheaton
Copy link
Author

Sorry, I've been in the wet lab all week and haven't had a chance to test. I'll try to run with the dev branch soon and let you know how it goes!

As an aside, I am particularly interested in using the public dataset feature, but I'm not clear on how the list of accessions works with the --input flag since I don't know the file names ahead of time. Can you provide any clarification on this?

@drpatelh
Copy link
Member

No worries. Thanks.

The commands to obtain the public data and to run the main arm of the pipeline are completely independent. This is intentional because it allows you to download all of the data in the initial pipeline run and then to curate the spreadsheet based on the sample metadata, before you run the pipeline again properly.

First download the public data for a given set of ids specified in ids.txt using the command:

nextflow run nf-core/rnaseq \
    --public_data_ids ids.txt \
    -profile <docker/singularity/podman/conda/institute>

The downloaded FastQ files will then be placed in results/public_data/ along with an auto-created samplesheet. You can then use that auto-created samplesheet to run the main portion of the pipeline with a separate command:

nextflow run nf-core/rnaseq \
    --input samplesheet.csv \
    --genome GRCh37 \
    -profile <docker/singularity/podman/conda/institute>

Hope that helps! Would be interested in getting some feedback as to how it works for you.

@jdwheaton
Copy link
Author

Thanks, that two-step process wasn't completely clear from the documentation. This makes much more sense! I'm happy to provide feedback once I have a chance to run it.

@drpatelh
Copy link
Member

Closing via #509 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants