Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with loading igenomes on default #81

Closed
relaxingbob opened this issue May 29, 2019 · 18 comments
Closed

Problems with loading igenomes on default #81

relaxingbob opened this issue May 29, 2019 · 18 comments

Comments

@relaxingbob
Copy link

bwilson@genomicscore:~$ nextflow run /home/bwilson/rnafusion --star-fusion --genomeDir GRCh38 -profile docker --reads 'data/*_R{1,2}.fastq.gz'

N E X T F L O W ~ version 19.01.0
Launching /home/bwilson/rnafusion-master/main.nf [serene_dubinsky] - revision: ea16375b17
WARN: There's no process matching config selector: download_star_fusion_ensembl
WARN: Access to undefined parameter genomes -- Initialise it to a default value eg. params.genomes = some_value
Missing fromPath parameter

This is running on a workstation with Ubuntu 18.04 with 24 cores and 48G memory.

I tried configuring the igenomes.config file and could get it to find the genome files.

@apeltzer
Copy link
Member

Same here...

@matq007
Copy link
Member

matq007 commented May 29, 2019

Ah, I think I know where the problem is: set the parameter igenomes_base in the config file. That should solve it.

@relaxingbob
Copy link
Author

In the igenomes.config file, I remove igenomes_base and gave it the complete paths to the genomes file and that didn’t work.

@matq007
Copy link
Member

matq007 commented May 29, 2019

You also have to set genomes to some value like GRCh38.

@relaxingbob
Copy link
Author

This is the command line that I used: nextflow run nf-core/rnafusion --star_fusion --genomeDir GRCh38 -profile docker --reads '/data/*_R{1,2}.fastq.gz'

@matq007
Copy link
Member

matq007 commented May 29, 2019

[EDIT]:
So that means you have to tun the pipeline like this instead: nextflow run nf-core/rnafusion --star_fusion --genome GRCh38 --igenomesIgnore false -profile docker --reads '/data/*_R{1,2}.fastq.gz' and then your igenomes.config should be defined like this if you hard-coded paths:

params {
  // illumina iGenomes reference file paths
  genomes {
    'GRCh38' {
      bed12   = "/PATH/TO/genes.bed"
      fasta   = "/PATH/TO/genome.fa"
      gtf     = "/PATH/TO/genes.gtf"
      star    = "/PATH/TO/STARIndex/"
    }

@relaxingbob
Copy link
Author

Yes. You are correct.

@relaxingbob
Copy link
Author

I don't know if this additional warning will help narrow this down or not: WARN: There's no process matching config selector: download_star_fusion_ensembl

@relaxingbob
Copy link
Author

Sorry for the confusion. With this docker, mproksik/rnafusion, everything works fine.

The failure I am having with with nextflow nf-core/rnafusion.

Thanks

@matq007
Copy link
Member

matq007 commented May 30, 2019

Have you tried the updated comment I've left @relaxingbob?

@relaxingbob
Copy link
Author

Sorry. I don't see your comment.

@matq007
Copy link
Member

matq007 commented Jun 2, 2019

Hey @relaxingbob, did you solve the issue?

@apeltzer
Copy link
Member

apeltzer commented Jun 2, 2019

Hi @matq007 !

I had to do two things in here (showing the result of my git diff on a checked out master branch of rnafusion version 1.0.2 here:

git diff
diff --git a/main.nf b/main.nf
index a6a2302..cd2d53f 100644
--- a/main.nf
+++ b/main.nf
@@ -96,7 +96,7 @@ Channel
     .ifEmpty { exit 1, "GTF annotation file not found: ${params.gtf}" }
     .into { gtf; gtf_squid }

-if (!params.star_index && (!params.fasta && !params..gtf)) {
+if (!params.star_index && (!params.fasta && !params.gtf)) {
     exit 1, "Either specify STAR-INDEX or fasta and gtf file!"
 }

diff --git a/nextflow.config b/nextflow.config
index a5a23c6..68c22fc 100644
--- a/nextflow.config
+++ b/nextflow.config
@@ -47,7 +47,6 @@ params {
   clusterOptions = false
   awsqueue = false
   awsregion = 'eu-west-1'
-  readPaths = null
   debug = false

   // Options: download-references.nf
@@ -58,8 +57,8 @@ params {

   // Shared default variables across different scripts
   download_db = false
-  igenomesIgnore = true
   outdir = './results'
+  igenomesIgnore = false
   tracedir = "${params.outdir}/pipeline_info"

   // Boilerplate options

Using this, I was able to use my bash script (as mentioned on slack, @maxulysse was also helping out there - thank you a lot!) to get things in a "batch style mode":

#!/usr/bin/bash
module purge
module load devel/java_jdk/1.8.0u112
module load devel/singularity/3.0.3

##SampleIDs to make Batch mode possible
list=( I15R018a01 I15R018b02 I15R018e02 I16R003a02 I16R003a03 I16R003c01 I16R003d01 I16R003e01 I16R003f01 I16R003g01 I16R003i01 I16R003i02 I16R003j01 I17R018Ra01 I17R018Ra02 I17R018Rc01 I17R018Rd03 I17R018Rf02 I18R020Ra01 I18R020Ra02 I18R020Rg01 I18R020Rh01 )

#Run RNAfusion
for i in "${list[@]}"
do
        echo "Running on sample $i"
    suffix='*{1,2}.fastq.gz'
    prefix='RAW/'
    path=${prefix}${i}${suffix}
    nextflow run ./rnafusion/main.nf --reads "$path" -profile binac --databases '/beegfs/work/iiipe01/2019-05-27_SFB209_RNAfusion/dbs_for_fusion/dbs_for_fusion' --fusioncatcher_ref '/beegfs/work/iiipe01/2019-05-27_SFB209_RNAfusion/dbs_for_fusion/dbs_for_fusion/fusioncatcher_ref/human_v90/' --star_fusion --star_fusion_ref '/beegfs/work/iiipe01/2019-05-27_SFB209_RNAfusion/dbs_for_fusion/dbs_for_fusion/star_fusion_ref/GRCh38_v27_CTAT_lib_Feb092018/ctat_genome_lib_build_dir' --ericscript --ericscript_ref '/beegfs/work/iiipe01/2019-05-27_SFB209_RNAfusion/dbs_for_fusion/dbs_for_fusion/ericscript_ref/ericscript_db_homosapiens_ensembl84' --pizzly --pizzly_fasta '/beegfs/work/iiipe01/2019-05-27_SFB209_RNAfusion/dbs_for_fusion/dbs_for_fusion/pizzly_ref/Homo_sapiens.GRCh38.cdna.all.fa.gz' --pizzly_gtf '/beegfs/work/iiipe01/2019-05-27_SFB209_RNAfusion/dbs_for_fusion/dbs_for_fusion/pizzly_ref/Homo_sapiens.GRCh38.94.gtf' --genome "GRCh38" --outdir "results_$i" --star_index '/nfsmounts/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/' --igenomes_base '/nfsmounts/igenomes/' -resume -dump-channels
done

@apeltzer
Copy link
Member

apeltzer commented Jun 2, 2019

So it runs well and ~8 samples ran through during the weekend (cluster is more crowded than I anticipated ;-)). I guess I'll use the resume option to also add reports for the other potential tools such as squid to have more complete reports - but this was merely a test balloon and seems to work just fine 👍 Batch mode in-built would be nicest I guess however ;-)

@matq007
Copy link
Member

matq007 commented Jun 4, 2019

@apeltzer thanks for figuring this out! You 🚀. I will implement the fix tomorrow in the dev branch. I will probably remove igenomesIgnore and just include the genomes.config on default, to make it less confusing.

@apeltzer
Copy link
Member

apeltzer commented Jun 5, 2019

No problem - I just wanted to share that info back since it took me a while 😓

@matq007 matq007 changed the title params.genomes = some_value Missing fromPath` parameter Problems with loading igenomes on default Jun 7, 2019
@apeltzer
Copy link
Member

apeltzer commented Jun 7, 2019

30 samples processed, ~350GB crunched! Thanks a bunch for all the input!

@relaxingbob
Copy link
Author

I finally got this to work. Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants