Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonfunctioning conversion from .gff to .gtf? #215

Closed
silviamorins opened this issue Jun 6, 2019 · 17 comments
Closed

Nonfunctioning conversion from .gff to .gtf? #215

silviamorins opened this issue Jun 6, 2019 · 17 comments

Comments

@silviamorins
Copy link
Contributor

silviamorins commented Jun 6, 2019

Hi,

I am running Nextflow v 19.01.0 with RNASeq data from S. coelicolor; using the Streptomyces_coelicolor_a3_2_.ASM20383v1.37.gff3 file downloaded from Ensembl as annotation leads to the error:

ERROR ~ No such variable: gtf_{makeSTARindex,makeHisatSplicesites}

depending on the aligner I am using.
However, if I execute on my machine the command reported in the respective process here, and then provide the resulting .gtf file at the --gtf flag, the pipeline runs through this step.

I have looked into this process, but have no idea why this is happening.

@lpantano
Copy link
Contributor

lpantano commented Jun 6, 2019

Hi @silviamorins

it is indeed weird, thanks for posting. Can you share the .nextflow.log file? maybe we see more information there that can help us.

What is the command line you used to run nextflow?

cheers

@silviamorins
Copy link
Contributor Author

Hi @lpantano ,

I'm sorry I can't share the whole file because it contains some private information, but here you can find all the lines after the initial description that comes after nexflow.

For running Nextflow I used:

nextflow run nf-core/rnaseq -r 1.3 -profile cfc --fasta 'Streptomyces_coelicolor_a3_2_.ASM20383v1.dna_rm.toplevel.fa' --gff 'Streptomyces_coelicolor_a3_2_.ASM20383v1.43.chromosome.Chromosome.gff' --reads '[...]*_{R1,R2}.fastq.gz' -resume --skip_preseq --reverse_stranded --skip_genebody_coverage

@lpantano
Copy link
Contributor

lpantano commented Jun 6, 2019

Thanks, it seems like is not entering that conditional process and the variables then are not defined.

When you get the summary when running the pipeline that is printing all the variables, you see something like:

GFF3 Annotation : Streptomyces_coelicolor_a3_2_.ASM20383v1.43.chromosome.Chromosome.gff

from these lines :

rnaseq/main.nf

Line 254 in 37f260d

if(params.gff) summary['GFF3 Annotation'] = params.gff

@silviamorins
Copy link
Contributor Author

silviamorins commented Jun 7, 2019

I see...

I have something else to add, since I've encountered another error further on in the pipeline: featureCounts expects to find an annotation called gene_biotype, which you don't obtain if you run the gffread without the option -F (at least this is what I experienced with the S. coelicolor .gff3 file). Even with the option -F, even if all the information is preserved, the field is called "biotype", so I had to replace it with "gene_biotype" myself in order for the pipeline to run until the end successfully.

@ewels
Copy link
Member

ewels commented Jun 7, 2019

so I had to replace it with "gene_biotype" myself in order for the pipeline to run until the end successfully.

Please see the usage docs on how to specify a different field instead of biotype: https://github.com/nf-core/rnaseq/blob/master/docs/usage.md#default-attribute-type

@drpatelh
Copy link
Member

@silviamorins Did you manage to resolve this issue?

@silviamorins
Copy link
Contributor Author

I have converted the .gff3 file to .gtf prior starting the workflow, so I can't say whether performing this conversion inside the pipeline works now (I don't know if any changes have been applied?).

I just realized I had missed @ewels 's comments, sorry - I think his answer would have solved the problem I had with naming the features, but since I had already completed the analysis I haven't used it so far.

@drpatelh
Copy link
Member

Ok. Thanks. Is it something you could test quite easily? e.g just running the command that gave you the error with the appropriate modification that @ewels suggested and -r dev? It should give you an error straight away if it's still an issue. If so, I'll try and fix it on the next PR to dev 👍

@silviamorins
Copy link
Contributor Author

It should be, yes, will do it and let you know!

@silviamorins
Copy link
Contributor Author

silviamorins commented Jun 19, 2019

The flag --fcGroupFeaturesType suggested by @ewels worked perfectly for my needs.

The conversion from .gff3 to .gtf still gives me problems, though. With the command:

nextflow run nf-core/rnaseq -r dev -profile cfc --fasta 'Streptomyces_coelicolor_a3_2_.ASM20383v1.dna_rm.toplevel.fa' --gff 'Sequence/Streptomyces_coelicolor_a3_2_.ASM20383v1.43.chromosome.Chromosome.gff3' --reads *_{R1,R2}.fastq.gz' -resume --skip_preseq --reverse_stranded --skip_genebody_coverage --fcGroupFeaturesType "biotype"

I get the following:
ERROR ~ No such variable: gtf_makeSTARindex

@drpatelh
Copy link
Member

Thanks @silviamorins ! I suspect it's something to do with the optional channel creations when specifying --gff as opposed to --gtf. Have you tracked down the bug? If not, I'll look into it 👍

@silviamorins
Copy link
Contributor Author

@drpatelh sorry, no... I have looked into the code but it seems fine to me. I am not very into the channel structure, I guess that might be an issue

@silviamorins
Copy link
Contributor Author

silviamorins commented Jul 5, 2019

I just ran into this again, with another genome (apple). The content in here is parsed correctly, right? (i.e., is providing the information further?)

@drpatelh
Copy link
Member

drpatelh commented Jul 5, 2019

Sorry, this evaded my extensive TODO list. Apple genome! I didnt even know that existed 🥇

Ive had a look and cant see anything obvious with the relevant channels. May have to create a minimal test case that reproduce the error...

@lpantano
Copy link
Contributor

lpantano commented Jul 5, 2019

I think this line:

rnaseq/main.nf

Line 427 in 37f260d

if(params.gff){
that converts GFF into GTF needs to be at the very beginning, so makeSTARindex use that GTF to make the index. Right now that makeSTARindex happens before the convertGFFtoGTF. (I think that is the issue)

@silviamorins
Copy link
Contributor Author

Hi @lpantano , you're right, if the processes are executed in order that must be it. I will try to insert the change in my fork tomorrow and test it; if the error doesn't show up anymore I'll open a PR. Thanks!

@drpatelh
Copy link
Member

drpatelh commented Jul 8, 2019

This should be fixed now with:
#248

@drpatelh drpatelh closed this as completed Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants