Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning when user provides transcript_fasta #753

Closed
lianov opened this issue Jan 25, 2022 · 1 comment
Closed

Add warning when user provides transcript_fasta #753

lianov opened this issue Jan 25, 2022 · 1 comment
Milestone

Comments

@lianov
Copy link
Member

lianov commented Jan 25, 2022

Description of feature

Feature: Add a warning linked to user provided transcript_fasta

If a user provides a file to transcript_fasta, it should be a transcriptome which has the same entries (transcript IDs) as the GTF file provided in gtf otherwise, it will cause the pipeline to fail with the following error:

(this was caused by using concatenated non-coding and coding transcriptome files from Ensembl v.102 for the mouse references)

[2022-01-10 23:10:02.162] [jointLog] [critical] Transcript ENSMUST00000196743 appeared in the BAM header, but was not in the provided FASTA file
[2022-01-10 23:10:02.162] [jointLog] [critical] Transcript ENSMUST00000196901 appeared in the BAM header, but was not in the provided FASTA file
[2022-01-10 23:10:02.162] [jointLog] [critical] Transcript ENSMUST00000197848 appeared in the BAM header, but was not in the provided FASTA file
[2022-01-10 23:10:02.162] [jointLog] [critical] Transcript ENSMUST00000200349 appeared in the BAM header, but was not in the provided FASTA file

This behavior has also been observed with transcriptome reference files from other sources such as GENCODE (see for example: https://nfcore.slack.com/archives/CE8SSJV3N/p1642697218225700)

The current solutions around this issue are:

  1. To not provide a file to transcript_fasta and let the pipeline generate one (from process RSEM_PREPAREREFERENCE_TRANSCRIPTS )
  2. To generate a custom transcriptome which meets the criteria described. This can be done in the same way as in the RSEM_PREPAREREFERENCE_TRANSCRIPTS or with similar tools such as gffread:
gffread -w Mus_musculus.GRCm38_GTF_matched_transcripts.fa -g ./Mus_musculus.GRCm38.dna.primary_assembly.fa Mus_musculus.GRCm38.102.gtf

The warning would be helpful for users to avoid this issue which leads to the pipeline finishing unsuccessfully .

@drpatelh drpatelh added this to the 3.6 milestone Feb 7, 2022
drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Feb 20, 2022
@drpatelh
Copy link
Member

Thanks @lianov ! The pipeline will now generate the warning below if --transcript_fasta is provided to the pipeline:

image

Will be fixed in #770

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants