Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault with long read data [ stringtie v2.2.1 ] #356

Closed
zhixue opened this issue Mar 3, 2022 · 7 comments
Closed

segmentation fault with long read data [ stringtie v2.2.1 ] #356

zhixue opened this issue Mar 3, 2022 · 7 comments
Assignees

Comments

@zhixue
Copy link

zhixue commented Mar 3, 2022

Hi,
thanks for the wonderful tool for long RNA reads analysis~

I am trying to run StringTie2 for rice ONT raw reads (fastq reads) by first running the uLTRA aligner (v0.0.4) and then provide generated sorted bam file to StringTie (v2.2.1). I have used IGV to check the bam file, it is ok.

Moreover, I have divided the bam file in different chromosomes to test, and I have found that the 90,001th~95,000th sorted reads cause "segmentation fault".

The bam file is here (5.8M) chr3_head90000_95000.bam.

The commands are as follow:

# ultra
uLTRA align $ref ${sample}.fq ${sample} --index ~/ricerna/ultra_bam/msu7_ultra_index --ont --t ${th} --prefix ${sample}_msu7 --use_NAM_seeds

# stringtie 
~/tool/stringtie-2.2.1.Linux_x86_64/stringtie -p 1 -L -l N1i1Chr3 -o chr3_head90000_95000.gtf chr3_head90000_95000.bam
#### Segmentation fault ####
@gpertea gpertea self-assigned this Mar 3, 2022
@gpertea
Copy link
Owner

gpertea commented Mar 3, 2022

Thank you for reporting this and providing the debug data -- it seems that there is a particular BAM record in the uLTRA output that stringtie has trouble parsing properly, I'll be fixing that shortly.

@gpertea
Copy link
Owner

gpertea commented Mar 3, 2022

The problem seems to be related to record 880af412-ef82-474a-9e85-e6df5784e5ac having an alignment that ends with an intron ( the CIGAR string ends with =9I4=1X2=1X5D230N ), which does not quite make sense by itself, unless that is a peculiar way of suggesting that the read alignment ends exactly at an intron boundary ? However, that alignment does not have a transcription strand assigned, which makes that justification rather unlikely.

I can modify StringTie to ignore that kind of unusual alignment (hanging intron with no terminal exon) but I suspect the problem might be deeper, it could be an alignment bug and perhaps it should be reported to the uLTRA aligner author.

Most SAM processing tools seem to silently ignore this issue, including IGV, so I guess I'll do the same (certainly preferable over the current crash due to the unexpected structural anomaly, the number of "exons" vs. the number of introns detected in that alignment).

@gpertea
Copy link
Owner

gpertea commented Mar 3, 2022

addressed by commit 996f585

@zhixue
Copy link
Author

zhixue commented Mar 4, 2022

Thanks for your rapid response!

I have re-downloaded the latest version of stringtie and run this sample successfully! I have also reported this case to the uLTRA aligner author.

Maybe the output of uLTRA has something unexpected in SAM format, because I have another sample causing "segmentation fault". With the similar way, I have located the trouble at part of alignment records in Chr1, but I can not infer more.

The bam file is here (28K) Sample2_Chr1head10900_11000.bam.

The commands are as follow:

# stringtie 
~/tool/stringtie_996f585/stringtie -p 1 -L -l S2c1 -o Sample2_Chr1head10900_11000.gtf Sample2_Chr1head10900_11000.bam
#### Segmentation fault ####

@gpertea
Copy link
Owner

gpertea commented Mar 4, 2022

Hmm, this was the same issue of a hanging intron with no terminal exon, but this time capped by a insertion (the CIGAR of read 03debcb9-2135-431b-bbf0-ff10c64983d1 ends with 1X3=3I1=59N2I)

I'll add a more robust check there: if there is no M/X/= preceding the first intron (N) or following the last intron, such intron should be discarded.

@gpertea gpertea reopened this Mar 4, 2022
@gpertea
Copy link
Owner

gpertea commented Mar 4, 2022

Addressed by 62551bb.

@gpertea gpertea closed this as completed Mar 4, 2022
@zhixue
Copy link
Author

zhixue commented Mar 5, 2022

It works.
Thank you~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants