CIGAR operations out of range after splice junction correction #16

msto · 2019-09-27T02:43:49Z

Hi,

Since the update in #14 to update transcripts after correcting splice junctions, I get the following error. I think there may be a bug in the rescueNoncanonicalJunction function, which results in a CIGAR string incompatible with the sequence (i.e. more match/insert/sub operations than there are characters in the sequence), but I haven't been able to trace the exact source of the error. Could you please look into it if you have the chance?

Thanks!

  File "bin/TranscriptClean_fork/TranscriptClean.py", line 1042, in <module>
    main()
  File "bin/TranscriptClean_fork/TranscriptClean.py", line 171, in main
    writeTranscriptOutput(noncanTranscripts, sjDict, oSam, oFa, transcriptLog, genome)
  File "bin/TranscriptClean_fork/TranscriptClean.py", line 184, in writeTranscriptOutput
    outSam.write(Transcript2.printableSAM(currTranscript, genome, spliceAnnot) + "\n")
  File "/scratch/mrstone3/pacbio_iPSC/bin/TranscriptClean_fork/transcript2.py", line 294, in printableSAM
    self.NM, self.MD = self.getNMandMDFlags(genome)
  File "/scratch/mrstone3/pacbio_iPSC/bin/TranscriptClean_fork/transcript2.py", line 360, in getNMandMDFlags
    currBase = self.SEQ[seqPos]
IndexError: string index out of range```

The text was updated successfully, but these errors were encountered:

dewyman · 2019-09-27T03:52:45Z

Hi Matt,
Have you tried running the TranscriptClean 2.0 version on this data instead of the patch update? I am hopeful that my test cases covered this issue in the new version of the code, although if this error happens there too I would certainly want to know so that I can address it. Another benefit of the most recent version is that you can run it on multiple cores for a better runtime.
-Dana

dewyman · 2019-10-01T22:36:59Z

Update: I was able to find an example in some data I had of a noisy read that triggered a possibly related problem during non canonical splice junction correction. I fixed the problem and added more tests. The patch will be included in v2.0.2. Hope this helps!

msto · 2019-11-04T17:10:00Z

Hi Dana,

Thanks! The patch fixed that issue. Sorry to bother you again, but I found another bug that I was hoping you could take a look at. I'm not sure if it's related to the problem you found, so I thought I'd post it here.

I'm finding that after correcting noncanonical splice junctions, some transcripts have all of their canonical splice junctions shifted by 1 bp. I've attached an example of an affected transcript here. The issue appears when correcting with respect to either the gencode reference junctions or a set of junctions from Illumina sequencing of the same sample. Please let me know if you can't reproduce and I can share the gencode file I'm using.

Thanks,

Matt

transcriptclean_example.tar.gz

dewyman · 2019-11-05T23:07:01Z

Hi Matt,
Thank you for bringing this to my attention. I will take a look at the files you sent and see if I can reproduce the issue.
Best,
Dana

dewyman · 2019-11-14T01:00:29Z

Hi Matt,
So I tried to reproduce the issue with the latest TranscriptClean version (including reducing it down to a small test case of one junction), and I have not been able to get the same result. Here is the link to a UCSC genome browser session I am using to visualize the reads and get to the bottom of what is going on. The uncorrected and corrected files shown are the ones you sent me, and the one marked 'dana_run_corrected' is the corrected file I obtained when I ran my local version of TC on your file. Using the genome browser, I noticed in your corrected file that the sequence looks messed up right after the leftmost splice junction ends, which made me think that maybe the sequence was being handled incorrectly in some way that led to everything after the junction being shifted over. But as I say, I haven't been able to reproduce the problem.
Let me know what you think- it would be great to get to the bottom of this!
Best,
Dana

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIGAR operations out of range after splice junction correction #16

CIGAR operations out of range after splice junction correction #16

msto commented Sep 27, 2019

dewyman commented Sep 27, 2019

dewyman commented Oct 1, 2019 •

edited

msto commented Nov 4, 2019

dewyman commented Nov 5, 2019

dewyman commented Nov 14, 2019

CIGAR operations out of range after splice junction correction #16

CIGAR operations out of range after splice junction correction #16

Comments

msto commented Sep 27, 2019

dewyman commented Sep 27, 2019

dewyman commented Oct 1, 2019 • edited

msto commented Nov 4, 2019

dewyman commented Nov 5, 2019

dewyman commented Nov 14, 2019

dewyman commented Oct 1, 2019 •

edited