Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame issues #30

Open
komalsrathi opened this issue Aug 22, 2019 · 4 comments
Open

Frame issues #30

komalsrathi opened this issue Aug 22, 2019 · 4 comments

Comments

@komalsrathi
Copy link

Hi,

I have Arriba fusion caller results (run with Gencode v27 equivalent to Ensembl 90) and wanted to test the following out-of-frame fusion:

#gene1 gene2 strand1(gene/fusion) strand2(gene/fusion) breakpoint1 breakpoint2 site1 site2 type direction1 direction2 split_reads1 split_reads2 discordant_mates coverage1 coverage2 confidence closest_genomic_breakpoint1 closest_genomic_breakpoint2 filters fusion_transcript reading_frame peptide_sequence read_identifiers sample_name
CHMP1A DPEP1 -/- +/- 16:89651569 16:89625844 splice-site intron deletion/read-through/5'-5' upstream downstream 7 6 1 118 36 high . . duplicates(4),low_entropy(1) GCTTGGTCGGTTCGATCGCCGCCGGGACCTGACACCGCCCGGAGTTGGCGTCCCTTCTCCCTCTCCGAGTGCTGCTCCTGTCATTGTGGCCATGGACG___ATACCCTGTTCCAGTTGAAG___TTCACGGCGAAGCAGCTGGAGAAGCTGGCCAAGAAGGCGGAGAAGGACTCCAAGGCGGAGCAGGCCAAAGTGAAGAAG|TCTCTTCGTCTGTCAAGTAGAAGTCATGGATGTACCTACTTTACGACAGTCCTGCTGTGAGCACAAAGGTACCAGCCACTCAGATCACCCTCGGATCAAG___CTCTGTCTCCCAGAACCACACAGAAGCCCCATCCACAGCCAACATGCAGACGCCACCGTGGCCATTTCGGATGATACCACTTCTGCACAGGCAGCCACAGGCAATGCGGACCTACCCACGCCCCCCACACACAGATCATCGCAG out-of-frame MDDTLFQLKFTAKQLEKLAKKAEKDSKAEQAKVKK|slrlssrshgctyfttvll* . 0062396f-103a-4c25-bfdc-a746782768d9

I ran agfusion like this:

agfusion annotate \
--gene5prime CHMP1A \
--gene3prime DPEP1 \
--junction5prime 89651569 \
--junction3prime 89625844 \
-db agfusion.homo_sapiens.90.db \
-o CHMP1A-DPEP1

I got quite the opposite result from agfusion, predicting it to be an in-frame fusion:

$ cat CHMP1A-DPEP1.fusion_transcripts.txt
5'_gene,3'_gene,5'_transcript,3'_transcript,5'_strand,3'_strand,5'_transcript_biotype,3'_transcript_biotype,Fusion_effect,Protein_length,Protein_weight_(kD)
CHMP1A,DPEP1,ENST00000397901,ENST00000393092,-,+,protein_coding,protein_coding,in-frame,446,49.6940567

I am not sure why that is happening, any ideas/suggestions would be much appreciated.

@murphycj
Copy link
Owner

Thanks for bringing up this issue. I will take a look at this later tonight/tomorrow!

@komalsrathi
Copy link
Author

Hi @murphycj

Did you get a chance to go over this? Thank you!

@murphycj
Copy link
Owner

Sorry for the belated reply. I was on vacation.

I've been looking at your case, and I see two issues here:

  1. According to Ensembl annotation, the junction for DPEP1 never lands within the coding region. It usually lands between two exon that encode portions of the 5'UTR. So this should definitely not be called in-frame. I'll have to fix this.
  2. There seems to be some kind of annotation difference between Ensembl and Gencode. Here are the transcript annotations I used for both genes on Ensembl:

CHMP1A
DPEP1

I used the coordinates and nearby sequences you provided for your fusion to manually compare with what's on Ensembl:

Gene Relative exon number Your results (Gencode) Ensembl
CHMP1A Exon -3 GCTTGGTCGGTTCGATCGCCGCCGGGACCTGACACCGCCCGGAGTTGGCGTCCCTTCTCCCTCTCCGAGTGCTGCTCCTGTCATTGTGGCCATGGACG GGCGACCCCGGAAGTCCCCGCCGGGTGCAGCTTGGTCGGTTCGATCGCCGCCGGGACCTGACACCGCCCGGAGTTGGCGTCCCTTCTCCCTCTCCGAGTGCTGCTCCTGTCATTGTGGCCATGGACG
CHMP1A Exon -2 ATACCCTGTTCCAGTTGAAG ATACCCTGTTCCAGTTGAAG
CHMP1A Exon -1 TTCACGGCGAAGCAGCTGGAGAAGCTGGCCAAGAAGGCGGAGAAGGACTCCAAGGCGGAGCAGGCCAAAGTGAAGAAG TTCACGGCGAAGCAGCTGGAGAAGCTGGCCAAGAAGGCGGAGAAGGACTCCAAGGCGGAGCAGGCCAAAGTGAAGAAG
DPEP1 Exon +1 TCTCTTCGTCTGTCAAGTAGAAGTCATGGATGTACCTACTTTACGACAGTCCTGCTGTGAGCACAAAGGTACCAGCCACTCAGATCACCCTCGGATCAAG GCAGAGTGGCTCCTCACAGCCTGAAGCTCATCCTTCTGCACGGGCCAGCCAGGCCAGCACAGAGGCACCAGGGCAGCAGTGCACACAGGTCCCCGGGGACCCCACCATGTGGAGCGGATGGTGGCTGTGGCCCCTTGTGGCCGTCTGCACTGCAGACTTCTTTCGGGACGAGGCAGAGAGGATCATGAGGGACTCCCCTGTCATTGATGG
DPEP1 Exon +2 CTCTGTCTCCCAGAACCACACAGAAGCCCCATCCACAGCCAACATGCAGACGCCACCGTGGCCATTTCGGATGATACCACTTCTGCACAGGCAGCCACAGGCAATGCGGACCTACCCACGCCCCCCACACACAGATCATCGCAG GCACAATGACCTCCCCTGGCAGCTGCTGGATATGTTCAACAACCGGCTGCAGGACGAGAGGGCCAACCTGACCACCTTGGCCGGCACACACACCAACATCCCCAAGCTGAGGGCCGGCTTTGTGGGAGGCCAG

As you can see the CHMP1A portion that I pulled from Ensembl matches what you got, but the DPEP1 portion does not. According to your fusion, the junction position for DPEP1 is 89625844, which lands somewhere in the intron between the first two exons. Hence, the DEPEP1 portion of this fusion will include the full exon 2 and onward in the cDNA.

Let me know if I am misunderstanding something.

@komalsrathi
Copy link
Author

@murphycj I think definitely 1 is an issue. For 2, are you looking at Ensembl 90?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants