Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run error: panic: start must be less than end #2

Open
TendoLiu opened this issue Nov 4, 2019 · 8 comments
Open

Run error: panic: start must be less than end #2

TendoLiu opened this issue Nov 4, 2019 · 8 comments

Comments

@TendoLiu
Copy link

TendoLiu commented Nov 4, 2019

bash-4.2$ bio-fusion \

-r1=$R1
-r2=$R2
-transcript=$transcript
I1104 12:57:44.433680 11841 main.go:336] Start reading geneDB
I1104 12:57:44.556241 11841 gene_db.go:310] Reading transcriptome /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa denovo
panic: start must be less than end

goroutine 90 [running]:
github.com/grailbio/bio/fusion.(*GeneDB).ReadTranscriptome.func5(0xc000050f00, 0xc00049a5c0, 0xc0003d2a20, 0xc0004865a0, 0xc00024b530, 0xc0003ea050)
/ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:356 +0x2bd
created by github.com/grailbio/bio/fusion.(*GeneDB).ReadTranscriptome
/ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:345 +0x3bb

@eksyang
Copy link

eksyang commented Nov 4, 2019

can you confirm where you have obtained $transcript file thanks

@TendoLiu
Copy link
Author

TendoLiu commented Nov 4, 2019

Thank you so much. Please see my script and the log.

Script:
`module load af4
hg19_UCSC=/bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/ucsc.hg19.fasta/ucsc.hg19.fasta
anno=/bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/Other/annotation/gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz

bio-fusion -generate-transcriptome
-output /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa
-keep-mitochondrial-genes
-keep-readthrough-transcripts
-keep-pary-locus-transcripts
-keep-versioned-genes
$anno $hg19_UCSC

R1=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion/fastq_merged/Sample1/Sample1.R1.fastq
R2=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion/fastq_merged/Sample1/Sample1.R2.fastq
transcript=/bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa
out=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion

bio-fusion
-r1=$R1
-r2=$R2
-umi-in-name
-fasta-output=$out/test.out.fa
-filtered-output=$out/test.filtered.fa
-transcript=$transcript
-output $out
`
Log:

`2019/11/04 15:48:41 Creating /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa
I1104 15:48:41.464042 15119 parsegencode.go:246] GTF: /bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/Other/annotation/gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz
I1104 15:48:50.646038 15119 parsegencode.go:248] Read 66738 genes, 247086 transcripts, 1478937 exons
I1104 15:49:06.822649 15119 parsegencode.go:358] Successfully parsed 66738 genes, 247086 transcripts and 1478937 exons
I1104 15:49:06.822692 15119 parsegencode.go:360] Retained 247086 transcripts and 1478937 exons
E1104 15:49:33.213128 15119 parsegencode.go:490] invalid query range 63686370 - 63688050 for sequence chr20 with length 63025520
panic: invalid query range 63686370 - 63688050 for sequence chr20 with length 63025520

goroutine 1 [running]:
github.com/grailbio/base/log.Panic(0xc0004c99d8, 0x1, 0x1)
/ihome/crc/install/af4/src/github.com/grailbio/base/log/log.go:168 +0xc7
github.com/grailbio/bio/fusion/parsegencode.PrintParsedGTFRecords(0x1042300, 0xc00025ea40, 0x104a020, 0xc07dd83260, 0xc07c132000, 0x104b2, 0x104b2, 0x1010101000000)
/ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/parsegencode/parsegencode.go:490 +0x576
github.com/grailbio/bio/fusion/cmd.GenerateTranscriptome(0x104bc20, 0xc00047d2c0, 0x7fff2027caa0, 0x8f, 0x7fff2027cb30, 0x6b, 0x0, 0x7fff2027c9d8, 0x5c, 0x0, ...)
/ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/cmd/generate_transcriptome.go:114 +0x617
github.com/grailbio/bio/fusion/cmd.Run()
/ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/cmd/main.go:558 +0xaa8
main.main()
/ihome/crc/install/af4/src/github.com/grailbio/bio/cmd/bio-fusion/main.go:9 +0x20

I1104 15:50:30.178381 15172 main.go:336] Start reading geneDB
I1104 15:50:30.256422 15172 gene_db.go:310] Reading transcriptome /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa denovo
panic: start must be less than end

goroutine 90 [running]:
github.com/grailbio/bio/fusion.(*GeneDB).ReadTranscriptome.func5(0xc000050f00, 0xc000496580, 0xc0003c8a20, 0xc000476120, 0xc0002e5350, 0xc0003e0030)
/ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:356 +0x2bd
created by github.com/grailbio/bio/fusion.(*GeneDB).ReadTranscriptome
/ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:345 +0x3bb

`

@TendoLiu
Copy link
Author

TendoLiu commented Nov 4, 2019

It seems that there was a "panic" when generating the transcriptome file, but the file was produced anyway. Note that I am using a newer version of annotation file. And the reference genome is hg19 instead of hg38 in the example code.

@TendoLiu
Copy link
Author

TendoLiu commented Nov 4, 2019

gencode.v26.whole_genes.txt
Just in case you want to look at the transcriptome file. I converted .fa to .txt to be able to upload.

@eksyang
Copy link

eksyang commented Nov 4, 2019

Thanks, will look into it

@TendoLiu
Copy link
Author

Any follow-up?

@eksyang
Copy link

eksyang commented Dec 21, 2019

sorry, at the moment, I do not have linux machine to debug into it. But after relook into the issue, it appears that the sequence chr20 in the provided fasta file has length 63025520, however, there seems to be a gene in the gtf in the range of 63686370 - 63688050 in "chr20". The only issue I can think of is that the gtf file you used ($anno = gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz) and the reference fasta file ($hg19_UCSC = ucsc.hg19.fasta) are not compatible with each other. possible to confirm that ?

@eksyang
Copy link

eksyang commented Dec 21, 2019

hi TendoLiu, a colleague offerred to help after holidays. In order to debug into this, could you share or point to the link to obtain the two files gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz and ucsc.hg19.fasta (if you cannot yet confirm the annotation and the fasta file do not match). Sorry for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants