Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Problem running with cage_peak #63

Closed
PJ-Shaw opened this issue Apr 5, 2020 · 2 comments
Closed

Problem running with cage_peak #63

PJ-Shaw opened this issue Apr 5, 2020 · 2 comments

Comments

@PJ-Shaw
Copy link

PJ-Shaw commented Apr 5, 2020

When I ran SQANTI2 using the option of providing a cage_peak file and SQANTI2 aborted without finishing the run. When I ran again leaving out this option, it worked OK suggesting that there's a problem with my cage_peak file. My cage_peak file is made from CAGEr output that looks like this:

(anaCogent3) philip@BT591-001:/mnt/d$ head merged_TSS.bed
Pf3D7_01_v3 5620 5621 cluster1 1000 -
Pf3D7_01_v3 5671 5702 cluster2 1000 -
Pf3D7_01_v3 6351 6362 cluster3 1000 -
Pf3D7_01_v3 6395 6423 cluster4 1000 -
Pf3D7_01_v3 6474 6483 cluster5 1000 -
Pf3D7_01_v3 7770 7789 cluster6 1000 -
Pf3D7_01_v3 35285 35286 cluster7 1000 +
Pf3D7_01_v3 35330 35347 cluster8 1000 +
Pf3D7_01_v3 35404 35412 cluster9 1000 +
Pf3D7_01_v3 35437 35463 cluster10 1000 +

This is what showed up when running SQANTI2:

(anaCogent3) philip@BT591-001:/mnt/d$ python ~/cDNA_Cupcake/sequence/SQANTI2/sqanti_qc2.py /mnt/d/ringschiz_sqanti2.fasta /mnt/d/Pf3D7_RNAsequins_combined.gtf /mnt/d/PlasmoDBv44_Pf3D7_RNAsequins_combined.fasta --cage_peak merged_TSS.bed --polyA_motif_list PAS.txt --polyA_peak sorted_Pf_polyA_sites.bed
R scripting front-end version 3.6.1 (2019-07-05)
Cleaning up isoform IDs...
Cleaned up isoform fasta file written to: /mnt/d/ringschiz_sqanti2.renamed.fasta
Write arguments to /mnt/d/ringschiz_sqanti2.params.txt...
**** Running SQANTI2...
**** Parsing provided files....
Reading genome fasta /mnt/d/PlasmoDBv44_Pf3D7_RNAsequins_combined.fasta....
Aligning reads with Minimap2...
[M::mm_idx_gen::2.410
1.00] collected minimizers
[M::mm_idx_gen::3.112
1.61] sorted minimizers
[M::main::3.113
1.61] loaded/built the index for 17 target sequence(s)
[M::mm_mapopt_update::3.303
1.58] mid_occ = 150
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 17
[M::mm_idx_stat::3.4161.56] distinct minimizers: 7046944 (83.50% are singletons); average occurrences: 1.697; average spacing: 2.835
[M::worker_pipeline::25.745
3.54] mapped 42700 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax splice --secondary=no -C5 -uf -t 10 /mnt/d/PlasmoDBv44_Pf3D7_RNAsequins_combined.fasta /mnt/d/ringschiz_sqanti2.renamed.fasta
[M::main] Real time: 25.810 sec; CPU: 91.281 sec; Peak RSS: 0.776 GB
output written to
**** Predicting ORF sequences...
**** Parsing Reference Transcriptome....
**** Parsing Isoforms....
Splice Junction Coverage files not provided.
**** Reading CAGE Peak data.
Traceback (most recent call last):
File "/home/philip/cDNA_Cupcake/sequence/SQANTI2/sqanti_qc2.py", line 2198, in
main()
File "/home/philip/cDNA_Cupcake/sequence/SQANTI2/sqanti_qc2.py", line 2191, in main
run(args)
File "/home/philip/cDNA_Cupcake/sequence/SQANTI2/sqanti_qc2.py", line 1672, in run
isoforms_info = isoformClassification(args, isoforms_by_chr, refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene, genome_dict, indelsJunc, orfDict)
File "/home/philip/cDNA_Cupcake/sequence/SQANTI2/sqanti_qc2.py", line 1395, in isoformClassification
cage_peak_obj = CAGEPeak(args.cage_peak)
File "/home/philip/cDNA_Cupcake/sequence/SQANTI2/sqanti_qc2.py", line 1896, in init
self.read_bed()
File "/home/philip/cDNA_Cupcake/sequence/SQANTI2/sqanti_qc2.py", line 1905, in read_bed
tss0 = int(raw[6])
IndexError: list index out of range

@Magdoll
Copy link
Owner

Magdoll commented Apr 5, 2020

Hi @PJ-Shaw ,
SQANTI2 is using the format used by FANTOM5 for their CAGE Peak Bed. See data here. Additional info on the refTSS is in their publication

The required columns are

column 1 - chromosome
column 2 - cage peak signal start
column 3 - cage peak signal end
column 6 - strand
column 7 - cage peak "tss" <--- this is what FANTOM5 considers as the actual TSS

Example below

chr1    629886  629898  hg_3.1  1       +       629892  629893  255,255,0
chr1    629904  629938  hg_223433.1     1       +       629921  629922  255,255,0
chr1    629945  629967  hg_223434.1     1       -       629956  629957  255,255,0
chr1    630059  630067  hg_223435.1     1       +       630063  630064  255,255,0
chr1    630098  630103  hg_4.1  1       +       630100  630101  255,255,0
chr1    630129  630161  hg_5.1  1       +       630145  630146  255,255,0
chr1    630225  630237  hg_236023.1     1       +       630231  630232  255,255,0

@PJ-Shaw
Copy link
Author

PJ-Shaw commented Apr 6, 2020

Thank you for spotting the error. I added a 7th column with the "dominant_ctss" nucleotide reported by CAGEr to my bed file and SQANTI2 worked, reporting isoforms "within_cage_peak".

@Magdoll Magdoll closed this as completed Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants