Isoforms defined by reads with high fraction A (>0.5) #129

callumparr · 2023-04-26T10:35:43Z

We are currently filtering out models that have supporting reads with evidence of internal priming.

I've seen a case for permissive where the model's 3' boundary is defined by a read that has fracA > 0.5 that so happened to be added first to the TALON database.

Is this the correct behaviour of TALON as it seems risky to build novel isoforms based on reads that might result from internal priming? I admit it is a genomic category so probably should exclude it anyway. Was just wondering.

 grep "TALONT000262684" talon_read_annot/F6_interactome_all_talon_read_annot.tsv
46661e1c-8c22-4bf9-a80d-2246ab8db3c1	iPSC_rep1	hg38	chr3	6522081	6524300	+	1	2209	9894	262684	ENSG00000189229.12	TALONT000262684	ENSG00000189229	TALONT000262684	Known	Genomic	None	0.75	None	None	None	None
1354b452-ccd8-475b-8641-ad9ec0528924	NSC_rep2	hg38	chr3	6523369	6523942	+	1	574	9894	262684	ENSG00000189229.12	TALONT000262684	ENSG00000189229	TALONT000262684	Known	Genomic	None	0.5	None	None	None	None

❯ gunzip -c annotation/F6_interactome_TALON_table1_talon.gencode.v39.gtf.gz| grep "TALONT000262684"
chr3	TALON	transcript	6522081	6524300	.	+	.	gene_id "ENSG00000189229.12"; transcript_id "TALONT000262684"; gene_name "ENSG00000189229"; gene_status "KNOWN"; gene_type "lncRNA"; transcript_status "NOVEL"; transcript_name "TALONT000262684"; talon_gene "9894"; talon_transcript "262684"; genomic_transcript "TRUE";
chr3	TALON	exon	6522081	6524300	.	+	.	gene_id "ENSG00000189229.12"; transcript_id "TALONT000262684"; gene_type "lncRNA"; gene_status "KNOWN"; gene_name "ENSG00000189229"; transcript_status "NOVEL"; transcript_name "TALONT000262684"; exon_number "1"; exon_id "1077077"; talon_gene "9894"; talon_transcript "262684"; talon_exon "1077077"; exon_status "NOVEL";

The text was updated successfully, but these errors were encountered:

fairliereese · 2023-05-24T23:28:27Z

Hi,
Unfortunately the only filtering that TALON does using the internal priming is after the TALON run. If you wish for these reads to not be used to generate transcript models I would perhaps try to filter your sam / bam alignments for those that have high A content at the 3' end before running TALON, as that might get you closer to your desired output.

callumparr · 2023-05-25T04:29:37Z

Yes, we are proceeding now to do this after the TALON label step and remove such reads and then start TALON.

Thank you for the reply

callumparr closed this as completed May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isoforms defined by reads with high fraction A (>0.5) #129

Isoforms defined by reads with high fraction A (>0.5) #129

callumparr commented Apr 26, 2023

fairliereese commented May 24, 2023

callumparr commented May 25, 2023

Isoforms defined by reads with high fraction A (>0.5) #129

Isoforms defined by reads with high fraction A (>0.5) #129

Comments

callumparr commented Apr 26, 2023

fairliereese commented May 24, 2023

callumparr commented May 25, 2023