Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isoforms defined by reads with high fraction A (>0.5) #129

Closed
callumparr opened this issue Apr 26, 2023 · 2 comments
Closed

Isoforms defined by reads with high fraction A (>0.5) #129

callumparr opened this issue Apr 26, 2023 · 2 comments

Comments

@callumparr
Copy link

We are currently filtering out models that have supporting reads with evidence of internal priming.

I've seen a case for permissive where the model's 3' boundary is defined by a read that has fracA > 0.5 that so happened to be added first to the TALON database.

Is this the correct behaviour of TALON as it seems risky to build novel isoforms based on reads that might result from internal priming? I admit it is a genomic category so probably should exclude it anyway. Was just wondering.

 grep "TALONT000262684" talon_read_annot/F6_interactome_all_talon_read_annot.tsv
46661e1c-8c22-4bf9-a80d-2246ab8db3c1	iPSC_rep1	hg38	chr3	6522081	6524300	+	1	2209	9894	262684	ENSG00000189229.12	TALONT000262684	ENSG00000189229	TALONT000262684	Known	Genomic	None	0.75	None	None	None	None
1354b452-ccd8-475b-8641-ad9ec0528924	NSC_rep2	hg38	chr3	6523369	6523942	+	1	574	9894	262684	ENSG00000189229.12	TALONT000262684	ENSG00000189229	TALONT000262684	Known	Genomic	None	0.5	None	None	None	None
❯ gunzip -c annotation/F6_interactome_TALON_table1_talon.gencode.v39.gtf.gz| grep "TALONT000262684"
chr3	TALON	transcript	6522081	6524300	.	+	.	gene_id "ENSG00000189229.12"; transcript_id "TALONT000262684"; gene_name "ENSG00000189229"; gene_status "KNOWN"; gene_type "lncRNA"; transcript_status "NOVEL"; transcript_name "TALONT000262684"; talon_gene "9894"; talon_transcript "262684"; genomic_transcript "TRUE";
chr3	TALON	exon	6522081	6524300	.	+	.	gene_id "ENSG00000189229.12"; transcript_id "TALONT000262684"; gene_type "lncRNA"; gene_status "KNOWN"; gene_name "ENSG00000189229"; transcript_status "NOVEL"; transcript_name "TALONT000262684"; exon_number "1"; exon_id "1077077"; talon_gene "9894"; talon_transcript "262684"; talon_exon "1077077"; exon_status "NOVEL";

@fairliereese
Copy link
Member

Hi,
Unfortunately the only filtering that TALON does using the internal priming is after the TALON run. If you wish for these reads to not be used to generate transcript models I would perhaps try to filter your sam / bam alignments for those that have high A content at the 3' end before running TALON, as that might get you closer to your desired output.

@callumparr
Copy link
Author

Yes, we are proceeding now to do this after the TALON label step and remove such reads and then start TALON.

Thank you for the reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants