Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent input files for update_gff #134

Open
Neato-Nick opened this issue Sep 29, 2022 · 2 comments
Open

Inconsistent input files for update_gff #134

Neato-Nick opened this issue Sep 29, 2022 · 2 comments

Comments

@Neato-Nick
Copy link

Hi, I'm correcting a query assembly based on a reference assembly, and updating the query gff for the corrections. But, I'm hitting the "Inconsistent input files" error.
Here's my command and the error from updategff:

ragtag.py updategff PHRA102.gff PR-102_v3.1_cor-PA-u/ragtag.correct.agp
Thu Sep 29 09:15:16 2022 --- VERSION: RagTag v2.1.0
Thu Sep 29 09:15:16 2022 --- CMD: ragtag.py updategff PHRA102.RXLRs.CRNs.gff PR-102_v3.1_cor-PA-u/ragtag.correct.agp
##gff-version 3

Phyram_PR-102_s0001	AUGUSTUS	gene	133386	134116	.	+	.	ID=PHRA102_1;Name=hprt1_2;locus_tag=KRP23_1
Traceback (most recent call last):
  File "/nfs5/BPP/Grunwald_Lab/home/carleson/opt/conda/envs/ragtag/bin/ragtag_update_gff.py", line 162, in <module>
    main()
  File "/nfs5/BPP/Grunwald_Lab/home/carleson/opt/conda/envs/ragtag/bin/ragtag_update_gff.py", line 156, in main
    sup_update(gff_file, agp_file)
  File "/nfs5/BPP/Grunwald_Lab/home/carleson/opt/conda/envs/ragtag/bin/ragtag_update_gff.py", line 114, in sup_update
    raise ValueError("Inconsistent input files.")
ValueError: Inconsistent input files.

Here was my command for correction:

ragtag.py correct -u --gff PHRA102.gff -o PR-102_v3.1_cor-PA-u $ref_assembly PR-102_v3.1.fasta -t 16

And here are the first 10 lines of my GFF:

##gff-version 3
Phyram_PR-102_s0001	AUGUSTUS	gene	133386	134116	.	+	.	ID=PHRA102_1;Name=hprt1_2;locus_tag=KRP23_1
Phyram_PR-102_s0001	AUGUSTUS	mRNA	133386	134116	.	+	.	ID=PHRA102_1.1;Parent=PHRA102_1;Dbxref=CDD:cd06223,InterPro:IPR000836,InterPro:IPR005904,PFAM:PF00156,TIGRFAM:TIGR01203,UniProtKB/Swiss-Prot:Q6WIT9;Name=hprt1_2;Ontology_term=GO:0009116,GO:0004422,GO:0006166;locus_tag=KRP23_1;product=Hypoxanthine-guanine phosphoribosyltransferase
Phyram_PR-102_s0001	AUGUSTUS	exon	133386	133391	.	+	.	ID=PHRA102_1.1-exon1;Parent=PHRA102_1.1;locus_tag=KRP23_1
Phyram_PR-102_s0001	AUGUSTUS	exon	133454	134116	.	+	.	ID=PHRA102_1.1-exon2;Parent=PHRA102_1.1;locus_tag=KRP23_1
Phyram_PR-102_s0001	AUGUSTUS	CDS	133386	133391	1	+	0	ID=PHRA102_1.1-cds1;Parent=PHRA102_1.1;locus_tag=KRP23_1;product=Hypoxanthine-guanine phosphoribosyltransferase
Phyram_PR-102_s0001	AUGUSTUS	CDS	133454	134116	1	+	0	ID=PHRA102_1.1-cds2;Parent=PHRA102_1.1;locus_tag=KRP23_1;product=Hypoxanthine-guanine phosphoribosyltransferase
Phyram_PR-102_s0001	AUGUSTUS	intron	133392	133453	.	+	.	ID=PHRA102_1.1-intron1;Parent=PHRA102_1.1;locus_tag=KRP23_1
Phyram_PR-102_s0001	AUGUSTUS	start_codon	133386	133388	.	+	0	ID=PHRA102_1.1-start_codon1;Parent=PHRA102_1.1;locus_tag=KRP23_1
Phyram_PR-102_s0001	AUGUSTUS	stop_codon	134114	134116	.	+	0	ID=PHRA102_1.1-stop_codon1;Parent=PHRA102_1.1;locus_tag=KRP23_1

And first 10 lines of the AGP:

## agp-version 2.1
# AGP created by RagTag v2.1.0
Phyram_PR-102_s0001	1	152786	1	W	Phyram_PR-102_s0001_1_152786_+	1	152786	+
Phyram_PR-102_s0001	152787	381574	2	W	Phyram_PR-102_s0001_152787_381574_+	1	228788	+
Phyram_PR-102_s0001	381575	422454	3	W	Phyram_PR-102_s0001_381575_422454_+	1	40880	+
Phyram_PR-102_s0001	422455	556621	4	W	Phyram_PR-102_s0001_422455_556621_+	1	134167	+
Phyram_PR-102_s0001	556622	711649	5	W	Phyram_PR-102_s0001_556622_711649_+	1	155028	+
Phyram_PR-102_s0001	711650	727677	6	W	Phyram_PR-102_s0001_711650_727677_+	1	16028	+
Phyram_PR-102_s0001	727678	800993	7	W	Phyram_PR-102_s0001_727678_800993_+	1	73316	+
Phyram_PR-102_s0001	800994	1033585	8	W	Phyram_PR-102_s0001_800994_1033585_+	1	232592	+

Based on the code for updategff I'm getting this error because column 6 of the AGP don't have a match in the GFF I provided, but I thought that I'm using this script for the exactly described purpose. Let me know if I'm reading the documentation wrong.
Thanks for any help!

@Neato-Nick
Copy link
Author

Neato-Nick commented Oct 13, 2022

I realized I needed to use the -c flag so the tool looks for the AGP object in the gff instead of looking for the AGP component.

Then I found that updategff requires no gaps are in the agp. Got around that error by adding a new flag -s for when splitasm was used, maintaining current functionality by allowing gaps only if -s was set.
I still don't have a complete fix because I ran into the issue that some of the genes in the gff overlap stretches of N that were removed by splitasm. I could either

  • Change the ValueError to a warning and just drop any feature overlapping Ns
  • Add a -gff option to splitasm just as implement in the correct module

@Neato-Nick
Copy link
Author

Implemented fixes in #135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant