Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newly seen in gpad files, qualifier, "NOT|", should be "NOT" #1461

Closed
dvklopfenstein opened this issue Mar 17, 2020 · 9 comments
Closed

Newly seen in gpad files, qualifier, "NOT|", should be "NOT" #1461

dvklopfenstein opened this issue Mar 17, 2020 · 9 comments

Comments

@dvklopfenstein
Copy link

Thank you for the great gene ontology annotations.

I am seeing many qualifiers which look like this:

NOT|

rather than what they were in the past:

NOT

This is causing our gpad reader to fail. I am going to change the gpad reader to not fail upon encountering an unnecessary divider, but the large annotation files already take 15-30 seconds to load on my machine and adding this extra check will extend that time.

It would be cleaner if when there is only one qualifier to not include the divider.

I am seeing this in the goa_human.gpad downloaded on March 17, 2020.

@pgaudet
Copy link
Contributor

pgaudet commented Apr 13, 2020

@dvklopfenstein Sorry about the late reply - where did you download the files from ?

Thanks, Pascale

@dvklopfenstein
Copy link
Author

Thanks for looking into this. I should have added the download source when creating this issue. I still have the files from March 17 on my machine:

$ l goa_human.g*
-rw-r--r--+ 1 note2 note2 84323582 Mar 17 13:49 goa_human.gaf
-rw-r--r--+ 1 note2 note2 55388353 Mar 17 13:50 goa_human.gpad
-rw-r--r--+ 1 note2 note2 63116193 Mar 17 15:47 goa_human.gpa

It is interesting to see the goa_human.gpad is different from the goa_human.gpa file.

The goa_human.gpa files work fine and come from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa

The goa_human.gpad file, downloaded from http://current.geneontology.org/annotations
has the trailing divider, NOT|, which caused us to change our software so we could read the file successfully.

@pgaudet
Copy link
Contributor

pgaudet commented Apr 13, 2020

@kltm Can we fix this ASAP ?

@pgaudet pgaudet transferred this issue from geneontology/go-annotation Apr 13, 2020
@kltm
Copy link
Member

kltm commented Apr 13, 2020

Talking to @dougli1sqrd , fix in progress.

@kltm
Copy link
Member

kltm commented Apr 13, 2020

@pgaudet We may want to add this to the release notes for the last release.

@kltm
Copy link
Member

kltm commented Apr 14, 2020

@dougli1sqrd Noting points of interest in:
/ontobio/model/association.py
/ontobio/io/assocwriter.py

@dvklopfenstein
Copy link
Author

Just wanted to make sure that you noticed that the content of goa_human.gpa and goa_human.gpad are different. For example:

$ grep NOT goa_human.gpad | sort | grep Q9ULI4
UniProtKB       Q9ULI4  NOT|    GO:0003777      GO_REF:0000024  ECO:0000250     UniProtKB:Q52KG5                20091119        UniProt
UniProtKB       Q9ULI4  NOT|    GO:0007018      GO_REF:0000024  ECO:0000250     UniProtKB:Q52KG5                20091119        UniProt
UniProtKB       Q9ULI4  NOT|    GO:0007018      PMID:21873635   ECO:0000318     PANTHER:PTN000648413|PANTHER:PTN000649056               20170228        GO_Central
UniProtKB       Q9ULI4  NOT|    GO:0016887      GO_REF:0000024  ECO:0000250     UniProtKB:Q52KG5                20091119        UniProt
UniProtKB       Q9ULI4  NOT|    GO:0016887      PMID:21873635   ECO:0000318     PANTHER:PTN000648413|PANTHER:PTN000649056               20170228        GO_Central

$ grep NOT goa_human.gpa | sort | grep Q9ULI4
UniProtKB       Q9ULI4  NOT|enables     GO:0003777      GO_REF:0000024  ECO:0000250     UniProtKB:Q52KG5                20091119        UniProt         go_evidence=ISS
UniProtKB       Q9ULI4  NOT|enables     GO:0016887      GO_REF:0000024  ECO:0000250     UniProtKB:Q52KG5                20091119        UniProt         go_evidence=ISS
UniProtKB       Q9ULI4  NOT|enables     GO:0016887      PMID:21873635   ECO:0000318     PANTHER:PTN000648413|PANTHER:PTN000649056               20170228        GO_Central              go_evidence=IBA
UniProtKB       Q9ULI4  NOT|involved_in GO:0007018      GO_REF:0000024  ECO:0000250     UniProtKB:Q52KG5                20091119        UniProt         go_evidence=ISS
UniProtKB       Q9ULI4  NOT|involved_in GO:0007018      PMID:21873635   ECO:0000318     PANTHER:PTN000648413|PANTHER:PTN000649056               20170228        GO_Central              go_evidence=IBA

And the count of NOTs seen in the files is different too:

$ grep NOT goa_human.gpad | wc -l
1244

$ grep NOT goa_human.gpa | wc -l
1250

@kltm
Copy link
Member

kltm commented Apr 15, 2020

@dvklopfenstein Fixes in the snapshot should be out tonight on the success of the pipeline. If so, we'll proceed to attempting a new release.

@kltm
Copy link
Member

kltm commented Apr 20, 2020

This appears to now be fixed in the current snapshot.

@kltm kltm closed this as completed Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants