Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update EcoCyc with new resource information #1961

Closed
kltm opened this issue Feb 15, 2023 · 22 comments
Closed

Update EcoCyc with new resource information #1961

kltm opened this issue Feb 15, 2023 · 22 comments

Comments

@kltm
Copy link
Member

kltm commented Feb 15, 2023

Update metadata and upstream resource sources for EcoCyc.

Tagging @pgaudet

@kltm
Copy link
Member Author

kltm commented Feb 15, 2023

#1959

@kltm
Copy link
Member Author

kltm commented Feb 15, 2023

@pgaudet I just wanted to make all of your ecocyc changes are on #1959, or if there were any elsewhere?

@kltm
Copy link
Member Author

kltm commented Feb 15, 2023

@pgaudet While I don't think it's a deal breaker, it's always desirable if all files are made available as gz. If the upstream could supply that for their GPI, that would be just that much better.

@kltm
Copy link
Member Author

kltm commented Feb 16, 2023

@pgaudet As of today, we can no longer run pipelines with the current metadata as

source: https://ecoliwiki.org/gaf/gene_association.ecocyc.gz
is now returning a 404 (it wasn't a few days ago).
I'm assuming that we need to update to #1959, or at least just the source, but I wanted to know if we could just take the whole thing.

@suzialeksander
Copy link
Contributor

suzialeksander commented Feb 16, 2023

@lmoore207 do you have any information we may need at this time?

@lmoore207
Copy link

lmoore207 commented Feb 17, 2023 via email

@k--r
Copy link

k--r commented Feb 17, 2023

@pgaudet While I don't think it's a deal breaker, it's always desirable if all files are made available as gz. If the upstream could supply that for their GPI, that would be just that much better.

OK, thanks for the feedback. I gzipped the ecocyc.gpi file. Available in same directory. I'll try to remember for the future.

@k--r
Copy link

k--r commented Feb 17, 2023

Hi Suzi, I didn’t get the notification about this issue, so thanks for letting me know. I’ve included Markus on this reply as he has been working with Pascale about the GO terms from EcoCyc. I don’t know if he is on GitHub, but I’m hoping he can provide more information regarding this. Regards, Lisa

Hi Suzi and Lisa,
I'm just getting started on GH now, and this is only my 2nd reply I'm posting.
Pascale and I have been communicating about the ecocyc.gaf file and how to resolve some issues.

From: suzialeksander @.> Date: Thursday, February 16, 2023 at 3:21 PM To: geneontology/go-site @.> Cc: Lisa Moore @.>, Mention @.> Subject: [EXTERNAL] Re: [geneontology/go-site] Update EcoCyc with new resource information (Issue #1961) @lmoore207<https://urldefense.us/v3/https:/github.com/lmoore207;!!Nv3xtKNH_4uope0!0sjew1_Uzyynr0oyxpk87jt6k_OK5g93ZjcF55HiKHB2O9RutAtSJJKJtEbGVQ$> do you have any information? Specifically- is there an active replacement for this GAF at EcoCyc, and if so can you provide a link?

Suzi, yes, the plan is that the EcoCyc group will directly submit the ecocyc.gaf . One change is that the older versions from EcoliWiki seem to have converted to UniProt IDs, whereas now, EcoCyc IDs (for proteins, etc.) will be in the file. The file also includes things like RNAs, which seem to have been missing in previous files.

The current experimental version of the ecocyc.gaf, for release 26.5, can be found at:
https://www.ai.sri.com/~kr/go/latest/ecocyc.gaf.gz

@kltm
Copy link
Member Author

kltm commented Feb 17, 2023

In discussion with @pgaudet , to temporarily have ecoli data so that pipelines can tick over again, while we sort out identifiers, we've decided to use https://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/18.E_coli_MG1655.goa as a stopgap.

@kltm
Copy link
Member Author

kltm commented Feb 21, 2023

@pgaudet Ugh, naturally the switch to the new file rather upset some of the watchdogs we have out looking for weirdness (i.e. ecocyc as a resource suddenly disappeared), so we lost a couple of snapshot loads. As another TODO note here: we may need to revert the change I'm making to the sanity checks.

@kltm
Copy link
Member Author

kltm commented Feb 28, 2023

After talking to @pgaudet, as a temporary fix to upstream format issues, I've fixed locally

awk -F"\t" '{OFS=FS}{ $13 = "taxon:"$13 ; print   }' 18.E_coli_MG1655.goa > ecoli.gaf

and will aim at that for the short term.

@kltm
Copy link
Member Author

kltm commented Mar 21, 2023

@pgaudet Are the recent changes the final ones for this issue?

@suzialeksander
Copy link
Contributor

suzialeksander commented Apr 5, 2023

not quite. the downloads page is not displaying a file type in the File field:

[ecocyc.gaf] ()

@kltm has fixed this.

@kltm
Copy link
Member Author

kltm commented Apr 13, 2023

Talking to @pgaudet , the current state is "good". Closing.

@kltm kltm closed this as completed Apr 13, 2023
@cmungall
Copy link
Member

I'm just documenting the history here for posterity/transparency

I assume there are processes in place at uniprot to pull manual annotations from EcoCyc (but it looks like there are no new manual annotations from EcoCyc from 2023 or 2024)

@cmungall
Copy link
Member

It looks like there is nothing in place to pull the latest annotations from EcoCyc into GOA

Annotations to cydC:
https://ecocyc.org/ECOLI/NEW-IMAGE?type=ENZYME&object=ABC-6-CPLX

This was recently shown to be a heme transporter

cydD involved_in GO:0035351 - heme transmembrane transport [Wu23]

This is not in
https://www.ebi.ac.uk/QuickGO/annotations?geneProductId=P23886

And hence not in GO Central annotations

@pgaudet
Copy link
Contributor

pgaudet commented Mar 3, 2024

Do you know when this annotation was made?

We can also @alexsign when was the last EcoCyc load; this is in the GOA loads, see https://ftp.ebi.ac.uk/pub/contrib/goa/GO.annotation_sources

Also, we would need to know how often EcoCyc produces their GAF, but usually discrepancies are due to lags in the different release cycles.

@cmungall
Copy link
Member

cmungall commented Mar 4, 2024

Do you know when this annotation was made?

It was in the December 08, 2023 release
https://ecocyc.org/ecocyc/release-notes.shtml

Also, we would need to know how often EcoCyc produces their GAF, but usually discrepancies are due to lags in the different release cycles.

We can ask them to create a GAF with every release.

Note we have no manual annotations from 2023 or 2024

I think this is just a communication issue. I don't know if @alexsign uses a github-tracked metadata file like we do to indicate the URL source for loading into Protein2GO/QuickGO but I expect this needs to be changed or we need to reinform the EcoCyc team that we are in fact using this.

@alexsign
Copy link
Contributor

alexsign commented Mar 5, 2024

@cmungall I do use https://github.com/geneontology/go-site/blob/master/metadata/datasets/ for many import pipelines, but EcoCyc is not one of them. I use GAF and GPI data at the URLs bellow.
https://www.ai.sri.com/~kr/go/latest/ecocyc.gaf.gz
https://www.ai.sri.com/~kr/go/latest/ecocyc.gpi.gz

@cmungall
Copy link
Member

cmungall commented Mar 20, 2024

@alexsign it looks like we should follow up with the EcoCyc team. Those URLs don't look like permanent URLs as they point to a user's home directory. I'm guessing that this was set up as a one-off and not updated.

@k--r and @lmoore207 -- can you shine any light on this? Are these the URLs we should be using?

@cmungall cmungall reopened this Mar 20, 2024
@pgaudet
Copy link
Contributor

pgaudet commented Mar 26, 2024

Pinged Markus and Lisa by mail.

@cmungall
Copy link
Member

cmungall commented Jun 8, 2024

It looks like:

  1. https://www.ai.sri.com/~kr/go/latest/ecocyc.gaf.gz is now kept up to date with the latest annotations, I see annotations as recent as 20240522. I assume this will remain the canonical location and will be updated.
  2. Alex is picking these up and incorporating these into the uniprot annotations. I checked cydC (https://www.ebi.ac.uk/QuickGO/annotations?geneProductId=P23886) which has the heme transporter annotations from EcoCyc from 20231005, which was previously missing.

Thanks everyone!

@cmungall cmungall closed this as completed Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants