Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reference genome set to GAF downloads #82

Open
kltm opened this issue Aug 30, 2023 · 5 comments
Open

Add reference genome set to GAF downloads #82

kltm opened this issue Aug 30, 2023 · 5 comments
Assignees
Labels
Needs LA approval Needs final approval from the Lead Architect Needs PM approval Needs final approval from the Project Manager Needs tech doc

Comments

@kltm
Copy link
Member

kltm commented Aug 30, 2023

Project link

https://github.com/orgs/geneontology/projects/144

Project description
  • the end-goal is to have the GAF download page (http://current.geneontology.org/products/pages/downloads.html) have the 142 reference species, broken down by species, available as GAFs
  • the non-core GAFs will not go through the same QC process pipeline as the core (MOD + goa human/cow/etc)
  • the implementation will be simply to take the filtered file provided by Alex (gcrp entries for 142 species; ftp://ftp.ebi.ac.uk/pub/contrib/goa/go_reference_species.gaf.gz) and bin these into separate files, one per species
  • a single static HTML file will be produced
    • this should ideally be pushed into GH, rather than the current routing
  • The UI implementation will abandon the paginated table and be a simple table with 142 rows in one page
  • The table should be sortable?
  • By default the sort order should prioritize the main curated species
  • the columns will be
    • species
    • count
    • link to download
    • potentially: primary source (ebigoa, mgi, ...)
  • there will not be separate rna/isoform files; these will be merged in. One file per species
PI

Chris

Product owner (PO)

Seth

Technical lead (TL)

Seth

Other personnel (OP)

Suzi

Technical specs

TBD (template: https://docs.google.com/document/d/111UqtS3G0aJZpAijZYI3Da0t94OQpGePlPJsqZE4Tio/edit)

Other comments

A narrow subset of the more broad #48

@kltm kltm added Needs LA approval Needs final approval from the Lead Architect Needs PM approval Needs final approval from the Project Manager Needs tech doc Needs PI Needs PO Needs TL labels Aug 30, 2023
@kltm
Copy link
Member Author

kltm commented Aug 30, 2023

As proposed by @cmungall at #48 (comment)

@kltm
Copy link
Member Author

kltm commented Aug 30, 2023

@cmungall To clarify, what is your intent for the species column? Would we be sticking with current status quo for current "core" species/resources and using resource shorthands, or do our best breaking them down by the information in the metadata? As a concrete example, what would be the value for species in the downloads for xenbase (obvs ignoring interacting taxon)?

sjcarbon@moiraine:/tmp$:) curl -s http://current.geneontology.org/annotations/xenbase.gaf.gz | zgrep -v '^!' | cut -f 13 | sort | uniq -c
 131361 taxon:8355
      1 taxon:8355|taxon:1280
      1 taxon:8355|taxon:1309
      1 taxon:8355|taxon:1313
      1 taxon:8355|taxon:1897064
      1 taxon:8355|taxon:303
      1 taxon:8355|taxon:4932
      1 taxon:8355|taxon:5476
      1 taxon:8355|taxon:562
      1 taxon:8355|taxon:90371
 174178 taxon:8364

@kltm
Copy link
Member Author

kltm commented Aug 30, 2023

Comment from @pgaudet that two tables might be an approach. Also looping in @suzialeksander .

@suzialeksander
Copy link

Updating this after a meeting with @pgaudet @thomaspd and @suzialeksander.

Current plan:

  1. Keep the table at http://current.geneontology.org/products/pages/downloads.html. Possibly as soon as the 2024-03-21 release candidate is approved, the pig, cow, human, dog, chicken will be combined into one file, but the downloads will remain.

  2. Add a new table in a new page, table would have about 150 organisms (the 143 with IBAs, plus a few more that have >350 EXPs but no IBAs). Mockup:

Screenshot 2024-03-27 at 15 14 24

Note that if this table doesn't need live annotation counts, we can do it easily in .md instead of html like the existing page.

More details for text on new page in the GDoc for Guide to getting GO, annotations and GO-CAMs

@suzialeksander
Copy link

above new table now has a specific ticket at geneontology/geneontology.github.io#525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs LA approval Needs final approval from the Lead Architect Needs PM approval Needs final approval from the Project Manager Needs tech doc
Projects
Status: Hopper
Development

No branches or pull requests

3 participants