Compute statistics and changes for both the GO ontology and annotations at every release and snapshot. The libraries/go-stats/ folder contains the script that are cloned and executed in the GO pipeline from go-site/scripts
The changes to the statistics computed at each release are reported in the CHANGES.md file of this repository.
This is the python package that is used to compute statistics over go annotations. You can read more here.
- go-stats-summary.json: summary statistics
- aggregated-go-stats-summaries.json: summary statistics for all GO releases stored in Zenodo
- go-stats.json: detailed statistics
- go-stats-no-pb.json: detailed statistics (excluding direct annotation to protein binding)
- go-ontology-changes.json: changes in the ontology since the last release
- go-ontology-changes.tsv
- go-annotation-changes.json: changes in the annotations since the last release
- go-annotation-changes.tsv
- gocam-models.json: detailed list of models
- gocam-pmids.json: list of articles/references per model
- gocam-gps.json: list of gene products permodel
- gocam-goterms: list of GO terms per model
The code checks the release date in the main pipeline (http://current.geneontology.org/metadata/release-date.json) and when the date changes, it triggers a secondary pipeline by publishing a message in a specific topic (SNS) and update the release date on the secondary pipeline
The code loads the GO obo file (http://purl.obolibrary.org/obo/go.obo) and compare the terms of the new release to the previous most recent release.
The code send queries to GOLr to fetch statistics about the GO annotations (e.g. per aspect, per species, per group etc)
The code compute a number of views over the GO-CAMs data (e.g. models, gene products, go terms, etc) using the GO SPARQL endpoint.