Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Mapping ontology annotations to a slim (subset)
Given a GO slim file, and a current ontology (in one or more files), this script will map a gene association file (containing annotations to the full GO) to the terms in the GO slim.
The script can be used to either create a new gene association file, which contains the most pertinent GO slim accessions, or in count-mode, in which case it will give distinct gene product counts for each slim term.
The association file format is described here:
GO is a Directed Acyclic Graph (DAG), not a tree. This means that there is often more than one path from a GO term up to the root Gene_Ontology node; the path may intersect multiple terms in the slim ontology - which means that one annotation can map to multiple slim terms!
GO also uses multiple relations (object properties) and depending on which GO file you use with map2slim different relations will be considered for slimming purposes. We recommend the go-basic version of the ontology be used, which contains:
- subClassOf (is a)
- part of
- regulates (+ positively and negatively regulates)
You can also use the full version of GO and filter those relationships you do not want to consider.
In a hypothetical example, blue circles show terms in the GO slim and yellow circles show terms in the full ontology. The full ontology subsumes the slim, so the blue terms are also in the ontology.
GO ID MAPS TO SLIM ID ALL SLIM ANCESTORS ===== =============== ================== 5 2+3 2,3,1 6 3 only 3,1 7 4 only 4,3,1 8 3 only 3,1 9 4 only 4,3,1 10 2+3 2,3,1
The 2nd column shows the most pertinent ID(s) in the slim direct mapping. The 3rd column shows all ancestors in the slim.
Note in particular the mapping of ID 9: although this has two paths to the root through the slim via 3 and 4, 3 is discarded because it is subsumed by 4.
On the other hand, 10 maps to both 2 and 3 because these are both the first slim ID in the two valid paths to the root, and neither subsumes the other.
The algorithm used is:
to map any one term in the full ontology: find all valid paths through to the root node in the full ontology
for each path, take the first slim term encountered in the path
discard any redundant slim terms in this set i.e. slim terms subsumed by other slim terms in the set
Using OWLTools Command-line
OWLTools provides a dedicated option for map2slim (
--map2slim). The general workflow is as follows:
Load the ontology
OWLTools can load local ontology files or PURLs.
Load the GAF
OWLTools expects Gene Annotations Files (GAFs) as local files, use:
There are two options to define the relevant subset:
- use existing subset:
use custom set of identifiers
The id file is expected to contain a single identifier per line
- use existing subset:
Save modified GAF
Set the output file for the mapped annotations using
Example command lines:
- using a custom slim from an id file:
owltools go.obo --gaf annotations.gaf --map2slim --idfile slim.terms --write-gaf annotations.mapped.gaf
- using an existing slim
owltools go.obo --gaf annotations.gaf --map2slim --subset goslim_pombe --write-gaf annotations.mapped.gaf
General information about getting and using OWLTools can be found at https://github.com/owlcollab/owltools/wiki/Install-OWLTools