# Gene Ontology Network

Download the [Gene Ontology](http://geneontology.org/docs/download-ontology/) and convert it into a network graph for inspection, visualisation, and hypothesis generation.

## Downloads

Download the core gene ontology in [OBO](http://owlcollab.github.io/oboformat/doc/obo-syntax.html) format

In [5]:
%%sh
# get the core gene ontology in OBO format
curl -O "http://current.geneontology.org/ontology/go.obo"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.3M  100 32.3M    0     0  8306k      0  0:00:03  0:00:03 --:--:-- 8304k


Download the human annotations in [GAF](http://geneontology.org/docs/go-annotation-file-gaf-format-2.2/) format.

In [10]:
%%sh
curl -O "http://current.geneontology.org/annotations/goa_human.gaf.gz"
curl -O "http://current.geneontology.org/annotations/goa_human_complex.gaf.gz"
curl -O "http://current.geneontology.org/annotations/goa_human_isoform.gaf.gz"
curl -O "http://current.geneontology.org/annotations/goa_human_rna.gaf.gz"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11.3M  100 11.3M    0     0  6478k      0  0:00:01  0:00:01 --:--:-- 6474k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 55972  100 55972    0     0   942k      0 --:--:-- --:--:-- --:--:--  942k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2566k  100 2566k    0     0  4526k      0 --:--:-- --:--:-- --:--:-- 4518k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  496k  100  496k    0     0  1737k      0 --:--:-- --:--:-- --:--:-- 1737k


## Ontology

Load the OBO file

In [81]:
import pronto
import pandas as pd
go = pronto.Ontology("go.obo")

Get non-obsolete terms

In [182]:
terms = [term for term in go.terms() if not term.obsolete]

Get the GO id, name, and namespace of all terms and turn them into a data frame.

In [280]:

go_terms = pd.DataFrame([[term.id, term.name, term.namespace] for term in terms], columns=['id', 'name', 'namespace'])
go_terms

Unnamed: 0,id,name,namespace
0,GO:0000001,mitochondrion inheritance,biological_process
1,GO:0000002,mitochondrial genome maintenance,biological_process
2,GO:0000003,reproduction,biological_process
3,GO:0000006,high-affinity zinc transmembrane transporter a...,molecular_function
4,GO:0000007,low-affinity zinc ion transmembrane transporte...,molecular_function
...,...,...,...
43694,GO:1905213,negative regulation of mitotic chromosome cond...,biological_process
43695,GO:1905214,regulation of RNA binding,biological_process
43696,GO:1905215,negative regulation of RNA binding,biological_process
43697,GO:1905216,positive regulation of RNA binding,biological_process


Get a list of `is_a` relationships from superclasses.

In [282]:
go_is_a = pd.concat([pd.DataFrame([[term.id, t.id, 'is_a'] for t in term.superclasses(1, with_self=False)], columns=['Source', 'Target', 'relationship']) for term in terms])
go_is_a

Unnamed: 0,Source,Target,relationship
0,GO:0000001,GO:0048308,is_a
1,GO:0000001,GO:0048311,is_a
0,GO:0000002,GO:0007005,is_a
0,GO:0000003,GO:0008150,is_a
0,GO:0000006,GO:0005385,is_a
...,...,...,...
0,GO:1905215,GO:0051100,is_a
1,GO:1905215,GO:1905214,is_a
0,GO:1905216,GO:0051099,is_a
1,GO:1905216,GO:1905214,is_a


Get terms that have relationships

In [223]:
terms_with_relationships = [term for term in terms if len(term.relationships.items()) > 0]

Build data frames for each relationship of each term, then concatenate them together

In [309]:
rels_list = [[pd.DataFrame([[term.id, i[0].name, t.id] for t in i[1]], columns=['Source', 'Target', 'relationship']) for i in term.relationships.items()] for term in terms_with_relationships]
# list comprehension creates extra nested list, so strip that out first
go_rels = pd.concat([rel[0] for rel in rels_list])

Make a single data frame for all the relationships

In [311]:
go_relationships = pd.concat([go_is_a, go_rels])
go_relationships

Unnamed: 0,Source,Target,relationship
0,GO:0000001,GO:0048308,is_a
1,GO:0000001,GO:0048311,is_a
0,GO:0000002,GO:0007005,is_a
0,GO:0000003,GO:0008150,is_a
0,GO:0000006,GO:0005385,is_a
...,...,...,...
0,GO:1905212,positively regulates,GO:1990956
0,GO:1905213,negatively regulates,GO:0007076
0,GO:1905214,regulates,GO:0003723
0,GO:1905215,negatively regulates,GO:0003723


In [313]:
go_terms.to_csv('go.terms.csv', index=False)
go_relationships.to_csv('go.relationships.csv', index=False)