# Gene Ontology (GO) Analysis

This notebook demonstrates how to work with Gene Ontology data in PerturbLab.

## Features
- Download and load GO ontology files
- Compute gene-gene similarity from GO annotations
- Build perturbation graphs for GEARS
- Query GO terms and gene annotations


In [1]:
from perturblab.data.resources import load_dataset
from perturblab.utils import read_obo
from perturblab.types import GeneVocab

# Download GO ontology file
go_path = load_dataset('go/go_basic')
print(f"GO file: {go_path}")

# Read GO ontology (returns tuple: (terms_list, dag))
terms, dag = read_obo(go_path)
print(f"\nLoaded {len(terms)} GO terms")
print(f"DAG has {dag.n_edges} edges")
print(f"\nExample term: {terms[0]['id']}")
print(f"Term name: {terms[0]['name']}")
print(f"Namespace: {terms[0]['namespace']}")


[perturblab] [INFO] Loading resource 'go_basic' from C:\Users\Administrator\.cache\perturblab\auto\go_basic...
GO file: C:\Users\Administrator\.cache\perturblab\auto\go_basic
[perturblab] [INFO] Parsing OBO file: C:\Users\Administrator\.cache\perturblab\auto\go_basic
[perturblab] [INFO] Parsed 39354 GO terms, 60096 edges (format: 1.2, version: releases/2025-10-10)

Loaded 39354 GO terms
DAG has 60096 edges

Example term: GO:0000001
Term name: mitochondrion inheritance
Namespace: biological_process


## Compute Gene Similarity from GO Annotations


In [2]:
from perturblab.tools import compute_gene_similarity_from_go

# Example gene-to-GO mapping
gene_to_go = {
    'TP53': {'GO:0006915', 'GO:0006974', 'GO:0006281'},
    'BRCA1': {'GO:0006281', 'GO:0006974', 'GO:0000724'},
    'KRAS': {'GO:0007165', 'GO:0007264', 'GO:0008284'},
    'MYC': {'GO:0008284', 'GO:0006355', 'GO:0000122'},
}

# Compute similarity graph
similarity_graph = compute_gene_similarity_from_go(
    gene_to_go,
    similarity='jaccard',
    threshold=0.1,
    show_progress=True
)

print(f"Similarity graph edges: {len(similarity_graph)}")
print(f"\nFirst 5 edges:")
print(similarity_graph.head())


[perturblab] [INFO] ðŸ§¬ Building gene similarity network from GO annotations
[perturblab] [INFO]    Genes: 4
[perturblab] [INFO]    GO terms: 9
[perturblab] [INFO]    Gene-GO edges: 12
[perturblab] [INFO] ðŸ”„ Projecting bipartite graph: 4 source nodes, 9 target nodes
[perturblab] [INFO] ðŸ“Š Retrieving neighbors for all source nodes...
[perturblab] [INFO] ðŸ§® Computing pairwise similarities (method=jaccard, threshold=0.1)...


Computing similarities: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 4/4 [00:00<?, ?it/s]

[perturblab] [INFO] âœ… Found 2 edges above threshold 0.1
[perturblab] [INFO] ðŸ“ˆ Created undirected graph: 2 unique edges, 4 total edges (undirected)
Similarity graph edges: 4

First 5 edges:
  source target  weight
0  BRCA1   TP53     0.5
1   KRAS    MYC     0.2
2   TP53  BRCA1     0.5
3    MYC   KRAS     0.2





## Build Perturbation Graph for GEARS


In [3]:
from perturblab.methods.gears import build_perturbation_graph

# Create gene vocabulary
gene_vocab = GeneVocab(['TP53', 'BRCA1', 'KRAS', 'MYC', 'EGFR', 'BRCA2'])

# Build perturbation graph (automatically downloads GO annotations if needed)
pert_graph = build_perturbation_graph(
    gene_vocab,
    similarity='jaccard',
    threshold=0.1,
    num_workers=1,
    show_progress=True
)

print(f"Graph nodes: {pert_graph.n_nodes}")
print(f"Graph edges: {pert_graph.n_unique_edges}")
print(f"\nNeighbors of TP53: {pert_graph.neighbors('TP53')}")
print(f"Edge weights: {pert_graph.get_weights('TP53')}")


[perturblab] [INFO] ðŸ§¬ Building GEARS perturbation graph
[perturblab] [INFO]    Using provided GeneVocab: 6 genes
[perturblab] [INFO]    ðŸ“– Loading GO annotations: gene2go_all.pkl
[perturblab] [INFO]    Total genes in GO database: 67,832
[perturblab] [INFO]    âœ“ Genes with GO annotations: 6
[perturblab] [INFO]    ðŸ”„ Computing pairwise gene similarities...
[perturblab] [INFO] ðŸ§¬ Building gene similarity network from GO annotations
[perturblab] [INFO]    Genes: 6
[perturblab] [INFO]    GO terms: 461
[perturblab] [INFO]    Gene-GO edges: 566
[perturblab] [INFO] ðŸ”„ Projecting bipartite graph: 6 source nodes, 461 target nodes
[perturblab] [INFO] ðŸ“Š Retrieving neighbors for all source nodes...
[perturblab] [INFO] ðŸ§® Computing pairwise similarities (method=jaccard, threshold=0.1)...


Computing similarities: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 6/6 [00:00<00:00, 6001.87it/s]

[perturblab] [INFO] âœ… Found 2 edges above threshold 0.1
[perturblab] [INFO] ðŸ“ˆ Created undirected graph: 2 unique edges, 4 total edges (undirected)
[perturblab] [INFO]    ðŸ”§ Building graph structure...
[perturblab] [INFO] âœ… GEARS perturbation graph built successfully:
[perturblab] [INFO]    Nodes: 6
[perturblab] [INFO]    Edges: 2
[perturblab] [INFO]    Average degree: 0.7
[perturblab] [INFO]    Similarity: jaccard, threshold: 0.1
Graph nodes: 6
Graph edges: 2

Neighbors of TP53: [3]
Edge weights: [0.10727969]



