# Node Annotator API Exploration

This notebook explores the NCATS Translator Node Annotator API to understand:
1. Response structure and available annotation fields
2. How to parse annotations into TranslatorNode objects
3. What attributes are available for different node types
4. Performance and error handling for batch annotation

**API Documentation**: https://annotator.transltr.io/

In [1]:
import requests
import json
from pathlib import Path
import urllib.parse

API_URL = 'https://annotator.transltr.io/'

## 1. Check API Status

In [2]:
# Check API is running
response = requests.get(f'{API_URL}status')
print(f"Status: {response.status_code}")
print(response.json())

Status: 200
{'success': True}


## 2. Test Single CURIE Annotation

In [3]:
# Test with APOE gene
test_curie = "NCBIGene:348"

path = urllib.parse.urljoin(API_URL, 'curie')
response = requests.post(
    path, 
    json={'ids': [test_curie]},
    timeout=30
)
response.raise_for_status()

result = response.json()
print(f"Response keys: {result.keys()}")
print(f"\nAPOE annotation structure:")
print(json.dumps(result[test_curie], indent=2)[:3000])

Response keys: dict_keys(['NCBIGene:348'])

APOE annotation structure:
[
  {
    "query": "348",
    "HGNC": "613",
    "MIM": "107741",
    "_id": "348",
    "_score": 26.492334,
    "alias": [
      "AD2",
      "APO-E",
      "ApoE4",
      "LDLCQ5",
      "LPG"
    ],
    "go": {
      "BP": [
        {
          "evidence": "NAS",
          "gocategory": "BP",
          "id": "GO:0000302",
          "pubmed": 11743999,
          "qualifier": "involved_in",
          "term": "response to reactive oxygen species"
        },
        {
          "evidence": "IEA",
          "gocategory": "BP",
          "id": "GO:0001568",
          "qualifier": "acts_upstream_of_or_within",
          "term": "blood vessel development"
        },
        {
          "evidence": "IDA",
          "gocategory": "BP",
          "id": "GO:0001937",
          "pubmed": 9685360,
          "qualifier": "involved_in",
          "term": "negative regulation of endothelial cell proliferation"
        },
        

In [4]:
# Explore available fields
apoe_data = result[test_curie]
if isinstance(apoe_data, list) and apoe_data:
    apoe_data = apoe_data[0]  # Unwrap if list

print("Available fields:")
for key in sorted(apoe_data.keys()):
    value = apoe_data[key]
    value_type = type(value).__name__
    preview = str(value)[:80] if value else "None"
    print(f"  {key}: ({value_type}) {preview}")

Available fields:
  HGNC: (str) 613
  MIM: (str) 107741
  _id: (str) 348
  _score: (float) 26.492334
  alias: (list) ['AD2', 'APO-E', 'ApoE4', 'LDLCQ5', 'LPG']
  go: (dict) {'BP': [{'evidence': 'NAS', 'gocategory': 'BP', 'id': 'GO:0000302', 'pubmed': 11
  interpro: (list) [{'desc': 'Apolipoprotein A/E', 'id': 'IPR000074', 'short_desc': 'ApoA_E'}, {'de
  name: (str) apolipoprotein E
  pharos: (dict) {'target_id': 7842, 'tdl': 'Tbio'}
  query: (str) 348
  summary: (str) The protein encoded by this gene is a major apoprotein of the chylomicron. It bi
  symbol: (str) APOE
  taxid: (int) 9606
  type_of_gene: (str) protein-coding


## 3. Test Batch Annotation with Multiple Node Types

In [5]:
# Test with diverse CURIEs
test_curies = [
    "NCBIGene:348",     # APOE gene
    "NCBIGene:351",     # APP gene  
    "NCBIGene:5663",    # PSEN1 gene
    "MONDO:0004975",    # Alzheimer's disease
    "GO:0006915",       # apoptotic process
    "CHEBI:15377",      # water (chemical)
    "UniProtKB:P02649", # APOE protein
]

path = urllib.parse.urljoin(API_URL, 'curie')
response = requests.post(
    path,
    json={'ids': test_curies},
    timeout=60
)
response.raise_for_status()

batch_result = response.json()
print(f"Results for {len(batch_result)} CURIEs:")
for curie in test_curies:
    if curie in batch_result:
        data = batch_result[curie]
        if isinstance(data, list) and data:
            data = data[0]
        keys = list(data.keys())[:10] if isinstance(data, dict) else []
        print(f"  {curie}: {len(keys)} fields - {keys[:5]}...")
    else:
        print(f"  {curie}: NOT FOUND")

Results for 7 CURIEs:
  NCBIGene:348: 10 fields - ['query', 'HGNC', 'MIM', '_id', '_score']...
  NCBIGene:351: 10 fields - ['query', 'HGNC', 'MIM', '_id', '_score']...
  NCBIGene:5663: 10 fields - ['query', 'HGNC', 'MIM', '_id', '_score']...
  MONDO:0004975: 6 fields - ['query', '_id', '_score', 'disease_ontology', 'mondo']...
  GO:0006915: 0 fields - []...
  CHEBI:15377: 10 fields - ['query', '_id', '_ignored', '_score', 'aeolus']...
  UniProtKB:P02649: 10 fields - ['query', 'HGNC', 'MIM', '_id', '_score']...


## 4. TranslatorNode Classes

In [6]:
from dataclasses import dataclass, field
from typing import Any, Optional, List

@dataclass
class TranslatorAttribute:
    """Node annotation attribute."""
    attribute_type_id: str
    value: Any
    value_type_id: Optional[str] = None
    original_attribute_name: Optional[str] = None
    value_url: Optional[str] = None
    attribute_source: Optional[str] = None
    description: Optional[str] = None
    attributes: Optional[list] = None


@dataclass
class TranslatorNode:
    """Translator graph node."""
    curie: str
    label: Optional[str] = None
    types: Optional[List[str]] = None
    synonyms: Optional[List[str]] = None
    curie_synonyms: Optional[List[str]] = None
    attributes: Optional[List[TranslatorAttribute]] = None
    taxa: Optional[List[str]] = None

    @property
    def identifier(self):
        return self.curie

    @identifier.setter
    def identifier(self, i):
        self.curie = i

    @property
    def categories(self):
        return self.types

    @classmethod
    def from_dict(cls, data_dict: dict, return_synonyms=False):
        """Creates a TranslatorNode from a data dict."""
        if 'curie' not in data_dict:
            raise ValueError('Missing "curie" key')
        n = cls(data_dict['curie'])
        if 'label' in data_dict:
            n.label = data_dict['label']
        if 'types' in data_dict:
            n.types = [f"biolink:{ty}" if not ty.startswith('biolink:') else ty 
                       for ty in data_dict['types']]
        if 'taxa' in data_dict:
            n.taxa = data_dict['taxa']
        if return_synonyms:
            if 'synonyms' in data_dict:
                n.synonyms = data_dict['synonyms']
            elif 'names' in data_dict:
                n.synonyms = data_dict['names']
        return n

## 5. Parse Annotation Response to TranslatorNode

Store ALL attributes without filtering - let the UI dynamically discover what's available.

In [7]:
def parse_annotation_to_translator_node(curie: str, node_data) -> Optional[TranslatorNode]:
    """Parse Node Annotator response to TranslatorNode.
    
    Stores ALL attributes from the API without filtering.
    """
    if not node_data:
        return None
    
    # Unwrap list if needed
    if isinstance(node_data, list):
        if not node_data:
            return None
        node_data = node_data[0]
    
    if not isinstance(node_data, dict):
        return None
    
    # Check for "not found" marker
    if node_data.get('notfound'):
        return None
    
    # Extract label (try multiple field names)
    label = (
        node_data.get('name') or 
        node_data.get('symbol') or 
        node_data.get('label') or
        curie
    )
    
    # Extract types/categories
    types = []
    if 'type_of_gene' in node_data:
        types.append(f"gene_type:{node_data['type_of_gene']}")
    
    # Extract synonyms
    synonyms = node_data.get('alias', [])
    if isinstance(synonyms, str):
        synonyms = [synonyms]
    
    # Extract taxa
    taxa = []
    if 'taxid' in node_data:
        taxa.append(f"NCBITaxon:{node_data['taxid']}")
    
    # Convert ALL fields to TranslatorAttributes (store everything)
    attributes = []
    skip_keys = {'_id', '_score', 'query', 'notfound'}  # Skip only metadata
    for key, value in node_data.items():
        if key not in skip_keys:
            attributes.append(TranslatorAttribute(
                attribute_type_id=key,
                value=value
            ))
    
    return TranslatorNode(
        curie=curie,
        label=label,
        types=types if types else None,
        synonyms=synonyms if synonyms else None,
        attributes=attributes if attributes else None,
        taxa=taxa if taxa else None,
    )

# Test parsing
for curie, data in batch_result.items():
    tn = parse_annotation_to_translator_node(curie, data)
    if tn:
        print(f"{tn.curie}:")
        print(f"  label: {tn.label}")
        print(f"  types: {tn.types}")
        print(f"  taxa: {tn.taxa}")
        print(f"  attributes: {len(tn.attributes) if tn.attributes else 0} fields")
        print()

NCBIGene:348:
  label: apolipoprotein E
  types: ['gene_type:protein-coding']
  taxa: ['NCBITaxon:9606']
  attributes: 11 fields

NCBIGene:351:
  label: amyloid beta precursor protein
  types: ['gene_type:protein-coding']
  taxa: ['NCBITaxon:9606']
  attributes: 11 fields

NCBIGene:5663:
  label: presenilin 1
  types: ['gene_type:protein-coding']
  taxa: ['NCBITaxon:9606']
  attributes: 11 fields

MONDO:0004975:
  label: MONDO:0004975
  types: None
  taxa: None
  attributes: 3 fields

CHEBI:15377:
  label: CHEBI:15377
  types: None
  taxa: None
  attributes: 12 fields

UniProtKB:P02649:
  label: apolipoprotein E
  types: ['gene_type:protein-coding']
  taxa: ['NCBITaxon:9606']
  attributes: 11 fields



## 6. Store Raw Attributes for Dynamic Filtering

Instead of pre-extracting specific "filterable" attributes, we store ALL attributes and let the UI discover what's filterable based on:
- Value is a scalar (string, int, float, bool)
- Has a reasonable number of unique values (2-50)

In [8]:
def get_raw_annotations(translator_node: TranslatorNode) -> dict:
    """Get all raw annotations as a flat dict."""
    if not translator_node or not translator_node.attributes:
        return {}
    
    return {
        attr.attribute_type_id: attr.value
        for attr in translator_node.attributes
    }

# Show raw attributes for each node
print("Raw annotations per node:")
for curie, data in batch_result.items():
    tn = parse_annotation_to_translator_node(curie, data)
    if tn:
        raw = get_raw_annotations(tn)
        print(f"\n{curie} ({len(raw)} attributes):")
        for k, v in list(raw.items())[:10]:
            v_str = str(v)[:60]
            print(f"  {k}: {v_str}")

Raw annotations per node:

NCBIGene:348 (11 attributes):
  HGNC: 613
  MIM: 107741
  alias: ['AD2', 'APO-E', 'ApoE4', 'LDLCQ5', 'LPG']
  go: {'BP': [{'evidence': 'NAS', 'gocategory': 'BP', 'id': 'GO:00
  interpro: [{'desc': 'Apolipoprotein A/E', 'id': 'IPR000074', 'short_de
  name: apolipoprotein E
  pharos: {'target_id': 7842, 'tdl': 'Tbio'}
  summary: The protein encoded by this gene is a major apoprotein of th
  symbol: APOE
  taxid: 9606

NCBIGene:351 (11 attributes):
  HGNC: 620
  MIM: 104760
  alias: ['AAA', 'ABETA', 'ABPP', 'AD1', 'APPI', 'CTFgamma', 'CVAP', 
  go: {'BP': [{'evidence': 'IGI', 'gocategory': 'BP', 'id': 'GO:00
  interpro: [{'desc': 'Pancreatic trypsin inhibitor Kunitz domain', 'id'
  name: amyloid beta precursor protein
  pharos: {'target_id': 15398, 'tdl': 'Tclin'}
  summary: This gene encodes a cell surface receptor and transmembrane 
  symbol: APP
  taxid: 9606

NCBIGene:5663 (11 attributes):
  HGNC: 9508
  MIM: 104311
  alias: ['ACNINV3', 'AD3', 'CMD1U', 'FAD'

## 7. Discover Filterable Attributes Dynamically

Scan all annotations to find attributes with a reasonable number of unique scalar values.

In [9]:
def discover_filterable_attributes(annotations: dict, min_values=2, max_values=50) -> dict:
    """Discover attributes that are good for filtering.
    
    Returns attributes where:
    - Values are scalar (str, int, float, bool)
    - Number of unique values is between min_values and max_values
    """
    all_keys: dict[str, set] = {}
    
    for curie, data in annotations.items():
        tn = parse_annotation_to_translator_node(curie, data)
        if not tn or not tn.attributes:
            continue
        
        for attr in tn.attributes:
            value = attr.value
            # Only include scalar values
            if isinstance(value, (str, int, float, bool)):
                if attr.attribute_type_id not in all_keys:
                    all_keys[attr.attribute_type_id] = set()
                all_keys[attr.attribute_type_id].add(value)
    
    # Filter to attributes with reasonable number of unique values
    filterable = {}
    for key, values in all_keys.items():
        if min_values <= len(values) <= max_values:
            filterable[key] = sorted(str(v) for v in values)
    
    return filterable

# Discover filterable attributes from batch results
filterable = discover_filterable_attributes(batch_result)
print(f"Discovered {len(filterable)} filterable attributes:")
for key, values in filterable.items():
    print(f"  {key}: {values[:5]}{'...' if len(values) > 5 else ''} ({len(values)} values)")

Discovered 5 filterable attributes:
  HGNC: ['613', '620', '9508'] (3 values)
  MIM: ['104311', '104760', '107741'] (3 values)
  name: ['amyloid beta precursor protein', 'apolipoprotein E', 'presenilin 1'] (3 values)
  summary: ["Alzheimer's disease (AD) patients with an inherited form of the disease carry mutations in the presenilin proteins (PSEN1; PSEN2) or in the amyloid precursor protein (APP). These disease-linked mutations result in increased production of the longer form of amyloid-beta (main component of amyloid deposits found in AD brains). Presenilins are postulated to regulate APP processing through their effects on gamma-secretase, an enzyme that cleaves APP. Also, it is thought that the presenilins are involved in the cleavage of the Notch receptor, such that they either directly regulate gamma-secretase activity or themselves are protease enzymes. Several alternatively spliced transcript variants encoding different isoforms have been identified for this gene, the full-le

## 8. Test with Real Graph from Cache

In [10]:
# Find latest cache file
cache_dir = Path("../data/cache")
cache_files = sorted(cache_dir.glob("tct_results_*.json"), reverse=True)
if cache_files:
    latest_cache = cache_files[0]
    print(f"Using cache: {latest_cache.name}")
else:
    print("No cache files found - run a query in the app first")
    latest_cache = None

Using cache: tct_results_20251217_091602.json


In [11]:
if latest_cache:
    with open(latest_cache) as f:
        cached_data = json.load(f)
    
    print(f"Cache keys: {cached_data.keys()}")
    
    # Extract unique node CURIEs from edges
    edges = cached_data.get('edges', [])
    node_curies = set()
    for edge in edges:
        node_curies.add(edge.get('subject'))
        node_curies.add(edge.get('object'))
    node_curies.discard(None)
    
    print(f"Found {len(node_curies)} unique node CURIEs")
    print(f"Sample CURIEs: {list(node_curies)[:10]}")

Cache keys: dict_keys(['query_id', 'input_genes', 'target_disease', 'edges', 'metadata', 'timestamp', 'apis_queried', 'apis_succeeded'])
Found 69 unique node CURIEs
Sample CURIEs: ['NCBIGene:6355', 'NCBIGene:969', 'NCBIGene:4283', 'NCBIGene:3565', 'CHEBI:4026', 'NCBIGene:6347', 'NCBIGene:1401', 'CHEBI:68610', 'MONDO:0005271', 'NCBIGene:3569']


## 9. Batch Annotation Performance Test

In [12]:
import time

def batch_annotate(curies: list, batch_size: int = 500):
    """Annotate CURIEs in batches."""
    all_results = {}
    total_batches = (len(curies) + batch_size - 1) // batch_size
    
    for i in range(0, len(curies), batch_size):
        batch = list(curies)[i:i+batch_size]
        batch_num = i // batch_size + 1
        
        start = time.time()
        try:
            path = urllib.parse.urljoin(API_URL, 'curie')
            response = requests.post(
                path,
                json={'ids': batch},
                timeout=120
            )
            response.raise_for_status()
            batch_result = response.json()
            all_results.update(batch_result)
            duration = time.time() - start
            print(f"Batch {batch_num}/{total_batches}: {len(batch)} CURIEs, "
                  f"{len(batch_result)} results, {duration:.2f}s")
        except Exception as e:
            duration = time.time() - start
            print(f"Batch {batch_num}/{total_batches}: FAILED after {duration:.2f}s - {e}")
    
    return all_results

# Test with cached nodes (if available)
if latest_cache and node_curies:
    # Limit to first 100 for testing
    test_nodes = list(node_curies)[:100]
    print(f"Testing batch annotation with {len(test_nodes)} nodes...\n")
    
    start_total = time.time()
    results = batch_annotate(test_nodes, batch_size=50)
    total_time = time.time() - start_total
    
    print(f"\nTotal: {len(results)} annotated in {total_time:.2f}s")
    print(f"Success rate: {len(results)/len(test_nodes)*100:.1f}%")
    
    # Discover filterable attributes from real data
    filterable = discover_filterable_attributes(results)
    print(f"\nDiscovered {len(filterable)} filterable attributes from real data:")
    for key, values in list(filterable.items())[:10]:
        print(f"  {key}: {len(values)} unique values")

Testing batch annotation with 69 nodes...

Batch 1/2: 50 CURIEs, 50 results, 6.66s
Batch 2/2: 19 CURIEs, 19 results, 6.25s

Total: 69 annotated in 12.91s
Success rate: 100.0%

Discovered 1 filterable attributes from real data:
  alias: 3 unique values


## 10. GO Term Extraction

The GO (Gene Ontology) data is nested - we need to extract BP (Biological Process), MF (Molecular Function), and CC (Cellular Component) terms for filtering.

In [13]:
# Look at the nested GO structure
apoe_data = batch_result.get('NCBIGene:348', [])
if isinstance(apoe_data, list) and apoe_data:
    apoe_data = apoe_data[0]

go_data = apoe_data.get('go', {})
print(f"GO data keys: {list(go_data.keys())}")
print()

# Count BP terms
bp_terms = go_data.get('BP', [])
print(f"APOE has {len(bp_terms)} GO:BP annotations")
print("\nFirst 10 GO:BP terms:")
for term in bp_terms[:10]:
    print(f"  {term.get('id')}: {term.get('term')}")

GO data keys: ['BP', 'MF']

APOE has 159 GO:BP annotations

First 10 GO:BP terms:
  GO:0000302: response to reactive oxygen species
  GO:0001568: blood vessel development
  GO:0001937: negative regulation of endothelial cell proliferation
  GO:0002021: response to dietary excess
  GO:0006629: lipid metabolic process
  GO:0006629: lipid metabolic process
  GO:0006631: fatty acid metabolic process
  GO:0006641: triglyceride metabolic process
  GO:0006641: triglyceride metabolic process
  GO:0006707: cholesterol catabolic process


In [14]:
def extract_go_terms(annotations: dict) -> dict:
    """Extract GO BP/MF/CC terms from all annotated nodes.
    
    Returns dict with go_bp, go_mf, go_cc keys mapping to sets of terms.
    """
    go_terms = {'go_bp': set(), 'go_mf': set(), 'go_cc': set()}
    
    for curie, data in annotations.items():
        if isinstance(data, list) and data:
            data = data[0]
        if not isinstance(data, dict):
            continue
            
        go_data = data.get('go', {})
        if not isinstance(go_data, dict):
            continue
        
        # Extract BP terms
        for term_entry in go_data.get('BP', []):
            if isinstance(term_entry, dict) and 'term' in term_entry:
                go_terms['go_bp'].add(term_entry['term'])
        
        # Extract MF terms
        for term_entry in go_data.get('MF', []):
            if isinstance(term_entry, dict) and 'term' in term_entry:
                go_terms['go_mf'].add(term_entry['term'])
        
        # Extract CC terms
        for term_entry in go_data.get('CC', []):
            if isinstance(term_entry, dict) and 'term' in term_entry:
                go_terms['go_cc'].add(term_entry['term'])
    
    return go_terms

# Test GO term extraction on our batch results
go_terms = extract_go_terms(batch_result)
print("GO terms extracted from batch results:")
print(f"  Biological Process (BP): {len(go_terms['go_bp'])} unique terms")
print(f"  Molecular Function (MF): {len(go_terms['go_mf'])} unique terms")
print(f"  Cellular Component (CC): {len(go_terms['go_cc'])} unique terms")
print()
print("Sample GO:BP terms:")
for term in sorted(go_terms['go_bp'])[:15]:
    print(f"  - {term}")

GO terms extracted from batch results:
  Biological Process (BP): 320 unique terms
  Molecular Function (MF): 60 unique terms
  Cellular Component (CC): 0 unique terms

Sample GO:BP terms:
  - AMPA glutamate receptor clustering
  - Cajal-Retzius cell differentiation
  - DNA damage response
  - G protein-coupled receptor signaling pathway
  - L-glutamate import across plasma membrane
  - NMDA glutamate receptor clustering
  - NMDA selective glutamate receptor signaling pathway
  - Notch receptor processing
  - Notch signaling pathway
  - T cell activation involved in immune response
  - T cell receptor signaling pathway
  - acylglycerol homeostasis
  - adenylate cyclase-activating G protein-coupled receptor signaling pathway
  - adenylate cyclase-inhibiting G protein-coupled receptor signaling pathway
  - adult locomotory behavior


## 11. Test with Real Cached Data

Now let's test GO term extraction with the full cached graph data to see how many terms we get.

In [15]:
# Annotate all nodes from cached graph and extract GO terms
if latest_cache and node_curies:
    print(f"Annotating {len(node_curies)} nodes from cached graph...")
    all_results = batch_annotate(list(node_curies), batch_size=100)
    
    go_terms = extract_go_terms(all_results)
    print(f"\nGO terms across {len(node_curies)} nodes:")
    print(f"  Biological Process (BP): {len(go_terms['go_bp'])} unique terms")
    print(f"  Molecular Function (MF): {len(go_terms['go_mf'])} unique terms")
    print(f"  Cellular Component (CC): {len(go_terms['go_cc'])} unique terms")
    
    print("\n--- Sample GO:BP terms (useful for filtering) ---")
    bp_sample = sorted(go_terms['go_bp'])[:30]
    for i, term in enumerate(bp_sample):
        print(f"  {i+1}. {term}")

Annotating 69 nodes from cached graph...
Batch 1/1: 69 CURIEs, 69 results, 7.33s

GO terms across 69 nodes:
  Biological Process (BP): 1060 unique terms
  Molecular Function (MF): 161 unique terms
  Cellular Component (CC): 0 unique terms

--- Sample GO:BP terms (useful for filtering) ---
  1. AMPA glutamate receptor clustering
  2. B cell activation
  3. B cell activation involved in immune response
  4. B cell differentiation
  5. B cell proliferation
  6. ERK1 and ERK2 cascade
  7. Fc-gamma receptor signaling pathway involved in phagocytosis
  8. G protein-coupled receptor internalization
  9. G protein-coupled receptor signaling pathway
  10. G protein-coupled receptor signaling pathway, coupled to cyclic nucleotide second messenger
  11. JNK cascade
  12. MAPK cascade
  13. NMDA glutamate receptor clustering
  14. Notch signaling pathway
  15. Peyer's patch development
  16. Ras protein signal transduction
  17. Rho protein signal transduction
  18. T cell activation
  19. T cell 

## 12. Summary

### Key Updates

The `NodeAnnotator` class now extracts GO terms from the nested API response:

1. **GO terms are stored as lists**: Each node's `node_annotations` dict includes:
   - `go_bp`: List of Biological Process term names
   - `go_mf`: List of Molecular Function term names
   - `go_cc`: List of Cellular Component term names

2. **Metadata separates attribute types**:
   - `filterable_attributes`: Scalar values with 2-50 unique values (dropdown-friendly)
   - `searchable_attributes`: List-valued attributes like GO terms (searchable multiselect)

3. **Filter logic handles lists**: When filtering by GO terms, a node matches if it has ANY of the selected terms (not all).

### UI Implementation

The app.py UI now has two sections:
1. **GO Term Filters**: Searchable multiselect for BP/MF/CC terms
2. **Other Annotation Filters**: Dropdown-based filters for scalar attributes

### Example Usage

To filter for genes involved in "lipid metabolic process":
1. Open the "Annotation Filters" expander
2. In "Biological Process", type "lipid" to search
3. Select "lipid metabolic process"
4. The graph will show only nodes with that GO term (+ their 1-hop neighbors)