# KETTLE → OntoGSN (pruned export)

This version improves on the previous notebook:

1. **Minimal TBox** – only classes/properties actually *used* in the population are kept in the output TTL; all other axioms from `ontogsn.owl` and `asce_ontogsn_mappings.ttl` are discarded.
2. **Dual typing** – every individual is explicitly typed as **`asce:Node`** *and* its corresponding `gsn:Class`.
3. **No duplicate class assertions** – each `(instance rdf:type gsn:Class)` is asserted once.

Run all cells and you’ll get a compact `kettle_populated.ttl`.


In [None]:
# If necessary, install rdflib
# !pip install rdflib

In [None]:
from pathlib import Path
import xml.etree.ElementTree as ET
from collections import defaultdict
import rdflib
from rdflib import Graph, Namespace, URIRef, Literal, RDF, RDFS, OWL

## 1 – File paths

In [None]:
AXML_FILE   = Path('KETTLE.axml') # your ASCE filepath here
MAPPING_TTL = Path('asce_mappings.ttl')
ONTOGSN_OWL = Path('../ontogsn.owl')          # optional (only for pruning metadata)
OUTPUT_TTL  = Path('kettle_populated.ttl')

assert AXML_FILE.exists() and MAPPING_TTL.exists(), 'Put the input files next to this notebook'

## 2 – Namespaces and lookup tables

In [None]:
EX      = Namespace('http://example.org/kettle/')       # individuals
ASCE    = Namespace('https://fortiss.github.io/OntoGSN/ontology/ascemappings#')
GSN     = Namespace('https://w3id.org/ontogsn/ontology#')

# Mapping: numeric code ➜ (ASCE class, GSN class)
NODE_TYPE_MAP = {
    '1': (ASCE.Node,        GSN.Goal),
    '2': (ASCE.Node,    GSN.Solution),
    '3': (ASCE.Node,    GSN.Strategy),
    '4': (ASCE.Node,  GSN.Assumption),
    '5': (ASCE.Node, GSN.Justification),
    '6': (ASCE.Node,     GSN.Context),
    '7': (ASCE.Node,       GSN.Model),
    '8': (ASCE.Node,        GSN.Note),
    '9': (ASCE.Node,      GSN.Option),
    '10': (ASCE.Node,    GSN.Caption),
}

LINK_TYPE_MAP = {
    '1': (ASCE.Link,  GSN.supportedBy),
    '2': (ASCE.Link,  GSN.inContextOf),
}

## 3 – Build the individuals graph

In [None]:
def inst(ref: str) -> URIRef:
    return EX[f'node_{ref}']

g_ind = Graph()
g_ind.bind('ex',  EX)
g_ind.bind('asce', ASCE)
g_ind.bind('gsn',  GSN)

tree = ET.parse(AXML_FILE)
root = tree.getroot()

ASCE_NODE_CLASS = ASCE.Node

# ---- Nodes ---------------------------------------------------------------
for node in root.findall('.//node'):
    ref = node.get('reference')
    if not ref:
        continue
    i = inst(ref)
    type_code = (node.findtext('type') or '').strip()
    asce_cls, gsn_cls = NODE_TYPE_MAP.get(type_code, (None, None))

    # dual typing
    g_ind.add((i, RDF.type, ASCE_NODE_CLASS))
    if asce_cls:
        g_ind.add((i, RDF.type, asce_cls))
    if gsn_cls:
        g_ind.add((i, RDF.type, gsn_cls))

    # rdfs:label / rdfs:comment
    title = node.findtext('user-title')
    if title:
        g_ind.add((i, RDFS.label, Literal(title)))
    desc = node.findtext('html-annotation')
    if desc:
        g_ind.add((i, RDFS.comment, Literal(desc)))

# ---- Links ---------------------------------------------------------------
for link in root.findall('.//link'):
    tcode = (link.findtext('type') or '').strip()
    asce_p, gsn_p = LINK_TYPE_MAP.get(tcode, (None, None))

    s_ref = (link.findtext('source-reference') or '').strip()
    o_ref = (link.findtext('destination-reference') or '').strip()
    if not s_ref or not o_ref:
        continue

    s = inst(s_ref)
    o = inst(o_ref)

    if asce_p:
        g_ind.add((s, asce_p, o))
    if gsn_p:
        g_ind.add((s, gsn_p, o))

print(f'Instance graph: {len(g_ind):,} triples')

## 4 – Determine which TBox terms are *used*

In [None]:
used_terms = set()

for s, p, o in g_ind:
    if isinstance(s, URIRef):
        used_terms.add(s)
    if isinstance(p, URIRef):
        used_terms.add(p)
    if isinstance(o, URIRef):
        used_terms.add(o)

print(f'Unique used URI resources (classes, properties, individuals): {len(used_terms)}')

## 5 – Load ontologies, keep only axioms involving used terms

In [None]:
g_tbox = Graph()

def add_relevant(src_graph: Graph):
    """Copy only triples where any of S, P, O is in `used_terms`."""
    for s, p, o in src_graph:
        if s in used_terms or p in used_terms or o in used_terms:
            g_tbox.add((s, p, o))

# --- Mapping ontology ------------------------------------------------------
mapping_graph = Graph()
mapping_graph.parse(MAPPING_TTL, format='turtle')
add_relevant(mapping_graph)

# --- OntoGSN ontology ------------------------------------------------------
if ONTOGSN_OWL.exists():
    try:
        onto_graph = Graph()
        onto_graph.parse(ONTOGSN_OWL)   # auto-detects RDF/XML
        add_relevant(onto_graph)
    except Exception as e:
        print('Could not parse ontogsn.owl – skipped pruning from it:', e)

print(f'TBox after pruning: {len(g_tbox):,} triples')

## 6 – Merge and serialise

In [None]:
g_final = g_ind + g_tbox   # rdflib allows graph addition (union)

g_final.serialize(destination=OUTPUT_TTL, format='turtle')
print('Saved compact graph to', OUTPUT_TTL.resolve())

---

### Result

* **Individuals** are of *three* types each:  
  * `asce:Node` (generic)  
  * their specific `asce:` class (e.g. `asce:Goal`)  
  * the mapped `gsn:` class (e.g. `gsn:Goal`)
* Only classes/properties that appear in the instance data are kept, along with any
  mapping axioms directly connecting them.

Feel free to inspect `kettle_populated.ttl` in an RDF editor or run SPARQL queries in‑notebook.
