[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/introduction/14_Ontology.ipynb)

#  Ontology Generation 

Welcome to the comprehensive guide on Semantica's Ontology Module. This module is the powerhouse for structuring your data into meaningful knowledge graphs, providing a complete 6-stage pipeline from raw data to validated OWL ontologies.

In this notebook, we will dive deep into:
1.  **The 6-Stage Generation Pipeline**: Understanding how Semantica transforms data into knowledge.
2.  **Core Components in Focus**: Detailed usage of `ClassInferrer`, `PropertyGenerator`, and `OntologyOptimizer`.
3.  **Visualize**: exploring your ontology with interactive charts and hierarchies.
4.  **Advanced Usage**: Text-to-Ontology (LLM), Competency Questions, and Lifecycle Management.
5.  **Exporting & Interoperability**: Saving your work in standard formats like Turtle and RDF/XML.

**Documentation**: [API Reference](https://semantica.readthedocs.io/reference/ontology/)

## Getting Started

First, let's setup our environment and initialize the `OntologyEngine`. This engine is the unified entry point for all ontology operations.

In [None]:
!pip install -q semantica

In [None]:

from semantica.ontology import OntologyEngine, OntologyGenerator
from semantica.utils.logging import get_logger

# Initialize logger for visibility
logger = get_logger("ontology_guide")

# Initialize the Engine
# base_uri defines the namespace root for your ontology
engine = OntologyEngine(base_uri="https://docs.semantica.dev/ontology/", min_occurrences=1)

print("Ontology Engine initialized successfully!")

---

## The 5-Stage Generation Pipeline

Semantica uses a sophisticated 6-stage pipeline to robustly generate ontologies. This automated process takes raw entity and relationship data and produces a high-quality OWL ontology.

### The Stages:
1.  **Semantic Network Parsing**: Extracts raw concepts and connections from your inputs.
2.  **YAML-to-Definition**: Transforms concepts into structured class definitions.
3.  **Definition-to-Types**: Maps definitions to formal OWL types (e.g., `owl:Class`, `owl:ObjectProperty`).
4.  **Hierarchy Generation**: Builds a taxonomic structure (parent-child relationships) using `associatedWith` or linguistic patterns.
5.  **TTL Generation**: Serializes the in-memory structure into Turtle format logic.

Let's see this in action with some sample data.

In [None]:
# Sample Data: A simple corporate structure
entities = [
    {"id": "e1", "type": "Company", "name": "TechCorp", "founded": "2010"},
    {"id": "e2", "type": "Person", "name": "Alice", "role": "CEO"},
    {"id": "e3", "type": "Person", "name": "Bob", "role": "CTO"},
    {"id": "e4", "type": "Department", "name": "Engineering"},
    {"id": "e5", "type": "Project", "name": "Project Phoenix"},
    # Added entities to ensure robust class inference
    {"id": "e6", "type": "Company", "name": "TechSolutions", "founded": "2015"},
    {"id": "e7", "type": "Person", "name": "Charlie", "role": "Lead"},
    {"id": "e8", "type": "Person", "name": "Dave", "role": "Manager"},
    {"id": "e9", "type": "Department", "name": "Research"},
    {"id": "e10", "type": "Project", "name": "Project Orion"}
]

relationships = [
    {"source": "e2", "target": "e1", "type": "leads"},
    {"source": "e3", "target": "e4", "type": "manages"},
    {"source": "e4", "target": "e1", "type": "part_of"},
    {"source": "e3", "target": "e5", "type": "works_on"},
    # Additional relationships for new entities
    {"source": "e7", "target": "e6", "type": "leads"},
    {"source": "e8", "target": "e9", "type": "manages"},
    {"source": "e9", "target": "e6", "type": "part_of"},
    {"source": "e7", "target": "e10", "type": "works_on"}
]

data = {
    "entities": entities,
    "relationships": relationships
}

# Run the full pipeline
ontology = engine.from_data(data, name="CorporateOntology")

print(f"Generated Ontology: {ontology['name']}")
print(f"Classes Found: {len(ontology['classes'])}")
print(f"Properties Found: {len(ontology['properties'])}")

### Inspecting the Results

The generated `ontology` object is a rich dictionary containing all the inferred structure. Let's peek inside to see what Classes and Properties were created.

In [None]:
# Inspect Classes
print("--- Inferred Classes ---")
for cls in ontology['classes']:
    print(f"Class: {cls['name']}")
    print(f"  URI: {cls.get('uri')}")
    # Check if a hierarchy was inferred
    if cls.get('subClassOf'):
        print(f"  Parent: {cls['subClassOf']}")
    print("")

# Inspect Properties
print("--- Inferred Properties ---")
for prop in ontology['properties']:
    type_label = "Object Property" if prop['type'] == 'object' else "Data Property"
    print(f"{prop['name']} [{type_label}]")
    print(f"  Domain: {prop.get('domain')}")
    print(f"  Range: {prop.get('range')}")
    print("")

---

## Deep Dive: Component by Component

While `OntologyEngine` is great for one-shot generation, you often need fine-grained control. Let's look at the individual tools that power the engine.

### 1. `ClassInferrer`: Mastering Class Discovery

The `ClassInferrer` analyzes entities to find patterns. It can handle noise and only creates classes for types that appear frequently enough.

*   **`min_occurrences`**: Ignores types with fewer entities than this count.
*   **`build_class_hierarchy`**: Toggles automatic parent-child detection.


In [None]:
from semantica.ontology import ClassInferrer

# Initialize inferrer with a threshold
# We set min_occurrences=1 here to capture everything in our small example
inferrer = ClassInferrer(min_occurrences=1)

raw_entities = [
    {"type": "Manager", "name": "Dave", "level": 5},
    {"type": "Manager", "name": "Eve", "level": 4},
    {"type": "Employee", "name": "Frank"}, # Only 1 employee
    {"type": "TemporaryWorker", "name": "Grace"} 
]

# Infer classes
classes = inferrer.infer_classes(raw_entities, build_hierarchy=True)

print(f"Inferred {len(classes)} classes from raw entities.")
for c in classes:
    print(f"- {c['name']} (Count: {c['entity_count']})")

### 2. `PropertyGenerator`: The Glue of the Ontology

Properties define relationships. Semantica distinguishes between:
*   **Object Properties**: Links between two entities (e.g., `leads` between Person and Company).
*   **Data Properties**: Attributes of an entity (e.g., `founded` year of a Company).

The `PropertyGenerator` automatically detects this distinction.

In [None]:
from semantica.ontology import PropertyGenerator

prop_gen = PropertyGenerator()

# We need the classes first to help property generation context
context_classes = classes  # reusing from previous step

# Let's define some relationships and attributes implicitly via entities
# Note: 'level' in Manager entities is a potential data property
complex_entities = [
    {"id": "m1", "type": "Manager", "name": "Dave", "level": 5},
    {"id": "e1", "type": "Employee", "name": "Frank"}
]
complex_relationships = [
    {"source": "m1", "target": "e1", "type": "supervises"} # Object property
]

properties = prop_gen.infer_properties(
    entities=complex_entities,
    relationships=complex_relationships,
    classes=context_classes
)

print("--- Property Types Identified ---")
for p in properties:
    print(f"Property: {p['name']}")
    print(f"  Type: {p['type']}")
    print(f"  Domain: {p['domain']} -> Range: {p['range']}")

### 3. `OntologyOptimizer`: Refining the Structure

Before finalizing, it's good practice to optimize. The optimizer removes redundancies and improves coherence, such as ensuring all classes have proper labels and valid URIs.

In [None]:
from semantica.ontology import OntologyOptimizer

optimizer = OntologyOptimizer()

# Let's pretend we have a messy ontology dict
messy_ontology = {
    "classes": [
        {"name": "Person", "uri": "...Person"},
        {"name": "Person", "uri": "...Person"} # Duplicate!
    ],
    "properties": []
}

clean_ontology = optimizer.optimize_ontology(messy_ontology, remove_redundancy=True)

print(f"Original Classes: {len(messy_ontology['classes'])}")
print(f"Optimized Classes: {len(clean_ontology['classes'])}")

---

## Visualization

A picture is worth a thousand triplets! The `OntologyVisualizer` lets you explore your ontology's structure interactively.

We can visualize:
*   **Class Hierarchies**: Tree diagrams of class inheritance.
*   **Structure Networks**: The full graph of classes and properties.
*   **Metrics Dashboards**: High-level stats at a glance.

In [None]:
from semantica.visualization import OntologyVisualizer

viz = OntologyVisualizer()

# 1. Interactive Class Hierarchy
# Returns a Plotly figure you can interact with
fig_hierarchy = viz.visualize_hierarchy(ontology, output="interactive")
if fig_hierarchy:
    fig_hierarchy.show()

# 2. Ontology Structure Network
# See how classes and properties connect
fig_structure = viz.visualize_structure(ontology, output="interactive")
if fig_structure:
    fig_structure.show()

# 3. Metrics Dashboard
# View counts, depths, and statistics
fig_metrics = viz.visualize_metrics(ontology, output="interactive")
if fig_metrics:
    fig_metrics.show()

---

## Advanced Usage: Lifecycle & AI

Enterprise ontologies are living artifacts. Semantica provides tools to manage their entire lifecycle and accelerate creation with AI.

### 1. Text-to-Ontology (LLM Integration)

Instead of manually creating entities, use the `LLMOntologyGenerator` to extract an ontology directly from text requirements or documents.

In [None]:
from semantica.ontology import LLMOntologyGenerator

try:
    # Note: Requires an API key in your environment variables
    llm_gen = LLMOntologyGenerator(provider="openai", model="gpt-4")

    text_description = """
    A University has many Departments. Each Department offers several Courses.
    Professors teach Courses and belong to a Department.
    Students enroll in Courses.
    """

    llm_ontology = llm_gen.generate_ontology_from_text(
        text=text_description,
        name="UniversityOntology"
    )

    print("Generated Classes:", [c['name'] for c in llm_ontology['classes']])
except Exception:
    print("Skipping LLM generation: No API key or provider configured in this environment.")

### 2. Test-Driven Design (Competency Questions)

Formalize your requirements as "Competency Questions" (CQs). The `CompetencyQuestionsManager` can check if your ontology contains the necessary terms to answer them.

In [None]:
from semantica.ontology import CompetencyQuestionsManager

cq_manager = CompetencyQuestionsManager()

# Define what our ontology SHOULD answer
cq_manager.add_question("Who leads TechCorp?", category="organizational")
cq_manager.add_question("Which projects does Bob manage?", category="operational")

# Validate our 'ontology' against these questions
validation_results = cq_manager.validate_ontology(ontology)

print(f"Answerable Questions: {validation_results['answerable']} / {validation_results['total_questions']}")
for q in cq_manager.questions:
    status = "✅" if q.answerable else "❌"
    print(f"{status} {q.question}")

### 3. Lifecycle Management (Versioning & Reuse)

Manage iterations with `VersionManager` and import external standards like FOAF or Dublin Core with `ReuseManager`.

In [None]:
from semantica.ontology import VersionManager, ReuseManager

# --- Versioning ---
v_manager = VersionManager(base_uri="https://example.org/ontology/")
v1 = v_manager.create_version("1.0", ontology, changes=["Initial creation"])
print(f"Created Version: {v1.version} at {v1.ontology_iri}")

# --- Reuse ---
reuse_manager = ReuseManager()

# Check if we can reuse FOAF
foaf_info = reuse_manager.research_ontology("http://xmlns.com/foaf/0.1/")
if foaf_info:
    print(f"Found standard ontology: {foaf_info['name']}")
    # We could now import this into our ontology
    ontology['imports'].append(foaf_info['uri'])

---

## Exporting Your Ontology

Once your ontology is built and validated, you'll want to save it. Semantica focuses on **Turtle (`.ttl`)** as the primary format, but supports others via `rdflib`.

You can export to a string or directly to a file.

In [None]:
# Get Turtle string representation
ttl_output = engine.to_owl(ontology, format="turtle")

print("--- Turtle Preview (First 500 chars) ---")
print(ttl_output[:500])
print("...")

# Save to file
output_path = "corporate_ontology.ttl"
engine.export_owl(ontology, path=output_path, format="turtle")
print(f"Successfully saved ontology to {output_path}")

## Summary

You have now mastered the essentials of Semantica's Ontology Module!

*   **Automated Generation**: Used the 6-stage pipeline to go from raw data to a structured ontology.
*   **Component Control**: Used `ClassInferrer` and `PropertyGenerator` for fine-tuned modeling.
*   **Visualization**: Explored the ontology structure interactively.
*   **Advanced Lifecycle**: Used AI generation, competency questions, and versioning.
*   **Export**: Serialized your knowledge graph for use in other semantic web tools.

**Next Steps**:
*   Try customizing the `NamespaceManager` to use your organization's URL.
*   Explore `OntologyEvaluator` for deeper quality metrics.
*   Feed the generated ontology into the **Knowledge Graph** module to start reasoning over your data!