## 🌐 Ontology Grounding with QUDT

Once a process schema is extracted, it can be semantically grounded using the [QUDT Ontologies](https://www.qudt.org/pages/HomePage.html) (Quantities, Units, Dimensions, and Data Types). This step ensures that schema attributes are linked to standardized scientific units and quantities, enabling interoperability and machine reasoning.

**Overview**
1. [Intalling Schema_miner](#installing-schema_miner)
2. [Configuration](#configuration)
3. [Ontology Grounding with QUDT Using LLM Prompting](#ontology-grounding-with-qudt-using-llm-prompting)
4. [Ontology Grounding with QUDT Using Agentic Workflow](#ontology-grounding-with-qudt-using-agentic-workflow)

### 🗂️ Installing Schema_miner <a id='installing-schema-miner'></a>

You can install schema-miner via **pip**:

```bash
pip install -i https://test.pypi.org/simple/ schema-miner
```

Or directly from GitHub (latest development version):

```bash
pip install git+https://github.com/sciknoworg/schema-miner.git
```

### 🛠️ Configuration <a id='configuration'></a>

Before running schema-miner, configure your environment.

In [7]:
import warnings
warnings.filterwarnings("ignore")

In [8]:
# Configure logging
import logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

In [None]:
from schema_miner.config.envConfig import EnvConfig
EnvConfig.OPENAI_api_key = '<insert-your-openai-key>'
EnvConfig.OPENAI_organization_id = '<insert-your-openi-organization-id>'

### 🧪 Process Setup

In this notebook, we demonstrate schema grounding on the domain of **Atomic Layer Deposition (ALD)**.  
This process involves alternating exposures of a substrate to chemical precursors, enabling precise thin-film growth at the atomic scale.

> 💡 To adapt this notebook to a new domain, replace the domain name/description here and update the input literature paths in the Data Setup section.

In [9]:
# Add process name and process description whose schema have to be extracted
from schema_miner.config.processConfig import ProcessConfig
ProcessConfig.Process_name = "Atomic Layer Deposition"
ProcessConfig.Process_description = "An ALD process involves a series of controlled chemical reactions used to deposit thin films on a surface at an atomic level"

### 🌐 Ontology Grounding with QUDT Using LLM Prompting <a id='prompt-grounding'></a>

In this approach, we ask the LLM directly to suggest ontology matches for schema properties from its prior knowledge.

In [10]:
import json
from pathlib import Path
from schema_miner.ontology_grounding.prompt_qudt_grounding import prompt_based_qudt_grounding

In [None]:
# Large Language Model (LLM) to be used for Schema Extraction
llm_model_name = 'gpt-4o'

# Ground the schema with QUDT Ontology
process_schema = Path('../results/stage-3/Atomic-Layer-Deposition/experimental-schema/gpt-4o.json')
results_file_path = Path("../results/qudt-grounded/Atomic-Layer-Deposition/experimental-schema")
schema = prompt_based_qudt_grounding(llm_model_name, process_schema, results_file_path, save_schema = True)

In [12]:
# Display the final Process Schema
logging.info(f'{ProcessConfig.Process_name} Schema:\n{json.dumps(schema, indent = 2)}')

2025-09-17 16:01:49,696 - root - INFO - Atomic Layer Deposition Schema:
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Atomic Layer Deposition Process",
  "type": "object",
  "properties": {
    "reactantSelection": {
      "type": "object",
      "description": "Details about the precursor and co-reactant selection.",
      "properties": {
        "precursor": {
          "type": "string",
          "description": "The chemical compound used as the precursor."
        },
        "coReactant": {
          "type": "string",
          "description": "The chemical compound used as the co-reactant."
        },
        "deliveryMethod": {
          "type": "string",
          "enum": [
            "vapor drawn",
            "carrier gas assisted",
            "bubbling"
          ],
          "description": "Method of delivering the precursor to the chamber."
        }
      },
      "required": [
        "precursor",
        "coReactant"
      ]
    },
    "chemicalC

### 🌐 Ontology Grounding with QUDT Using Agentic Workflow <a id='agentic-grounding'></a>

Here we use an agent-based pipeline to ground schema elements. The agent retrieves ontology candidates, reasons over matches, and validates outputs.

In [13]:
from schema_miner.ontology_grounding.agentic_qudt_grounding import agentic_qudt_grounding

In [None]:
# Large Language Model (LLM) to be used for Schema Extraction
llm_model_name = 'gpt-4o'

# Ground the schema with QUDT Ontology
process_schema = Path('../results/stage-3/Atomic-Layer-Deposition/experimental-schema/gpt-4o.json')
results_file_path = Path("../results/agentic-qudt-grounded/Atomic-Layer-Deposition/experimental-schema")
schema = agentic_qudt_grounding(llm_model_name, process_schema, results_file_path, save_schema = True)

In [15]:
# Display the final Process Schema
logging.info(f'{ProcessConfig.Process_name} Schema:\n{json.dumps(schema, indent = 2)}')

2025-09-17 16:17:55,645 - root - INFO - Atomic Layer Deposition Schema:
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Atomic Layer Deposition Process",
  "type": "object",
  "properties": {
    "reactantSelection": {
      "type": "object",
      "description": "Details about the precursor and co-reactant selection.",
      "properties": {
        "precursor": {
          "type": "string",
          "description": "The chemical compound used as the precursor."
        },
        "coReactant": {
          "type": "object",
          "description": "The chemical compound used as the co-reactant.",
          "properties": {
            "quantityValue": {
              "type": "object",
              "description": "The measured or observed value of the physical quantity, expressed as a numeric value along with its associated unit.",
              "properties": {
                "numericValue": {
                  "type": "number",
                  "description": "