<a href="https://colab.research.google.com/github/suhtoo/SE-Pipeline/blob/main/SE_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Semantic Enrichment Pipeline with OpenBIM
<a name="semantic-enrichment-pipeline-with-openbim"></a>

This notebook demonstrates the case study implementation of Semantic Enrichment Pipeline in OpenBIM (IFC) and ASB-ING dataset in JSON format.


## Table of Contents
<a name="table-of-contents"></a>
1. [IFC to RDF](#ifc-to-rdf)
2. [ASB to RDF](#asb-to-rdf)
3. [Graph Merging](#graph-merging)
4. [Ontology Linking](#ontology-linking)
5. [Graph Completion](#graph-completion)
6. [Querying and Validation](#querying-and-validation)

   [Visualisation](#visualisation)

### Install Dependencies and Create Directories

In [None]:
!pip install ifcopenshell pandas rdflib owlrl pyvis requests

Collecting ifcopenshell
  Downloading ifcopenshell-0.8.1.post1-py311-none-manylinux_2_31_x86_64.whl.metadata (11 kB)
Collecting rdflib
  Downloading rdflib-7.1.3-py3-none-any.whl.metadata (11 kB)
Collecting owlrl
  Downloading owlrl-7.1.3-py3-none-any.whl.metadata (3.6 kB)
Collecting pyvis
  Downloading pyvis-0.3.2-py3-none-any.whl.metadata (1.7 kB)
Collecting isodate (from ifcopenshell)
  Downloading isodate-0.7.2-py3-none-any.whl.metadata (11 kB)
Collecting lark (from ifcopenshell)
  Downloading lark-1.2.2-py3-none-any.whl.metadata (1.8 kB)
Collecting jedi>=0.16 (from ipython>=5.3.0->pyvis)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading ifcopenshell-0.8.1.post1-py311-none-manylinux_2_31_x86_64.whl (40.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rdflib-7.1.3-py3-none-any.whl (564 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m564.9/564.9 kB

The functions in this notebook use default directories:
In "data" folder, upload the data sources like IFC files, JSON files and excel file for Mapping table. The output of the functions are saved in "output" folder if not specified. The directories can be manually created or run the script below.

In [None]:
import os

# Create data folder if it doesn't exist
data_folder = "data"
if not os.path.exists(data_folder):
    os.makedirs(data_folder)
    print(f"Created data folder: {data_folder}")
else:
    print(f"Data folder '{data_folder}' already exists.")

# Create output folder if it doesn't exist
output_folder = "output"
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
    print(f"Created output folder: {output_folder}")
else:
    print(f"Output folder '{output_folder}' already exists.")

Created data folder: data
Created output folder: output


# 1. IFC to RDF

## Extract Attributes from IFC


This can later be used to formulate alignment file between IFC model elements and other dataset.

In [None]:
import ifcopenshell
import pandas as pd
from pathlib import Path

def ifc_extractor(ifc_file_path: str):
    """
    Extracts all available attributes from IFC elements and saves them to a CSV file in the 'output' folder.

    Args:
        ifc_file_path (str): Path to the IFC file.
    """

    output_path = Path("output", "ifc_attributes.csv") # Default output path in 'output' folder

    # Load the IFC file
    ifc_file = ifcopenshell.open(ifc_file_path)

    # Get all elements
    elements = ifc_file.by_type('IfcObject')

    # Initialize list to store element data
    elements_data = []

    for element in elements:
        # Basic element data
        element_data = {
            'ElementType': element.is_a(),
            'PredefinedType': element.PredefinedType if hasattr(element, 'PredefinedType') else None,
            'GlobalId': element.GlobalId,
            'id': element.id(),
            'Name': getattr(element, 'Name', None),
            'Description': getattr(element, 'Description', None),
            'ObjectType': getattr(element, 'ObjectType', None),
        }

        # Get property sets
        if element.IsDefinedBy:
            for definition in element.IsDefinedBy:
                if definition.is_a('IfcRelDefinesByProperties'):
                    pset = definition.RelatingPropertyDefinition
                    if pset.is_a('IfcPropertySet'):
                        for prop in pset.HasProperties:
                            if hasattr(prop, 'NominalValue') and prop.NominalValue is not None:
                                element_data[f"{pset.Name}_{prop.Name}"] = prop.NominalValue.wrappedValue

        elements_data.append(element_data)

    # Convert to DataFrame and save to CSV
    df = pd.DataFrame(elements_data)
    df.to_csv(output_path, index=False)
    print(f"{output_path} created and saved with {len(df)} attributes")

    return df

# ifc_extractor = ifc_extractor("/content/data/UKA_UK_Aachen_IFC_02.ifc")


## Create IFC Instance Graph

Convert IFC file to RDF and save as TTL file.

In [None]:
# helper functions
import ifcopenshell
from rdflib import Graph, Namespace, URIRef, Literal
from pathlib import Path
import urllib.parse

def create_namespaces():
    """Create and return commonly used namespaces."""
    return {
        'INST': Namespace("http://ifc-instance.org/instances/"),
        'RDF': Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#"),
        'RDFS': Namespace("http://www.w3.org/2000/01/rdf-schema#"),
        'OWL': Namespace("http://www.w3.org/2002/07/owl#"),
        'BSDD': Namespace("https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/"),
        'PROP': Namespace("https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/"),
        'BROT': Namespace("https://w3id.org/brot#"),
        'BRCOMP': Namespace("https://w3id.org/brcomp#"),
        'ASB': Namespace("http://asb-example.org/")

    }

def create_and_bind_graph():
    """Create a new graph and bind namespaces."""
    g = Graph()
    namespaces = create_namespaces()

    # Bind all namespaces
    for prefix, ns in namespaces.items():
        g.bind(prefix.lower(), ns)

    return g, namespaces

def clean_uri_string(s: str) -> str:
    """Clean string to make it URI-safe."""
    if not isinstance(s, str):
        return ""
    cleaned = s.replace(" ", "_").replace("/", "_").replace(".", "_")
    return urllib.parse.quote(cleaned)

In [None]:
def ifc_rdf_converter(ifc_file_path: str):
    """
    Convert IFC file to RDF and save as TTL file in the 'output' folder.

    Args:
        ifc_file_path (str): Path to the IFC file.
    """

    output_file = Path("output", "ifc_graph.ttl")  # Default output path in 'output' folder

    # Create graph and get namespaces
    g, ns = create_and_bind_graph()

    def get_element_uri(element_id: int) -> URIRef:
        """Create URI for an element."""
        return URIRef(ns['INST'][f"{element_id}"])

    def process_relationship(relationship):
        """Process IFC relationship and add triples to graph."""
        rel_type = relationship.is_a()

        # Get common relating element
        relating_element = None
        if hasattr(relationship, 'RelatingElement'):
            relating_element = relationship.RelatingElement
        elif hasattr(relationship, 'RelatingStructure'):
            relating_element = relationship.RelatingStructure
        elif hasattr(relationship, 'RelatingObject'):
            relating_element = relationship.RelatingObject


        # Get common related elements
        related_elements = []
        if hasattr(relationship, 'RelatedElements'):
            related_elements.extend(relationship.RelatedElements)
        elif hasattr(relationship, 'RelatedElement'):
            related_elements.append(relationship.RelatedElement)
        elif hasattr(relationship, 'RelatedObjects'):
            related_elements.extend(relationship.RelatedObjects)
        elif hasattr(relationship, 'RelatedObject'):
            related_elements.append(relationship.RelatedObject)


        # Add relationship triples
        if relating_element and hasattr(relating_element, 'id'):
            relating_uri = get_element_uri(relating_element.id())
            for related_element in related_elements:
                if hasattr(related_element, 'id'):
                    related_uri = get_element_uri(related_element.id())
                    g.add((relating_uri, ns['INST'][rel_type], related_uri))

    # Load IFC file
    ifc_file = ifcopenshell.open(ifc_file_path)

    # Process elements
    for element in ifc_file.by_type('IfcProduct'):
        element_uri = get_element_uri(element.id())

        # Add element type
        element_type = element.is_a()
        if hasattr(element, "PredefinedType") and element.PredefinedType not in ["NOTDEFINED", "USERDEFINED", None, '*']:
            element_type = f"{element.is_a()}{element.PredefinedType}"
        g.add((element_uri, ns['RDF'].type, URIRef(ns['BSDD'][clean_uri_string(element_type)])))

        # Add basic attributes
        if element.GlobalId:
            g.add((element_uri, ns['PROP'].GlobalId, Literal(element.GlobalId)))

        if hasattr(element, 'Name') and element.Name:
            g.add((element_uri, ns['RDFS'].label, Literal(element.Name)))

        if hasattr(element, 'Description') and element.Description:
            g.add((element_uri, ns['RDFS'].comment, Literal(element.Description)))

        if hasattr(element, 'ObjectType') and element.ObjectType:
            g.add((element_uri, ns['PROP'].ObjectType, Literal(element.ObjectType)))

    # Process relationships
    for relationship in ifc_file.by_type('IfcRelationship'):
        process_relationship(relationship)

    g.serialize(destination=str(output_file), format="turtle")
    print(f"{output_file} created and saved with {len(g)} triples")
    print(f"Number of relationships: {len(ifc_file.by_type('IfcRelationship'))}")

    return g

# Execution
# ifc_graph = ifc_rdf_converter("/content/data/UKA_UK_Aachen_IFC_02.ifc")

# 2. JSON to RDF

## Replace 15-digit key with description.

In [None]:
import json
from rdflib import Graph
import re
from collections import defaultdict
import requests

def key_value_mapper(input_json, output_filepath=None):
    """
    Replaces 15-digit keys in the JSON data with their corresponding descriptions
    from an ontology.

    Args:
        input_json (str): Path to the input JSON file.
        output_filepath (str, optional): Path to save the mapped JSON data.
    """

    # Load the ontology that contains key-value descriptions
    g = Graph()
    g.parse("https://annegoebels.github.io/asb/oldkeys/ontology.ttl", format='ttl')


    # Create lookup dictionary
    lookup_dict = {}
    pattern = re.compile(r"#(\d{15})_(.+)$")  # Match 15-digit number and class name
    for s in g.subjects():
        s_str = str(s)
        match = pattern.search(s_str)
        if match:
            number, class_name = match.groups()
            if number not in lookup_dict:
                lookup_dict[number] = class_name

    # Load JSON data
    if isinstance(input_json, str):
        with open(input_json, 'r') as file:
            data = json.load(file)
    else:
        data = input_json

    # Initialize counters
    total_processed = 0
    matches_found = 0

    # Process each file's data
    for file_key, records in data.items():
        # Process each record in the file
        for record in records:
            # Process each field in the record
            for field_key, field_value in record.items():
                if isinstance(field_value, str) and len(field_value) == 15 and field_value.isdigit():
                    total_processed += 1
                    class_name = lookup_dict.get(field_value)
                    if class_name:
                        matches_found += 1
                        record[field_key] = class_name  # Replacement key with value

    # Save if output path is provided
    if output_filepath is None:
        # Default output path in 'data' folder if not specified
        output_filepath = input_json.replace('.json', '_mapped.json')

    with open(output_filepath, 'w') as file:
        json.dump(data, file, indent=2)

    # Print statistics
    print(f"\nStatistics Key-Value Map:")
    print(f"Total 15-digit values processed: {total_processed}")
    print(f"Matches found: {matches_found}")
    print(f"Number of files processed: {len(data)}\n")

    return data

## Extract Subject-Predicate-Object from JSON

In [None]:
import json
import pandas as pd
import requests

def json_spo_extractor(json_file_path, output_path=None):
    """
    Extracts Subject-Predicate-Object triples from JSON data and saves them to a CSV file.

    Args:
        json_file_path (str): Path to the input JSON file.
        output_path (str, optional): Path to save the extracted SPO triples.
    """

    # Read JSON file
    with open(json_file_path) as file:
        data = json.load(file)

    # Define differnt ASBID for Bauwerk, Teilbauwerk and Brueke
    asbid_mapping = {
        "ges_bw.csv": "BWNR",
        "teil_bw.csv": "ID_NR",
        "bruecke.csv": "REF_BRUCKE"
        }

    # List to store all rows
    rows = []

    # Process each CSV file
    for category, records in data.items():
        # Get subject
        subject = category

        # Determine which ASBID key to use using asbid_mapping dictionary
        asbid_key = asbid_mapping.get(category, 'IDENT') # Default to IDENT if not in mapping

        # Process each dictionary in the CSV
        for record in records:
            if isinstance(record, dict):
                # Get ASBID using appropriate key
                asbid = record.get(asbid_key, '')

                # Process each key-value pair
                for key, value in record.items():
                    # Check if OBJECT value is not "NaN" or "0" or "0.0"
                    if value != "NaN" and value != "***" and value != " ***":
                      rows.append({
                          'ASBID': asbid,
                          'CATEGORY': subject,
                          'PROPERTY': key,
                          'OBJECT': value,
                    })

    # Create DataFrame and save to CSV
    df_SPO = pd.DataFrame(rows)

    # Reorder columns to match specified order
    df_SPO = df_SPO[['ASBID', 'CATEGORY', 'PROPERTY', 'OBJECT']]
    print(f"Total rows in SPO: {len(df_SPO)} \n")

    # Save if output path is provided, else default to 'output'
    if output_path is None:
        output_path = os.path.join("output",'asb_SPO.csv')

    df_SPO.to_csv(output_path, index=False)

    return df_SPO

# Execute
# df_SPO = json_spo_extractor("/content/data/extracted_B115_mapped.json")

## Map to a more descriptive subject name and map datatype for object value

In [None]:
import pandas as pd

def datatype_mapper(map_excel_path, df):
    """
    Maps datatypes and properties to more descriptive names using a mapping Excel file.

    Args:
        map_excel_path (str): Path to the mapping Excel file.
        df (pd.DataFrame): DataFrame containing the SPO triples.
    """

    # Read excel sheet
    map_csv = pd.read_excel(map_excel_path, sheet_name='MAPPING')

    # Create mapping dictionaries
    class_mapping = dict(zip(map_csv["Table"], map_csv["Table_Name"]))
    datatype_mapping = dict(zip(map_csv["Attribute"], map_csv["Datatype"]))
    # property_mapping = dict(zip(map_csv["Attribute"], map_csv["Attribute_FullText"]))

    # Replace SUBJECT values using class_mapping
    df['CATEGORY'] = df['CATEGORY'].replace(class_mapping)

    # Add a new column for datatype based on the PROPERTY column
    df['DATATYPE'] = df['PROPERTY'].map(datatype_mapping) # Map datatype based on 'PROPERTY' column

    # Replace PROPERTY values using property_mapping
    # df['PROPERTY'] = df['PROPERTY'].replace(property_mapping)

    #print(len(df["CATEGORY"].unique()))

    # Save the DataFrame to CSV, replacing existing file
    output_path = os.path.join("output", "asb_SPO.csv")  # Path to asb_SPO.csv in the output folder
    df.to_csv(output_path, index=False)  # Overwrite the existing file if it exists

    return df

# Execute
# datatype_mapper("/content/data/mapping.xlsx", df_SPO)

## Main SPO Extractor Function

In [None]:
def main_spo_extractor(input_json_file: str, mapping_table: str, output_spo_file=None):
    """
    Main function to extract and process SPO triples from a JSON file,
    apply key-value mapping and datatype mapping, and save to a CSV.

    Args:
        input_json_file (str): Path to the input JSON file.
        mapping_table (str): Path to the mapping table (Excel file).
        output_spo_file (str, optional): Path to save the extracted SPO triples.
    """

    # 1. Key-Value Mapping:
    mapped_data = key_value_mapper(input_json_file)

    # 2. JSON to SPO Extraction:
    # Pass the original file path for reading the JSON data
    asb_SPO = json_spo_extractor(input_json_file, output_path=output_spo_file)

    # 3. Datatype Mapping:
    asb_SPO = datatype_mapper(mapping_table, asb_SPO)

    # 4. Save to CSV:
    if output_spo_file is None:
        output_path = os.path.join("output", "asb_SPO.csv")  # Default output path
    else:
        output_path = output_spo_file  # Use provided output path

    asb_SPO.to_csv(output_path, index=False)

    return asb_SPO

# Example Usage
# json_input = "/content/data/extracted_B115.json"
# mapping_table = "/content/data/mapping.xlsx"
# asb_df = main_spo_extractor(json_input, mapping_table)

## Generate RDF

In [None]:
# helper function to clean the strings
import pandas as pd
from rdflib import Graph, Literal, Namespace, URIRef
from rdflib.namespace import RDF, RDFS, XSD
import re
import unicodedata

def clean_string(text):
    if pd.isna(text):
        return text

    # Convert to string if not already
    text = str(text)

    # Replace German special characters
    replacements = {
        'ä': 'ae', 'ö': 'oe', 'ü': 'ue', 'ß': 'ss',
        'Ä': 'Ae', 'Ö': 'Oe', 'Ü': 'Ue'
    }
    for char, replacement in replacements.items():
        text = text.replace(char, replacement)

    # Remove accents
    text = ''.join(c for c in unicodedata.normalize('NFKD', text)
                  if not unicodedata.combining(c))

    # Replace spaces and special chars with underscore
    text = re.sub(r'[^a-zA-Z0-9]+', '_', text)

    # Remove leading/trailing underscores
    text = text.strip('_')

    return text


def get_xsd_datatype(datatype: str):
    """Map datatype string to XSD datatype."""
    datatype_mapping = {
        'double': XSD.float,
        'dateTime': XSD.dateTime,
        'year': XSD.year,
        'text': XSD.string,
        'int': XSD.integer,
        'boolean': XSD.boolean,
        'long': XSD.integer,
        'float': XSD.float,
        # Add more mappings as needed
    }
    return datatype_mapping.get(datatype, None)


In [None]:
def json_rdf_converter(df, valid_categories):
    """
    Converts a DataFrame of SPO triples to an RDF graph and saves it as a Turtle file.

    Args:
        df (pd.DataFrame): DataFrame containing the SPO triples.
        valid_categories (list): List of valid categories for filtering the data.
    """

    # Define Namespace
    ASB = Namespace("http://asb-example.org/")

    # Create graph and bind namespaces
    g = Graph()
    g.bind("asb", ASB)

    # Clean the relevant columns
    df['CATEGORY'] = df['CATEGORY'].apply(clean_string)
    df['PROPERTY'] = df['PROPERTY'].apply(clean_string)

    for _, row in df.iterrows():
        # Get the subject from the row and clean it
        category = row['CATEGORY']

        # Check if the class is in the list of valid_categories
        if category not in valid_categories:
            continue

        # Create RDF triples
        subject = URIRef(ASB[f"{clean_string(row['ASBID'])}"])
        predicate = URIRef(ASB[row['PROPERTY']])

        # Use the value in 'OBJECT' column directly
        if pd.notna(row['OBJECT']):
            datatype = row['DATATYPE'] if isinstance(row['DATATYPE'], str) else None

            if datatype:  # If datatype is present, create a Literal with datatype
                xsd_datatype = get_xsd_datatype(datatype) if datatype else None
                obj = Literal(row['OBJECT'], datatype=xsd_datatype) if xsd_datatype else Literal(row['OBJECT'])
            else:  # If datatype is empty, create a resource with 'asb:' prefix
                obj = URIRef(ASB[clean_string(row['OBJECT'])])
        else:
            continue

        # Replace "SUBJECT" with "CATEGORY"
        g.add((subject, RDF.type, URIRef(ASB[f"ASB_Pset_{row['CATEGORY']}"])))
        g.add((subject, predicate, obj)) if obj else None

    # Serialize the graph to the default output file
    output_file = os.path.join("output", "asb_graph.ttl")  # Default output path
    g.serialize(format='turtle', destination=output_file)
    print(f"{output_file} created and saved with {len(g)} triples")

    return g

'''
# Execution
valid_categories = ['Bauwerk', 'Teilbauwerk', 'Bruecke',
                   'BelagAbdichtung', 'Ausstattungen', 'fhrb_bel.csv', 'Gruendung', 'Kappe', 'Lager',
                   'Leitung', 'Schutzeinrichtungen', 'StatischesSystem_Tragfaehigkeit', 'Vorspannung',
                   'Fahrbahnuebergang', 'Feld']

graph = json_rdf_converter(df_SPO, valid_categories)


# Print numbers of triples
print(f"Number of triples: {len(graph)}")

# Print numbers of instances
instances = set()
for s, p, o in graph:
    instances.add(s)
print(f"Number of instances: {len(instances)}") '''

'\n# Execution\nvalid_categories = [\'Bauwerk\', \'Teilbauwerk\', \'Bruecke\',\n                   \'BelagAbdichtung\', \'Ausstattungen\', \'fhrb_bel.csv\', \'Gruendung\', \'Kappe\', \'Lager\',\n                   \'Leitung\', \'Schutzeinrichtungen\', \'StatischesSystem_Tragfaehigkeit\', \'Vorspannung\',\n                   \'Fahrbahnuebergang\', \'Feld\']\n\ngraph = json_rdf_converter(df_SPO, valid_categories)\n\n\n# Print numbers of triples\nprint(f"Number of triples: {len(graph)}")\n\n# Print numbers of instances\ninstances = set()\nfor s, p, o in graph:\n    instances.add(s)\nprint(f"Number of instances: {len(instances)}") '

# 3. Graph Merging

In [None]:
def graph_mapper(instance_graph_data, mapping_file):
    """
    Links IFC instances to ASB properties using SPARQL queries.

    Args:
        instance_graph_data (str or rdflib.Graph): Path to the instance graph
                                                 file in Turtle format or an
                                                 rdflib.Graph object.
        mapping_file (str): Path to the mapping Excel file.
    """

    # Create graph and get namespaces
    g, ns = create_and_bind_graph()

    # Load instance graph
    if isinstance(instance_graph_data, str):
        # Input is a file path, parse the TTL file
        g.parse(instance_graph_data, format='turtle')
    elif isinstance(instance_graph_data, Graph):
        # Input is an rdflib.Graph object, use it directly
        g = instance_graph_data
    else:
        raise TypeError("Invalid input type. Expected str (file path) or rdflib.Graph object.")

    # Load mapping file
    mappings = pd.read_excel(mapping_file, sheet_name="instance-alignment")

    # Process mappings
    for _, row in mappings.iterrows():
        # Skip if either field is empty or NaN
        if pd.isna(row.ifc_instance_id) or pd.isna(row.asb_instance_id):
            continue

        # Split and clean IFC instances
        ifc_instances = [inst.strip() for inst in str(row.ifc_instance_id).split(',') if inst.strip()]

        # Split and clean ASB instances
        asb_instances = [inst.strip() for inst in str(row.asb_instance_id).split(',') if inst.strip()]

        # Create mappings between all combinations of IFC and ASB instances
        for ifc_instance in ifc_instances:
            # Skip empty IFC instances
            if not ifc_instance:
                continue

            # Create SPARQL query for each IFC instance
            for asb_instance in asb_instances:
                # Skip empty ASB instances
                if not asb_instance:
                    continue

                # Create and execute SPARQL query for each combination
                query = f"""
                CONSTRUCT {{
                    ?inst <{ns['INST']}hasAsbPset> <{ns['ASB']}{asb_instance}>
                }}
                WHERE {{
                    ?inst a ?type .
                    FILTER(STRENDS(STR(?inst), "{ifc_instance}"))
                }}
                """

                # Execute query and add results to graph
                results = g.query(query)
                for triple in results:
                    g.add(triple)

    # Add inst:hasAsbPset a owl:ObjectProperty
    g.add((ns['INST']['hasAsbPset'], RDF.type, ns['OWL'].ObjectProperty))

    # Serialize the graph to the default output file
    # output_file = os.path.join("output", "mapped_graph.ttl")  # Default output path
    # g.serialize(format='turtle', destination=output_file)
    print(f"Mapped graph created with {len(g)} triples")

    return g

# Usage
# mapped_graph = graph_mapper("/content/output/ifc_graph.ttl", "/content/data/mapping.xlsx")

In [None]:
from rdflib import Graph

def graphs_merger(input_graphs: list, output_file: str = None):
    """
    Merges multiple RDF graphs into a single graph.

    Args:
        input_graphs (list): A list of file paths (str) or Graph objects.
        output_file (str, optional): Path to save the merged graph. Defaults to None.
    """

    # Create a new graph for the merged content
    merged_graph = Graph()

    for input_graph in input_graphs:
        if isinstance(input_graph, str):
            # Input is a file path
            try:
                graph = Graph()  # Create a temporary graph to load the file
                graph.parse(input_graph, format="turtle")
                merged_graph += graph
                print(f"Loaded {len(graph)} triples from {input_graph}")
            except Exception as e:
                print(f"Error loading {input_graph}: {e}")
        elif isinstance(input_graph, Graph):
            # Input is a Graph object
            merged_graph += input_graph
            print(f"Loaded {len(input_graph)} triples.")
        else:
            print(f"Skipping invalid input: {input_graph}")

    # Copy all namespace bindings from all graphs
    for input_graph in input_graphs:
        if isinstance(input_graph, Graph):
            for prefix, namespace in input_graph.namespaces():
                if prefix not in merged_graph.namespaces():
                    merged_graph.bind(prefix, namespace)
        elif isinstance(input_graph, str):
            graph = Graph()
            try:
                graph.parse(input_graph, format="turtle")
                for prefix, namespace in graph.namespaces():
                    if prefix not in merged_graph.namespaces():
                        merged_graph.bind(prefix, namespace)
            except Exception as e:
                print(f"Error loading namespaces from {input_graph}: {e}")

    # Serialize the graph to the default output file
    output_file = os.path.join("output", "merged_graph.ttl")  # Default output path
    merged_graph.serialize(format='turtle', destination=output_file)
    print(f"{output_file} created and saved with  {len(merged_graph)} triples .")

    return merged_graph

# 4. Ontology Linking

In [None]:
def onto_mapper(instance_graph_data, mapping_csv):
    """
    Links an IFC instance graph with external ontologies using mapping definitions.

    Args:
        instance_graph_path (str): Path to the instance graph file in Turtle format
        mapping_csv (str): Path to the mapping Excel file

    Returns:
        rdflib.Graph: The enriched graph with ontology linkages
    """
    # Create graph and get namespaces
    g, ns = create_and_bind_graph()

    # Load instance data
    if isinstance(instance_graph_data, str):
        # Input is a file path, parse the Turtle file
        g.parse(instance_graph_data, format='turtle')
    elif isinstance(instance_graph_data, Graph):
        # Input is an rdflib.Graph object, use it directly
        g = instance_graph_data
    else:
        raise TypeError("Invalid input type. Expected str (file path) or rdflib.Graph object.")

    # Load mapping file
    mappings = pd.read_excel(mapping_csv, sheet_name="instance-alignment")

    # Process each mapping row
    for _, row in mappings.iterrows():
        # We only proceed if IFC class, ontology prefix, and ontology class are present
        if pd.notna(row.ifc_class) and pd.notna(row.ontology_prefix) and pd.notna(row.ontology_class):
            # Build SPARQL update to map the IFC class to the ontology class
            query = f"""
            INSERT {{
                bsdd:{row.ifc_class} owl:equivalentClass {row.ontology_prefix}:{row.ontology_class} .
            }}
            WHERE {{}}
            """
            g.update(query)
        else:
            print(f"Skipping row due to missing values: {row}")

    print(f"\nMapped Onto graph created with {len(g)} triples")
    return g


# Example usage
# linked_graph = onto_mapper("/content/output/merged_graph.ttl", "/content/data/mapping.xlsx")

In [None]:
import requests

def onto_linker(graph_data, ontology_urls=None):
    """
    Links an RDF graph with external ontologies specified in `ontology_urls`.

    Args:
        graph_data (str or rdflib.Graph): Path to the RDF graph file (Turtle format)
                                        or an rdflib.Graph object.
        ontology_urls (list, optional): List of URLs to external ontologies. Defaults to None.

    Returns:
        rdflib.Graph: The linked RDF graph.
    """

    # Create graph and get namespaces
    g, ns = create_and_bind_graph()

    # Load the main graph
    if isinstance(graph_data, str):
        # Input is a file path, parse the Turtle file
        g.parse(graph_data, format="turtle")
    elif isinstance(graph_data, Graph):
        # Input is an rdflib.Graph object, use it directly
        g = graph_data
    else:
        raise TypeError("Invalid input type for graph_data. Expected str (file path) or rdflib.Graph object.")


    # Stop if ontology_urls is None
    if ontology_urls is None:
        print("No ontology URLs provided. Stopping execution.")
        return g  # Return the graph as is without linking

    # Load each ontology
    for url in ontology_urls:
        try:
            response = requests.get(url)
            response.raise_for_status()  # Raises an HTTPError for bad responses
            g.parse(data=response.text, format="turtle")
        except requests.exceptions.RequestException as e:
            print(f"Failed to load ontology from {url}. Error: {str(e)}")

    # Serialize the graph to the default output file
    output_file = os.path.join("output", "linked_graph.ttl")  # Default output path
    g.serialize(format='turtle', destination=output_file)
    print(f"{output_file}h created and saved with {len(g)} triples")

    return g

'''
# Example usage
custom_urls = ["https://example.com/ontology1.ttl",
               "https://example.com/ontology2.ttl"
               ]
g = onto_linker("ifc_linked.ttl", custom_urls)'''

'\n# Example usage\ncustom_urls = ["https://example.com/ontology1.ttl",\n               "https://example.com/ontology2.ttl"\n               ]\ng = onto_linker("ifc_linked.ttl", custom_urls)'

# 5. Graph Completion

In [None]:
from rdflib import Graph
from owlrl import DeductiveClosure, OWLRL_Semantics

def basic_reasoner(graph_path, output_path):
    """
    Performs OWL 2 RL reasoning on an RDF graph using owlrl.

    Args:
        graph_path (str): Path to the input RDF graph file (Turtle format).
        output_path (str): Path to save the inferred graph (Turtle format).
    """
    # Load the graph
    g = Graph()
    g.parse(graph_path, format="turtle")

    # Create a deductive closure object
    dc = DeductiveClosure(OWLRL_Semantics)

    # Expand the graph with inferences
    dc.expand(g)

    # Save the inferred graph
    g.serialize(destination=output_path, format="turtle")

    print(f"Inferred graph saved to: {output_path}")
    print(f"Total triples after inference: {len(g)}")


# Example usage
# graph_path = "/content/output/linked_graph.ttl"  # Path to your input graph
# output_path = "/content/output/inferred_graph.ttl"  # Path to save the inferred graph
# basic_reasoner(graph_path, output_path)

# Visualisation

In [48]:
from rdflib import Graph, URIRef, BNode, Literal
from pyvis.network import Network
import colorsys
import hashlib

def visualize_rdf(input_data, output_html="graph.html", custom_options=None):
    """
    RDF graph visualization with interactive features.

    Parameters:
    input_data (str or Graph): Path to TTL file or rdflib.Graph object
    output_html (str): Path to save the generated HTML file
    custom_options (dict): Optional custom vis.js options to override defaults
    """
    # Initialize RDF Graph
    g = Graph()

    # Check input type and load data accordingly
    if isinstance(input_data, str):
        g.parse(input_data, format='turtle')
    elif isinstance(input_data, Graph):
        g = input_data
    else:
        raise TypeError("Invalid input type. Expected str (file path) or rdflib.Graph object.")

    # Generate consistent, visually pleasing colors for namespaces
    def generate_color(prefix, saturation=0.7, value=0.95):
        """Generate consistent, color for a prefix"""
        hash_value = int(hashlib.md5(prefix.encode('utf-8')).hexdigest(), 16)
        hue = hash_value % 360 / 360.0
        rgb = colorsys.hsv_to_rgb(hue, saturation, value)
        return '#{:02x}{:02x}{:02x}'.format(int(rgb[0]*255), int(rgb[1]*255), int(rgb[2]*255))

    # Extract namespace prefixes and assign colors
    namespace_colors = {}
    prefix_color_mapping = {
        'inst': '#1E3A8A',    # Deep blue
        'asb': '#BE123C',     # Rich red
        'bsdd': '#065F46',    # Forest green
        'prop': '#047857',    # Emerald green
        'rdf': '#7E22CE',     # Purple
        'rdfs': '#6D28D9',    # Indigo
        'owl': '#8B5CF6',     # Violet
        'xsd': '#0284C7',     # Sky blue
    }

    # Assign colors based on prefix or generate new ones
    for prefix, namespace in g.namespaces():
        if prefix in prefix_color_mapping:
            namespace_colors[str(namespace)] = prefix_color_mapping[prefix]
        else:
            namespace_colors[str(namespace)] = generate_color(prefix)

    # Enhanced node and edge styling
    node_styles = {
        "URIRef": {
            "shape": "dot",
            "size": 20,
            "font": {"size": 16, "face": "Helvetica", "strokeWidth": 0, "color": "#333333"},
            "shadow": {"enabled": True, "size": 5, "x": 3, "y": 3, "color": "rgba(0,0,0,0.2)"}
        },
        "BNode": {
            "shape": "hexagon",
            "size": 15,
            "color": "#9CA3AF",
            "font": {"size": 14, "face": "Helvetica", "color": "#4B5563"}
        },
        "Literal": {
            "shape": "box",
            "size": 10,
            "color": "#D1D5DB",
            "font": {"size": 14, "face": "Helvetica", "color": "#1F2937"}
        }
    }

    # Create collections for nodes and edges
    nodes = {}
    edges = []

    # Process triples to extract nodes and edges
    for s, p, o in g:
        # Handle subject node
        s_id = str(s)
        s_namespace = next((ns for ns in namespace_colors if s_id.startswith(ns)), None)

        if isinstance(s, URIRef):
            # For URIs, use the last segment as the label
            s_label = s_id.split('/')[-1].split('#')[-1]
            s_type = "URIRef"
            s_color = namespace_colors.get(s_namespace, generate_color(s_id))
        elif isinstance(s, BNode):
            s_label = f"_:{s_id[-4:]}"  # Shortened blank node label
            s_type = "BNode"
            s_color = node_styles["BNode"]["color"]
        else:
            s_label = str(s)[:30] + ("..." if len(str(s)) > 30 else "")
            s_type = "Literal"
            s_color = node_styles["Literal"]["color"]

        nodes[s_id] = {
            "label": s_label,
            "title": s_id,  # Full URI on hover
            "type": s_type,
            "color": s_color,
            **node_styles[s_type]
        }

        # Handle object node with similar logic
        o_id = str(o)
        o_namespace = next((ns for ns in namespace_colors if o_id.startswith(ns)), None)

        if isinstance(o, URIRef):
            o_label = o_id.split('/')[-1].split('#')[-1]
            o_type = "URIRef"
            o_color = namespace_colors.get(o_namespace, generate_color(o_id))
        elif isinstance(o, BNode):
            o_label = f"_:{o_id[-4:]}"
            o_type = "BNode"
            o_color = node_styles["BNode"]["color"]
        elif isinstance(o, Literal):
            # Truncate long literals
            o_value = str(o)
            o_label = (o_value[:30] + "...") if len(o_value) > 30 else o_value
            o_type = "Literal"
            o_color = node_styles["Literal"]["color"]
        else:
            o_label = str(o)[:30] + ("..." if len(str(o)) > 30 else "")
            o_type = "Literal"
            o_color = node_styles["Literal"]["color"]

        nodes[o_id] = {
            "label": o_label,
            "title": o_id,  # Full value on hover
            "type": o_type,
            "color": o_color,
            **node_styles[o_type]
        }

        # Handle predicate (relationship)
        p_id = str(p)
        p_label = p_id.split('/')[-1].split('#')[-1]

        # Add edge
        edges.append({
            "from": s_id,
            "to": o_id,
            "label": p_label,
            "title": p_id,  # Full predicate URI on hover
            "font": {"size": 12, "align": "middle", "background": "white"},
            "color": {"color": "#64748B", "opacity": 0.8},
            "smooth": {"type": "curvedCW", "roundness": 0.2},
            "arrows": {"to": {"enabled": True, "scaleFactor": 0.5}}
        })

    # Create network with better defaults
    net = Network(
        height="850px",
        width="100%",
        bgcolor="#FFFFFF",  # White background
        font_color="#1F2937",
        directed=True,
        notebook=False,
        select_menu=False,
        filter_menu=False,
        neighborhood_highlight=True,  # Highlight connected nodes on click
        cdn_resources="remote"  # Use CDN for better loading
    )

    # Default heading
    net.heading = net.heading = f"RDF Graph Visualization ({len(nodes)} nodes, {len(edges)} relationships)"

    # Add nodes to network
    for node_id, node_data in nodes.items():
        style_data = {k: v for k, v in node_data.items() if k not in ["label", "title", "type"]}
        net.add_node(
            node_id,
            label=node_data["label"],
            title=node_data["title"],
            **style_data
        )

    # Add edges to network
    for edge_data in edges:
        net.add_edge(
            edge_data["from"],
            edge_data["to"],
            title=edge_data["title"],
            label=edge_data["label"],
            **{k: v for k, v in edge_data.items() if k not in ["from", "to", "label", "title"]}
        )

    # Default enhanced physics and interaction options
    default_options = {
        "physics": {
            "enabled": True,
            "solver": "forceAtlas2Based",
            "forceAtlas2Based": {
                "gravitationalConstant": -75,
                "centralGravity": 0.01,
                "springLength": 150,
                "springConstant": 0.05,
                "damping": 0.09
            },
            "minVelocity": 0.75,
            "stabilization": {
                "enabled": True,
                "iterations": 100,
                "updateInterval": 10
            }
        },
        "layout": {
            "randomSeed": 42,
            "improvedLayout": True
        },
        "interaction": {
            "hover": True,
            "hoverConnectedEdges": True,
            "selectConnectedEdges": True,
            "multiselect": True,
            "dragNodes": True,
            "hideEdgesOnDrag": False,
            "hideNodesOnDrag": False,
            "navigationButtons": True,
            "keyboard": {
                "enabled": True,
                "speed": {"x": 10, "y": 10, "zoom": 0.1}
            },
            "zoomView": True
        },
        "edges": {
            "smooth": {"type": "dynamic"},
            "length": 250,
            "color": {"inherit": "both"},
            "selectionWidth": 3
        },
        "groups": {
            # Add namespace-based groups for legend
            **{prefix: {"color": color} for prefix, color in prefix_color_mapping.items()}
        }
    }

    # Merge with any custom options provided
    if custom_options:
        # Helper function to recursively merge dictionaries
        def deep_merge(d1, d2):
            for k in d2:
                if k in d1 and isinstance(d1[k], dict) and isinstance(d2[k], dict):
                    deep_merge(d1[k], d2[k])
                else:
                    d1[k] = d2[k]
            return d1

        options = deep_merge(default_options, custom_options)
    else:
        options = default_options

    # Apply options to network
    net.set_options(json.dumps(options))

    # Add legend with HTML
    legend_html = """
    <div style="position: absolute; top: 10px; right: 10px; background: rgba(255, 255, 255, 0.8);
                padding: 10px; border-radius: 5px; border: 1px solid #ddd; z-index: 100; max-width: 250px;">
        <h3 style="margin-top: 0; font-family: Helvetica;">Legend</h3>
        <div style="display: flex; flex-direction: column; gap: 5px;">
    """

    # Add legend items for node types
    legend_html += f"""
        <div style="display: flex; align-items: center; gap: 5px;">
            <div style="width: 15px; height: 15px; border-radius: 50%; background: {node_styles['URIRef']['color'] if 'color' in node_styles['URIRef'] else '#1E3A8A'};"></div>
            <span style="font-family: Helvetica; font-size: 12px;">URI Reference</span>
        </div>
        <div style="display: flex; align-items: center; gap: 5px;">
            <div style="width: 15px; height: 15px; background: {node_styles['BNode']['color']}; clip-path: polygon(50% 0%, 100% 25%, 100% 75%, 50% 100%, 0% 75%, 0% 25%);"></div>
            <span style="font-family: Helvetica; font-size: 12px;">Blank Node</span>
        </div>
        <div style="display: flex; align-items: center; gap: 5px;">
            <div style="width: 15px; height: 15px; background: {node_styles['Literal']['color']}; border-radius: 2px;"></div>
            <span style="font-family: Helvetica; font-size: 12px;">Literal</span>
        </div>
    """

    # Add namespace prefixes to legend
    legend_html += "<h4 style='margin-bottom: 5px; font-family: Helvetica; font-size: 14px;'>Namespaces</h4>"
    for prefix, color in prefix_color_mapping.items():
        legend_html += f"""
            <div style="display: flex; align-items: center; gap: 5px;">
                <div style="width: 15px; height: 15px; border-radius: 50%; background: {color};"></div>
                <span style="font-family: Helvetica; font-size: 12px;">{prefix}</span>
            </div>
        """

    legend_html += """
        </div>
    </div>
    """

    # Add controls help tooltip
    controls_html = """
    <div style="position: absolute; bottom: 10px; left: 10px; background: rgba(255, 255, 255, 0.8);
                padding: 10px; border-radius: 5px; border: 1px solid #ddd; z-index: 100; font-family: Helvetica;">
        <h3 style="margin-top: 0; font-size: 14px;">Controls</h3>
        <ul style="margin: 0; padding-left: 20px; font-size: 12px;">
            <li>Click node to highlight connections</li>
            <li>Drag to move nodes</li>
            <li>Scroll to zoom</li>
            <li>Hold Ctrl to select multiple nodes</li>
        </ul>
    </div>
    """

    # Inject custom HTML into the generated file
    net.html = net.html.replace("</body>", f"{legend_html}{controls_html}</body>")

    # Save the graph
    net.save_graph(output_html)
    print(f"Visualization saved as {output_html}")

    return net

'''
# Example of how to use with custom options:
custom_options = {
    "physics": {
        "forceAtlas2Based": {"improvedLayout": True}
        },
    "layout": {"circular": {"scale": 300}}
     }
visualize_rdf("/content/output/ifc_graph.ttl", "/content/output/ifc_graph.html", custom_options)'''



'\n# Example of how to use with custom options:\ncustom_options = {\n    "physics": {\n        "forceAtlas2Based": {"improvedLayout": True}\n        },\n    "layout": {"circular": {"scale": 300}}\n     }\nvisualize_rdf("/content/output/ifc_graph.ttl", "/content/output/ifc_graph.html", custom_options)'

# Execution

This is the main execution section of the pipeline to realize all the functions above.

In [None]:
''' IFC to RDF conversion '''

# Input Data
ifc_file = "/content/data/UKA_UK_Aachen_IFC_02.ifc"

# Process IFC and generate IFC Graph
ifc_attr = ifc_extractor(ifc_file)  # optional
ifc_graph = ifc_rdf_converter(ifc_file)

output/ifc_attributes.csv created and saved with 41 attributes
output/ifc_graph.ttl created and saved with 202 triples
Number of relationships: 81


In [None]:
''' JSON/ASB data preprocessing and conversion to RDF '''

# Input Data
json_file = "/content/data/extracted_B115.json"
mapping_table = "/content/data/mapping.xlsx"
valid_categories = ['Bauwerk', 'Teilbauwerk', 'Bruecke',
                   'BelagAbdichtung', 'Ausstattungen', 'fhrb_bel.csv', 'Gruendung', 'Kappe', 'Lager',
                   'Leitung', 'Schutzeinrichtungen', 'StatischesSystem_Tragfaehigkeit', 'Vorspannung',
                   'Fahrbahnuebergang', 'Feld']


# Process JSON/ASB data, create SPO and generate ASB Graph
asb_df = main_spo_extractor(json_file,mapping_table)
asb_graph = json_rdf_converter(asb_df, valid_categories)


Statistics Key-Value Map:
Total 15-digit values processed: 5776
Matches found: 5018
Number of files processed: 62

Total rows in SPO: 29464 

output/asb_graph.ttl created and saved with 554 triples


In [None]:
''' Map and Merge instance graphs (IFC and ASB) '''

# Map IFC instances and ASB instances
mapped_graph = graph_mapper(ifc_graph, mapping_table)

# Input Data
input_graphs = [mapped_graph, asb_graph] # add more graphs as needed

# Merge instance graphs and generate a new graph
merged_graph = graphs_merger(input_graphs)

Mapped graph created with 242 triples
Loaded 242 triples.
Loaded 554 triples.
output/merged_graph.ttl created and saved with  796 triples .


In [None]:
''' Map and Link merged graph with ontologies '''

# Map IFC classes and Ontology concepts
mapped_onto_graph = onto_mapper(merged_graph, mapping_table)

# Input Data
onto_urls = ["https://alhakam.github.io/brcomp/ontology.ttl",
            "https://alhakam.github.io/brot/ontology.ttl"
               ] # add more links as needed

# Link merge graphs and ontology classes and generate a new linked graph
linked_graph = onto_linker(mapped_onto_graph, onto_urls)


Mapped Onto graph created with 806 triples
output/linked_graph.ttlh created and saved with 1340 triples


In [None]:
''' Apply basic reasoning by OWL 2 RL '''

graph_path = "/content/output/linked_graph.ttl"  # Path to input graph
output_path = "/content/output/enriched_graph.ttl"  # Path to save the inferred graph

# Function to generate enriched graph from linked graph
basic_reasoner(graph_path, output_path)

Inferred graph saved to: /content/output/enriched_graph.ttl
Total triples after inference: 3230


In [51]:
# visualisation
custom_options = {
    "physics": {
        "forceAtlas2Based": {"improvedLayout": True}
        },
    "layout": {"circular": {"scale": 300}}
     }
visualize_rdf("/content/output/ifc_graph.ttl", "/content/output/ifc_graph.html", custom_options)
visualize_rdf("/content/output/asb_graph.ttl", "/content/output/asb_graph.html", custom_options)
visualize_rdf("/content/output/merged_graph.ttl", "/content/output/merged_graph.html", custom_options)
visualize_rdf("/content/output/linked_graph.ttl", "/content/output/linked_graph.html", custom_options)
visualize_rdf("/content/output/enriched_graph.ttl", "/content/output/enriched_graph.html", custom_options)

Visualization saved as /content/output/ifc_graph.html
Visualization saved as /content/output/asb_graph.html
Visualization saved as /content/output/merged_graph.html
Visualization saved as /content/output/linked_graph.html
Visualization saved as /content/output/enriched_graph.html


<class 'pyvis.network.Network'> |N|=993 |E|=3,230

# 6. Querying and Validation

### Querying instance graphs

In [None]:
''' Which object types are contained in a specific spatial container? '''

# Load the RDF graph
ifc_graph = Graph()
ifc_graph.parse("/content/output/ifc_graph.ttl", format="turtle")

# Query the graph
query = """
PREFIX bsdd: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?subj ?subjType ?obj ?objType
WHERE {
    ?subj inst:IfcRelContainedInSpatialStructure ?obj .
    ?subj a ?subjType .
    ?obj a ?objType .
} GROUP BY ?objType
"""

results = ifc_graph.query(query)

# Print the results
for row in results:
    print(f"{row.subjType} contains {row.objType}")

https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgeGIRDER contains https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcRailing
https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgeGIRDER contains https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartDECK
https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgeGIRDER contains https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartPIER
https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgeGIRDER contains https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartFOUNDATION
https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgeGIRDER contains https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartABUTMENT
https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgeGIRDER contain

In [None]:
''' Extract the ids of elements that are instances of IFC type IfcBridgePartABUTMENT . '''

# Load the RDF graph
ifc_graph = Graph()
ifc_graph.parse("/content/output/ifc_graph.ttl", format="turtle")

# Query the graph
query = """
PREFIX bsdd: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?element ?name
WHERE {
    ?element a bsdd:IfcBridgePartABUTMENT .
    ?element rdfs:label ?name .
}
"""

results = ifc_graph.query(query)

# Print the results
print("Elements that are instances of IfcBridgePartABUTMENT:")
for row in results:
    print(f"{row.element}: {row.name}")

Elements that are instances of IfcBridgePartABUTMENT:
http://ifc-instance.org/instances/43395: Widerlager_Ost:Widerlager_Ost:2618100
http://ifc-instance.org/instances/43741: Widerlager_West:Widerlager_West:2656902


In [36]:
''' Query asb graph to extract numbers of associated elements for each ASB property set. Here bridge support structures. '''

# Load the RDF graph
asb_graph = Graph()
asb_graph.parse("/content/output/asb_graph.ttl", format="turtle")

# Query the graph
query = """
PREFIX asb: <http://asb-example.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?pset ?elements WHERE {
    ?pset a asb:ASB_Pset_Feld .
    ?pset asb:ANZAHL_ST ?elements .
}
"""

results = asb_graph.query(query)

# Print the results
for row in results:
    print(f"Pset {row.pset} is associated with {row.elements} bridge components.")

Pset http://asb-example.org/4B0W1WZO is associated with 1 bridge components.
Pset http://asb-example.org/4B0W1WZP is associated with 3 bridge components.
Pset http://asb-example.org/4B0W1WZQ is associated with 3 bridge components.
Pset http://asb-example.org/4B0W1WZR is associated with 1 bridge components.


### Querying merged graph

In [None]:
'''Extract associated ASB pset of IfcBridgePartABUTMENT and IfcBridgePartPIER .'''

# Load the graph
g = Graph()
g.parse("/content/output/merged_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX asb: <http://asb-example.org/>
PREFIX bsdd: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?ifcInst ?ifcClass ?psetInstance ?psetClass ?property ?value
WHERE {
  {?ifcInst a bsdd:IfcBridgePartABUTMENT .} UNION {?ifcInst a bsdd:IfcBridgePartPIER .}
  ?ifcInst inst:hasAsbPset ?psetInstance .
  ?psetInstance a ?psetClass .
}
"""

# Execute the query
results = g.query(query)

# Print the results
for row in results:
    print(f"IFC Instance: {row.ifcInst} has ASB Pset: {row.psetInstance} .")
    # print(f"IFC Instance: {row.ifcInst} has ASB Pset: {row.psetInstance} that is of type {row.psetClass}.")

IFC Instance: http://ifc-instance.org/instances/43395 has ASB Pset: http://asb-example.org/4B0W1WZO .
IFC Instance: http://ifc-instance.org/instances/43741 has ASB Pset: http://asb-example.org/4B0W1WZR .
IFC Instance: http://ifc-instance.org/instances/42796 has ASB Pset: http://asb-example.org/4B0W1WZP .
IFC Instance: http://ifc-instance.org/instances/42819 has ASB Pset: http://asb-example.org/4B0W1WZP .
IFC Instance: http://ifc-instance.org/instances/42841 has ASB Pset: http://asb-example.org/4B0W1WZP .
IFC Instance: http://ifc-instance.org/instances/42863 has ASB Pset: http://asb-example.org/4B0W1WZQ .
IFC Instance: http://ifc-instance.org/instances/42885 has ASB Pset: http://asb-example.org/4B0W1WZQ .
IFC Instance: http://ifc-instance.org/instances/42894 has ASB Pset: http://asb-example.org/4B0W1WZQ .


In [None]:
'''Extract ASB properties of IfcBridgePartAbutment type. '''

# Load the graph
g = Graph()
g.parse("/content/output/merged_graph.ttl", format="turtle")

# Define the SPARQL query
query_1 = """
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX asb: <http://asb-example.org/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?ifcType ?psetInstance ?psetClass ?property ?value
WHERE {
  ?ifcType a bsdd:IfcBridgePartABUTMENT .
  ?ifcType inst:hasAsbPset ?psetInstance .
  ?psetInstance a ?psetClass .
  ?psetInstance ?property ?value .
}
"""

# Execute the query
results = g.query(query_1)

# Group results by psetClass and then by psetInstance
pset_data = defaultdict(lambda: defaultdict(dict))
for row in results:
    pset_data[row.psetClass][row.psetInstance][row.property] = row.value

# Print the first psetInstance and its values for each psetClass
print("bsdd:IfcBridgePartABUTMENT has the following properties: ")
for psetClass, instances_data in pset_data.items():
    print(f"Pset Category: {psetClass}")
    if instances_data:
        first_instance = next(iter(instances_data))  # Get the first psetInstance
        print(f"Pset ID: {first_instance}")
        for property, value in instances_data[first_instance].items():
            print(f"{property}: {value}")
        print()
    else:
        print("No instances found for this psetClass.")

bsdd:IfcBridgePartABUTMENT has the following properties: 
Pset Category: http://asb-example.org/ASB_Pset_Feld
Pset ID: http://asb-example.org/4B0W1WZO
http://www.w3.org/1999/02/22-rdf-syntax-ns#type: http://asb-example.org/ASB_Pset_Feld
http://asb-example.org/AMT: 533
http://asb-example.org/ANZAHL_ST: 1
http://asb-example.org/ART: http://asb-example.org/40011100000000
http://asb-example.org/BEARBEITER: CODEKERK
http://asb-example.org/BEARB_DAT: 2002-08-27T08:34:54.999000
http://asb-example.org/BEMERKUNG: Widerlager 1
***
http://asb-example.org/BWNR: http://asb-example.org/B115
http://asb-example.org/FELDNR: http://asb-example.org/0
http://asb-example.org/FELD_NR: http://asb-example.org/1
http://asb-example.org/IDENT: http://asb-example.org/4B0W1WZO
http://asb-example.org/ID_NR: http://asb-example.org/B115_0
http://asb-example.org/REF_BRUCKE: http://asb-example.org/4B0W1X9V
http://asb-example.org/REF_FELDER: http://asb-example.org/4B0W1X9W
http://asb-example.org/SCHIFF_OEF: http://asb-

### Querying linked graph

In [None]:
'''Extract ASB pset of Abutment or Pier. '''

# Load the graph
g = Graph()
g.parse("/content/output/linked_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX asb: <http://asb-example.org/>
PREFIX bsdd: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?inst ?ifcClass ?pset
WHERE {
  ?inst inst:hasAsbPset ?pset .
  ?inst a ?ifcClass .
  {?ifcClass owl:equivalentClass brcomp:Abutment .} UNION {?ifcClass owl:equivalentClass brcomp:Pier .}
}
"""
# Execute the query
results = g.query(query)

for row in results:
  print(f"IFC Instance: {row.inst} has ASB Pset: {row.pset}.")

IFC Instance: http://ifc-instance.org/instances/42796 has ASB Pset: http://asb-example.org/4B0W1WZP.
IFC Instance: http://ifc-instance.org/instances/42819 has ASB Pset: http://asb-example.org/4B0W1WZP.
IFC Instance: http://ifc-instance.org/instances/42841 has ASB Pset: http://asb-example.org/4B0W1WZP.
IFC Instance: http://ifc-instance.org/instances/42863 has ASB Pset: http://asb-example.org/4B0W1WZQ.
IFC Instance: http://ifc-instance.org/instances/42885 has ASB Pset: http://asb-example.org/4B0W1WZQ.
IFC Instance: http://ifc-instance.org/instances/42894 has ASB Pset: http://asb-example.org/4B0W1WZQ.
IFC Instance: http://ifc-instance.org/instances/43395 has ASB Pset: http://asb-example.org/4B0W1WZO.
IFC Instance: http://ifc-instance.org/instances/43741 has ASB Pset: http://asb-example.org/4B0W1WZR.


In [None]:
''' Extract substructure components. The expected result contains Abutment, Pier and Foundations.
    But as the linked graph is not yet applied with inferencing engines. Such associations can't be made yet. '''

# Load the graph
g = Graph()
g.parse("/content/output/linked_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT DISTINCT ?inst ?ifcClass
WHERE {
  {?inst a brot:SubStructure .} UNION {?inst a brcomp:SubStructureComponent .}
  ?inst a ?ifcClass .

  FILTER(STRSTARTS(STR(?ifcClass), "https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/"))
}
"""

# Execute the query
results = g.query(query)

# Print the results
print("IFC Types that are brot:SubStructure or brcomp:SubStructureComponent:")
for row in results:
    print(f"{row.inst} - {row.ifcClass}")

IFC Types that are brot:SubStructure or brcomp:SubStructureComponent:


### Querying enriched graph

In [None]:
'''Extract substructure components. The expected result contains Abutment, Pier and Foundations.
   This same query above on the enriched graph will return expected results.'''

# Load the graph
g = Graph()
g.parse("/content/output/enriched_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT DISTINCT ?inst ?ifcClass
WHERE {
  {?inst a brot:SubStructure .} UNION {?inst a brcomp:SubStructureComponent .}
  ?inst a ?ifcClass .

  FILTER(STRSTARTS(STR(?ifcClass), "https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/"))
} LIMIT 10
"""

# Execute the query
results = g.query(query)

# Print the results
print("IFC Types that are brot:SubStructure or brcomp:SubStructureComponent:")
for row in results:
    print(f"{row.inst} - {row.ifcClass}")

IFC Types that are brot:SubStructure or brcomp:SubStructureComponent:
http://ifc-instance.org/instances/42796 - https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePart
http://ifc-instance.org/instances/42796 - https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartPIER
http://ifc-instance.org/instances/42819 - https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePart
http://ifc-instance.org/instances/42819 - https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartPIER
http://ifc-instance.org/instances/42841 - https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePart
http://ifc-instance.org/instances/42841 - https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartPIER
http://ifc-instance.org/instances/42863 - https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePart
http://ifc-instance.org/instances/42863 - h

In [None]:
'''Count numbers of instances for each substructure element types. '''

# Load the graph
g = Graph()
g.parse("/content/output/enriched_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?ifcClass (COUNT(?inst) AS ?instance_count)
WHERE {
  {?inst a brot:SubStructure .} UNION {?inst a brcomp:SubStructureComponent .}
  ?inst a ?ifcClass .
  FILTER(STRSTARTS(STR(?ifcClass), "https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/"))
}
GROUP BY ?ifcClass
"""

# Execute the query
results = g.query(query)

# Print the results
print("Instance count per ifcClass:")
for row in results:
    print(f"Type: {row.ifcClass}, Instance Count: {row.instance_count}")

Instance count per ifcClass:
Type: https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePart, Instance Count: 24
Type: https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartPIER, Instance Count: 12
Type: https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartFOUNDATION, Instance Count: 10
Type: https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartABUTMENT, Instance Count: 2


In [None]:
''' Extract all ASB psets associated with each instances of Substructure elements.'''

# Load the graph
g = Graph()
g.parse("/content/output/enriched_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX inst: <http://ifc-instance.org/instances/>

SELECT ?asb ?asbType ?inst ?brcompClass
WHERE {
  ?inst a ?brcompClass .
  ?brcompClass rdfs:subClassOf* brcomp:SubStructureComponent .
  ?inst inst:hasAsbPset ?asb .
  ?asb a ?asbType .
}
ORDER BY ?brcompClass
"""

# Execute the query
results = g.query(query)

# Process and print the results
current_brcomp_class = None  # Keep track of the current brcomp class
for row in results:
    if row.brcompClass != current_brcomp_class:
        print(f"\nBrcomp Class: {row.brcompClass}")  # Print brcomp class only when it changes
        current_brcomp_class = row.brcompClass
    print(f"  Instance: {row.inst}, ASB Pset: {row.asb} & {row.asbType}")  # Print ASB ID and Instance


Brcomp Class: https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartABUTMENT
  Instance: http://ifc-instance.org/instances/43395, ASB Pset: http://asb-example.org/4B0W1WZO & http://asb-example.org/ASB_Pset_Feld
  Instance: http://ifc-instance.org/instances/43741, ASB Pset: http://asb-example.org/4B0W1WZR & http://asb-example.org/ASB_Pset_Feld

Brcomp Class: https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgePartFOUNDATION
  Instance: http://ifc-instance.org/instances/42952, ASB Pset: http://asb-example.org/4B0W1WX8 & http://asb-example.org/ASB_Pset_Gruendung
  Instance: http://ifc-instance.org/instances/42976, ASB Pset: http://asb-example.org/4B0W1WX8 & http://asb-example.org/ASB_Pset_Gruendung
  Instance: http://ifc-instance.org/instances/43032, ASB Pset: http://asb-example.org/4B0W1WVO & http://asb-example.org/ASB_Pset_BelagAbdichtung
  Instance: http://ifc-instance.org/instances/43032, ASB Pset: http://asb-example.org/4B0W1WX8 

In [None]:
'''Extract all ASB properties associated with Abutments. '''

# Load the graph
g = Graph()
g.parse("/content/output/enriched_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX asb: <http://asb-example.org/>

SELECT ?inst ?asbPset ?property ?value
WHERE {
  ?inst a/rdfs:subClassOf* brcomp:Abutment .
  ?inst inst:hasAsbPset ?asbPset .
  ?asbPset ?property ?value .
  FILTER(?property != rdf:type) .
}
ORDER BY ?inst ?asbPset
"""

# Execute the query
results = g.query(query)

# Process and print the results
current_inst = None
current_asb_pset = None
for row in results:
    if row.inst != current_inst:
        print(f"\nInstance: {row.inst}")
        current_inst = row.inst
        current_asb_pset = None  # Reset current_asb_pset when instance changes

    if row.asbPset != current_asb_pset:
        print(f"  ASB Pset: {row.asbPset}")
        current_asb_pset = row.asbPset

    print(f"    Property: {row.property}, Value: {row.value}")


Instance: http://ifc-instance.org/instances/43395
  ASB Pset: http://asb-example.org/4B0W1WZO
    Property: http://asb-example.org/AMT, Value: 533
    Property: http://asb-example.org/ANZAHL_ST, Value: 1
    Property: http://asb-example.org/ART, Value: http://asb-example.org/40011100000000
    Property: http://asb-example.org/BEARBEITER, Value: CODEKERK
    Property: http://asb-example.org/BEARB_DAT, Value: 2002-08-27T08:34:54.999000
    Property: http://asb-example.org/BEMERKUNG, Value: Widerlager 1
***
    Property: http://asb-example.org/BWNR, Value: http://asb-example.org/B115
    Property: http://asb-example.org/FELDNR, Value: http://asb-example.org/0
    Property: http://asb-example.org/FELD_NR, Value: http://asb-example.org/1
    Property: http://asb-example.org/IDENT, Value: http://asb-example.org/4B0W1WZO
    Property: http://asb-example.org/ID_NR, Value: http://asb-example.org/B115_0
    Property: http://asb-example.org/REF_BRUCKE, Value: http://asb-example.org/4B0W1X9V
   

# Extracting Single Entity

In [37]:
from rdflib import Graph, URIRef
from typing import Union

def extract_direct_connections(
    input_file: str,
    target_object: str,
    output_file: str,
    include_inverse: bool = True
) -> None:
    """
    Extract triples that directly connect to a specific object from a TTL file.
    Preserves namespace bindings from input to output.

    Args:
        input_file: Path to input TTL file
        target_object: URI or literal of the target object
        output_file: Path to output TTL file
        include_inverse: Whether to include triples where target is subject
    """
    # Create new graph with namespaces
    output_graph, namespaces = create_and_bind_graph()

    # Load the input graph
    input_graph = Graph()
    input_graph.parse(input_file, format="turtle")

    # Copy namespace bindings from input graph
    for prefix, namespace in input_graph.namespaces():
        if prefix not in [p.lower() for p in namespaces.keys()]:
            output_graph.bind(prefix, namespace)

    # Convert target_object to URIRef if it's a URI
    target = URIRef(target_object) if target_object.startswith("http") else target_object

    # Get direct triples where target is the object
    for s, p, o in input_graph.triples((None, None, target)):
        output_graph.add((s, p, o))

    if include_inverse:
        # Get direct triples where target is the subject
        for s, p, o in input_graph.triples((target, None, None)):
            output_graph.add((s, p, o))

    # Save the output graph with preserved namespaces
    output_graph.serialize(
        destination=output_file,
        format="turtle"
    )

    # Print statistics
    print(f"Original graph: {len(input_graph)} triples")
    print(f"Extracted graph: {len(output_graph)} triples")
    print(f"Preserved namespaces: {len(list(output_graph.namespaces()))} namespaces")



# Example usage
extract_direct_connections(
    input_file="/content/output/ifc_graph.ttl",
    target_object="http://ifc-instance.org/instances/43395",
    output_file="/content/output/singleElement_ifc.ttl",
    include_inverse=True,  # Include triples where target is the subject
)

extract_direct_connections(
    input_file="/content/output/merged_graph.ttl",
    target_object="http://ifc-instance.org/instances/43395",
    output_file="/content/output/singleElement_merged.ttl",
    include_inverse=True,  # Include triples where target is the subject
)

extract_direct_connections(
    input_file="/content/output/linked_graph.ttl",
    target_object="http://ifc-instance.org/instances/43395",
    output_file="/content/output/singleElement_linked.ttl",
    include_inverse=True,  # Include triples where target is the subject
)

extract_direct_connections(
    input_file="/content/output/enriched_graph.ttl",
    target_object="http://ifc-instance.org/instances/43395",
    output_file="/content/output/singleElement_enriched.ttl",
    include_inverse=True,  # Include triples where target is the subject
)


Original graph: 202 triples
Extracted graph: 5 triples
Preserved namespaces: 35 namespaces
Original graph: 796 triples
Extracted graph: 6 triples
Preserved namespaces: 35 namespaces
Original graph: 1340 triples
Extracted graph: 6 triples
Preserved namespaces: 37 namespaces
Original graph: 3230 triples
Extracted graph: 12 triples
Preserved namespaces: 37 namespaces


In [40]:
# visualisation
custom_options = {"physics": {
    "barnesHut": {
        "gravitationalConstant": -2000,
        "centralGravity": 0.3,
        "springLength": 95,
        "springConstant": 0.04
        }
    }}

visualize_rdf("/content/output/singleElement_ifc.ttl", "/content/output/singleElement_ifc.html", custom_options)
# visualize_rdf("/content/output/singleElement_merged.ttl", "/content/output/singleElement_merged.html", custom_options)
# visualize_rdf("/content/output/singleElement_linked.ttl", "/content/output/singleElement_linked.html", custom_options)
visualize_rdf("/content/output/singleElement_enriched.ttl", "/content/output/singleElement_enriched.html", custom_options)

Visualization saved as /content/output/singleElement_ifc.html
Visualization saved as /content/output/singleElement_enriched.html


<class 'pyvis.network.Network'> |N|=12 |E|=12