<a href="https://colab.research.google.com/github/suhtoo/SE-Pipeline/blob/main/SE_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Semantic Enrichment Pipeline with OpenBIM
This notebook demostrates the case study implementation of Semantic Enrichment Pipeline in OpenBIM (IFC).

### <a name="TOC"></a> Table of Contents:
1. Google Colab Basics
2. The Art of Understanding: Natural Language Processing ([NLP](#2-nlp))
  - [Tokenization](#nlp-tokenization)
  - Morphological and Syntactic Analysis
    - [Lemmatization](#nlp-lemma)
    - [Part-of-Speech Tagging](#nlp-pos)
    - [Dependency Parsing](#nlp-dep)
  - [Named Entity Recognition](#nlp-ner)

## Install Dependencies and Create Directories

In [19]:
!pip install ifcopenshell pandas rdflib owlrl pyvis requests



In [2]:
# Removing all the folder content
# !rm -rf output/*

The functions in this notebook use default directories:
In "data" folder, upload the data sources like IFC files, JSON files and excel file for Mapping table. The output of the functions are saved in "output" folder if not specified. The directories can be manually created or run the script below.

In [3]:
import os

# Create data folder if it doesn't exist
data_folder = "data"
if not os.path.exists(data_folder):
    os.makedirs(data_folder)
    print(f"Created data folder: {data_folder}")
else:
    print(f"Data folder '{data_folder}' already exists.")

# Create output folder if it doesn't exist
output_folder = "output"
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
    print(f"Created output folder: {output_folder}")
else:
    print(f"Output folder '{output_folder}' already exists.")

Created data folder: data
Created output folder: output


# 1. IFC to RDF

### Extract all available attributes from IFC elements and save to CSV - Optional

This can later be used to formulate alignment file between IFC model elements and other dataset.

In [22]:
import ifcopenshell
import pandas as pd
from pathlib import Path

def ifc_extractor(ifc_file_path: str):
    """
    Extracts all available attributes from IFC elements and saves them to a CSV file in the 'output' folder.

    Args:
        ifc_file_path (str): Path to the IFC file.
    """

    output_path = Path("output", "ifc_attributes.csv") # Default output path in 'output' folder

    # Load the IFC file
    ifc_file = ifcopenshell.open(ifc_file_path)

    # Get all elements
    elements = ifc_file.by_type('IfcObject')

    # Initialize list to store element data
    elements_data = []

    for element in elements:
        # Basic element data
        element_data = {
            'ElementType': element.is_a(),
            'PredefinedType': element.PredefinedType if hasattr(element, 'PredefinedType') else None,
            'GlobalId': element.GlobalId,
            'id': element.id(),
            'Name': getattr(element, 'Name', None),
            'Description': getattr(element, 'Description', None),
            'ObjectType': getattr(element, 'ObjectType', None),
        }

        # Get property sets
        if element.IsDefinedBy:
            for definition in element.IsDefinedBy:
                if definition.is_a('IfcRelDefinesByProperties'):
                    pset = definition.RelatingPropertyDefinition
                    if pset.is_a('IfcPropertySet'):
                        for prop in pset.HasProperties:
                            if hasattr(prop, 'NominalValue') and prop.NominalValue is not None:
                                element_data[f"{pset.Name}_{prop.Name}"] = prop.NominalValue.wrappedValue

        elements_data.append(element_data)

    # Convert to DataFrame and save to CSV
    df = pd.DataFrame(elements_data)
    df.to_csv(output_path, index=False)
    print(f"{output_path} created and saved with {len(df)} attributes")

    return df

# ifc_extractor = ifc_extractor("/content/data/UKA_UK_Aachen_IFC_02.ifc")


## Extract RDF from IFC

Convert IFC file to RDF and save as TTL file.

In [23]:
# helper functions
import ifcopenshell
from rdflib import Graph, Namespace, URIRef, Literal
from pathlib import Path
import urllib.parse

def create_namespaces():
    """Create and return commonly used namespaces."""
    return {
        'INST': Namespace("http://ifc-instance.org/instances/"),
        'RDF': Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#"),
        'RDFS': Namespace("http://www.w3.org/2000/01/rdf-schema#"),
        'OWL': Namespace("http://www.w3.org/2002/07/owl#"),
        'BSDD': Namespace("https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/"),
        'PROP': Namespace("https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/"),
        'BROT': Namespace("https://w3id.org/brot#"),
        'BRCOMP': Namespace("https://w3id.org/brcomp#"),
        'ASB': Namespace("http://asb-example.org/")

    }

def create_and_bind_graph():
    """Create a new graph and bind namespaces."""
    g = Graph()
    namespaces = create_namespaces()

    # Bind all namespaces
    for prefix, ns in namespaces.items():
        g.bind(prefix.lower(), ns)

    return g, namespaces

def clean_uri_string(s: str) -> str:
    """Clean string to make it URI-safe."""
    if not isinstance(s, str):
        return ""
    cleaned = s.replace(" ", "_").replace("/", "_").replace(".", "_")
    return urllib.parse.quote(cleaned)

In [24]:
def ifc_rdf_converter(ifc_file_path: str):
    """
    Convert IFC file to RDF and save as TTL file in the 'output' folder.

    Args:
        ifc_file_path (str): Path to the IFC file.
    """

    output_file = Path("output", "ifc_graph.ttl")  # Default output path in 'output' folder

    # Create graph and get namespaces
    g, ns = create_and_bind_graph()

    def get_element_uri(element_id: int) -> URIRef:
        """Create URI for an element."""
        return URIRef(ns['INST'][f"{element_id}"])

    def process_relationship(relationship):
        """Process IFC relationship and add triples to graph."""
        rel_type = relationship.is_a()

        # Get common relating element
        relating_element = None
        if hasattr(relationship, 'RelatingElement'):
            relating_element = relationship.RelatingElement
        elif hasattr(relationship, 'RelatingStructure'):
            relating_element = relationship.RelatingStructure
        elif hasattr(relationship, 'RelatingObject'):
            relating_element = relationship.RelatingObject


        # Get common related elements
        related_elements = []
        if hasattr(relationship, 'RelatedElements'):
            related_elements.extend(relationship.RelatedElements)
        elif hasattr(relationship, 'RelatedElement'):
            related_elements.append(relationship.RelatedElement)
        elif hasattr(relationship, 'RelatedObjects'):
            related_elements.extend(relationship.RelatedObjects)
        elif hasattr(relationship, 'RelatedObject'):
            related_elements.append(relationship.RelatedObject)


        # Add relationship triples
        if relating_element and hasattr(relating_element, 'id'):
            relating_uri = get_element_uri(relating_element.id())
            for related_element in related_elements:
                if hasattr(related_element, 'id'):
                    related_uri = get_element_uri(related_element.id())
                    g.add((relating_uri, ns['INST'][rel_type], related_uri))

    # Load IFC file
    ifc_file = ifcopenshell.open(ifc_file_path)

    # Process elements
    for element in ifc_file.by_type('IfcProduct'):
        element_uri = get_element_uri(element.id())

        # Add element type
        element_type = element.is_a()
        if hasattr(element, "PredefinedType") and element.PredefinedType not in ["NOTDEFINED", "USERDEFINED", None, '*']:
            element_type = f"{element.is_a()}{element.PredefinedType}"
        g.add((element_uri, ns['RDF'].type, URIRef(ns['BSDD'][clean_uri_string(element_type)])))

        # Add basic attributes
        if element.GlobalId:
            g.add((element_uri, ns['PROP'].GlobalId, Literal(element.GlobalId)))

        if hasattr(element, 'Name') and element.Name:
            g.add((element_uri, ns['RDFS'].label, Literal(element.Name)))

        if hasattr(element, 'Description') and element.Description:
            g.add((element_uri, ns['RDFS'].comment, Literal(element.Description)))

        if hasattr(element, 'ObjectType') and element.ObjectType:
            g.add((element_uri, ns['PROP'].ObjectType, Literal(element.ObjectType)))

    # Process relationships
    for relationship in ifc_file.by_type('IfcRelationship'):
        process_relationship(relationship)

    g.serialize(destination=str(output_file), format="turtle")
    print(f"{output_file} created and saved with {len(g)} triples")

    return g

# Execution
# ifc_graph = ifc_rdf_converter("/content/data/UKA_UK_Aachen_IFC_02.ifc")

# 2. JSON to RDF

## Replace 15-digit key with description.

In [25]:
import json
from rdflib import Graph
import re
from collections import defaultdict
import requests

def key_value_mapper(input_json, output_filepath=None):
    """
    Replaces 15-digit keys in the JSON data with their corresponding descriptions
    from an ontology.

    Args:
        input_json (str): Path to the input JSON file.
        output_filepath (str, optional): Path to save the mapped JSON data.
    """

    # Load the ontology that contains key-value descriptions
    g = Graph()
    g.parse("https://annegoebels.github.io/asb/oldkeys/ontology.ttl", format='ttl')

    '''
    # Download the Turtle file
    url = "https://annegoebels.github.io/asb/newkeys/ontology.ttl" # Add filename to the url
    response = requests.get(url)
    response.raise_for_status()  # Raise an exception if download fails

    # Parse the downloaded content
    g.parse(data=response.content, format='turtle') '''


    # Create lookup dictionary
    lookup_dict = {}
    pattern = re.compile(r"#(\d{15})_(.+)$")  # Match 15-digit number and class name
    for s in g.subjects():
        s_str = str(s)
        match = pattern.search(s_str)
        if match:
            number, class_name = match.groups()
            if number not in lookup_dict:
                lookup_dict[number] = class_name

    # Load JSON data
    if isinstance(input_json, str):
        with open(input_json, 'r') as file:
            data = json.load(file)
    else:
        data = input_json

    # Initialize counters
    total_processed = 0
    matches_found = 0

    # Process each file's data
    for file_key, records in data.items():
        # Process each record in the file
        for record in records:
            # Process each field in the record
            for field_key, field_value in record.items():
                if isinstance(field_value, str) and len(field_value) == 15 and field_value.isdigit():
                    total_processed += 1
                    class_name = lookup_dict.get(field_value)
                    if class_name:
                        matches_found += 1
                        record[field_key] = class_name  # Replacement key with value

    # Save if output path is provided
    if output_filepath is None:
        # Default output path in 'data' folder if not specified
        output_filepath = input_json.replace('.json', '_mapped.json')

    with open(output_filepath, 'w') as file:
        json.dump(data, file, indent=2)

    # Print statistics
    print(f"\nStatistics Key-Value Map:")
    print(f"Total 15-digit values processed: {total_processed}")
    print(f"Matches found: {matches_found}")
    print(f"Number of files processed: {len(data)}\n")

    return data

## Extract Subject-Predicate-Object from JSON

In [26]:
import json
import pandas as pd
import requests

def json_spo_extractor(json_file_path, output_path=None):
    """
    Extracts Subject-Predicate-Object triples from JSON data and saves them to a CSV file.

    Args:
        json_file_path (str): Path to the input JSON file.
        output_path (str, optional): Path to save the extracted SPO triples.
    """

    # Read JSON file
    with open(json_file_path) as file:
        data = json.load(file)

    # Define differnt ASBID for Bauwerk, Teilbauwerk and Brueke
    asbid_mapping = {
        "ges_bw.csv": "BWNR",
        "teil_bw.csv": "ID_NR",
        "bruecke.csv": "REF_BRUCKE"
        }

    # List to store all rows
    rows = []

    # Process each CSV file
    for category, records in data.items():
        # Get subject
        subject = category

        # Determine which ASBID key to use using asbid_mapping dictionary
        asbid_key = asbid_mapping.get(category, 'IDENT') # Default to IDENT if not in mapping

        # Process each dictionary in the CSV
        for record in records:
            if isinstance(record, dict):
                # Get ASBID using appropriate key
                asbid = record.get(asbid_key, '')

                # Process each key-value pair
                for key, value in record.items():
                    # Check if OBJECT value is not "NaN" or "0" or "0.0"
                    if value != "NaN" and value != "***" and value != " ***":
                      rows.append({
                          'ASBID': asbid,
                          'CATEGORY': subject,
                          'PROPERTY': key,
                          'OBJECT': value,
                    })

    # Create DataFrame and save to CSV
    df_SPO = pd.DataFrame(rows)

    # Reorder columns to match specified order
    df_SPO = df_SPO[['ASBID', 'CATEGORY', 'PROPERTY', 'OBJECT']]
    print(f"Total rows in SPO: {len(df_SPO)} \n")

    # Save if output path is provided, else default to 'output'
    if output_path is None:
        output_path = os.path.join("output",'asb_SPO.csv')

    df_SPO.to_csv(output_path, index=False)

    return df_SPO

# Execute
# df_SPO = json_spo_extractor("/content/data/extracted_B115_mapped.json")

## Map to a more descriptive subject name and map datatype for object value

In [27]:
import pandas as pd

def datatype_mapper(map_excel_path, df):
    """
    Maps datatypes and properties to more descriptive names using a mapping Excel file.

    Args:
        map_excel_path (str): Path to the mapping Excel file.
        df (pd.DataFrame): DataFrame containing the SPO triples.
    """

    # Read excel sheet
    map_csv = pd.read_excel(map_excel_path, sheet_name='MAPPING')

    # Create mapping dictionaries
    class_mapping = dict(zip(map_csv["Table"], map_csv["Table_Name"]))
    datatype_mapping = dict(zip(map_csv["Attribute"], map_csv["Datatype"]))
    # property_mapping = dict(zip(map_csv["Attribute"], map_csv["Attribute_FullText"]))

    # Replace SUBJECT values using class_mapping
    df['CATEGORY'] = df['CATEGORY'].replace(class_mapping)

    # Add a new column for datatype based on the PROPERTY column
    df['DATATYPE'] = df['PROPERTY'].map(datatype_mapping) # Map datatype based on 'PROPERTY' column

    # Replace PROPERTY values using property_mapping
    # df['PROPERTY'] = df['PROPERTY'].replace(property_mapping)

    #print(len(df["CATEGORY"].unique()))

    # Save the DataFrame to CSV, replacing existing file
    output_path = os.path.join("output", "asb_SPO.csv")  # Path to asb_SPO.csv in the output folder
    df.to_csv(output_path, index=False)  # Overwrite the existing file if it exists

    return df

# Execute
# datatype_mapper("/content/data/mapping.xlsx", df_SPO)

## Main SPO Extractor Function

In [28]:
def main_spo_extractor(input_json_file: str, mapping_table: str, output_spo_file=None):
    """
    Main function to extract and process SPO triples from a JSON file,
    apply key-value mapping and datatype mapping, and save to a CSV.

    Args:
        input_json_file (str): Path to the input JSON file.
        mapping_table (str): Path to the mapping table (Excel file).
        output_spo_file (str, optional): Path to save the extracted SPO triples.
    """

    # 1. Key-Value Mapping:
    mapped_data = key_value_mapper(input_json_file)

    # 2. JSON to SPO Extraction:
    # Pass the original file path for reading the JSON data
    asb_SPO = json_spo_extractor(input_json_file, output_path=output_spo_file)

    # 3. Datatype Mapping:
    asb_SPO = datatype_mapper(mapping_table, asb_SPO)

    # 4. Save to CSV:
    if output_spo_file is None:
        output_path = os.path.join("output", "asb_SPO.csv")  # Default output path
    else:
        output_path = output_spo_file  # Use provided output path

    asb_SPO.to_csv(output_path, index=False)

    return asb_SPO

# Example Usage
# json_input = "/content/data/extracted_B115.json"
# mapping_table = "/content/data/mapping.xlsx"
# asb_df = main_spo_extractor(json_input, mapping_table)

## Generate RDF

In [29]:
# helper function to clean the strings
import pandas as pd
from rdflib import Graph, Literal, Namespace, URIRef
from rdflib.namespace import RDF, RDFS, XSD
import re
import unicodedata

def clean_string(text):
    if pd.isna(text):
        return text

    # Convert to string if not already
    text = str(text)

    # Replace German special characters
    replacements = {
        'ä': 'ae', 'ö': 'oe', 'ü': 'ue', 'ß': 'ss',
        'Ä': 'Ae', 'Ö': 'Oe', 'Ü': 'Ue'
    }
    for char, replacement in replacements.items():
        text = text.replace(char, replacement)

    # Remove accents
    text = ''.join(c for c in unicodedata.normalize('NFKD', text)
                  if not unicodedata.combining(c))

    # Replace spaces and special chars with underscore
    text = re.sub(r'[^a-zA-Z0-9]+', '_', text)

    # Remove leading/trailing underscores
    text = text.strip('_')

    return text


def get_xsd_datatype(datatype: str):
    """Map datatype string to XSD datatype."""
    datatype_mapping = {
        'double': XSD.float,
        'dateTime': XSD.dateTime,
        'year': XSD.year,
        'text': XSD.string,
        'int': XSD.integer,
        'boolean': XSD.boolean,
        'long': XSD.integer,
        'float': XSD.float,
        # Add more mappings as needed
    }
    return datatype_mapping.get(datatype, None)


In [30]:
def json_rdf_converter(df, valid_categories):
    """
    Converts a DataFrame of SPO triples to an RDF graph and saves it as a Turtle file.

    Args:
        df (pd.DataFrame): DataFrame containing the SPO triples.
        valid_categories (list): List of valid categories for filtering the data.
    """

    # Define Namespace
    ASB = Namespace("http://asb-example.org/")

    # Create graph and bind namespaces
    g = Graph()
    g.bind("asb", ASB)

    # Clean the relevant columns
    df['CATEGORY'] = df['CATEGORY'].apply(clean_string)
    df['PROPERTY'] = df['PROPERTY'].apply(clean_string)

    for _, row in df.iterrows():
        # Get the subject from the row and clean it
        category = row['CATEGORY']

        # Check if the class is in the list of valid_categories
        if category not in valid_categories:
            continue

        # Create RDF triples
        subject = URIRef(ASB[f"{clean_string(row['ASBID'])}"])
        predicate = URIRef(ASB[row['PROPERTY']])

        # Use the value in 'OBJECT' column directly
        if pd.notna(row['OBJECT']):
            datatype = row['DATATYPE'] if isinstance(row['DATATYPE'], str) else None

            if datatype:  # If datatype is present, create a Literal with datatype
                xsd_datatype = get_xsd_datatype(datatype) if datatype else None
                obj = Literal(row['OBJECT'], datatype=xsd_datatype) if xsd_datatype else Literal(row['OBJECT'])
            else:  # If datatype is empty, create a resource with 'asb:' prefix
                obj = URIRef(ASB[clean_string(row['OBJECT'])])
        else:
            continue

        # Replace "SUBJECT" with "CATEGORY"
        g.add((subject, RDF.type, URIRef(ASB[f"ASB_Pset_{row['CATEGORY']}"])))
        g.add((subject, predicate, obj)) if obj else None

    # Serialize the graph to the default output file
    output_file = os.path.join("output", "asb_graph.ttl")  # Default output path
    g.serialize(format='turtle', destination=output_file)
    print(f"{output_file} created and saved with {len(g)} triples")

    return g

'''
# Execution
valid_categories = ['Bauwerk', 'Teilbauwerk', 'Bruecke',
                   'BelagAbdichtung', 'Ausstattungen', 'fhrb_bel.csv', 'Gruendung', 'Kappe', 'Lager',
                   'Leitung', 'Schutzeinrichtungen', 'StatischesSystem_Tragfaehigkeit', 'Vorspannung',
                   'Fahrbahnuebergang', 'Feld']

graph = json_rdf_converter(df_SPO, valid_categories)


# Print numbers of triples
print(f"Number of triples: {len(graph)}")

# Print numbers of instances
instances = set()
for s, p, o in graph:
    instances.add(s)
print(f"Number of instances: {len(instances)}") '''

'\n# Execution\nvalid_categories = [\'Bauwerk\', \'Teilbauwerk\', \'Bruecke\',\n                   \'BelagAbdichtung\', \'Ausstattungen\', \'fhrb_bel.csv\', \'Gruendung\', \'Kappe\', \'Lager\',\n                   \'Leitung\', \'Schutzeinrichtungen\', \'StatischesSystem_Tragfaehigkeit\', \'Vorspannung\',\n                   \'Fahrbahnuebergang\', \'Feld\']\n\ngraph = json_rdf_converter(df_SPO, valid_categories)\n\n\n# Print numbers of triples\nprint(f"Number of triples: {len(graph)}")\n\n# Print numbers of instances\ninstances = set()\nfor s, p, o in graph:\n    instances.add(s)\nprint(f"Number of instances: {len(instances)}") '

# 4. Graph Merging

In [31]:
def graph_mapper(instance_graph_data, mapping_file):
    """
    Links IFC instances to ASB properties using SPARQL queries.

    Args:
        instance_graph_data (str or rdflib.Graph): Path to the instance graph
                                                 file in Turtle format or an
                                                 rdflib.Graph object.
        mapping_file (str): Path to the mapping Excel file.
    """

    # Create graph and get namespaces
    g, ns = create_and_bind_graph()

    # Load instance graph
    if isinstance(instance_graph_data, str):
        # Input is a file path, parse the TTL file
        g.parse(instance_graph_data, format='turtle')
    elif isinstance(instance_graph_data, Graph):
        # Input is an rdflib.Graph object, use it directly
        g = instance_graph_data
    else:
        raise TypeError("Invalid input type. Expected str (file path) or rdflib.Graph object.")

    # Load mapping file
    mappings = pd.read_excel(mapping_file, sheet_name="instance-alignment")

    # Process mappings
    for _, row in mappings.iterrows():
        # Skip if either field is empty or NaN
        if pd.isna(row.ifc_instance_id) or pd.isna(row.asb_instance_id):
            continue

        # Split and clean IFC instances
        ifc_instances = [inst.strip() for inst in str(row.ifc_instance_id).split(',') if inst.strip()]

        # Split and clean ASB instances
        asb_instances = [inst.strip() for inst in str(row.asb_instance_id).split(',') if inst.strip()]

        # Create mappings between all combinations of IFC and ASB instances
        for ifc_instance in ifc_instances:
            # Skip empty IFC instances
            if not ifc_instance:
                continue

            # Create SPARQL query for each IFC instance
            for asb_instance in asb_instances:
                # Skip empty ASB instances
                if not asb_instance:
                    continue

                # Create and execute SPARQL query for each combination
                query = f"""
                CONSTRUCT {{
                    ?inst <{ns['INST']}hasAsbPset> <{ns['ASB']}{asb_instance}>
                }}
                WHERE {{
                    ?inst a ?type .
                    FILTER(STRENDS(STR(?inst), "{ifc_instance}"))
                }}
                """

                # Execute query and add results to graph
                results = g.query(query)
                for triple in results:
                    g.add(triple)

    # Serialize the graph to the default output file
    # output_file = os.path.join("output", "mapped_graph.ttl")  # Default output path
    # g.serialize(format='turtle', destination=output_file)
    print(f"Mapped graph created with {len(g)} triples")

    return g

# Usage
# mapped_graph = graph_mapper("/content/output/ifc_graph.ttl", "/content/data/mapping.xlsx")

In [32]:
from rdflib import Graph

def graphs_merger(input_graphs: list, output_file: str = None):
    """
    Merges multiple RDF graphs into a single graph.

    Args:
        input_graphs (list): A list of file paths (str) or Graph objects.
        output_file (str, optional): Path to save the merged graph. Defaults to None.
    """

    # Create a new graph for the merged content
    merged_graph = Graph()

    for input_graph in input_graphs:
        if isinstance(input_graph, str):
            # Input is a file path
            try:
                graph = Graph()  # Create a temporary graph to load the file
                graph.parse(input_graph, format="turtle")
                merged_graph += graph
                print(f"Loaded {len(graph)} triples from {input_graph}")
            except Exception as e:
                print(f"Error loading {input_graph}: {e}")
        elif isinstance(input_graph, Graph):
            # Input is a Graph object
            merged_graph += input_graph
            print(f"Loaded {len(input_graph)} triples.")
        else:
            print(f"Skipping invalid input: {input_graph}")

    # Copy all namespace bindings from all graphs
    for input_graph in input_graphs:
        if isinstance(input_graph, Graph):
            for prefix, namespace in input_graph.namespaces():
                if prefix not in merged_graph.namespaces():
                    merged_graph.bind(prefix, namespace)
        elif isinstance(input_graph, str):
            graph = Graph()
            try:
                graph.parse(input_graph, format="turtle")
                for prefix, namespace in graph.namespaces():
                    if prefix not in merged_graph.namespaces():
                        merged_graph.bind(prefix, namespace)
            except Exception as e:
                print(f"Error loading namespaces from {input_graph}: {e}")

    # Serialize the graph to the default output file
    output_file = os.path.join("output", "merged_graph.ttl")  # Default output path
    merged_graph.serialize(format='turtle', destination=output_file)
    print(f"{output_file} created and saved with  {len(merged_graph)} triples .")

    return merged_graph

# 5. Ontology Linking

In [33]:
def onto_mapper(instance_graph_data, mapping_csv):
    """
    Links an IFC instance graph with external ontologies using mapping definitions.

    Args:
        instance_graph_path (str): Path to the instance graph file in Turtle format
        mapping_csv (str): Path to the mapping Excel file

    Returns:
        rdflib.Graph: The enriched graph with ontology linkages
    """
    # Create graph and get namespaces
    g, ns = create_and_bind_graph()

    # Load instance data
    if isinstance(instance_graph_data, str):
        # Input is a file path, parse the Turtle file
        g.parse(instance_graph_data, format='turtle')
    elif isinstance(instance_graph_data, Graph):
        # Input is an rdflib.Graph object, use it directly
        g = instance_graph_data
    else:
        raise TypeError("Invalid input type. Expected str (file path) or rdflib.Graph object.")

    # Load mapping file
    mappings = pd.read_excel(mapping_csv, sheet_name="instance-alignment")

    # Process each mapping
    for _, row in mappings.iterrows():
        # Check if necessary columns have valid values before constructing the query
        if pd.notna(row.ontology_prefix) and pd.notna(row.ontology_class) and pd.notna(row.ifc_class):
            query = f"""
            INSERT {{
                ?inst owl:equivalentClass {row.ontology_prefix}:{row.ontology_class} .
            }}
            WHERE {{
                ?inst a bsdd:{row.ifc_class} .
            }}
            """
            g.update(query)
        else:
            print(f"Skipping row due to missing values: {row}")

    # Serialize the graph to the default output file
    # output_file = os.path.join("output", "mapped_onto_graph.ttl")  # Default output path
    # g.serialize(format='turtle', destination=output_file)
    print(f"\nMapped Onto graph created with {len(g)} triples")

    return g

# Example usage
# linked_graph = onto_mapper("/content/output/merged_graph.ttl", "/content/data/mapping.xlsx")

In [34]:
import requests

def onto_linker(graph_data, ontology_urls=None):
    """
    Links an RDF graph with external ontologies specified in `ontology_urls`.

    Args:
        graph_data (str or rdflib.Graph): Path to the RDF graph file (Turtle format)
                                        or an rdflib.Graph object.
        ontology_urls (list, optional): List of URLs to external ontologies. Defaults to None.

    Returns:
        rdflib.Graph: The linked RDF graph.
    """

    # Create graph and get namespaces
    g, ns = create_and_bind_graph()

    # Load the main graph
    if isinstance(graph_data, str):
        # Input is a file path, parse the Turtle file
        g.parse(graph_data, format="turtle")
    elif isinstance(graph_data, Graph):
        # Input is an rdflib.Graph object, use it directly
        g = graph_data
    else:
        raise TypeError("Invalid input type for graph_data. Expected str (file path) or rdflib.Graph object.")


    # Stop if ontology_urls is None
    if ontology_urls is None:
        print("No ontology URLs provided. Stopping execution.")
        return g  # Return the graph as is without linking

    # Load each ontology
    for url in ontology_urls:
        try:
            response = requests.get(url)
            response.raise_for_status()  # Raises an HTTPError for bad responses
            g.parse(data=response.text, format="turtle")
        except requests.exceptions.RequestException as e:
            print(f"Failed to load ontology from {url}. Error: {str(e)}")

    # Serialize the graph to the default output file
    output_file = os.path.join("output", "linked_graph.ttl")  # Default output path
    g.serialize(format='turtle', destination=output_file)
    print(f"{output_file}h created and saved with {len(g)} triples")

    return g

'''
# Example usage
custom_urls = ["https://example.com/ontology1.ttl",
               "https://example.com/ontology2.ttl"
               ]
g = onto_linker("ifc_linked.ttl", custom_urls)'''

'\n# Example usage\ncustom_urls = ["https://example.com/ontology1.ttl",\n               "https://example.com/ontology2.ttl"\n               ]\ng = onto_linker("ifc_linked.ttl", custom_urls)'

# 6. Graph Completion

In [35]:
from rdflib import Graph
from owlrl import DeductiveClosure, OWLRL_Semantics

def basic_reasoner(graph_path, output_path):
    """
    Performs OWL 2 RL reasoning on an RDF graph using owlrl.

    Args:
        graph_path (str): Path to the input RDF graph file (Turtle format).
        output_path (str): Path to save the inferred graph (Turtle format).
    """
    # Load the graph
    g = Graph()
    g.parse(graph_path, format="turtle")

    # Create a deductive closure object
    dc = DeductiveClosure(OWLRL_Semantics)

    # Expand the graph with inferences
    dc.expand(g)

    # Save the inferred graph
    g.serialize(destination=output_path, format="turtle")

    print(f"Inferred graph saved to: {output_path}")
    print(f"Total triples after inference: {len(g)}")


# Example usage
# graph_path = "/content/output/linked_graph.ttl"  # Path to your input graph
# output_path = "/content/output/inferred_graph.ttl"  # Path to save the inferred graph
# basic_reasoner(graph_path, output_path)

# Visualisation

In [27]:
from rdflib import Graph, URIRef, BNode
from pyvis.network import Network

def visualize_rdf(input_data, output_html="graph.html"):
    """
    Parses an RDF Turtle file and generates an interactive visualization as an HTML file,
    incorporating namespace prefixes and color-coding nodes based on their namespaces.

    Parameters:
    ttl_file (str): Path to the input Turtle (.ttl) file.
    output_html (str): Path to save the generated HTML file.
    """
    # Initialize RDF Graph
    g = Graph()

    # Check input type and load data accordingly
    if isinstance(input_data, str):
        # Input is a file path, parse the TTL file
        g.parse(input_data, format='turtle')
    elif isinstance(input_data, Graph):
        # Input is an rdflib.Graph object, use it directly
        g = input_data
    else:
        raise TypeError("Invalid input type. Expected str (file path) or rdflib.Graph object.")

    # Extract namespace prefixes and assign colors
    namespace_colors = {}
    prefix_color_mapping = {
        'inst': '#003366',
        'asb': '#FF6B6B',
        'bsdd': '#006D5B',
        'prop': '#228B22'
    }
    default_namespace_color = "#e3c598"  # Black for undefined prefixes

    # Assign colors based on prefix
    for prefix, namespace in g.namespaces():
        if prefix in prefix_color_mapping:
            namespace_colors[str(namespace)] = prefix_color_mapping[prefix]
        else:
            namespace_colors[str(namespace)] = default_namespace_color

    # Create sets for nodes and edges
    nodes = {}
    edges = []

    # Default colors and sizes
    default_colors = {"URIRef": "#e3c598", "BNode": "lightgray", "Literal": "#2b2b2b"}
    node_size = {"URIRef": 15, "BNode": 10, "Literal": 10}

    # Process triples
    for s, p, o in g:
        # Handle subject
        s_id = str(s)
        s_namespace = next((ns for ns in namespace_colors if s_id.startswith(ns)), None)
        s_color = namespace_colors.get(s_namespace, default_colors["URIRef"])

        if isinstance(s, URIRef):
            s_label = s_id.split('/')[-1].split('#')[-1]
            s_type = "URIRef"
        elif isinstance(s, BNode):
            s_label = f"BNode:{s_id}"
            s_type = "BNode"
            s_color = default_colors["BNode"]
        else:
            s_label = s_id
            s_type = "Literal"
            s_color = default_colors["Literal"]
        nodes[s_id] = {"label": s_label, "type": s_type, "color": s_color}

        # Handle object
        o_id = str(o)
        o_namespace = next((ns for ns in namespace_colors if o_id.startswith(ns)), None)
        o_color = namespace_colors.get(o_namespace, default_colors["URIRef"])

        if isinstance(o, URIRef):
            o_label = o_id.split('/')[-1].split('#')[-1]
            o_type = "URIRef"
        elif isinstance(o, BNode):
            o_label = f"BNode:{o_id}"
            o_type = "BNode"
            o_color = default_colors["BNode"]
        else:
            o_label = str(o)
            o_type = "Literal"
            o_color = default_colors["Literal"]
        nodes[o_id] = {"label": o_label, "type": o_type, "color": o_color}

        # Handle predicate
        p_label = str(p).split('/')[-1].split('#')[-1]
        edges.append((s_id, o_id, p_label))

    # Create network
    net = Network(
        notebook=False,
        height="3000px",
        width="200%",
        bgcolor="#ffffff",
        font_color="black",
        select_menu=False,
        filter_menu=False,
        layout=False
    )

    # Add nodes with namespace-based colors
    for node_id, node_data in nodes.items():
        net.add_node(
            node_id,
            label=node_data["label"],
            title=node_id,
            color=node_data["color"],
            size=node_size[node_data["type"]],
            font={'size': 20, 'strokeWidth': 0}
        )

    # Add edges with labels and styles
    for s, o, p in edges:
        net.add_edge(
            s, o,
            label=p,  # Edge label
            title=p,  # Tooltip on hover
            font={"size": 20, "color": "#4a7ebb", "align": "middle"},
            inherit = True,
            color="#1f4c7a",
            arrows="to",
            length=100
        )

    # Configure physics settings for better layout
    net.toggle_physics(True)
    net.set_options("""
    {
      "physics": {
        "forceAtlas2Based": {
          "gravitationalConstant": -50,
          "centralGravity": 0.005,
          "springLength": 100
        },
        "minVelocity": 0.75,
        "solver": "forceAtlas2Based"
      },
      "layout": {
            "randomSeed": 42,
            "improvedLayout": true,
            "hierarchical": {"enabled": false, "direction": "UD","sortMethod": "hubsize"}
      },
      "interaction": {
        "hover": true,
        "navigationButtons": true
      },
      "nodes": {
        "font": {
          "size": 30,
          "face": "arial"
          }
        }
      }
    """)

    # Save graph to file
    net.save_graph(output_html)
    print(f"Visualization saved as {output_html}")

# Example usage:
# visualize_rdf("/content/output/merged_graph.ttl", "knowledge_graph.html")


Before we can visualize our project ttl file, we need to ensure that the file is encoded with utf8 as it might involve some German characters which cant be read properly by some programs. For now, ignore the problematic characters such as ä,ö,ü,ß, etc. It can be cleaned properly later.

# Execution

In [36]:
''' IFC to RDF conversion '''

# Input Data
ifc_file = "/content/data/UKA_UK_Aachen_IFC_02.ifc"

# Process IFC and generate IFC Graph
ifc_attr = ifc_extractor(ifc_file)  # optional
ifc_graph = ifc_rdf_converter(ifc_file)

# Visualisation
# visualize_rdf(ifc_graph, "/content/output/ifc_graph.html")

output/ifc_attributes.csv created and saved with 41 attributes
output/ifc_graph.ttl created and saved with 202 triples


In [37]:
''' JSON/ASB data preprocessing and conversion to RDF '''

# Input Data
json_file = "/content/data/extracted_B115.json"
mapping_table = "/content/data/mapping.xlsx"
valid_categories = ['Bauwerk', 'Teilbauwerk', 'Bruecke',
                   'BelagAbdichtung', 'Ausstattungen', 'fhrb_bel.csv', 'Gruendung', 'Kappe', 'Lager',
                   'Leitung', 'Schutzeinrichtungen', 'StatischesSystem_Tragfaehigkeit', 'Vorspannung',
                   'Fahrbahnuebergang', 'Feld']


# Process JSON/ASB data, create SPO and generate ASB Graph
asb_df = main_spo_extractor(json_file,mapping_table)
asb_graph = json_rdf_converter(asb_df, valid_categories)

# Visualisation
# visualize_rdf(asb_graph, "/content/output/asb_graph.html")


Statistics Key-Value Map:
Total 15-digit values processed: 5776
Matches found: 5018
Number of files processed: 62

Total rows in SPO: 29464 

output/asb_graph.ttl created and saved with 554 triples


In [38]:
''' Map and Merge instance graphs (IFC and ASB) '''

# Map IFC instances and ASB instances
mapped_graph = graph_mapper(ifc_graph, mapping_table)

# Input Data
input_graphs = [mapped_graph, asb_graph] # add more graphs as needed

# Merge instance graphs and generate a new graph
merged_graph = graphs_merger(input_graphs)

# Visualisation
# visualize_rdf(merged_graph, "/content/output/merged_graph.html")

Mapped graph created with 241 triples
Loaded 241 triples.
Loaded 554 triples.
output/merged_graph.ttl created and saved with  795 triples .


In [39]:
''' Map and Link merged graph with ontologies '''

# Map IFC classes and Ontology concepts
mapped_onto_graph = onto_mapper(merged_graph, mapping_table)

# Input Data
onto_urls = ["https://alhakam.github.io/brcomp/ontology.ttl",
            "https://alhakam.github.io/brot/ontology.ttl"
               ] # add more links as needed

# Link merge graphs and ontology classes and generate a new linked graph
linked_graph = onto_linker(mapped_onto_graph, onto_urls)

# Visualisation
# visualize_rdf(linked_graph, "/content/output/linked_graph.html")


Mapped Onto graph created with 837 triples
output/linked_graph.ttlh created and saved with 1371 triples


In [40]:
''' Apply basic reasoning by OWL 2 RL '''

graph_path = "/content/output/linked_graph.ttl"  # Path to input graph
output_path = "/content/output/enriched_graph.ttl"  # Path to save the inferred graph

# Function to generate enriched graph from linked graph
basic_reasoner(graph_path, output_path)

# Visualisation
# visualize_rdf(linked_graph, "/content/output/enriched_graph.html")

Inferred graph saved to: /content/output/enriched_graph.ttl
Total triples after inference: 5115


# Querying and Validation

In [44]:
import pandas as pd
from rdflib import Graph

### Querying instance graphs

In [129]:
# Query IFC graph to extract elements that are of bSDD IfcAbutment.

# Load the RDF graph
ifc_graph = Graph()
ifc_graph.parse("/content/output/ifc_graph.ttl", format="turtle")

# Query the graph
query1 = """
PREFIX bsdd: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?subj ?subjType ?obj ?objType
WHERE {
    ?subj inst:IfcRelAggregates ?obj .
    ?subj a ?subjType .
    ?obj a ?objType .
}
"""

results_1 = ifc_graph.query(query1)

# Print the results
for row in results_1:
    print(f"{row.subjType} aggregates to {row.objType}")

https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcSite aggregates to https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/IfcBridgeGIRDER


In [98]:
# Query IFC graph to extract elements that are of bSDD IfcAbutment.

# Load the RDF graph
ifc_graph = Graph()
ifc_graph.parse("/content/output/ifc_graph.ttl", format="turtle")

# Query the graph
query1 = """
PREFIX bsdd: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?element WHERE {
    ?element a bsdd:IfcBridgePartABUTMENT .

}
"""

results_1 = ifc_graph.query(query1)

# Print the results
for row in results_1:
    print(f"{row.element} is bsdd:IfcBridgePartAbutment")

http://ifc-instance.org/instances/43395 is bsdd:IfcBridgePartAbutment
http://ifc-instance.org/instances/43741 is bsdd:IfcBridgePartAbutment


In [103]:
# Query asb graph to extract numbers of associated elements for each ASB property set. Here bridge support structures.

# Load the RDF graph
asb_graph = Graph()
asb_graph.parse("/content/output/asb_graph.ttl", format="turtle")

# Query the graph
query2 = """
PREFIX asb: <http://asb-example.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?pset ?elements WHERE {
    ?pset a asb:ASB_Pset_Feld .
    ?pset asb:ANZAHL_ST ?elements .
}
"""

results_2 = asb_graph.query(query2)

# Print the results
for row in results_2:
    print(f"Pset {row.pset} is associated with {row.elements} bridge components.")

Pset http://asb-example.org/4B0W1WZO is associated with 1 bridge components.
Pset http://asb-example.org/4B0W1WZP is associated with 3 bridge components.
Pset http://asb-example.org/4B0W1WZQ is associated with 3 bridge components.
Pset http://asb-example.org/4B0W1WZR is associated with 1 bridge components.


### Querying merged graph

In [115]:


# Load the graph
g = Graph()
g.parse("/content/output/merged_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX asb: <http://asb-example.org/>
PREFIX bsdd: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?ifcInst ?ifcClass ?psetInstance ?psetClass ?property ?value
WHERE {
  ?ifcInst a bsdd:IfcBridgePartABUTMENT .
  ?ifcInst inst:hasAsbPset ?psetInstance .
  ?psetInstance a ?psetClass .
}
"""

# Execute the query
results = g.query(query)

# Print the results
for row in results:
    print(f"IFC Instance: {row.ifcInst} has ASB Pset: {row.psetInstance} that is of type {row.psetClass}.")

IFC Instance: http://ifc-instance.org/instances/43395 has ASB Pset: http://asb-example.org/4B0W1WZO that is of type http://asb-example.org/ASB_Pset_Feld.
IFC Instance: http://ifc-instance.org/instances/43741 has ASB Pset: http://asb-example.org/4B0W1WZR that is of type http://asb-example.org/ASB_Pset_Feld.


In [119]:
# Extract properties of IfcBridgePartAbutment type

# Load the graph
g = Graph()
g.parse("/content/output/merged_graph.ttl", format="turtle")

# Define the SPARQL query
query_1 = """
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX asb: <http://asb-example.org/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?ifcType ?psetInstance ?psetClass ?property ?value
WHERE {
  ?ifcType a bsdd:IfcBridgePartABUTMENT .
  ?ifcType inst:hasAsbPset ?psetInstance .
  ?psetInstance a ?psetClass .
  ?psetInstance ?property ?value .
}
"""

# Execute the query
results = g.query(query_1)

# Group results by psetClass and then by psetInstance
pset_data = defaultdict(lambda: defaultdict(dict))
for row in results:
    pset_data[row.psetClass][row.psetInstance][row.property] = row.value

# Print the first psetInstance and its values for each psetClass
print("bsdd:IfcBridgePartABUTMENT has the following properties: ")
for psetClass, instances_data in pset_data.items():
    print(f"Pset Category: {psetClass}")
    if instances_data:
        first_instance = next(iter(instances_data))  # Get the first psetInstance
        print(f"Pset ID: {first_instance}")
        for property, value in instances_data[first_instance].items():
            print(f"{property}: {value}")
        print()
    else:
        print("No instances found for this psetClass.")

bsdd:IfcBridgePartABUTMENT has the following properties: 
Pset Category: http://asb-example.org/ASB_Pset_Feld
Pset ID: http://asb-example.org/4B0W1WZO
http://www.w3.org/1999/02/22-rdf-syntax-ns#type: http://asb-example.org/ASB_Pset_Feld
http://asb-example.org/AMT: 533
http://asb-example.org/ANZAHL_ST: 1
http://asb-example.org/ART: http://asb-example.org/40011100000000
http://asb-example.org/BEARBEITER: CODEKERK
http://asb-example.org/BEARB_DAT: 2002-08-27T08:34:54.999000
http://asb-example.org/BEMERKUNG: Widerlager 1
***
http://asb-example.org/BWNR: http://asb-example.org/B115
http://asb-example.org/FELDNR: http://asb-example.org/0
http://asb-example.org/FELD_NR: http://asb-example.org/1
http://asb-example.org/IDENT: http://asb-example.org/4B0W1WZO
http://asb-example.org/ID_NR: http://asb-example.org/B115_0
http://asb-example.org/REF_BRUCKE: http://asb-example.org/4B0W1X9V
http://asb-example.org/REF_FELDER: http://asb-example.org/4B0W1X9W
http://asb-example.org/SCHIFF_OEF: http://asb-

### Querying linked graph

In [135]:
# Load the graph
g = Graph()
g.parse("/content/output/linked_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX asb: <http://asb-example.org/>
PREFIX bsdd: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/>
PREFIX inst: <http://ifc-instance.org/instances/>
PREFIX prop: <https://identifier.buildingsmart.org/uri/buildingsmart/ifc/4.3/class/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?ifcType ?brcompClass ?brotClass
WHERE {
  {?ifcType owl:equivalentClass ?brotClass} UNION {?brotClass owl:equivalentClass ?ifcType} .
  {?ifcType owl:equivalentClass ?brcompClass} UNION {?brcompClass owl:equivalentClass ?ifcType}
} LIMIT 5
"""
# Execute the query
results = g.query(query)

# Print the results
print("Query Results:")
for row in results:
    print("-" * 20)  # Separator for each row
    if row.ifcType:
        print(f"IFC Type: {row.ifcType}")
    if row.brcompClass:
        print(f"brcomp Class: {row.brcompClass}")
    if row.brotClass:
        print(f"brot Class: {row.brotClass}")
print("-" * 20)  # Final sep

Query Results:
--------------------
IFC Type: http://ifc-instance.org/instances/117644
brcomp Class: https://w3id.org/brcomp#Railing
brot Class: https://w3id.org/brcomp#Railing
--------------------
IFC Type: http://ifc-instance.org/instances/80795
brcomp Class: https://w3id.org/brcomp#Railing
brot Class: https://w3id.org/brcomp#Railing
--------------------
IFC Type: http://ifc-instance.org/instances/32
brcomp Class: https://w3id.org/brot#Bridge
brot Class: https://w3id.org/brot#Bridge
--------------------
IFC Type: http://ifc-instance.org/instances/35
brcomp Class: https://w3id.org/brot#Site
brot Class: https://w3id.org/brot#Site
--------------------
IFC Type: http://ifc-instance.org/instances/42750
brcomp Class: https://w3id.org/brcomp#Cap
brot Class: https://w3id.org/brcomp#Cap
--------------------


In [162]:
# Load the graph
g = Graph()
g.parse("/content/output/linked_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT DISTINCT ?ifcElement
WHERE {
  {
    ?ifcElement rdf:type brot:SubStructure .
  } UNION {
    ?ifcElement rdf:type brcomp:SubStructureComponent .
  }
}
"""

# Execute the query
results = g.query(query)

# Print the results
print("IFC Types that are brot:SubStructure or brcomp:SubStructureComponent:")
for row in results:
    print(row.ifcType)

IFC Types that are brot:SubStructure or brcomp:SubStructureComponent:


### Querying enriched graph

In [163]:
# Load the graph
g = Graph()
g.parse("/content/output/enriched_graph.ttl", format="turtle")

# Define the SPARQL query
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT DISTINCT ?ifcElement
WHERE {
  {
    ?ifcElement rdf:type/rdfs:subClassOf* brot:SubStructure .
  } UNION {
    ?ifcElement rdf:type/rdfs:subClassOf* brcomp:SubStructureComponent .
  }
}
"""

# Execute the query
results = g.query(query)

# Print the results
print("IFC Types that are brot:SubStructure or brcomp:SubStructureComponent:")
for row in results:
    print(row.ifcType)

IFC Types that are brot:SubStructure or brcomp:SubStructureComponent:


In [164]:
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX brot: <https://w3id.org/brot#>
PREFIX brcomp: <https://w3id.org/brcomp#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT DISTINCT ?ifcElement
WHERE {
  {
    ?ifcElement owl:equivalentClass ?ontologyClass .
    ?ontologyClass rdfs:subClassOf* brot:SubStructure .
  } UNION {
    ?ifcElement owl:equivalentClass ?ontologyClass .
    ?ontologyClass rdfs:subClassOf* brcomp:SubStructureComponent .
  }

  # Optional: Add this if you want only elements from the 'inst' namespace
  FILTER(STRSTARTS(STR(?ifcElement), "http://ifc-instance.org/instances/"))
}
"""

# Load both graphs and execute the query
linked_graph = Graph()
linked_graph.parse("/content/output/linked_graph.ttl", format="turtle")
linked_graph_results = linked_graph.query(query)

enriched_graph = Graph()
enriched_graph.parse("/content/output/enriched_graph.ttl", format="turtle")
enriched_graph_results = enriched_graph.query(query)

# Print the results
print("Results from linked_graph:")
for row in linked_graph_results:
    print(row.ifcElement)

print("\nResults from enriched_graph:")
for row in enriched_graph_results:
    print(row.ifcElement)

Results from linked_graph:
http://ifc-instance.org/instances/43395
http://ifc-instance.org/instances/43741
http://ifc-instance.org/instances/42952
http://ifc-instance.org/instances/42976
http://ifc-instance.org/instances/43032
http://ifc-instance.org/instances/43048
http://ifc-instance.org/instances/43067
http://ifc-instance.org/instances/43090
http://ifc-instance.org/instances/43411
http://ifc-instance.org/instances/43757
http://ifc-instance.org/instances/43776
http://ifc-instance.org/instances/43792
http://ifc-instance.org/instances/42796
http://ifc-instance.org/instances/42819
http://ifc-instance.org/instances/42841
http://ifc-instance.org/instances/42863
http://ifc-instance.org/instances/42885
http://ifc-instance.org/instances/42894
http://ifc-instance.org/instances/42916
http://ifc-instance.org/instances/42925
http://ifc-instance.org/instances/42934
http://ifc-instance.org/instances/42999
http://ifc-instance.org/instances/43008
http://ifc-instance.org/instances/43017

Results from

# Testing

In [82]:
import json

with open("data/tables_from_book.json", "r", encoding="utf-8") as file:
    data = json.load(file)

# Since 'data' is a list, iterate directly through it
for item in data:
    # Assuming each 'item' is a dictionary, you can access its elements
    for key, value in item.items():
        print(f"{key}: {value}")

UI: 020061000000000
Pflicht 
		 bei Straßenbauverwaltung: 020061100000000
Pflicht 
		 bei Kreis: 020061200000000
Pflicht 
		 bei Gemeinde: 020061300000000
Pflicht 
		 bei Bezirk: 020061400000000
Pflicht 
		 bei WSV oder Dritten***: 020061500000000
Pflicht 
		 bei DB AG: 020061510000000
Pflicht 
		 bei Wasser- und Schifffahrtsverwaltung des Bundes: 020061520000000
Pflicht 
		 bei örtlichem Nahverkehrsunternehmen: 020061530000000
Pflicht 
		 bei Sonstigen: 020061540000000
RWE 
		 Power AG: 020061541000000
Privater 
		 Eigentümer: 020061542000000
Pflicht 
		 bei Betreibergesellschaft LKW-Maut: 020061550000000
Pflicht 
		 bei Betreibergesellschaft/Konzessionsnehmer: 020061560000000
Bundesländer 
		 01 - 08 ***: 020061561000000
01 
		 Schleswig - Holstein: 020061561100000
Bund: 020061561110000
Land: 020061561120000
02 
		 Freie u. Hansestadt Hamburg: 020061561200000
Bund: 020061561210000
Land: 020061561220000
03 
		 Niedersachsen: 020061561300000
Bund: 020061561310000
Land: 020061561320000


In [94]:
import json
import pandas as pd

def transform_to_df(json_file):
    """
    Transforms the data in the JSON file into a pandas DataFrame
    with columns "keys" and "description".

    Args:
        json_file (str): Path to the JSON file (tables_from_book.json).

    Returns:
        pd.DataFrame: The transformed DataFrame.
    """

    with open(json_file, 'r', encoding="utf-8") as file:
        data = json.load(file)

    # Create a list to store the rows for the DataFrame
    rows = []
    for item in data:
        for key, value in item.items():
            if key == "UI":
                ui_value = value  # Store the "UI" value (key)
            else:
                rows.append({"keys": ui_value, "description": key})  # Add row with key and description

    # Create the DataFrame using the list of rows
    df = pd.DataFrame(rows)
    return df

# Example usage
json_file = "data/tables_from_book.json"
df = transform_to_df(json_file)
df.to_csv("tables_from_book.csv", index=False)