# RDFSolve: AOP-wiki RDF - Complete Analysis

This notebook demonstrates VoID generation with full count aggregations:
1. Setting up an endpoint and graph
2. Generating comprehensive VoID descriptions using CONSTRUCT queries with COUNT aggregations
3. Extracting detailed schema from the VoID description
4. Analyzing the results as DataFrame and JSON


In [11]:
import pandas as pd
from rdfsolve.rdfsolve import RDFSolver
from rdfsolve.void_parser import VoidParser, generate_void_from_endpoint
import warnings
warnings.filterwarnings('ignore')

## Step 1: Configure Dataset Parameters

We'll configure the AOP-Wiki RDF dataset with its SPARQL endpoint and metadata.

In [12]:
# AOPWIKI configuration
endpoint_url = "https://aopwiki.rdf.bigcat-bioinformatics.org/sparql/"
dataset_name = "aopwikirdf_complete"
void_iri = "http://aopwiki.org/"
graph_uri = "http://aopwiki.org/"  # Specify the correct graph URI
working_path = "."

print(f"Dataset: {dataset_name}")
print(f"Endpoint: {endpoint_url}")
print(f"VoID IRI: {void_iri}")
print(f"Graph URI: {graph_uri}")
print(f"Mode: Complete (with COUNT aggregations)")

Dataset: aopwikirdf_complete
Endpoint: https://aopwiki.rdf.bigcat-bioinformatics.org/sparql/
VoID IRI: http://aopwiki.org/
Graph URI: http://aopwiki.org/
Mode: Complete (with COUNT aggregations)


## Step 2: Initialize RDFSolver

Create an RDFSolver instance with our configuration.

In [13]:
try:
    # Initialize RDFSolver with our configuration
    solver = RDFSolver(
        endpoint=endpoint_url,
        path=working_path,
        void_iri=void_iri,
        dataset_name=dataset_name
    )
    
    print("RDFSolver initialized successfully")
    print(f"Endpoint: {solver.endpoint}")
    print(f"Dataset: {solver.dataset_name}")
    
except Exception as e:
    print(f"Error: {e}")

RDFSolver initialized successfully
Endpoint: https://aopwiki.rdf.bigcat-bioinformatics.org/sparql/
Dataset: aopwikirdf_complete


## Step 3: Generate Complete VoID Description

Generate VoID with full COUNT aggregations. This provides complete statistics but takes longer to execute.

Three CONSTRUCT queries get the partitions for classes, properties, and datatypes from the specified graph with complete count information.

In [14]:
try:    
    # Generate VoID using CONSTRUCT query approach with full counts
    print("Generating complete VoID with COUNT aggregations...")
    
    void_graph = solver.void_generator(
        graph_uri=graph_uri,
        output_file=f"{dataset_name}_void.ttl",
        counts=True  # Full count aggregations
    )
    
    print(f"Graph contains {len(void_graph)} triples")
    print(f"Saved to: {dataset_name}_void.ttl")
    
except Exception as e:
    print(f"Error: {e}")

Generating complete VoID with COUNT aggregations...
Generating VoID from endpoint: https://aopwiki.rdf.bigcat-bioinformatics.org/sparql/
Using graph URI: http://aopwiki.org/
Starting query: class_partitions
Finished query: class_partitions (took 0.08s)
Starting query: property_partitions
Finished query: property_partitions (took 0.17s)
Starting query: datatype_partitions
Finished query: property_partitions (took 0.17s)
Starting query: datatype_partitions
Finished query: datatype_partitions (took 0.65s)
VoID description saved to aopwikirdf_complete_void.ttl
VoID generation completed successfully
Graph contains 961 triples
Saved to: aopwikirdf_complete_void.ttl
Finished query: datatype_partitions (took 0.65s)
VoID description saved to aopwikirdf_complete_void.ttl
VoID generation completed successfully
Graph contains 961 triples
Saved to: aopwikirdf_complete_void.ttl


## Step 4: Extract Schema from Complete VoID

`VoidParser` via `solver.extract_schema()` extracts the comprehensive schema structure from the generated VoID.

In [15]:
try:
    # Extract schema
    parser = solver.extract_schema()

    # Get schema as DataFrame
    schema_df = parser.to_schema(filter_void_nodes=True)

    print(f"Total schema triples: {len(schema_df)}")
    print(f"Unique classes: {schema_df['subject_class'].nunique()}")
    print(f"Unique properties: {schema_df['property'].nunique()}")
    
except Exception as e:
    print(f"Schema extraction failed: {e}")

Total schema triples: 240
Unique classes: 26
Unique properties: 65


## Step 5: Schema Visualization

Display a sample of the extracted schema, filtering out generic classes.

In [16]:
# Display schema sample (excluding generic classes)
display(schema_df[~schema_df.object_class.isin(["Class", "Resource"])].head(10))

Unnamed: 0,subject_class,subject_uri,property,property_uri,object_class,object_uri
0,KeyEvent,http://aopkb.org/aop_ontology#KeyEvent,PATO_0001241,http://purl.obolibrary.org/obo/PATO_0001241,GO_0008150,http://purl.obolibrary.org/obo/GO_0008150
1,KeyEvent,http://aopkb.org/aop_ontology#KeyEvent,PATO_0001241,http://purl.obolibrary.org/obo/PATO_0001241,PATO_0001241,http://purl.obolibrary.org/obo/PATO_0001241
2,KeyEvent,http://aopkb.org/aop_ontology#KeyEvent,PATO_0001241,http://purl.obolibrary.org/obo/PATO_0001241,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
3,KeyEvent,http://aopkb.org/aop_ontology#KeyEvent,PATO_0001241,http://purl.obolibrary.org/obo/PATO_0001241,OrganContext,http://aopkb.org/aop_ontology#OrganContext
4,KeyEvent,http://aopkb.org/aop_ontology#KeyEvent,PATO_0001241,http://purl.obolibrary.org/obo/PATO_0001241,CellTypeContext,http://aopkb.org/aop_ontology#CellTypeContext
5,AdverseOutcomePathway,http://aopkb.org/aop_ontology#AdverseOutcomePa...,created,http://purl.org/dc/terms/created,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
6,KeyEventRelationship,http://aopkb.org/aop_ontology#KeyEventRelation...,created,http://purl.org/dc/terms/created,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
7,C54571,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,created,http://purl.org/dc/terms/created,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
9,KeyEventRelationship,http://aopkb.org/aop_ontology#KeyEventRelation...,modified,http://purl.org/dc/terms/modified,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
10,AdverseOutcomePathway,http://aopkb.org/aop_ontology#AdverseOutcomePa...,modified,http://purl.org/dc/terms/modified,Literal,http://www.w3.org/2000/01/rdf-schema#Literal


## Step 6: Analyze AOP Wiki RDF Key Event

Examine the `KeyEvent` class as an example of detailed analysis:

In [17]:
try:
    # Focus on DirectedInteraction class
    di_schema = schema_df[schema_df['subject_class'] == 'KeyEvent']
    
    print(f"DirectedInteraction Analysis (Complete Mode):")
    print(f"Properties found: {len(di_schema)}")
    
    if len(di_schema) > 0:
        print(f"\nKeyEvent Properties:")
        for _, row in di_schema.head(15).iterrows():
            print(f"  {row['property']:20} -> {row['object_class']}")
        
        # Look for database cross-references (bdb*)
        bdb_props = di_schema[di_schema['property'].str.contains('bdb', na=False)]
        if len(bdb_props) > 0:
            print(f"\nDatabase Cross-References (bdb*):")
            print(f"Found {len(bdb_props)} bdb properties")
            for _, row in bdb_props.iterrows():
                print(f"  {row['property']:15} -> {row['object_class']}")
        else:
            print("\nNo bdb* properties found in KeyEvent")
    else:
        print("\nKeyEvent class not found in schema")
        print("Available classes:")
        for cls in schema_df['subject_class'].unique()[:10]:
            print(f"  - {cls}")
            
except Exception as e:
    print(f"KeyEvent analysis failed: {e}")

DirectedInteraction Analysis (Complete Mode):
Properties found: 35

KeyEvent Properties:
  PATO_0001241         -> GO_0008150
  PATO_0001241         -> PATO_0001241
  PATO_0001241         -> Literal
  PATO_0001241         -> OrganContext
  PATO_0001241         -> CellTypeContext
  label                -> Literal
  identifier           -> KeyEvent
  source               -> Literal
  PATO_0000047         -> Literal
  title                -> Literal
  alternative          -> Literal
  isPartOf             -> AdverseOutcomePathway
  CellTypeContext      -> CellTypeContext
  CellTypeContext      -> OrganContext
  CellTypeContext      -> PATO_0001241

No bdb* properties found in KeyEvent


## Step 7: Export Complete Schema

Export the complete schema as JSON and CSV files with detailed statistics.

In [18]:
try:
    # Export as JSON
    print("Generating JSON schema (complete mode)...")
    schema_json = parser.to_json(filter_void_nodes=True)
    
    print("Complete JSON export completed")
    print(f"Total triples: {schema_json['metadata']['total_triples']}")
    print(f"Classes: {len(schema_json['metadata']['classes'])}")
    print(f"Properties: {len(schema_json['metadata']['properties'])}")
    print(f"Object types: {len(schema_json['metadata']['objects'])}")
    
    # Save JSON to file
    import json
    with open(f"{dataset_name}_schema.json", "w") as f:
        json.dump(schema_json, f, indent=2)
    print(f"\nComplete JSON schema saved to: {dataset_name}_schema.json")
    
    # Export as CSV
    schema_df.to_csv(f"{dataset_name}_schema.csv", index=False)
    print(f"Complete CSV schema saved to: {dataset_name}_schema.csv")
    
except Exception as e:
    print(f"Export failed: {e}")

Generating JSON schema (complete mode)...
Complete JSON export completed
Total triples: 240
Classes: 26
Properties: 65
Object types: 28

Complete JSON schema saved to: aopwikirdf_complete_schema.json
Complete CSV schema saved to: aopwikirdf_complete_schema.csv
Complete JSON export completed
Total triples: 240
Classes: 26
Properties: 65
Object types: 28

Complete JSON schema saved to: aopwikirdf_complete_schema.json
Complete CSV schema saved to: aopwikirdf_complete_schema.csv


## JSON-LD Export

Export AOP-Wiki RDF data as JSON-LD with ontology-specific context.

In [19]:
# Export AOP-Wiki RDF as JSON-LD
print("Exporting AOP-Wiki RDF as JSON-LD...")

# AOP-Wiki specific context with biological ontologies

# Export schema (VoID might be very large for complete dataset)
print("Exporting AOP-Wiki schema...")
schema_jsonld = solver.export_schema_jsonld(
    output_file="aopwikirdf_complete_schema.jsonld", 
    indent=2,
    filter_void_nodes=True
)

print(f"Schema exported to: aopwikirdf_complete_schema.jsonld")
print(f"Schema size: {len(schema_jsonld)} characters")

# Optionally export VoID (comment out if too large)
try:
    print("\nAttempting VoID export (may be large)...")
    void_jsonld = solver.export_void_jsonld(
        output_file="aopwikirdf_complete_void.jsonld",
        indent=2
    )
    print(f"VoID exported to: aopwikirdf_complete_void.jsonld")
    print(f"VoID size: {len(void_jsonld)} characters")
except Exception as e:
    print(f"VoID export skipped (dataset too large): {e}")

# Show schema preview
print(f"\nSchema Preview:")
print(schema_jsonld[:400] + "..." if len(schema_jsonld) > 400 else schema_jsonld)

Exporting AOP-Wiki RDF as JSON-LD...
Exporting AOP-Wiki schema...
Schema JSON-LD exported to: aopwikirdf_complete_schema.jsonld
Schema exported to: aopwikirdf_complete_schema.jsonld
Schema size: 18208 characters

Attempting VoID export (may be large)...
Schema JSON-LD exported to: aopwikirdf_complete_schema.jsonld
Schema exported to: aopwikirdf_complete_schema.jsonld
Schema size: 18208 characters

Attempting VoID export (may be large)...


JSON-LD exported to: aopwikirdf_complete_void.jsonld
VoID exported to: aopwikirdf_complete_void.jsonld
VoID size: 77097 characters

Schema Preview:
{
  "@context": {
    "@context": {
      "aopo": "http://aopkb.org/aop_ontology#",
      "brick": "https://brickschema.org/schema/Brick#",
      "csvw": "http://www.w3.org/ns/csvw#",
      "dc": "http://purl.org/dc/elements/1.1/",
      "dcam": "http://purl.org/dc/dcam/",
      "dcat": "http://www.w3.org/ns/dcat#",
      "dcmitype": "http://purl.org/dc/dcmitype/",
      "dcterms": "http://purl.or...
