# RDFSolve: DrugBank Drugs Analysis

This notebook analyzes the DrugBank Drugs graph using RDFSolve:
- **Graph URI**: http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugs/
- **SPARQL Endpoint**: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
- **Dataset**: DrugBank Drugs

Explore the structure and schema of the DrugBank Drugs dataset.

In [1]:
import pandas as pd
from rdfsolve.rdfsolve import RDFSolver
from rdfsolve.void_parser import VoidParser
import warnings
warnings.filterwarnings('ignore')

## Step 1: Configure Dataset Parameters

In [2]:
# DrugBank Drugs configuration
endpoint_url = "https://idsm.elixir-czech.cz/sparql/endpoint/idsm"
graph_uri = "http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugs/"
void_iri = "http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugs/"
dataset_name = "drugbank_drugs"
working_path = ".."

print(f"Dataset: {dataset_name}")
print(f"Endpoint: {endpoint_url}")
print(f"Graph URI: {graph_uri}")
print(f"VoID IRI: {void_iri}")

Dataset: drugbank_drugs
Endpoint: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
Graph URI: http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugs/
VoID IRI: http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugs/


## Step 2: Initialize RDFSolver

In [3]:
try:
    solver = RDFSolver(
        endpoint=endpoint_url,
        path=working_path,
        void_iri=void_iri,
        dataset_name=dataset_name
    )
    
    print("RDFSolver initialized successfully")
    print(f"Endpoint: {solver.endpoint}")
    print(f"Dataset: {solver.dataset_name}")
    
except Exception as e:
    print(f"Error: {e}")

RDFSolver initialized successfully
Endpoint: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
Dataset: drugbank_drugs


## Step 3: Generate VoID Description

In [4]:
try:
    print("Generating VoID description...")
    
    void_graph = solver.void_generator(
        graph_uri=graph_uri,
        output_file=f"{dataset_name}_void.ttl",
        counts=True
    )
    
    print(f"VoID generation completed!")
    print(f"Graph contains {len(void_graph)} triples")
    print(f"Saved to: {dataset_name}_void.ttl")
    
except Exception as e:
    print(f"VoID generation failed: {e}")

Generating VoID description...
Generating VoID from endpoint: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
Using graph URI: http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugs/
Starting query: class_partitions
Finished query: class_partitions (took 0.11s)
Starting query: property_partitions
Finished query: property_partitions (took 0.14s)
Starting query: datatype_partitions
Finished query: datatype_partitions (took 0.14s)
VoID description saved to drugbank_drugs_void.ttl
VoID generation completed successfully
VoID generation completed!
Graph contains 24 triples
Saved to: drugbank_drugs_void.ttl
Finished query: property_partitions (took 0.14s)
Starting query: datatype_partitions
Finished query: datatype_partitions (took 0.14s)
VoID description saved to drugbank_drugs_void.ttl
VoID generation completed successfully
VoID generation completed!
Graph contains 24 triples
Saved to: drugbank_drugs_void.ttl


## Step 4: Extract Schema

In [5]:
try:
    print("Extracting schema from VoID...")
    parser = VoidParser(void_graph)
    
    schema_df = parser.to_schema(filter_void_nodes=True)
    
    print("Schema extraction completed")
    print(f"Total schema triples: {len(schema_df)}")
    print(f"Unique classes: {schema_df['subject_class'].nunique()}")
    print(f"Unique properties: {schema_df['property'].nunique()}")
    
except Exception as e:
    print(f"Schema extraction failed: {e}")

Extracting schema from VoID...
Schema extraction completed
Total schema triples: 5
Unique classes: 1
Unique properties: 5


## Step 5: Schema Visualization

In [6]:
# Display schema sample
if 'schema_df' in locals():
    print("Schema Sample (first 10 rows):")
    display(schema_df.head(10))
    
    print("\nTop 10 Classes by Property Count:")
    class_counts = schema_df['subject_class'].value_counts().head(10)
    for cls, count in class_counts.items():
        print(f"  {cls}: {count} properties")

Schema Sample (first 10 rows):


Unnamed: 0,subject_class,subject_uri,property,property_uri,object_class,object_uri
0,SIO_011120,http://semanticscience.org/resource/SIO_011120,SIO_000011,http://semanticscience.org/resource/SIO_000011,Resource,http://www.w3.org/2000/01/rdf-schema#Resource
1,SIO_011120,http://semanticscience.org/resource/SIO_011120,SIO_000300,http://semanticscience.org/resource/SIO_000300,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
2,SIO_011120,http://semanticscience.org/resource/SIO_011120,is-attribute-of,http://semanticscience.org/resource/is-attribu...,Resource,http://www.w3.org/2000/01/rdf-schema#Resource
3,SIO_011120,http://semanticscience.org/resource/SIO_011120,type,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,Resource,http://www.w3.org/2000/01/rdf-schema#Resource
4,SIO_011120,http://semanticscience.org/resource/SIO_011120,has-value,http://semanticscience.org/resource/has-value,Literal,http://www.w3.org/2000/01/rdf-schema#Literal



Top 10 Classes by Property Count:
  SIO_011120: 5 properties


## Step 6: Domain-Specific Analysis

#TODO: Add DrugBank Drugs-specific analysis

In [7]:
# TODO: Implement drugbank_drugs-specific analysis
print("TODO: Add drugbank_drugs analysis")

TODO: Add drugbank_drugs analysis


## Step 7: Export Results

In [8]:
try:
    if 'parser' in locals():
        # Export as JSON
        schema_json = parser.to_json(filter_void_nodes=True)
        
        import json
        with open(f"{dataset_name}_schema.json", "w") as f:
            json.dump(schema_json, f, indent=2)
        
        # Export as CSV
        schema_df.to_csv(f"{dataset_name}_schema.csv", index=False)
        
        print(f"Results exported:")
        print(f"  - {dataset_name}_void.ttl")
        print(f"  - {dataset_name}_schema.json")
        print(f"  - {dataset_name}_schema.csv")
        
except Exception as e:
    print(f"Export failed: {e}")

Results exported:
  - drugbank_drugs_void.ttl
  - drugbank_drugs_schema.json
  - drugbank_drugs_schema.csv


## JSON-LD Export

Export the VoID description and extracted schema as JSON-LD for semantic web integration.

In [9]:
# Export complete VoID description as JSON-LD
print("Exporting VoID description as JSON-LD...")

# NEW: Automatic prefix extraction from VoID
# The export methods now automatically extract prefixes from the VoID file
# and use them as JSON-LD context (no manual context needed!)

print("Using automatic prefix extraction from VoID...")

void_jsonld = solver.export_void_jsonld(
    output_file="drugbank_drugs_void.jsonld",
    # context parameter omitted = automatic prefix extraction
    indent=2
)

print(f"VoID exported to: drugbank_drugs_void.jsonld")
print(f"JSON-LD size: {len(void_jsonld)} characters")

# Show the prefixes that were automatically extracted
prefixes = solver._extract_prefixes_from_void()
print(f"Auto-extracted prefixes: {', '.join(sorted(prefixes.keys()))}")

# Show preview
print("\nJSON-LD Preview (first 400 chars):")
print(void_jsonld[:400] + "..." if len(void_jsonld) > 400 else void_jsonld)

Exporting VoID description as JSON-LD...
Using automatic prefix extraction from VoID...
JSON-LD exported to: drugbank_drugs_void.jsonld
VoID exported to: drugbank_drugs_void.jsonld
JSON-LD size: 4316 characters
Auto-extracted prefixes: brick, csvw, dc, dcam, dcat, dcmitype, dcterms, doap, foaf, geo, ns1, odrl, org, owl, prof, prov, qb, rdf, rdfs, schema, sh, skos, sosa, ssn, time, vann, void, void-ext, wgs, xml, xsd

JSON-LD Preview (first 400 chars):
{
  "@context": {
    "@context": {
      "brick": "https://brickschema.org/schema/Brick#",
      "csvw": "http://www.w3.org/ns/csvw#",
      "dc": "http://purl.org/dc/elements/1.1/",
      "dcam": "http://purl.org/dc/dcam/",
      "dcat": "http://www.w3.org/ns/dcat#",
      "dcmitype": "http://purl.org/dc/dcmitype/",
      "dcterms": "http://purl.org/dc/terms/",
      "doap": "http://usefulinc.co...


In [10]:
# Export extracted schema as JSON-LD
print("\nExporting extracted schema as JSON-LD...")

# Automatic context vs manual context comparison
print("Automatic context (using VoID prefixes):")

# First export with automatic context
schema_jsonld = solver.export_schema_jsonld(
    output_file="drugbank_drugs_schema_auto.jsonld",
    # No context = automatic extraction from VoID
    indent=2,
    filter_void_nodes=True
)

print(f"Auto-context schema: drugbank_drugs_schema_auto.jsonld")

# Now with manual context for comparison
print("\nManual context (custom specification):")
drugbank_context = {
    "@context": {
        "void": "http://rdfs.org/ns/void#",
        "drugbank": "https://identifiers.org/drugbank:",
        "sio": "http://semanticscience.org/resource/",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#"
    }
}

manual_schema_jsonld = solver.export_schema_jsonld(
    output_file="drugbank_drugs_schema_manual.jsonld",
    context=drugbank_context,
    indent=2,
    filter_void_nodes=True
)

print(f"Manual-context schema: drugbank_drugs_schema_manual.jsonld")

# Show comparison
print(f"\nExport Summary:")
print(f"- Auto VoID JSON-LD: drugbank_drugs_void.jsonld")
print(f"- Auto Schema JSON-LD: drugbank_drugs_schema_auto.jsonld ({len(schema_jsonld)} chars)")
print(f"- Manual Schema JSON-LD: drugbank_drugs_schema_manual.jsonld ({len(manual_schema_jsonld)} chars)")
print(f"- Legacy CSV: drugbank_drugs_schema.csv")
print(f"- Legacy JSON: drugbank_drugs_schema.json")
print(f"\nAutomatic prefix extraction = zero configuration JSON-LD!")


Exporting extracted schema as JSON-LD...
Automatic context (using VoID prefixes):
Schema JSON-LD exported to: drugbank_drugs_schema_auto.jsonld
Auto-context schema: drugbank_drugs_schema_auto.jsonld

Manual context (custom specification):
Schema JSON-LD exported to: drugbank_drugs_schema_manual.jsonld
Manual-context schema: drugbank_drugs_schema_manual.jsonld

Export Summary:
- Auto VoID JSON-LD: drugbank_drugs_void.jsonld
- Auto Schema JSON-LD: drugbank_drugs_schema_auto.jsonld (1945 chars)
- Manual Schema JSON-LD: drugbank_drugs_schema_manual.jsonld (584 chars)
- Legacy CSV: drugbank_drugs_schema.csv
- Legacy JSON: drugbank_drugs_schema.json

Automatic prefix extraction = zero configuration JSON-LD!
