# RDFSolve: ChEBI Analysis

This notebook analyzes the ChEBI graph using RDFSolve:
- **Graph URI**: http://rdf.ebi.ac.uk/dataset/chebi
- **SPARQL Endpoint**: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
- **Dataset**: ChEBI (Chemical Entities of Biological Interest)

Explore the structure and schema of the ChEBI dataset.

In [10]:
import pandas as pd
from rdfsolve.rdfsolve import RDFSolver
from rdfsolve.void_parser import VoidParser
import warnings
warnings.filterwarnings('ignore')

## Step 1: Configure Dataset Parameters

In [11]:
# ChEBI configuration
endpoint_url = "https://idsm.elixir-czech.cz/sparql/endpoint/idsm"
graph_uri = "http://rdf.ebi.ac.uk/dataset/chebi"
void_iri = "http://rdf.ebi.ac.uk/dataset/chebi"
dataset_name = "chebi"
working_path = "."

print(f"Dataset: {dataset_name}")
print(f"Endpoint: {endpoint_url}")
print(f"Graph URI: {graph_uri}")
print(f"VoID IRI: {void_iri}")

Dataset: chebi
Endpoint: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
Graph URI: http://rdf.ebi.ac.uk/dataset/chebi
VoID IRI: http://rdf.ebi.ac.uk/dataset/chebi


## Step 2: Initialize RDFSolver

In [12]:
try:
    solver = RDFSolver(
        endpoint=endpoint_url,
        path=working_path,
        void_iri=void_iri,
        dataset_name=dataset_name
    )
    
    print("RDFSolver initialized successfully")
    print(f"Endpoint: {solver.endpoint}")
    print(f"Dataset: {solver.dataset_name}")
    
except Exception as e:
    print(f"Error: {e}")

RDFSolver initialized successfully
Endpoint: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
Dataset: chebi


## Step 3: Generate VoID Description

In [None]:
try:
    print("Generating VoID description...")
    
    void_graph = solver.void_generator(
        graph_uri=graph_uri,
        output_file=f"{dataset_name}_void.ttl",
        counts=True
    )
    
    print(f"VoID generation completed!")
    print(f"Graph contains {len(void_graph)} triples")
    print(f"Saved to: {dataset_name}_void.ttl")
    
except Exception as e:
    print(f"VoID generation failed: {e}")

Generating VoID description...
Generating VoID from endpoint: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
Using graph URI: http://rdf.ebi.ac.uk/dataset/chebi
Starting query: class_partitions


Finished query: class_partitions (took 0.28s)
Starting query: property_partitions


## Step 4: Extract Schema

In [None]:
try:
    print("Extracting schema from VoID...")
    parser = VoidParser(void_graph)
    
    schema_df = parser.to_schema(filter_void_nodes=True)
    
    print("Schema extraction completed")
    print(f"Total schema triples: {len(schema_df)}")
    print(f"Unique classes: {schema_df['subject_class'].nunique()}")
    print(f"Unique properties: {schema_df['property'].nunique()}")
    
except Exception as e:
    print(f"Schema extraction failed: {e}")

Extracting schema from VoID...
Schema extraction completed
Total schema triples: 40
Unique classes: 4
Unique properties: 33


## Step 5: Schema Visualization

In [None]:
# Display schema sample
if 'schema_df' in locals():
    print("Schema Sample (first 10 rows):")
    display(schema_df.head(10))
    
    print("\nTop 10 Classes by Property Count:")
    class_counts = schema_df['subject_class'].value_counts().head(10)
    for cls, count in class_counts.items():
        print(f"  {cls}: {count} properties")

Schema Sample (first 10 rows):


Unnamed: 0,subject_class,subject_uri,property,property_uri,object_class,object_uri
0,Class,http://www.w3.org/2002/07/owl#Class,hasRelatedSynonym,http://www.geneontology.org/formats/oboInOwl#h...,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
1,Class,http://www.w3.org/2002/07/owl#Class,type,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,Resource,http://www.w3.org/2000/01/rdf-schema#Resource
2,SIO_011120,http://semanticscience.org/resource/SIO_011120,type,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,Resource,http://www.w3.org/2000/01/rdf-schema#Resource
3,Axiom,http://www.w3.org/2002/07/owl#Axiom,type,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,Resource,http://www.w3.org/2000/01/rdf-schema#Resource
4,Restriction,http://www.w3.org/2002/07/owl#Restriction,type,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,Resource,http://www.w3.org/2000/01/rdf-schema#Resource
5,SIO_011120,http://semanticscience.org/resource/SIO_011120,has-value,http://semanticscience.org/resource/has-value,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
6,Class,http://www.w3.org/2002/07/owl#Class,hasDbXref,http://www.geneontology.org/formats/oboInOwl#h...,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
7,Axiom,http://www.w3.org/2002/07/owl#Axiom,hasDbXref,http://www.geneontology.org/formats/oboInOwl#h...,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
8,Class,http://www.w3.org/2002/07/owl#Class,label,http://www.w3.org/2000/01/rdf-schema#label,Literal,http://www.w3.org/2000/01/rdf-schema#Literal
9,Class,http://www.w3.org/2002/07/owl#Class,monoisotopicmass,http://purl.obolibrary.org/obo/chebi/monoisoto...,Literal,http://www.w3.org/2000/01/rdf-schema#Literal



Top 10 Classes by Property Count:
  Class: 23 properties
  SIO_011120: 7 properties
  Axiom: 7 properties
  Restriction: 3 properties


## Step 6: Domain-Specific Analysis

#TODO: Add ChEBI-specific analysis

In [None]:
# TODO: Implement ChEBI-specific analysis
# - Chemical entity classification
# - Molecular structure properties
# - Chemical role analysis
print("TODO: Add ChEBI analysis")

TODO: Add ChEBI analysis


## Step 7: Export Results

In [None]:
try:
    if 'parser' in locals():
        # Export as JSON
        schema_json = parser.to_json(filter_void_nodes=True)
        
        import json
        with open(f"{dataset_name}_schema.json", "w") as f:
            json.dump(schema_json, f, indent=2)
        
        # Export as CSV
        schema_df.to_csv(f"{dataset_name}_schema.csv", index=False)
        
        print(f"Results exported:")
        print(f"  - {dataset_name}_void.ttl")
        print(f"  - {dataset_name}_schema.json")
        print(f"  - {dataset_name}_schema.csv")
        
except Exception as e:
    print(f"Export failed: {e}")

Results exported:
  - chebi_void.ttl
  - chebi_schema.json
  - chebi_schema.csv


## JSON-LD Export

RDFSolve now supports JSON-LD export for semantic web integration!

In [None]:
# NEW: Export RDF schema as JSON-LD
# Two new methods available:
# 1. export_void_jsonld() - Complete VoID description  
# 2. export_schema_jsonld() - Extracted schema only

# Example JSON-LD export with custom context
custom_context = {
    "@context": {
        "void": "http://rdfs.org/ns/void#",
        "chebi": "http://purl.obolibrary.org/obo/CHEBI_",
        "sio": "http://semanticscience.org/resource/",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#"
    }
}

print("JSON-LD Export Options:")
print("1. Complete VoID description:")
print("   solver.export_void_jsonld('chebi_void.jsonld', context=custom_context)")
print()
print("2. Schema only:")  
print("   solver.export_schema_jsonld('chebi_schema.jsonld', context=custom_context)")
print()
print("Uncomment the lines below to export:")

# Uncomment to export:
# void_jsonld = solver.export_void_jsonld("chebi_void.jsonld", context=custom_context)
# schema_jsonld = solver.export_schema_jsonld("chebi_schema.jsonld", context=custom_context)
# print("JSON-LD files exported for semantic web integration!")

JSON-LD Export Options:
1. Complete VoID description:
   solver.export_void_jsonld('chebi_void.jsonld', context=custom_context)

2. Schema only:
   solver.export_schema_jsonld('chebi_schema.jsonld', context=custom_context)

Uncomment the lines below to export:
