# RDFSolve: PubChem Gene Analysis

This notebook analyzes the PubChem Gene graph using RDFSolve:
- **Graph URI**: http://rdf.ncbi.nlm.nih.gov/pubchem/gene
- **SPARQL Endpoint**: https://idsm.elixir-czech.cz/sparql/endpoint/idsm
- **Dataset**: PubChem Gene

Explore the structure and schema of the PubChem Gene dataset.

In [None]:
import pandas as pd
from rdfsolve.rdfsolve import RDFSolver
from rdfsolve.void_parser import VoidParser
import warnings
warnings.filterwarnings('ignore')

## Step 1: Configure Dataset Parameters

In [None]:
# PubChem Gene configuration
endpoint_url = "https://idsm.elixir-czech.cz/sparql/endpoint/idsm"
graph_uri = "http://rdf.ncbi.nlm.nih.gov/pubchem/gene"
void_iri = "http://rdf.ncbi.nlm.nih.gov/pubchem/gene"
dataset_name = "pubchem_gene"
working_path = "."

print(f"Dataset: {dataset_name}")
print(f"Endpoint: {endpoint_url}")
print(f"Graph URI: {graph_uri}")
print(f"VoID IRI: {void_iri}")

## Step 6: Domain-Specific Analysis

#TODO: Add PubChem Gene-specific analysis

In [None]:
# TODO: Implement gene-specific analysis
# - Gene function classification
# - Organism distribution
# - Gene-compound interactions
print("TODO: Add gene analysis")

## JSON-LD Export

Export the VoID description and schema as JSON-LD with automatic prefix extraction.

In [None]:
# Export PubChem Gene data as JSON-LD (automatic prefix extraction)
print("Exporting PubChem Gene VoID and Schema as JSON-LD...")

# Export complete VoID with automatic context
void_jsonld = solver.export_void_jsonld(
    output_file="pubchem_gene_void.jsonld",
    indent=2
)

# Export schema only with automatic context
schema_jsonld = solver.export_schema_jsonld(
    output_file="pubchem_gene_schema.jsonld",
    indent=2,
    filter_void_nodes=True
)

print(f"Exported files:")
print(f"  - pubchem_gene_void.jsonld ({len(void_jsonld)} chars)")
print(f"  - pubchem_gene_schema.jsonld ({len(schema_jsonld)} chars)")

# Show automatically extracted prefixes
prefixes = solver._extract_prefixes_from_void()
print(f"\nAuto-extracted prefixes: {', '.join(sorted(prefixes.keys()))}")

print(f"\nSchema Preview:")
print(schema_jsonld[:300] + "..." if len(schema_jsonld) > 300 else schema_jsonld)