# SPARQL Reader Demo

This notebook demonstrates how to query public SPARQL endpoints and load the results into Spark DataFrames using the `SPARQLReader` connector.


## Overview
1. Create or reuse a local Spark session.
2. Configure `SPARQLReader` to issue a query against the Wikidata SPARQL service.
3. Inspect the resulting DataFrame and explore optional features like metadata capture.

> ℹ️ The [Wikidata Query Service](https://query.wikidata.org/) enforces rate limits and expects a descriptive `User-Agent`. Adjust paging or sampling to stay within published usage guidelines.


In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("SPARQLReaderDemo").master("local[*]").getOrCreate()

In [None]:
from spark_fuse.io.sparql import SPARQLReader

reader = SPARQLReader()
endpoint = "https://query.wikidata.org/sparql"

sample_query = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX bd: <http://www.bigdata.com/rdf#>

SELECT ?pokemon ?pokemonLabel ?pokedexNumber WHERE {
  ?pokemon wdt:P31 wd:Q3966183 .
  ?pokemon wdt:P1685 ?pokedexNumber .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ASC(?pokedexNumber)
LIMIT 10
""".strip()

config = {
    "query": sample_query,
    "request_type": "POST",         # POST handles longer queries more reliably
    "include_metadata": True,
    "metadata_suffix": "__",
    "coerce_types": True,
    "headers": {
        # Replace with a contact address to comply with Wikidata Query Service policy
        "User-Agent": "spark-fuse-sparql-demo/1.0 (contact@example.com)",
    },
    "params": {
        "format": "json",
    },
}

pokemon_df = reader.read(spark, endpoint, source_config=config)
pokemon_df.printSchema()

if pokemon_df.rdd.isEmpty():
    print("No Pokémon results returned. Falling back to a small static sample.")
    fallback_rows = [
        {
            "pokemon": "bulbasaur",
            "pokemonLabel": "Bulbasaur",
            "pokedexNumber": "001",
        },
        {
            "pokemon": "charmander",
            "pokemonLabel": "Charmander",
            "pokedexNumber": "004",
        },
        {
            "pokemon": "squirtle",
            "pokemonLabel": "Squirtle",
            "pokedexNumber": "007",
        },
    ]
    pokemon_df = spark.createDataFrame(fallback_rows, schema=pokemon_df.schema)

pokemon_df.show(5, truncate=False)

### Boolean queries (ASK)

`SPARQLReader` also supports boolean responses returned by `ASK` queries. The result is a single-row DataFrame with a `boolean` column.


In [None]:
ask_query = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

ASK WHERE {
  wd:Q3966183 wdt:P31 wd:Q1656682
}
""".strip()

ask_df = reader.read(
    spark,
    {"endpoint": endpoint, "query": ask_query},
    source_config={
        "request_type": "POST",
        "headers": {
            "User-Agent": "spark-fuse-sparql-demo/1.0 (https://github.com/kevinsames/spark-fuse)",
        },
    },
)
ask_df.show()

In [None]:
spark.stop()