# REST API Reader Demo

This notebook demonstrates how to ingest a paginated REST endpoint using the `RestAPIReader` connector. We will rely on the public [PokeAPI](https://pokeapi.co/) service to retrieve Pokémon metadata.

## Overview
1. Create or reuse a local Spark session.
2. Configure `RestAPIReader` to page through `https://pokeapi.co/api/v2/pokemon`.
3. Inspect the resulting Spark DataFrame and perform a couple of lightweight transforms.

> ℹ️ PokeAPI is a free community API with rate limits. Keep the pagination bounds modest 
and avoid running the notebook repeatedly in tight loops.

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("RestAPIReaderDemo").master("local[*]").getOrCreate()


In [None]:
from spark_fuse.io.rest_api import RestAPIReader

reader = RestAPIReader()
base_url = "https://pokeapi.co/api/v2/pokemon"

config = {
    'pagination': {
        'mode': 'response',
        'field': 'next',  # follow the 'next' link returned by each response
        'max_pages': 3,   # 3 pages x 20 Pokémon each
    },
    'params': {'limit': 20},  # page size
    'records_field': 'results',
    'infer_schema': True,
    'parallelism': 2,
    'headers': {'User-Agent': 'spark-fuse-rest-demo/1.0'},
}

pokemon_df = reader.read(spark, base_url, source_config=config)
pokemon_df.printSchema()
pokemon_df.show(5, truncate=False)


In [None]:
from pyspark.sql import functions as F

pokemon_enriched_df = pokemon_df\n    .withColumn('pokemon_id', F.regexp_extract('url', r'/pokemon/(\d+)/', 1).cast('int'))\n    .orderBy('pokemon_id')

pokemon_enriched_df.select('pokemon_id', 'name').show(10, truncate=False)


In [None]:
spark.stop()
