Skip to content

ozefe/thaumiel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

thaumiel

thaumiel mascot (SCP-3000, Anantashesha) generated by Google's Nano Banana 2

PyPI - Python Version PyPI - License PyPI - Status PyPI - Downloads

A typed, ergonomic, read-only async Python client for the SCP Foundation Wiki's Crom GraphQL API.

thaumiel wraps Crom's GraphQL endpoint in a small, fully-typed surface: fetch pages, filter and sort them with a Python DSL, page through results without touching cursors, and budget your rate-limit quota — all async, all type-checked.

Features

  • Fully typed: Frozen Pydantic v2 models (Page, Author, Attribution, ...), checked under pyright strict.
  • Ergonomic filter DSL: Build server-side filters with Python operators: (F.rating >= 100) & (F.tag == "scp"). Illegal filters raise at build time, not at the server.
  • Automatic pagination: pages() is an async iterator that follows Crom's cursors for you; fetch_page_batch() exposes them when you want manual control.
  • Costly-field provenance: Opt into expensive fields per call, and tell "not requested" apart from "server returned null" via page.requested(...).
  • Quota estimation: estimate_* predicts a call's point cost before you spend it.
  • Typed errors and optional retry: A ThaumielError hierarchy plus a configurable RetryPolicy with exponential backoff.

Installation

Requires Python 3.14+.

pip install thaumiel

Quickstart

import asyncio

from thaumiel import AsyncClient


async def main() -> None:
    async with AsyncClient() as client:
        # Crom stores SCP wiki URLs with the http:// scheme.
        page = await client.page("http://scp-wiki.wikidot.com/scp-173")
        if page is None:
            return

        print(page.title, page.rating)
        print(page.tags[:3])


asyncio.run(main())
SCP-173 10752.0
('autonomous', 'ectoentropic', 'euclid')

page() returns None (not an exception) when nothing matches, and takes either a url or a wikidot_id.

Filtering, sorting, and listing

pages() streams every match, following pagination automatically. Combine F accessors into a filter and pass a Sort:

import asyncio

from thaumiel import AsyncClient, F, Sort, SortKey


async def main() -> None:
    # Highest-rated SCP articles on the English wiki.
    query = F.url.starts_with("http://scp-wiki.wikidot.com") & (F.tag == "scp")
    async with AsyncClient() as client:
        shown = 0
        async for page in client.pages(
            filter=query, sort=Sort.by(SortKey.RATING), page_size=5
        ):
            print(f"{page.rating:>6.0f}  {page.title}")
            shown += 1
            if shown == 5:
                break


asyncio.run(main())
 10752  SCP-173
  7145  ●●|●●●●●|●●|●
  5544  SCP-049
  5240  SCP-____-J
  4790  SCP-096

Count matches without fetching them:

await client.count_pages(F.tag == "scp")   # -> 69916

Need the cursor yourself (checkpointing, UI paging)? fetch_page_batch() returns one PageBatch with .pages, .end_cursor, and .has_next_page.

The filter DSL

Each F accessor exposes only the operators its field supports; an unsupported operator or a wrong-typed value raises InvalidPredicateError immediately.

Accessor Field type Operators
F.url prefix string == != .starts_with()
F.title string (case-insensitive) == != .eq_lower() .neq_lower() .starts_with() .starts_with_lower()
F.author string (case-insensitive) same as F.title; matches an attribution's display name
F.category string == !=
F.rating int == != < <= > >=
F.created_at datetime == != < <= > >=
F.is_hidden, F.is_user_page bool == !=
F.tag tag set == (has) != (lacks) .all_of() .any_of() .none_of()

Combine predicates with & (and), | (or), and ~ (not).

Warning

Because ==/>=/... are overloaded, the combinators & | ~ bind looser than the comparisons. Parenthesize every comparison:

(F.rating >= 100) & (F.tag == "scp")   # correct
F.rating >= 100 & F.tag == "scp"       # WRONG: parsed as F.rating >= (100 & F.tag) == "scp"

A predicate lowers to Crom's GraphQL input only when a request is issued, but you can inspect it:

(F.rating >= 100).compile().model_dump(by_alias=True, exclude_unset=True)
# {'onWikidotPage': {'rating': {'gte': 100}}}

Costly fields and quota

Some fields cost extra rate-limit points and are opt-in per call. A field you don't request stays None; some can be None even when requested, so page.requested(...) disambiguates.

from thaumiel import CostlyField

page = await client.page(
    "http://scp-wiki.wikidot.com/scp-173",
    source=True,
    attributions=True,
)

print(len(page.source))                       # 1680
print(page.summary)                           # None
print(page.requested(CostlyField.SUMMARY))    # False  — we never asked for it
credit = page.attributions[0]
print(credit.type.value, credit.user_display_name)   # AUTHOR Moto42

Crom meters usage in points (reported via the x-ratelimit-remaining header; the ceiling is 300000). Estimate before you spend — costly fields in pages() are billed per page:

from thaumiel import estimate_count, estimate_page, estimate_pages

estimate_page(source=True, attributions=True)   # 4
estimate_count()                                # 2
estimate_pages(page_size=100, source=True)      # 200

Errors and retries

Every error subclasses ThaumielError:

from thaumiel import GraphQLError, RateLimitError, TransportError

try:
    page = await client.page(url)
except RateLimitError as exc:      # HTTP 429 (a subclass of TransportError)
    ...
except TransportError as exc:      # other HTTP/network failure; .status_code, .cause
    ...
except GraphQLError as exc:        # query-level errors; .errors
    ...

Every call is read-only and idempotent, so retrying is safe. RetryPolicy backs off exponentially on rate limits (and optionally on 5xx):

from thaumiel import AsyncClient, RetryPolicy

policy = RetryPolicy(max_attempts=4, backoff=0.5)
async with AsyncClient() as client:
    # Pass a factory, not a coroutine: a retry needs a fresh awaitable.
    page = await policy.run(lambda: client.page("http://scp-wiki.wikidot.com/scp-173"))

Configuration

from thaumiel import AsyncClient

client = AsyncClient(
    user_agent="my-app/1.0 (me@example.com)",   # good Crom etiquette
    timeout=30.0,
)

For full control — connection limits, event hooks, observing quota headers — inject your own httpx.AsyncClient. thaumiel will not close a client it did not create:

import httpx
from thaumiel import AsyncClient

http = httpx.AsyncClient(headers={"User-Agent": "my-app/1.0"})
client = AsyncClient(http_client=http)
# ... use client ...
await http.aclose()   # you own it; you close it

More end-to-end scripts live in examples/.

Limitations

  • Read-only: thaumiel offers no writes.
  • Async only: There is no synchronous client.
  • Wikidot pages only: pages() skips non-Wikidot nodes (e.g. RuFoundation), so it can yield fewer rows than count_pages reports for the same filter.
  • Curated filter surface: Only the fields in the table above are filterable, and some support equality only.
  • Quota-bound: Requests cost points against Crom's quota; budget with estimate_*.
  • Alpha: While on 0.x, the public API may change before 1.0.

Development

See .github/CONTRIBUTING.md.

License

MIT — see LICENSE.

About

A typed, ergonomic, and async Python client for the SCP Foundation Wiki's Crom GraphQL API.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages