Skip to content

Releases: spiraldb/raincloud

v0.1.0 β€” initial public release

06 May 03:50

Choose a tag to compare

Initial public release.

Raincloud is a client-reproducible pipeline for building a curated catalog
of public datasets as analytics-ready Parquet + Vortex files. See
README.md for the user-facing overview,
AGENTS.md for the architecture, and
SKILLS.md for procedural playbooks.

This release bundles:

  • The 7-stage build pipeline (fetch β†’ extract β†’ parse β†’ transform β†’ write
    β†’ validate β†’ convert) plus the optional opt-in hydrate stage.
  • 249 dataset specs across 5 families (direct, kaggle-upstream,
    nyc-tlc, public-bi, uci).
  • 24 named transform handlers covering CSV / Parquet / JSONL / XML / PBF /
    custom-format upstreams plus streaming variants for memory-constrained
    shapes.
  • A read-only Textual TUI for browsing the catalog
    (python -m scripts.pipeline.browse, requires --extra tui).
  • Per-dataset Vortex conversion via the convert.vortex flag.
  • Apache License 2.0, with SPDX file headers on all Python sources.
  • Governance: SECURITY.md, CONTRIBUTING.md, CODE_OF_CONDUCT.md
    (Contributor Covenant 2.1), DISCLAIMER.md (AS IS posture, content
    and license disclaimers, dataset-removal reporting), and
    HYDRATING.md (policy for the optional hydrate stage).
  • Tooling: ruff lint (rules E, F, W, I) + GitHub Actions CI
    (.github/workflows/ci.yml) running lint, manifest validation, and
    pytest on every push and PR to develop.
  • Dataset-removal issue template
    (.github/ISSUE_TEMPLATE/dataset-removal.yml) β€” structured form for
    the channel DISCLAIMER.md points readers at.
  • Pull-request template (.github/pull_request_template.md) prompting
    for summary, test-plan checkbox list against the standard pre-PR gate,
    and change-type tags.
  • CITATION.cff β€” GitHub-native citation metadata; surfaces the "Cite
    this repository" button in the repo sidebar with BibTeX / APA / Chicago
    exports.