Skip to content

Parquet output requires pip install pyarrow (not auto-installed) #6

@sqllocks

Description

@sqllocks

Issue

On a fresh clean-machine install (pip install sqllocks-spindle v2.13.0 + Python 3.12), running:

spindle generate retail --scale small --seed 42 --output ./demo-data/ --format parquet

…fails with:

ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
A suitable version of pyarrow or fastparquet is required for parquet support.

Workaround: pip install pyarrow — then it works perfectly.

Why this matters

Spindle's pitch is Microsoft Fabric / Lakehouse, which writes Delta/Parquet. The README quick-start mentions Lakehouse, Warehouse, etc. — but the default install can't write parquet without an extra step.

Recommended fixes (pick one)

  1. Add pyarrow to core install_requires — simplest, but adds ~50MB to the install
  2. Add a [fabric] or [parquet] extras_require — users can opt-in via pip install sqllocks-spindle[fabric]
  3. Document clearly in README — explicit "for parquet output, also install pyarrow"

Leaning toward #2 — keeps the core install small but makes the Fabric story one command:

pip install sqllocks-spindle[fabric]

Test repro

Fresh venv, Python 3.12.10, Windows. Full session captured in tracker.

Friction context

Found during the Step 10 launch-pre-flight test (clean-machine install). End-to-end time from venv creation to working CSV output was 22 seconds. Parquet was the only blocker.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions