Skip to content

fix: bulk submissions extract writes no parquet despite success report (#43)#57

Merged
joce merged 1 commit into
mainfrom
fix/43-submissions-parquet
May 25, 2026
Merged

fix: bulk submissions extract writes no parquet despite success report (#43)#57
joce merged 1 commit into
mainfrom
fix/43-submissions-parquet

Conversation

@joce
Copy link
Copy Markdown
Owner

@joce joce commented May 25, 2026

Summary

Fixes #43.

  • Add a defensive post-condition to the bulk submissions / companyfacts --extract-parquet handlers so they refuse to emit a rc=0 JSON envelope claiming a parquet_path that the writer did not actually produce. Instead, surface a clear EdgarqError with rc=3 and a "re-run with --refresh" hint.
  • Strengthen the existing CLI test to assert the reported path is a real, non-empty file on disk.
  • Add a phantom-path regression test that mocks extract_to_parquet to return a non-existent path — fails on the pre-fix handler, passes after.

Test plan

  • uv run pytest tests/commands_impl/test_bulk.py tests/test_bulk.py — 157 passed
  • uv run pytest tests/ --ignore=tests/integration — 1114 passed
  • uv run tox -e py3.10,py3.11,py3.12,py3.13,py3.14 — all green (individually; combined run hit transient resource contention on Windows)
  • npx cspell --no-must-find-files --no-gitignore src/edgarq/commands_impl/bulk.py tests/commands_impl/test_bulk.py — clean
  • Live smoke (bulk submissions --extract-parquet --workers 1 against the real ~975K-CIK SEC dump in an isolated EDGARQ_BULK_HOME): produced _parquet/companies.parquet successfully — the on-disk invariant the new guard enforces.

Fixes #43

Add a defensive post-condition to the bulk submissions/companyfacts
``--extract-parquet`` handlers so that the CLI refuses to emit a
``rc=0`` JSON envelope with a ``parquet_path`` that points at a file
the writer did not actually produce.  Issue #43 documented a
"reported success, no parquet on disk" failure mode that bypassed
every downstream ``--prefer local`` consumer (SQL views, lookup,
screen).  The new guard re-checks ``out_path.exists()`` (submissions)
or ``shard_dir.glob('*.parquet')`` (companyfacts) after the extract
call returns and surfaces a clear ``EdgarqError`` with rc=3 instead.

Strengthens the existing CLI test to assert the reported path is a
real, non-empty file on disk, and adds a phantom-path regression
test that mocks ``extract_to_parquet`` to return a path that does
not exist — this test fails on the pre-fix handler and passes after.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 25, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 62.50000% with 3 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/edgarq/commands_impl/bulk.py 62.50% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@joce joce merged commit acf8e06 into main May 25, 2026
14 checks passed
@joce joce deleted the fix/43-submissions-parquet branch May 25, 2026 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bulk submissions --extract-parquet reports success but writes no parquet file

2 participants