fix: bulk submissions extract writes no parquet despite success report (#43)#57
Merged
Conversation
Fixes #43 Add a defensive post-condition to the bulk submissions/companyfacts ``--extract-parquet`` handlers so that the CLI refuses to emit a ``rc=0`` JSON envelope with a ``parquet_path`` that points at a file the writer did not actually produce. Issue #43 documented a "reported success, no parquet on disk" failure mode that bypassed every downstream ``--prefer local`` consumer (SQL views, lookup, screen). The new guard re-checks ``out_path.exists()`` (submissions) or ``shard_dir.glob('*.parquet')`` (companyfacts) after the extract call returns and surfaces a clear ``EdgarqError`` with rc=3 instead. Strengthens the existing CLI test to assert the reported path is a real, non-empty file on disk, and adds a phantom-path regression test that mocks ``extract_to_parquet`` to return a path that does not exist — this test fails on the pre-fix handler and passes after.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #43.
--extract-parquethandlers so they refuse to emit arc=0JSON envelope claiming aparquet_paththat the writer did not actually produce. Instead, surface a clearEdgarqErrorwith rc=3 and a "re-run with --refresh" hint.extract_to_parquetto return a non-existent path — fails on the pre-fix handler, passes after.Test plan
uv run pytest tests/commands_impl/test_bulk.py tests/test_bulk.py— 157 passeduv run pytest tests/ --ignore=tests/integration— 1114 passeduv run tox -e py3.10,py3.11,py3.12,py3.13,py3.14— all green (individually; combined run hit transient resource contention on Windows)npx cspell --no-must-find-files --no-gitignore src/edgarq/commands_impl/bulk.py tests/commands_impl/test_bulk.py— cleanbulk submissions --extract-parquet --workers 1against the real ~975K-CIK SEC dump in an isolatedEDGARQ_BULK_HOME): produced_parquet/companies.parquetsuccessfully — the on-disk invariant the new guard enforces.