feat: 15 - verbose mode#17
Conversation
joefrost01
left a comment
There was a problem hiding this comment.
Review of feat: 15 - verbose mode
Overall this is solid work — the VerboseLogger struct is clean, stderr-only output is correct, timestamp format matches spec, format_count is well-tested, and stdout data output is untouched. However, there's one issue that needs fixing before merge:
1. "Accumulation complete" is logged in the wrong position (must fix)
File: src/query_pipeline.rs:274-278
The "Accumulation complete" log appears after post-sql, masking, lineage, and limit have all been applied. Per the spec, it should appear immediately after the per-file processing loop and before post-sql.
The spec example makes this clear — the counts are different:
[00:12.4] Accumulation complete: 48,721 rows ← pre-transformation total
[00:12.5] Applying post-sql...
[00:12.8] Post-sql complete: 1,204 rows ← post-transformation total
Currently the code queries SELECT COUNT(*) FROM temp_results after limit/post-sql/etc. have already mutated the table, so "Accumulation complete" would report e.g. 1,204 instead of 48,721. This is semantically misleading.
Fix: Move the row_count query + "Accumulation complete" log to immediately after the file processing loop (after the processed == 0 check, before the post_sql block). Keep the existing later row_count query for validation/output where it is.
2. Hyphens instead of em-dashes (minor)
The spec consistently uses em-dashes (—) in log messages:
Loading ref table: regions (ref/regions.parquet) — 250 rows
[1/47] data/2024-01/sales.csv — 1,423 rows matched
The implementation uses regular hyphens (-). See the format! calls in the ref-table loop (~line 163) and per-file success/skip logging (~lines 209, 224).
3. Missing "Exclude" log point (minor, can defer)
The spec lists Excluded {n} files (only if excludes matched) but no such log exists. If the exclude count isn't readily available from resolve_files, this could be a follow-up.
Everything else checks out: pipe safety, dry-run early exit before logging, per-file [i/total] indexing, conditional logging for masking/lineage/limit/validation, and the "Done" summary. Nice format_count implementation with tests.
Not LGTM yet — issue #1 needs to be addressed. After that fix (and the optional em-dash cleanup), this is ready to merge pending CI.
joefrost01
left a comment
There was a problem hiding this comment.
Review: feat/15-verbose-mode vs specs/15-verbose-mode.md
Overall this is solid work — the VerboseLogger struct, timestamp format, format_count, and pipeline integration are all clean. A few spec deviations need fixing before merge:
1. Em dash vs hyphen in log messages
The spec uses — (em dash) as the separator in several log messages. The implementation uses - (hyphen) instead.
Spec:
[00:00.1] Loading ref table: regions (ref/regions.parquet) — 250 rows
[00:00.2] [1/47] data/2024-01/sales.csv — 1,423 rows matched
[00:00.4] [3/47] data/2024-02/sales.csv — SKIPPED (...)
Actual (in diff):
"Loading ref table: {} ({}) - {} rows"
"[{}/{}] {} - {} rows matched"
"[{}/{}] {} - SKIPPED ({})"These should use — (em dash) to match the spec format.
2. Missing "Exclude" log point
The spec requires:
Excluded {n} files(only if excludes matched)
This log point is absent from the implementation. FileResolver::resolve() currently consumes the exclude count internally, so surfacing it may require returning metadata from the resolver. If this is deferred, it should be tracked as a follow-up.
3. indicatif dependency
The spec lists indicatif = "0.17" under Dependencies. It isn't added in this PR. The spec notes it's for future progress bar use, so this may be intentional — but worth confirming whether it should be added now per the spec or deferred.
Everything else looks correct: timestamp format, all other log points present, thousands-separator formatting, pipe safety (stderr only), closure refactor to capture row counts, and the format_count test. Nice work on keeping the behavioral changes minimal.
Verdict: Not ready to merge. The em dash formatting is a clear spec deviation that should be fixed. The exclude log point and indicatif dependency should be addressed or explicitly deferred.
joefrost01
left a comment
There was a problem hiding this comment.
Re-review: All prior findings resolved ✓
Verified the latest commit (74bf7c2) addresses all three issues from the prior review:
- Em-dash format — Log messages now use
—(em dash) matching the spec (query_pipeline.rs:167,218,231) - Excluded files log —
"Excluded {n} files"is now emitted when excludes match (query_pipeline.rs:131) - indicatif dependency —
indicatif = "0.17"added toCargo.toml
LGTM — ready to merge once CI passes.
What problem are you trying to solve?
--verboseexisted as a flag but did not emit structured progress logs across pipeline stages. Users could not observe stage timing, per-file outcomes, or row-count progression without adding ad-hoc prints.What does this PR change?
This adds a timestamped verbose logger in the query pipeline and logs key execution milestones to stderr (start, file resolution, schema mode, ref loads, per-file success/skip, accumulation, post-sql, masking, lineage, limit, validation, count, output/profile/fingerprint, done). It also adds thousands-separator row formatting and tests for number formatting.
Does this change align with DESIGN.md?
Yes. Output data paths remain unchanged, and verbose logs stay on stderr so stdout output remains pipe-safe. Core execution order and semantics are preserved.
What alternatives did you consider?
I considered introducing a separate logging module and dependency-heavy progress UI first, but that added complexity before behavior was validated. A focused in-pipeline logger with clear timestamp formatting delivered the spec behavior with minimal risk.
Does this PR contain multiple unrelated changes?
No. All changes are scoped to verbose mode behavior for feature 15.
Existing PRs
Testing
cargo testpassescargo clippypasses with no warningscargo fmthas been runformat_count_inserts_thousands_separatorsEvaluation
Human review