Skip to content

fix: address review findings round 3 (hy-ug1uj)#53

Merged
philcunliffe merged 2 commits into
integration/cache-lifecyclefrom
polecat/hy-ug1uj
May 27, 2026
Merged

fix: address review findings round 3 (hy-ug1uj)#53
philcunliffe merged 2 commits into
integration/cache-lifecyclefrom
polecat/hy-ug1uj

Conversation

@philcunliffe
Copy link
Copy Markdown
Contributor

Summary

Merge target

polecat/hy-ug1ujintegration/cache-lifecycle

🤖 Auto-merged by refinery

philcunliffe and others added 2 commits May 26, 2026 18:56
…y-ug1uj)

1. Spool cleanup: remove malformed-file continue that caused permanent
   spool accumulation — files are now cleaned up regardless of malformed rows.
2. metadataVersion: propagate real version through commitRowStream instead
   of hardcoding empty string, fixing ExportMarker contract violation.
3. INTERNAL_FIELDS: wire up filtering in query-facing storage.readRows
   and storage.dataSourceForTable so _hyp_cache_row_id and
   _hyp_cache_batch_id are hidden from query output.
4. Replace @typedef with .d.ts interfaces in streaming-reader.js and
   format-iceberg/maintenance.js per CLAUDE.md.
5. Replace inline import() types with @import declarations in
   maintenance.js, dataset.js, and test files per CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntionConfig

Inline import('./maintenance.js').ExportRetentionConfig caused a typecheck
failure because maintenance.js only consumes that type — it does not
re-export it. Import from the canonical ./types.d.ts instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@philcunliffe philcunliffe merged commit 430d953 into integration/cache-lifecycle May 27, 2026
6 checks passed
@philcunliffe philcunliffe deleted the polecat/hy-ug1uj branch May 27, 2026 02:08
philcunliffe added a commit that referenced this pull request May 27, 2026
* chore: seed integration/cache-lifecycle for feature flow

* hy-arjts: Streaming spool reader (#42)

* feat: streaming spool reader with batching, resume, and internal fields (hy-arjts)

Replace the readFile-based spool flush with a streaming JSONL reader that
handles large backlogs without loading them into memory. Key changes:

- Stream complete lines only; partial trailing lines stay in the buffer
- Batch rows by 128 MiB estimated payload or 100,000 rows (whichever first)
- Decorate each row with _hyp_cache_row_id (SHA-256 dedup key) and
  _hyp_cache_batch_id (flush batch identifier)
- Persist flush progress per rotated spool file so restarts resume from
  the last successfully flushed offset
- Malformed JSON lines are counted and skipped without aborting the batch

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: remove dead batchByteLen and batchStartOffset (hy-arjts)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* hy-qyq1p: Partition layout + storage APIs (#43)

* feat: partition layout + storage APIs (hy-qyq1p)

Introduce per-source/day cache partitions with cursor-based epoch
tracking:

- appendRowsToPartition(dataset, segments, columns, rows) writes into
  the current epoch's Iceberg table and updates cursor.json
- discoverCachePartitions(scope) enumerates logical partitions with
  scope filtering (datasets, date range)
- resolveClientName() fallback chain for partition key derivation
- client_name column added to ai_gateway_messages schema
- Existing appendRows compat path unchanged; new APIs added alongside

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: partition storage APIs and client_name column (hy-qyq1p)

Unit tests for:
- appendRowsToPartition: single/multi-segment, epoch creation, cursor
  updates, empty rows short-circuit
- discoverCachePartitions: empty cache, single/multi-date discovery,
  scope filtering by datasets/date/range
- resolveClientName: full fallback chain including empty string skipping
- cursor read/write roundtrip

Update ai-gateway schema assertion for new client_name column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Feature cache-lifecycle: Flush + append integration (hy-omtz3) (#44)

* hy-omtz3: Flush + append integration

Wire the streaming spool reader into the partition-aware flush pipeline
so each batch from streamFlushFile is routed through appendRowsToPartition,
landing rows in the correct per-source/day partition (client=X/date=Y).

Cache storage read methods (tableExists, readRows, dataSourceForTable)
now resolve through the epoch directory layer so queries work with both
the old flat layout and the new partitioned layout.

Update @hypaware/format-iceberg to stream rows through target-sized
batch commits via commitRowStream instead of draining all rows into
memory before a single commitBatch call.

Smoke test covers batching, partition routing, and row count integrity.
10 new unit tests for resolvePartitionDate and resolvePartitionSegments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: remove dead _spool access that broke typecheck

The smoke test referenced kernel.storage._spool which does not exist on
ExtendedQueryStorageService. The block was a no-op (declared flushCallCount
was never read, the if-body was an empty comment), so removing it fixes CI
typecheck without behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* hy-3u4uy: Legacy compat + migration (#46)

- discoverCachePartitions detects legacy Iceberg tables (no cursor.json)
  and marks them with legacy: true; skips .retired/ directories
- ai-gateway dataset discovers all partitions (legacy + new per-client/day)
  and unions them into a single data source for seamless query results
- hyp query maintain [--force] migrates legacy partitions into the new
  per-source/day layout and retires old paths; idempotent
- gateway_claude_capture and gateway_codex_capture smokes verify rows
  land in client=*/date=* partitions with client_name populated
- Unit tests for legacy discovery, .retired skipping, and migration

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* hy-xg0v8: Maintenance engine + CLI + config (#48)

* feat: cache maintenance engine — snapshot expiration, compaction, CLI, daemon integration (hy-xg0v8)

Add maintainCache API with two maintenance operations:
- Snapshot expiration: keeps current + min 10 recent + all within 24h
- Compaction: streams rows into new epoch with row-id dedup when partition
  exceeds 32 data files, avg file size < 32 MiB, or metadata > 64 MiB

New CLI commands:
- hyp query maintain [dataset] [--dry-run] [--force] [--compact-only] [--expire-only]
- Enhanced hyp query status with partition-level stats (files, snapshots, metadata bytes)

Config surface at query.cache.maintenance with all tunable thresholds.
Daemon runs bounded maintenance on a configurable interval (default 60 min,
30s budget per tick). Retired epochs get a 24h grace period before cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: resolve 7 typecheck errors in cache maintenance and CLI

Narrow JSDoc union type after error guard in runQueryMaintain,
fix bigint/number array type in expireSnapshots, and use
ColumnSpec['type'] instead of missing ColumnType export.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: narrow parsed union before property access in runQueryMaintain

Move destructuring with JSDoc type assertion above the compactOnly/expireOnly
guard so TS no longer sees the error branch of the union on lines 812-815.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* hy-jbxou: Export maintenance — snapshot expiration + compaction status (#50)

* feat: export maintenance — snapshot expiration + compaction status (hy-jbxou)

- Add format-iceberg/src/maintenance.js with snapshot expiration for
  blob-store-backed Iceberg export tables (same retention defaults as
  cache maintenance: keep current + 10 recent, expire after 24h).
- Run best-effort snapshot expiration after each successful export
  commit in table-format.js.
- Add `hyp sink maintain [instance] [--dry-run]` CLI command that
  discovers exported datasets and runs snapshot expiration.  Reports
  `compaction_unsupported` because icebird V1 has no rewrite-data-files
  or delete-data-files primitive.
- Store blobStore on ExtendedSinkHandle for table-format sinks so the
  CLI can access it.
- Update iceberg_export_local_fs smoke: second batch export, maintenance
  assertions, verify compaction status and post-maintenance readability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: BigInt serialization in commitRowStream + update smoke span count (hy-jbxou)

commitRowStream used JSON.stringify to estimate batch byte size, which
throws on rows containing BigInt values. Use a BigInt-safe replacer.

Update the iceberg_export_local_fs smoke to expect 2 export_batch spans
now that the test exercises two batches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: resolve unbound `config` reference in exportDataset snapshot expiration

`config` was defined in `buildSink()` scope but referenced inside the
standalone `exportDataset()` function. Use `ctx.sinkInstanceConfig?.maintenance`
which is the same value and is available via the ctx parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* hy-c7pml: Fix dual-review findings on PR #41 (cache-lifecycle) (#51)

1. Add smoke_name/smoke_step/DEV_RUN_ID observability attributes to
   cache_lifecycle_maintenance smoke, matching the cache_roundtrip pattern.
2. Convert compactPartition from unbounded in-memory collect to batched
   streaming writes (10k rows per batch), preventing OOM on large partitions.
3. Surface malformedCount from streaming-reader through spool flush pipeline;
   throw on corruption to restore the data-integrity invariant that the
   original readFlushChunks enforced.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): address all 7 review findings from PR #41 dual-review (hy-nsc6k) (#52)

1. sinkInstanceConfig.maintenance: extract maintenance config in buildSink
   and pass through to exportDataset via the untyped config object
2. retirePartition: move inside rows.length > 0 guard so empty legacy
   partitions are not irrecoverably retired
3. shutdown: track maintenanceInFlight promise and await it during shutdown
   to prevent orphan data files from abandoned mid-write operations
4. compactPartition: use inferColumnType from migrate.js instead of
   hardcoding all columns as STRING
5. spool malformed-line: replace throw with continue so remaining spool
   files are still processed (malformed files retained for inspection)
6. ExportMarker.dataFiles: accumulate dataFiles across batches in
   commitRowStream and propagate to the marker write
7. retention DEFAULTS: export SNAPSHOT_RETENTION_DEFAULTS from cache
   maintenance and import in format-iceberg to eliminate dual source of truth

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address review findings round 3 (hy-ug1uj) (#53)

* fix: address all 5 review findings from PR #41 dual-review round 3 (hy-ug1uj)

1. Spool cleanup: remove malformed-file continue that caused permanent
   spool accumulation — files are now cleaned up regardless of malformed rows.
2. metadataVersion: propagate real version through commitRowStream instead
   of hardcoding empty string, fixing ExportMarker contract violation.
3. INTERNAL_FIELDS: wire up filtering in query-facing storage.readRows
   and storage.dataSourceForTable so _hyp_cache_row_id and
   _hyp_cache_batch_id are hidden from query output.
4. Replace @typedef with .d.ts interfaces in streaming-reader.js and
   format-iceberg/maintenance.js per CLAUDE.md.
5. Replace inline import() types with @import declarations in
   maintenance.js, dataset.js, and test files per CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(types): replace inline import() types with @import for ExportRetentionConfig

Inline import('./maintenance.js').ExportRetentionConfig caused a typecheck
failure because maintenance.js only consumes that type — it does not
re-export it. Import from the canonical ./types.d.ts instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address round 4 dual-review findings (codex-driven)

Fixes all 6 findings from the round 4 Claude + Codex dual review:

- C1: compactPartition now updates cursor and retires old epoch on zero-row dedup
- C2: appendRowsToPartition propagates return value from partition.js
- X1: Internal field filtering now strips both columns and cells to maintain alignment
- X2: createDataSource passes scope through to re-discovery to prevent data leaks
- X3: Maintenance ticks guarded against overlap with inFlight flag
- X4: Resume offset only advances after batch is durably flushed

Co-Authored-By: Codex (gpt-5.3-codex) <noreply@openai.com>

* fix: resolve typecheck and test failures from codex fix round

- storage.appendRowsToPartition: keep Promise<void> return type to
  match kernel interface (span still records metrics internally)
- storage.dataSourceForTable: internal field filtering now outputs
  AsyncCells (Record<string, AsyncCell>) not array, preserving type
  compatibility with squirreling's AsyncRow
- cache-storage.test: remove return-value assertions (void), test
  filtering via column/resolved exclusion instead
- ai-gateway-dataset.test: add required limit to QueryScope, use
  resolved-only assertion path

Typecheck clean, 490/490 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: dry-run must not perform real migration or epoch cleanup

- Skip migrateLegacyPartitions when --dry-run is set in hyp query maintain
- Skip cleanRetiredEpochs when dryRun is true in maintainCache

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: feature-launch <feature-launch@gas.city>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Codex (gpt-5.3-codex) <noreply@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant