Skip to content

Add EdgeCache encoding to BulkEdgeEncoder of codec-java#249

Merged
em3s merged 3 commits intomainfrom
feat/bulk-cache-encoding
Apr 24, 2026
Merged

Add EdgeCache encoding to BulkEdgeEncoder of codec-java#249
em3s merged 3 commits intomainfrom
feat/bulk-cache-encoding

Conversation

@zipdoki
Copy link
Copy Markdown
Contributor

@zipdoki zipdoki commented Apr 15, 2026

Summary

Extend BulkEdgeEncoder.bulkEncodeAll to also emit EdgeCache records during bulk load. Previously only HashEdge/IndexedEdge/CounterEdge rows were produced, so bulk-loaded data was missing from cache-backed (multi-hop) queries for INDEXED and MULTI_EDGE labels. Emitted rows are byte-compatible with what the V3 EdgeCacheRecordMapper writes — key layout xxhash32(src) | directedSource | labelId | EDGE_CACHE(-6) | direction | cacheCode(int32), qualifier cacheValues... | directedTarget, value ts | (propertyHashKey, propertyValue)..., and the IN-direction src/tgt swap in BytesKeyValueEdgeEncoder / StringKeyFieldValueEdgeEncoder mirrors V3 EdgeMutationStrategy.MultiEdge. Fixes the behavior gap noted in #37.

Test plan

  • ./gradlew :codec-java:build — full codec-java compile; catches type/import breakage from the new Cache DTO, LabelDTO.caches field, EncodedEdgeType.EDGE_CACHE_TYPE(-6), and the EdgeEncoder.encodeCacheEdge / encodeAllCacheEdges additions.
  • ./gradlew :codec-java:test --tests "*BulkEdgeEncoderTests*" — INDEXED label encoding across BOTH / OUT / IN directions with total row-count assertions, HASH / inactive-edge negative cases (no cache rows even when caches is set), and the backward-compat path where a label JSON without a caches key still deserializes and emits no cache rows.
  • ./gradlew :codec-java:test --tests "*MultiEdgeBulkEdgeEncoderTests*" — MULTI_EDGE label; verifies the synthetic outEdge=(src, id) / inEdge=(id, tgt) from the bulk path are reused by the cache encoder and emit 2 cache rows (OUT/IN). Also covers MULTI_EDGE JSON without caches.
  • ./gradlew :core:test --tests "*V2MultiEdgeBulkLoadTest*" — end-to-end round-trip: V2 bytes produced by BulkEdgeEncoder are decoded via V3 EdgeCacheRecordMapper.Decoder (testEdgeCacheOut/In), proving byte compatibility with V3's wire format. Also regresses state/indexed/counter paths to ensure the cache-row addition did not disturb them.
  • Manual multi-hop cache-backed query after bulk load — confirms bulk-loaded edges surface in cache-path multi-hop queries.
  • Grep audit for getCaches() / entity.caches under codec-java/ and engine/ — confirms no production caller expects a non-null LabelDTO.caches after the null-guard removal (BulkEdgeEncoder is the only Java consumer and already does caches != null && !caches.isEmpty(); Kotlin LabelEntity.caches is a separate class with a Kotlin emptyList() default and is unaffected).

Emit EdgeCache rows during bulk load for INDEXED and MULTI_EDGE labels. The wire format matches the V3 EdgeCacheRecordMapper byte layout so that bulk-loaded data is visible to cache-backed multi-hop queries. Verified end-to-end by a V3 Decoder round-trip in V2MultiEdgeBulkLoadTest.
@zipdoki zipdoki requested a review from em3s April 15, 2026 04:29
@zipdoki zipdoki self-assigned this Apr 15, 2026
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Apr 15, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@em3s
Copy link
Copy Markdown
Contributor

em3s commented Apr 17, 2026

Note - Although the migration to V3 is in progress, this PR is a necessary update to V2 to bring the latest features into the currently running production.

@zipdoki
BTW, was this tested against HBase? Mutation and simple query are being implemented, so you can test against those.

@em3s
Copy link
Copy Markdown
Contributor

em3s commented Apr 17, 2026

@zipdoki

Need to check whether the LabelDTO change impacts any of its usages.

If the above and HBase testing are confirmed, I'll go ahead with an optimistic merge.

@em3s em3s changed the title feat(codec-java): add EdgeCache encoding to BulkEdgeEncoder Add EdgeCache encoding to BulkEdgeEncoder of codec-java Apr 17, 2026
@em3s em3s removed their request for review April 17, 2026 06:17
@zipdoki
Copy link
Copy Markdown
Contributor Author

zipdoki commented Apr 17, 2026

@em3s
Will test these two and report back:

  • HBase (mutation + simple query)
  • LabelDTO usage impact

@zipdoki zipdoki requested a review from em3s as a code owner April 23, 2026 03:50
@em3s em3s removed their request for review April 23, 2026 06:52
@em3s em3s modified the milestones: v0.6.0, v0.5.0, v0.4.0 Apr 23, 2026
@zipdoki
Copy link
Copy Markdown
Contributor Author

zipdoki commented Apr 24, 2026

@em3s
Testing is complete via bulk load (get / scan / count all passed). However, there was an issue with LabelDTO worth noting.

The error occurs not in encoding, but when deserializing the Table Schema response. caches is a constructor property in LabelDTO (1.0.14), and ObjectMapper has FAIL_ON_NULL_CREATOR_PROPERTIES=true, so deserialization fails when caches is absent from the response. Root cause is a version mismatch: the Actionbase server is still on 1.0.13 (no caches in Table Schema) while the client was already bumped to 1.0.14.

The proper fix is to add caches to the server-side Table Schema. Since the server is deployed first, this should be a non-issue after the server update. For now, set FAIL_ON_NULL_CREATOR_PROPERTIES=false as a workaround to complete testing.

@em3s
Copy link
Copy Markdown
Contributor

em3s commented Apr 24, 2026

@zipdoki Thanks — confirmations look good.

Moving this to the v0.3.0 milestone so the bulk-load cache fix ships in 0.3.0 rather than 0.4.0. Related changes landing in 0.3.0:

Upgrade order is server first, then clients — the deserialization issue you hit disappears as long as that order holds, so we don't need the FAIL_ON_NULL_CREATOR_PROPERTIES=false workaround in shipped code.

I'll proceed with the optimistic merge here once the server-side Table Schema PR is merged.

@em3s em3s modified the milestones: v0.4.0, v0.3.0 Apr 24, 2026
@em3s
Copy link
Copy Markdown
Contributor

em3s commented Apr 24, 2026

Quick correction on the earlier comment — after verifying on main, the server already includes caches in the Table Schema response. LabelEntity.caches: List<Cache> landed via #226 and GET /graph/v2/service/{service}/label/{label} returns LabelEntity directly, so Jackson serializes the field by default. No functional server-side change is needed for 0.3.0.

Opened #272 to add a regression test that locks in the caches JSON serialization contract as a safety net. Once that merges, I'll proceed with the optimistic merge here.


Update: Merged #272

@em3s
Copy link
Copy Markdown
Contributor

em3s commented Apr 24, 2026

@zipdoki Optimistic Merging.

Expecting codec-java:0.0.1-SNAPSHOT (current main) to become 0.0.1 alongside Actionbase 0.3.0 (shipping next Monday) — if any issue surfaces, a follow-up PR would be appreciated.

@em3s em3s merged commit aeb19fb into main Apr 24, 2026
7 checks passed
@em3s
Copy link
Copy Markdown
Contributor

em3s commented Apr 24, 2026

And huge thanks for the work here, @zipdoki — running bulk load against real HBase and chasing down the LabelDTO root-cause aren't easy lifts.

@em3s
Copy link
Copy Markdown
Contributor

em3s commented Apr 24, 2026

Update: 0.0.1-SNAPSHOT was renumbered to 0.1.0-SNAPSHOT via #274 — the release coordinate will be codec-java:0.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants