Add EdgeCache encoding to BulkEdgeEncoder of codec-java#249
Conversation
Emit EdgeCache rows during bulk load for INDEXED and MULTI_EDGE labels. The wire format matches the V3 EdgeCacheRecordMapper byte layout so that bulk-loaded data is visible to cache-backed multi-hop queries. Verified end-to-end by a V3 Decoder round-trip in V2MultiEdgeBulkLoadTest.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note - Although the migration to V3 is in progress, this PR is a necessary update to V2 to bring the latest features into the currently running production. @zipdoki |
|
Need to check whether the LabelDTO change impacts any of its usages. If the above and HBase testing are confirmed, I'll go ahead with an optimistic merge. |
|
@em3s
|
|
@em3s The error occurs not in encoding, but when deserializing the Table Schema response. The proper fix is to add |
|
@zipdoki Thanks — confirmations look good. Moving this to the v0.3.0 milestone so the bulk-load cache fix ships in 0.3.0 rather than 0.4.0. Related changes landing in 0.3.0:
Upgrade order is server first, then clients — the deserialization issue you hit disappears as long as that order holds, so we don't need the I'll proceed with the optimistic merge here once the server-side Table Schema PR is merged. |
|
Quick correction on the earlier comment — after verifying on Opened #272 to add a regression test that locks in the Update: Merged #272 |
|
@zipdoki Optimistic Merging. Expecting |
|
And huge thanks for the work here, @zipdoki — running bulk load against real HBase and chasing down the LabelDTO root-cause aren't easy lifts. |
|
Update: |
Summary
Extend
BulkEdgeEncoder.bulkEncodeAllto also emit EdgeCache records during bulk load. Previously only HashEdge/IndexedEdge/CounterEdge rows were produced, so bulk-loaded data was missing from cache-backed (multi-hop) queries for INDEXED and MULTI_EDGE labels. Emitted rows are byte-compatible with what the V3EdgeCacheRecordMapperwrites — key layoutxxhash32(src) | directedSource | labelId | EDGE_CACHE(-6) | direction | cacheCode(int32), qualifiercacheValues... | directedTarget, valuets | (propertyHashKey, propertyValue)..., and the IN-direction src/tgt swap inBytesKeyValueEdgeEncoder/StringKeyFieldValueEdgeEncodermirrors V3EdgeMutationStrategy.MultiEdge. Fixes the behavior gap noted in #37.Test plan
./gradlew :codec-java:build— full codec-java compile; catches type/import breakage from the newCacheDTO,LabelDTO.cachesfield,EncodedEdgeType.EDGE_CACHE_TYPE(-6), and theEdgeEncoder.encodeCacheEdge/encodeAllCacheEdgesadditions../gradlew :codec-java:test --tests "*BulkEdgeEncoderTests*"— INDEXED label encoding across BOTH / OUT / IN directions with total row-count assertions, HASH / inactive-edge negative cases (no cache rows even whencachesis set), and the backward-compat path where a label JSON without acacheskey still deserializes and emits no cache rows../gradlew :codec-java:test --tests "*MultiEdgeBulkEdgeEncoderTests*"— MULTI_EDGE label; verifies the syntheticoutEdge=(src, id)/inEdge=(id, tgt)from the bulk path are reused by the cache encoder and emit 2 cache rows (OUT/IN). Also covers MULTI_EDGE JSON withoutcaches../gradlew :core:test --tests "*V2MultiEdgeBulkLoadTest*"— end-to-end round-trip: V2 bytes produced byBulkEdgeEncoderare decoded via V3EdgeCacheRecordMapper.Decoder(testEdgeCacheOut/In), proving byte compatibility with V3's wire format. Also regresses state/indexed/counter paths to ensure the cache-row addition did not disturb them.getCaches()/entity.cachesundercodec-java/andengine/— confirms no production caller expects a non-nullLabelDTO.cachesafter the null-guard removal (BulkEdgeEncoderis the only Java consumer and already doescaches != null && !caches.isEmpty(); KotlinLabelEntity.cachesis a separate class with a KotlinemptyList()default and is unaffected).