[BatchUpdate 2/3] feat: add batch SQL layer for multi-aspect upsert and batch deletion#607
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #607 +/- ##
============================================
+ Coverage 65.43% 65.56% +0.12%
- Complexity 1749 1759 +10
============================================
Files 144 144
Lines 6813 6847 +34
Branches 826 829 +3
============================================
+ Hits 4458 4489 +31
- Misses 1993 1996 +3
Partials 362 362 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| import java.time.ZoneOffset; | ||
| import java.time.format.DateTimeFormatter; | ||
|
|
||
| import static com.linkedin.metadata.dao.BaseReadDAO.LATEST_VERSION; |
There was a problem hiding this comment.
nit: it's not used in this pr, but i see it's used in the large pr containing all the changes (#598) so it should be good
|
|
||
| /** | ||
| * Helper method to build the ON DUPLICATE KEY UPDATE clause for batchUpsert() method. | ||
| * This clause always updates the row and clears deleted_ts (UPSERT semantics). |
There was a problem hiding this comment.
@jphui could you help confirm the behavior of batchUpsert() on duplicate key (row exists)?
in that case, this buildOnDuplicateKeyForUpsert() appends ON DUPLICATE KEY UPDATE <aspect columns>, lastmodifiedon = :lastmodifiedon, deleted_ts = NULL; and lastmodifiedby is not updated? should we update lastmodifiedby as well?
There was a problem hiding this comment.
ON DUPLICATE KEY UPDATE %s = :metadata, lastmodifiedon = :lastmodifiedon, deleted_ts = NULL;
Same pattern — lastmodifiedby is not updated on duplicate key in the existing single-aspect path either. So the batch upsert is consistent with the existing behavior. It's not a
new bug we introduced — it's pre-existing.The lastmodifiedby is baked into each aspect's JSON blob (via AuditedAspect.setLastmodifiedby), so the actual actor info is preserved inside the aspect column. The top-level
lastmodifiedby column just doesn't get refreshed on update, which has been the case all along.
This is basically just keeping behavior consistent with existing behavior.
…h deletion Batch upsert: - EbeanLocalAccess.batchUpsert(): single INSERT...ON DUPLICATE KEY UPDATE for all aspects - prepareMultiColumnInsert(): shared helper consolidating create() and batchUpsert() SQL building - buildOnDuplicateKeyForCreate/Upsert(): split ON DUPLICATE KEY clause generation - IEbeanLocalAccess: batchUpsert() interface addition Batch deletion: - readDeletionInfoBatch(): single SELECT for deletion-eligible URNs - batchSoftDeleteAssets(): guarded UPDATE with defense-in-depth WHERE clauses - EntityDeletionInfo: new @value @builder data class in dao-api - SQLStatementUtils: createReadDeletionInfoByUrnsSql, createBatchSoftDeleteAssetSql - EBeanDAOUtils: convertSqlRowsToEntityDeletionInfoMap, toEntityDeletionInfo Instrumentation: - InstrumentedEbeanLocalAccess: decorator recording per-operation latency/errors - BaseDaoBenchmarkMetrics / NoOpDaoBenchmarkMetrics: metrics interface + no-op impl Supporting: - SchemaValidatorUtil.getColumns(): expose column cache for batch operations - Status.pdl, FooAsset.pdl: test models for integration tests - Comprehensive integration tests against embedded MariaDB Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b01f5d4 to
f26ca81
Compare
Overview
Extracted from #598 for easier review (part 2 of 4, depends on #606). This PR contains only the SQL-level batch operations — no orchestration logic in
BaseLocalDAOorEbeanLocalDAO.Note that upon inspection, the comparison between this PR and the relevant sections of 598 are identical 😎
Changes
1.
EbeanLocalAccess.batchUpsert()— multi-aspect upsert in a single SQL callINSERT ... ON DUPLICATE KEY UPDATEthat writes all aspects for a URN at oncedeleted_tscreate()which throwsDuplicateKeyExceptionon non-soft-deleted duplicates2.
prepareMultiColumnInsert()— shared SQL buildercreate()into a reusable helpercreate()now calls this helper instead of inline SQL construction. Output SQL is identical — pure extract-method refactor.3.
buildOnDuplicateKeyForCreate()/buildOnDuplicateKeyForUpsert()— split clause generationcreate: sets aspects + conditionally throws viaCAST('DuplicateKeyException' AS UNSIGNED)upsert: sets aspects +lastmodifiedon+deleted_ts = NULLDELETED_TS_DUPLICATE_KEY_CHECKrenamed toON_DUPLICATE_KEY_UPDATEfor clarity4.
IEbeanLocalAccess+InstrumentedEbeanLocalAccess— interface and instrumentationbatchUpsert()to the interfaceInstrumentedEbeanLocalAccessdelegates with latency/error tracking5.
EbeanLocalAccessTest— integration tests against embedded MariaDBtestBatchUpsertMultipleAspects: inserts two aspects in one call, verifies both persistedtestBatchUpsertOverwritesExistingValues: upserts over existing data, verifies updatetestBatchUpsertClearsDeletedTs: soft-delete then upsert, verifiesdeleted_tsclearedtestBatchUpsertSingleAspect: single-aspect degenerate casetestBatchUpsertPreservesIngestionTrackingContext: verifies emitter/emitTime round-tripTesting
EbeanLocalAccessTestandInstrumentedEbeanLocalAccessTesttests passcreate()produces identical SQL output via the shared helperChecklist
./gradlew :dao-api:compileJava :dao-impl:ebean-dao:compileJava./gradlew :dao-impl:ebean-dao:test --tests EbeanLocalAccessTest --tests InstrumentedEbeanLocalAccessTest