Skip to content

[BatchUpdate 2/3] feat: add batch SQL layer for multi-aspect upsert and batch deletion#607

Merged
jphui merged 1 commit into
masterfrom
jhui/batch-sql-layer
Apr 7, 2026
Merged

[BatchUpdate 2/3] feat: add batch SQL layer for multi-aspect upsert and batch deletion#607
jphui merged 1 commit into
masterfrom
jhui/batch-sql-layer

Conversation

@jphui
Copy link
Copy Markdown
Contributor

@jphui jphui commented Apr 6, 2026

Overview

Extracted from #598 for easier review (part 2 of 4, depends on #606). This PR contains only the SQL-level batch operations — no orchestration logic in BaseLocalDAO or EbeanLocalDAO.

Note that upon inspection, the comparison between this PR and the relevant sections of 598 are identical 😎

Changes

1. EbeanLocalAccess.batchUpsert() — multi-aspect upsert in a single SQL call

  • Generates one INSERT ... ON DUPLICATE KEY UPDATE that writes all aspects for a URN at once
  • Uses UPSERT semantics: always updates if exists, clears deleted_ts
  • Contrast with create() which throws DuplicateKeyException on non-soft-deleted duplicates

2. prepareMultiColumnInsert() — shared SQL builder

  • Extracted common logic from create() into a reusable helper
  • Handles: INSERT INTO + VALUES clause building, audit stamp extraction, aspect parameter binding, URN extraction
  • Caller provides only the ON DUPLICATE KEY clause, eliminating ~80 lines of duplication
  • Note: create() now calls this helper instead of inline SQL construction. Output SQL is identical — pure extract-method refactor.

3. buildOnDuplicateKeyForCreate() / buildOnDuplicateKeyForUpsert() — split clause generation

  • create: sets aspects + conditionally throws via CAST('DuplicateKeyException' AS UNSIGNED)
  • upsert: sets aspects + lastmodifiedon + deleted_ts = NULL
  • DELETED_TS_DUPLICATE_KEY_CHECK renamed to ON_DUPLICATE_KEY_UPDATE for clarity

4. IEbeanLocalAccess + InstrumentedEbeanLocalAccess — interface and instrumentation

  • Added batchUpsert() to the interface
  • InstrumentedEbeanLocalAccess delegates with latency/error tracking

5. EbeanLocalAccessTest — integration tests against embedded MariaDB

  • testBatchUpsertMultipleAspects: inserts two aspects in one call, verifies both persisted
  • testBatchUpsertOverwritesExistingValues: upserts over existing data, verifies update
  • testBatchUpsertClearsDeletedTs: soft-delete then upsert, verifies deleted_ts cleared
  • testBatchUpsertSingleAspect: single-aspect degenerate case
  • testBatchUpsertPreservesIngestionTrackingContext: verifies emitter/emitTime round-trip

Testing

  • All EbeanLocalAccessTest and InstrumentedEbeanLocalAccessTest tests pass
  • Existing tests unaffected — create() produces identical SQL output via the shared helper
  • *** The FULL E2E Testing suite from the original mega PR still applies here too

Checklist

  • Compiles with ./gradlew :dao-api:compileJava :dao-impl:ebean-dao:compileJava
  • Tests pass with ./gradlew :dao-impl:ebean-dao:test --tests EbeanLocalAccessTest --tests InstrumentedEbeanLocalAccessTest
  • No breaking changes to public interfaces
  • Files are byte-identical to corresponding code in (feat) Transactional Multi-Aspect Update #598

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 82.02247% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.56%. Comparing base (fb9fa85) to head (f26ca81).

Files with missing lines Patch % Lines
...va/com/linkedin/metadata/dao/EbeanLocalAccess.java 83.90% 6 Missing and 8 partials ⚠️
...din/metadata/dao/InstrumentedEbeanLocalAccess.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #607      +/-   ##
============================================
+ Coverage     65.43%   65.56%   +0.12%     
- Complexity     1749     1759      +10     
============================================
  Files           144      144              
  Lines          6813     6847      +34     
  Branches        826      829       +3     
============================================
+ Hits           4458     4489      +31     
- Misses         1993     1996       +3     
  Partials        362      362              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jphui jphui changed the title feat(ebean-dao): add batch SQL layer for multi-aspect upsert and batc… feat: add batch SQL layer for multi-aspect upsert and batch deletion Apr 6, 2026
import java.time.ZoneOffset;
import java.time.format.DateTimeFormatter;

import static com.linkedin.metadata.dao.BaseReadDAO.LATEST_VERSION;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not used in this pr, but i see it's used in the large pr containing all the changes (#598) so it should be good


/**
* Helper method to build the ON DUPLICATE KEY UPDATE clause for batchUpsert() method.
* This clause always updates the row and clears deleted_ts (UPSERT semantics).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jphui could you help confirm the behavior of batchUpsert() on duplicate key (row exists)?
in that case, this buildOnDuplicateKeyForUpsert() appends ON DUPLICATE KEY UPDATE <aspect columns>, lastmodifiedon = :lastmodifiedon, deleted_ts = NULL; and lastmodifiedby is not updated? should we update lastmodifiedby as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ON DUPLICATE KEY UPDATE %s = :metadata, lastmodifiedon = :lastmodifiedon, deleted_ts = NULL;

Same pattern — lastmodifiedby is not updated on duplicate key in the existing single-aspect path either. So the batch upsert is consistent with the existing behavior. It's not a
new bug we introduced — it's pre-existing.

The lastmodifiedby is baked into each aspect's JSON blob (via AuditedAspect.setLastmodifiedby), so the actual actor info is preserved inside the aspect column. The top-level
lastmodifiedby column just doesn't get refreshed on update, which has been the case all along.

This is basically just keeping behavior consistent with existing behavior.

…h deletion

Batch upsert:
- EbeanLocalAccess.batchUpsert(): single INSERT...ON DUPLICATE KEY UPDATE for all aspects
- prepareMultiColumnInsert(): shared helper consolidating create() and batchUpsert() SQL building
- buildOnDuplicateKeyForCreate/Upsert(): split ON DUPLICATE KEY clause generation
- IEbeanLocalAccess: batchUpsert() interface addition

Batch deletion:
- readDeletionInfoBatch(): single SELECT for deletion-eligible URNs
- batchSoftDeleteAssets(): guarded UPDATE with defense-in-depth WHERE clauses
- EntityDeletionInfo: new @value @builder data class in dao-api
- SQLStatementUtils: createReadDeletionInfoByUrnsSql, createBatchSoftDeleteAssetSql
- EBeanDAOUtils: convertSqlRowsToEntityDeletionInfoMap, toEntityDeletionInfo

Instrumentation:
- InstrumentedEbeanLocalAccess: decorator recording per-operation latency/errors
- BaseDaoBenchmarkMetrics / NoOpDaoBenchmarkMetrics: metrics interface + no-op impl

Supporting:
- SchemaValidatorUtil.getColumns(): expose column cache for batch operations
- Status.pdl, FooAsset.pdl: test models for integration tests
- Comprehensive integration tests against embedded MariaDB

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jphui jphui force-pushed the jhui/batch-sql-layer branch from b01f5d4 to f26ca81 Compare April 7, 2026 20:13
Copy link
Copy Markdown
Contributor

@chmeng0 chmeng0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jphui jphui merged commit a08aa98 into master Apr 7, 2026
2 checks passed
@jphui jphui changed the title feat: add batch SQL layer for multi-aspect upsert and batch deletion [BatchUpdate 2/3] feat: add batch SQL layer for multi-aspect upsert and batch deletion Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants