Reset data chunk id counter for import process #2697

inv-jishnu · 2025-05-26T11:22:30Z

Description

This PR introduces a change to reset the data chunk ID counter to 0 at the beginning of each import process.
The fix addresses the issue reported in DLT-16709.

In the API case, since the JVM is not restarted between import processes, the counter would previously continue from the last value, causing the data chunk IDs in a new import to not start from 0. This update ensures the counter is reset at the start of each import so that data chunk IDs always begin at 0.

Related issues and/or PRs

NA

Changes made

Set data chunk id counter to 0 at the beginning of import process.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes.
I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
Tests (unit, integration, etc.) have been added for the changes.
My changes generate no new warnings.
Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

NA

Release notes

NA

…counter

Copilot

Pull Request Overview

This PR addresses the issue [DLT-16709] by resetting the data chunk ID counter to 0 at the beginning of each import process so that new imports always start with data chunk ID 0. Key changes include updating three import processors (JSON, JSON Lines, and CSV) to perform the counter reset and adding corresponding tests to verify the behavior.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessorTest.java	Added tests to verify that the counter is reset for both STORAGE and TRANSACTION modes for JSON Lines imports.
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonImportProcessorTest.java	Added tests confirming the reset behavior for JSON imports across both modes.
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessorTest.java	Introduced tests to check the reset of the counter for CSV imports in both STORAGE and TRANSACTION modes.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessor.java	Modified the processing method to reset the data chunk counter prior to processing.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/JsonImportProcessor.java	Incorporated counter reset logic at the start of processing.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessor.java	Added counter reset code at the beginning of the processing method.

Copilot · 2025-05-29T01:24:36Z

...st/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessorTest.java

+  @Test
+  void test_importProcessWithStorage_runTwice_CheckDataChunkId() {
+    params =
+        ImportProcessorParams.builder()
+            .scalarDbMode(ScalarDbMode.STORAGE)


[nitpick] The test cases that verify the data chunk ID reset are very similar across STORAGE and TRANSACTION modes. Consider consolidating these into parameterized tests to reduce repetition and improve maintainability.

Suggested change

@Test

void test_importProcessWithStorage_runTwice_CheckDataChunkId() {

params =

ImportProcessorParams.builder()

.scalarDbMode(ScalarDbMode.STORAGE)

@ParameterizedTest

@MethodSource("provideScalarDbModes")

void test_importProcess_runTwice_CheckDataChunkId(ScalarDbMode scalarDbMode) {

params =

ImportProcessorParams.builder()

.scalarDbMode(scalarDbMode)

Copilot · 2025-05-29T01:24:36Z

...c/main/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessor.java

+    // Since the JVM is not restarted between API calls (as in a web application’s API server),
+    // failing to reset the counter would cause the next import to continue from the previous data
+    // chunk ID.
+    dataChunkIdCounter.set(0);


[nitpick] Similar counter reset logic is present in JsonImportProcessor and CsvImportProcessor. Consider refactoring this common behavior into a shared utility method or base class to promote consistency and simplify future updates.

komamitsu · 2025-05-29T04:27:18Z

...ore/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessor.java

+    // Since the JVM is not restarted between API calls (as in a web application’s API server),
+    // failing to reset the counter would cause the next import to continue from the previous data
+    // chunk ID.
+    dataChunkIdCounter.set(0);


I guess the root problem is this counter is static. It's okay for one-shot CLI command, though. What if the API server receives multiple data-load requests simultaneously?

@komamitsu san,
You are right. It is better to remove static than the change I added. I have removed the changes and also removed static from dataChunkIdCounter object.

…counter

inv-jishnu · 2025-05-29T06:49:50Z

@komamitsu san,
I have made changes based on feedback.
Please take a look at this again when you get a chance.
Thank you.

komamitsu

LGTM! 👍

ypeckstadt

LGTM. Thank you.

brfrn169

LGTM! Thank you!

Torch3333

LGTM, thank you!

data chunk id counter value reset

2324b55

inv-jishnu self-assigned this May 26, 2025

inv-jishnu requested a review from a team as a code owner May 26, 2025 11:22

inv-jishnu requested review from Torch3333, brfrn169, feeblefakie and komamitsu May 26, 2025 11:22

inv-jishnu added the bugfix label May 26, 2025

inv-jishnu marked this pull request as draft May 26, 2025 11:22

inv-jishnu removed request for Torch3333, brfrn169, feeblefakie and komamitsu May 26, 2025 11:22

spotless applied

530ba82

Torch3333 removed the request for review from a team May 28, 2025 05:28

inv-jishnu and others added 3 commits May 28, 2025 13:52

Merge branch 'master' into feat/data-loader/core-reset-data-chunk-id-…

341b733

…counter

Unit tests updated

fd5cece

Fixed spotbugs issue

baa5a26

ypeckstadt marked this pull request as ready for review May 29, 2025 01:23

ypeckstadt requested review from a team, Torch3333, brfrn169, Copilot, feeblefakie, komamitsu and ypeckstadt and removed request for a team May 29, 2025 01:23

Copilot AI reviewed May 29, 2025

View reviewed changes

komamitsu reviewed May 29, 2025

View reviewed changes

Changes

9ad1583

Merge branch 'master' into feat/data-loader/core-reset-data-chunk-id-…

4d27fd4

…counter

inv-jishnu requested a review from komamitsu May 29, 2025 06:48

komamitsu approved these changes May 30, 2025

View reviewed changes

ypeckstadt approved these changes May 30, 2025

View reviewed changes

brfrn169 approved these changes May 30, 2025

View reviewed changes

Torch3333 approved these changes May 30, 2025

View reviewed changes

ypeckstadt merged commit 7b62ad5 into master May 31, 2025
55 checks passed

ypeckstadt deleted the feat/data-loader/core-reset-data-chunk-id-counter branch May 31, 2025 04:39

feeblefakie pushed a commit that referenced this pull request May 31, 2025

Reset data chunk id counter for import process (#2697)

35a5e6f

feeblefakie mentioned this pull request May 31, 2025

Backport to branch(3.15) : Reset data chunk id counter for import process #2712

Merged

feeblefakie pushed a commit that referenced this pull request May 31, 2025

Reset data chunk id counter for import process (#2697)

8aa1988

feeblefakie mentioned this pull request May 31, 2025

Backport to branch(3.14) : Reset data chunk id counter for import process #2713

Merged

feeblefakie pushed a commit that referenced this pull request May 31, 2025

Reset data chunk id counter for import process (#2697)

f21c551

feeblefakie mentioned this pull request May 31, 2025

Backport to branch(3.13) : Reset data chunk id counter for import process #2714

Merged

feeblefakie pushed a commit that referenced this pull request May 31, 2025

Reset data chunk id counter for import process (#2697)

deb9dba

feeblefakie mentioned this pull request May 31, 2025

Backport to branch(3.12) : Reset data chunk id counter for import process #2715

Merged

feeblefakie pushed a commit that referenced this pull request May 31, 2025

Reset data chunk id counter for import process (#2697)

3877a38

feeblefakie mentioned this pull request May 31, 2025

Backport to branch(3) : Reset data chunk id counter for import process #2716

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reset data chunk id counter for import process #2697

Reset data chunk id counter for import process #2697

Uh oh!

inv-jishnu commented May 26, 2025 •

edited by ypeckstadt

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 29, 2025

Uh oh!

Copilot AI May 29, 2025

Uh oh!

komamitsu May 29, 2025

Uh oh!

inv-jishnu May 29, 2025

Uh oh!

inv-jishnu commented May 29, 2025

Uh oh!

komamitsu left a comment

Uh oh!

ypeckstadt left a comment

Uh oh!

brfrn169 left a comment

Uh oh!

Torch3333 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Reset data chunk id counter for import process #2697

Reset data chunk id counter for import process #2697

Uh oh!

Conversation

inv-jishnu commented May 26, 2025 • edited by ypeckstadt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues and/or PRs

Changes made

Checklist

Additional notes (optional)

Release notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 29, 2025

Choose a reason for hiding this comment

Uh oh!

komamitsu May 29, 2025

Choose a reason for hiding this comment

Uh oh!

inv-jishnu May 29, 2025

Choose a reason for hiding this comment

Uh oh!

inv-jishnu commented May 29, 2025

Uh oh!

komamitsu left a comment

Choose a reason for hiding this comment

Uh oh!

ypeckstadt left a comment

Choose a reason for hiding this comment

Uh oh!

brfrn169 left a comment

Choose a reason for hiding this comment

Uh oh!

Torch3333 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

inv-jishnu commented May 26, 2025 •

edited by ypeckstadt

Loading