Skip to content

Conversation

@inv-jishnu
Copy link
Contributor

@inv-jishnu inv-jishnu commented May 26, 2025

Description

This PR introduces a change to reset the data chunk ID counter to 0 at the beginning of each import process.
The fix addresses the issue reported in DLT-16709.

In the API case, since the JVM is not restarted between import processes, the counter would previously continue from the last value, causing the data chunk IDs in a new import to not start from 0. This update ensures the counter is reset at the start of each import so that data chunk IDs always begin at 0.

Related issues and/or PRs

NA

Changes made

Set data chunk id counter to 0 at the beginning of import process.

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

NA

Release notes

NA

@inv-jishnu inv-jishnu self-assigned this May 26, 2025
@inv-jishnu inv-jishnu requested a review from a team as a code owner May 26, 2025 11:22
@inv-jishnu inv-jishnu marked this pull request as draft May 26, 2025 11:22
@Torch3333 Torch3333 removed the request for review from a team May 28, 2025 05:28
@ypeckstadt ypeckstadt marked this pull request as ready for review May 29, 2025 01:23
@ypeckstadt ypeckstadt requested review from a team, Torch3333, brfrn169, Copilot, feeblefakie, komamitsu and ypeckstadt and removed request for a team May 29, 2025 01:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses the issue [DLT-16709] by resetting the data chunk ID counter to 0 at the beginning of each import process so that new imports always start with data chunk ID 0. Key changes include updating three import processors (JSON, JSON Lines, and CSV) to perform the counter reset and adding corresponding tests to verify the behavior.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessorTest.java Added tests to verify that the counter is reset for both STORAGE and TRANSACTION modes for JSON Lines imports.
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonImportProcessorTest.java Added tests confirming the reset behavior for JSON imports across both modes.
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessorTest.java Introduced tests to check the reset of the counter for CSV imports in both STORAGE and TRANSACTION modes.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessor.java Modified the processing method to reset the data chunk counter prior to processing.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/JsonImportProcessor.java Incorporated counter reset logic at the start of processing.
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessor.java Added counter reset code at the beginning of the processing method.

Comment on lines 119 to 123
@Test
void test_importProcessWithStorage_runTwice_CheckDataChunkId() {
params =
ImportProcessorParams.builder()
.scalarDbMode(ScalarDbMode.STORAGE)
Copy link

Copilot AI May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The test cases that verify the data chunk ID reset are very similar across STORAGE and TRANSACTION modes. Consider consolidating these into parameterized tests to reduce repetition and improve maintainability.

Suggested change
@Test
void test_importProcessWithStorage_runTwice_CheckDataChunkId() {
params =
ImportProcessorParams.builder()
.scalarDbMode(ScalarDbMode.STORAGE)
@ParameterizedTest
@MethodSource("provideScalarDbModes")
void test_importProcess_runTwice_CheckDataChunkId(ScalarDbMode scalarDbMode) {
params =
ImportProcessorParams.builder()
.scalarDbMode(scalarDbMode)

Copilot uses AI. Check for mistakes.
// Since the JVM is not restarted between API calls (as in a web application’s API server),
// failing to reset the counter would cause the next import to continue from the previous data
// chunk ID.
dataChunkIdCounter.set(0);
Copy link

Copilot AI May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Similar counter reset logic is present in JsonImportProcessor and CsvImportProcessor. Consider refactoring this common behavior into a shared utility method or base class to promote consistency and simplify future updates.

Copilot uses AI. Check for mistakes.
// Since the JVM is not restarted between API calls (as in a web application’s API server),
// failing to reset the counter would cause the next import to continue from the previous data
// chunk ID.
dataChunkIdCounter.set(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the root problem is this counter is static. It's okay for one-shot CLI command, though. What if the API server receives multiple data-load requests simultaneously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@komamitsu san,
You are right. It is better to remove static than the change I added. I have removed the changes and also removed static from dataChunkIdCounter object.

@inv-jishnu inv-jishnu requested a review from komamitsu May 29, 2025 06:48
@inv-jishnu
Copy link
Contributor Author

@komamitsu san,
I have made changes based on feedback.
Please take a look at this again when you get a chance.
Thank you.

Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link
Contributor

@ypeckstadt ypeckstadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Copy link
Contributor

@Torch3333 Torch3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@ypeckstadt ypeckstadt merged commit 7b62ad5 into master May 31, 2025
55 checks passed
@ypeckstadt ypeckstadt deleted the feat/data-loader/core-reset-data-chunk-id-counter branch May 31, 2025 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants