-
Notifications
You must be signed in to change notification settings - Fork 40
Reset data chunk id counter for import process #2697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset data chunk id counter for import process #2697
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses the issue [DLT-16709] by resetting the data chunk ID counter to 0 at the beginning of each import process so that new imports always start with data chunk ID 0. Key changes include updating three import processors (JSON, JSON Lines, and CSV) to perform the counter reset and adding corresponding tests to verify the behavior.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessorTest.java | Added tests to verify that the counter is reset for both STORAGE and TRANSACTION modes for JSON Lines imports. |
| data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonImportProcessorTest.java | Added tests confirming the reset behavior for JSON imports across both modes. |
| data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessorTest.java | Introduced tests to check the reset of the counter for CSV imports in both STORAGE and TRANSACTION modes. |
| data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessor.java | Modified the processing method to reset the data chunk counter prior to processing. |
| data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/JsonImportProcessor.java | Incorporated counter reset logic at the start of processing. |
| data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessor.java | Added counter reset code at the beginning of the processing method. |
| @Test | ||
| void test_importProcessWithStorage_runTwice_CheckDataChunkId() { | ||
| params = | ||
| ImportProcessorParams.builder() | ||
| .scalarDbMode(ScalarDbMode.STORAGE) |
Copilot
AI
May 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The test cases that verify the data chunk ID reset are very similar across STORAGE and TRANSACTION modes. Consider consolidating these into parameterized tests to reduce repetition and improve maintainability.
| @Test | |
| void test_importProcessWithStorage_runTwice_CheckDataChunkId() { | |
| params = | |
| ImportProcessorParams.builder() | |
| .scalarDbMode(ScalarDbMode.STORAGE) | |
| @ParameterizedTest | |
| @MethodSource("provideScalarDbModes") | |
| void test_importProcess_runTwice_CheckDataChunkId(ScalarDbMode scalarDbMode) { | |
| params = | |
| ImportProcessorParams.builder() | |
| .scalarDbMode(scalarDbMode) |
| // Since the JVM is not restarted between API calls (as in a web application’s API server), | ||
| // failing to reset the counter would cause the next import to continue from the previous data | ||
| // chunk ID. | ||
| dataChunkIdCounter.set(0); |
Copilot
AI
May 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Similar counter reset logic is present in JsonImportProcessor and CsvImportProcessor. Consider refactoring this common behavior into a shared utility method or base class to promote consistency and simplify future updates.
| // Since the JVM is not restarted between API calls (as in a web application’s API server), | ||
| // failing to reset the counter would cause the next import to continue from the previous data | ||
| // chunk ID. | ||
| dataChunkIdCounter.set(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the root problem is this counter is static. It's okay for one-shot CLI command, though. What if the API server receives multiple data-load requests simultaneously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@komamitsu san,
You are right. It is better to remove static than the change I added. I have removed the changes and also removed static from dataChunkIdCounter object.
|
@komamitsu san, |
komamitsu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
ypeckstadt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
brfrn169
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Torch3333
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Description
This PR introduces a change to reset the data chunk ID counter to 0 at the beginning of each import process.
The fix addresses the issue reported in DLT-16709.
In the API case, since the JVM is not restarted between import processes, the counter would previously continue from the last value, causing the data chunk IDs in a new import to not start from 0. This update ensures the counter is reset at the start of each import so that data chunk IDs always begin at 0.
Related issues and/or PRs
NA
Changes made
Set data chunk id counter to 0 at the beginning of import process.
Checklist
Additional notes (optional)
NA
Release notes
NA