-
Notifications
You must be signed in to change notification settings - Fork 40
Add the import process implementation for data loader #2462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
komamitsu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a minor comment. But other than that, LGTM! 👍
| @@ -0,0 +1,2 @@ | |||
| transaction.batch.thread.pool.size=16 | |||
| import.data.chunk.queue.size=256 No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] A file should end with a newline https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a new line.
Thank you.
| private ImportDataChunkStatus processDataChunkWithTransactions( | ||
| ImportDataChunk dataChunk, int transactionBatchSize, int numCores) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer use numCores. Could you please remove it?
| Instant startTime = Instant.now(); | ||
| AtomicInteger successCount = new AtomicInteger(0); | ||
| AtomicInteger failureCount = new AtomicInteger(0); | ||
| ExecutorService recordExecutor = Executors.newFixedThreadPool(numCores); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use the number of cores here for the same reason mentioned in #2462 (comment).
| * <p>This class reads properties from a {@code config.properties} file located in the classpath. | ||
| */ | ||
| public class ConfigUtil { | ||
| public static final String CONFIG_PROPERTIES = "config.properties"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, are we using a fixed configuration file? Could you please explain why we don’t use command-line arguments for the queue size and thread pool size?
Anyway, if we’re using a file for the configurations, I think we should pass the configuration file name via command-line arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should pass the configuration file name via command-line arguments.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brfrn169 san, @komamitsu san,
Thank you for the suggestions.
I have updated to use command line arguments to configure queue size(added new parameter) and thread pool size (there was an existing parameter for this, but I didn't use that as the initial new data loader changes had usage of number of cores in it. So I didn't change that). I have removed configurable by properties file completely. Asking the users to add a properties file for just 2 parameter while rest are passed as arguments seemed confusing. I have also remove numCores from method arguments and instead use the value directly as instructed.
|
@brfrn169 san, |
brfrn169
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
ypeckstadt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
Torch3333
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Co-authored-by: Peckstadt Yves <peckstadt.yves@gmail.com>
Co-authored-by: Peckstadt Yves <peckstadt.yves@gmail.com>
Co-authored-by: Peckstadt Yves <peckstadt.yves@gmail.com>
Co-authored-by: Peckstadt Yves <peckstadt.yves@gmail.com>
Co-authored-by: Peckstadt Yves <peckstadt.yves@gmail.com>
Description
In this PR I have added import processes to process the import file based on the file format and related dtos and util files for it.
Related issues and/or PRs
Please review this PR once the below PRs are reviewed and merged and master branch is merged to this branch with those changes.
Some more information on data chunk and transaction size
The data chunk size and transaction size are introduced in new changes. The data chunk size is specified is used to split the input files to data chunks of specified size. If the scalardb mode is transaction, the records in each data chunk is processed as transactions. The records are then further split up based on transaction size and are processed together as a single transaction.
Changes made
Added classes to process the import source file based on the file format and related dtos and util classes
Checklist
Additional notes (optional)
Road map to merge remaining data loader core files. Current status
General
Export
Import
Release notes
NA