-
Notifications
You must be signed in to change notification settings - Fork 40
Add the import process implementation for data loader #2462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
130 commits
Select commit
Hold shift + click to select a range
753618b
Util classes for data loader
inv-jishnu 8d39d02
Fix spotbug issue
inv-jishnu bf94c49
Removed error message and added core error
inv-jishnu 47be388
Applied spotless
inv-jishnu 913eb1c
Fixed unit test failures
inv-jishnu 1f204b8
Merge branch 'master' into feat/data-loader/utils
ypeckstadt 6cfa83a
Basic data import enum and exception
inv-jishnu d381b2b
Removed exception class for now
inv-jishnu 67f2474
Added DECIMAL_FORMAT
inv-jishnu 14e3593
Path util class updated
inv-jishnu a096d51
Feedback changes
inv-jishnu dbf1940
Merge branch 'master' into feat/data-loader/utils
ypeckstadt cd8add9
Merge branch 'master' into feat/data-loader/utils
ypeckstadt 52890c8
Changes
inv-jishnu 5114639
Merge branch 'master' into feat/data-loader/import-data-1
inv-jishnu 4f9cd75
Merge branch 'feat/data-loader/utils' into feat/data-loader/scaladb-dao
inv-jishnu 1997eb8
Added ScalarDB Dao
inv-jishnu 91e6310
Merge branch 'master' into feat/data-loader/scaladb-dao
inv-jishnu 8a7338b
Remove unnecessary files
inv-jishnu 2b52eeb
Initial commit [skip ci]
inv-jishnu e206073
Changes
inv-jishnu 26d3144
Changes
inv-jishnu b86487d
spotbugs exclude
inv-jishnu 818a2b4
spotbugs exclude -2
inv-jishnu 90c4105
Added a file [skip ci]
inv-jishnu 3d5d3e0
Added unit test files [skip ci]
inv-jishnu 6495202
Spotbug fixes
inv-jishnu 90abd9e
Removed use of List.of to fix CI error
inv-jishnu ba2b3dd
Merged changes from master after resolving conflict
inv-jishnu b1b811b
Merge branch 'master' into feat/data-loader/metadata-service
inv-jishnu 30db988
Applied spotless
inv-jishnu e9bb004
Added export options validator
inv-jishnu 03324e1
Minor change in test
inv-jishnu d6aaf85
Applied spotless on CoreError
inv-jishnu 4439dea
Make constructor private and improve javadocs
ypeckstadt ccb1ace
Improve javadocs
ypeckstadt a374f1a
Add private constructor to TableMetadataUtil
ypeckstadt a65c9b5
Apply spotless fix
ypeckstadt b3279ba
Fix the validation for partition and clustering keys
ypeckstadt 78a8170
Fix spotless format
ypeckstadt acedabe
Partial feedback changes
inv-jishnu bf31a01
Data chunk and task result enums and dtos
inv-jishnu 57cd330
Spotless applied
inv-jishnu 7a39564
Changes
inv-jishnu a95a858
Resolved conflicts and merged latest changes from main
inv-jishnu 093cb1d
Merge branch 'feat/data-loader/scaladb-dao' into feat/data-loader/imp…
inv-jishnu bfebd95
Merge branch 'feat/data-loader/metadata-service' into feat/data-loade…
inv-jishnu fd1c186
Control file files
inv-jishnu e2cc6ac
Added task files and dtos
inv-jishnu 8c75b79
Fix unit test failure
inv-jishnu 98618aa
Fix spot bugs failure
inv-jishnu c05286d
Merge branch 'feat/data-loader/scaladb-dao' into feat/data-loader/exp…
inv-jishnu 0d3f79e
Export tasks added
inv-jishnu 2365460
Merge branch 'feat/data-loader/metadata-service' into feat/data-loade…
inv-jishnu 95022a9
Initial commit [skip ci]
inv-jishnu be4583c
Added transaction batch dtos
inv-jishnu 89fea78
Added changes
inv-jishnu 29a8c25
Fix spot less issue
inv-jishnu 45adc95
Merge branch 'master' into feat/data-loader/export-tasks
inv-jishnu 5568a7b
Merge branch 'feat/data-loader/import-utils-dtos' into feat/data-load…
inv-jishnu 67dcb06
Merge branch 'master' into feat/data-loader/scaladb-dao
ypeckstadt 2b58dcb
Initial commit
inv-jishnu cebb543
Merge branch 'master' into feat/data-loader/control-file
ypeckstadt 8ecb39c
Changes -1
inv-jishnu c7ba6c8
Description added
inv-jishnu a566ef2
Code updated to support java 8
inv-jishnu f6c54ec
Updated test code to remove warning
inv-jishnu b92758c
Merged latest changes from main after resolving conflicts
inv-jishnu ee252d2
Added import manager
inv-jishnu 90c4830
Changes added
inv-jishnu 39c43de
Removed scalardb manager file
inv-jishnu 3fe30a3
Merge branch 'master' into feat/data-loader/scaladb-dao
inv-jishnu 4df4acd
Removed wildcard import
inv-jishnu 53cd523
Merge branch 'master' into feat/data-loader/scaladb-dao
inv-jishnu f4f253e
Changes
inv-jishnu 6d43bdc
Merge branch 'master' into feat/data-loader/scaladb-dao
inv-jishnu 9c4ae23
Resolved conflicts and merge latest changes from master
inv-jishnu c9d01cb
Added default case in switch to resolve sportbugs warning
inv-jishnu 50be8fd
Merge branch 'master' into feat/data-loader/import-process
inv-jishnu 9224c7b
Merge branch 'master' into feat/data-loader/import-task
inv-jishnu 5e61fd1
Merge branch 'master' into feat/data-loader/control-file
inv-jishnu f024670
Merge branch 'feat/data-loader/scaladb-dao' into feat/data-loader/con…
inv-jishnu d453e6c
Change wildcard imports
inv-jishnu 39ecefe
Merge changes from master after resolving conflicts
inv-jishnu 0984d51
Resolved conflicts and merged latest changes from main
inv-jishnu aadf3e1
Resolved conflicts and merged latest changes from master
inv-jishnu 6998b68
Changes
inv-jishnu 3c79ab6
Merged changes from master after resolving conflicts
inv-jishnu 3ff03d9
Merge branch 'master' into feat/data-loader/export-tasks
inv-jishnu f3fb8d8
Merge branch 'master' into feat/data-loader/control-file
inv-jishnu 6dd2ce2
Resolved conflicts and merged changes from branch feat/data-loader/co…
inv-jishnu 1996865
Reverted new line removal
inv-jishnu 8a1e6b9
Merge branch 'feat/data-loader/import-task' into feat/data-loader/imp…
inv-jishnu 3187847
Changes to util function calls
inv-jishnu da2e241
Merge export tasks branch after resolving conflicts
inv-jishnu 7d7ec91
Revert "Merge export tasks branch after resolving conflicts"
inv-jishnu 31094b1
Resolved conflicts and merged latest changes from main
inv-jishnu aebcef6
Removing unwanted changes [skip ci]
inv-jishnu b5134b1
Changes
inv-jishnu 285f51d
Java doc minor change [skip ci]
inv-jishnu 01ce7d3
Merged changes from feat/data-loader/import-task after resolving conf…
inv-jishnu 9cadea4
Constant file added
inv-jishnu 40fde36
Changes from master merged after resolving conflicts
inv-jishnu 0dd2956
Error messages and adding java docs
inv-jishnu 9d3ffb1
Resolved conflicts and merged latest changes from master
inv-jishnu 868d9b5
Column util correction
inv-jishnu e4cd7fe
Minor corrections
inv-jishnu d0a73a3
Changes
inv-jishnu bffa85b
gradle change reverted
inv-jishnu 328afe5
Resolved conflicts and merged changes from master
inv-jishnu adc7e56
Spotless applied
inv-jishnu 5b61876
Fixed unit test
inv-jishnu b9842be
Reverted try-catch changes
inv-jishnu 16ae46d
Optimizations
inv-jishnu 6b2536e
Error message changes and further optimizations
inv-jishnu 6aea83c
Improve javadocs for the data loader import process
ypeckstadt 851b691
Changes added
inv-jishnu c835730
Removed unused test util methods [skip ci]
inv-jishnu ff87a9a
Merge branch 'master' into feat/data-loader/import-process
inv-jishnu 8f7adc8
Fixed spotbugs test issues
inv-jishnu 3aff018
reader data updated [skip ci]
inv-jishnu 35a758f
Merge branch 'master' into feat/data-loader/import-process
inv-jishnu 24bfa37
Changes
inv-jishnu 05ac8ff
Merge branch 'master' into feat/data-loader/import-process
inv-jishnu d9f239c
Thread exexcuter changes
inv-jishnu 723bd51
Changed few values to be configurable
inv-jishnu 450aaea
Added new line
inv-jishnu aeaa08f
reverted config utils and add CLI options
inv-jishnu 44bf503
Updated tests
inv-jishnu a5c0b91
Removed explict passing of thread size and use it directly
inv-jishnu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
...ader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/ImportEventListener.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| package com.scalar.db.dataloader.core.dataimport; | ||
|
|
||
| import com.scalar.db.dataloader.core.dataimport.datachunk.ImportDataChunkStatus; | ||
| import com.scalar.db.dataloader.core.dataimport.task.result.ImportTaskResult; | ||
| import com.scalar.db.dataloader.core.dataimport.transactionbatch.ImportTransactionBatchResult; | ||
| import com.scalar.db.dataloader.core.dataimport.transactionbatch.ImportTransactionBatchStatus; | ||
|
|
||
| /** | ||
| * Listener interface for monitoring import events during the data loading process. Implementations | ||
| * can use this to track progress and handle various stages of the import process. | ||
| */ | ||
| public interface ImportEventListener { | ||
|
|
||
| /** | ||
| * Called when processing of a data chunk begins. | ||
| * | ||
| * @param status the current status of the data chunk being processed | ||
| */ | ||
| void onDataChunkStarted(ImportDataChunkStatus status); | ||
|
|
||
| /** | ||
| * Updates or adds new status information for a data chunk. | ||
| * | ||
| * @param status the updated status information for the data chunk | ||
| */ | ||
| void addOrUpdateDataChunkStatus(ImportDataChunkStatus status); | ||
|
|
||
| /** | ||
| * Called when processing of a data chunk is completed. | ||
| * | ||
| * @param status the final status of the completed data chunk | ||
| */ | ||
| void onDataChunkCompleted(ImportDataChunkStatus status); | ||
|
|
||
| /** | ||
| * Called when all data chunks have been processed. This indicates that the entire chunked import | ||
| * process is complete. | ||
| */ | ||
| void onAllDataChunksCompleted(); | ||
|
|
||
| /** | ||
| * Called when processing of a transaction batch begins. | ||
| * | ||
| * @param batchStatus the initial status of the transaction batch | ||
| */ | ||
| void onTransactionBatchStarted(ImportTransactionBatchStatus batchStatus); | ||
|
|
||
| /** | ||
| * Called when processing of a transaction batch is completed. | ||
| * | ||
| * @param batchResult the result of the completed transaction batch | ||
| */ | ||
| void onTransactionBatchCompleted(ImportTransactionBatchResult batchResult); | ||
|
|
||
| /** | ||
| * Called when an import task is completed. | ||
| * | ||
| * @param taskResult the result of the completed import task | ||
| */ | ||
| void onTaskComplete(ImportTaskResult taskResult); | ||
| } |
183 changes: 183 additions & 0 deletions
183
data-loader/core/src/main/java/com/scalar/db/dataloader/core/dataimport/ImportManager.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,183 @@ | ||
| package com.scalar.db.dataloader.core.dataimport; | ||
|
|
||
| import com.scalar.db.api.DistributedStorage; | ||
| import com.scalar.db.api.DistributedTransactionManager; | ||
| import com.scalar.db.api.TableMetadata; | ||
| import com.scalar.db.dataloader.core.ScalarDBMode; | ||
| import com.scalar.db.dataloader.core.dataimport.dao.ScalarDBDao; | ||
| import com.scalar.db.dataloader.core.dataimport.datachunk.ImportDataChunkStatus; | ||
| import com.scalar.db.dataloader.core.dataimport.processor.ImportProcessor; | ||
| import com.scalar.db.dataloader.core.dataimport.processor.ImportProcessorFactory; | ||
| import com.scalar.db.dataloader.core.dataimport.processor.ImportProcessorParams; | ||
| import com.scalar.db.dataloader.core.dataimport.processor.TableColumnDataTypes; | ||
| import com.scalar.db.dataloader.core.dataimport.task.result.ImportTaskResult; | ||
| import com.scalar.db.dataloader.core.dataimport.transactionbatch.ImportTransactionBatchResult; | ||
| import com.scalar.db.dataloader.core.dataimport.transactionbatch.ImportTransactionBatchStatus; | ||
| import java.io.BufferedReader; | ||
| import java.util.ArrayList; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.concurrent.ConcurrentHashMap; | ||
| import lombok.AllArgsConstructor; | ||
| import lombok.NonNull; | ||
|
|
||
| /** | ||
| * Manages the data import process and coordinates event handling between the import processor and | ||
| * listeners. This class implements {@link ImportEventListener} to receive events from the processor | ||
| * and relay them to registered listeners. | ||
| * | ||
| * <p>The import process involves: | ||
| * | ||
| * <ul> | ||
| * <li>Reading data from an input file | ||
| * <li>Processing the data in configurable chunk sizes | ||
| * <li>Managing database transactions in batches | ||
| * <li>Notifying listeners of various import events | ||
| * </ul> | ||
| */ | ||
| @AllArgsConstructor | ||
| public class ImportManager implements ImportEventListener { | ||
|
|
||
| @NonNull private final Map<String, TableMetadata> tableMetadata; | ||
| @NonNull private final BufferedReader importFileReader; | ||
| @NonNull private final ImportOptions importOptions; | ||
| private final ImportProcessorFactory importProcessorFactory; | ||
| private final List<ImportEventListener> listeners = new ArrayList<>(); | ||
| private final ScalarDBMode scalarDBMode; | ||
| private final DistributedStorage distributedStorage; | ||
| private final DistributedTransactionManager distributedTransactionManager; | ||
| private final ConcurrentHashMap<Integer, ImportDataChunkStatus> importDataChunkStatusMap = | ||
| new ConcurrentHashMap<>(); | ||
|
|
||
| /** | ||
| * Starts the import process using the configured parameters. | ||
| * | ||
| * <p>If the data chunk size in {@link ImportOptions} is set to 0, the entire file will be | ||
| * processed as a single chunk. Otherwise, the file will be processed in chunks of the specified | ||
| * size. | ||
| * | ||
| * @return a map of {@link ImportDataChunkStatus} objects containing the status of each processed | ||
| * chunk | ||
| */ | ||
| public ConcurrentHashMap<Integer, ImportDataChunkStatus> startImport() { | ||
| ImportProcessorParams params = | ||
| ImportProcessorParams.builder() | ||
| .scalarDBMode(scalarDBMode) | ||
| .importOptions(importOptions) | ||
| .tableMetadataByTableName(tableMetadata) | ||
| .dao(new ScalarDBDao()) | ||
| .distributedTransactionManager(distributedTransactionManager) | ||
| .distributedStorage(distributedStorage) | ||
| .tableColumnDataTypes(getTableColumnDataTypes()) | ||
| .build(); | ||
| ImportProcessor processor = importProcessorFactory.createImportProcessor(params); | ||
| processor.addListener(this); | ||
| // If the data chunk size is 0, then process the entire file in a single data chunk | ||
| int dataChunkSize = | ||
| importOptions.getDataChunkSize() == 0 | ||
| ? Integer.MAX_VALUE | ||
| : importOptions.getDataChunkSize(); | ||
| return processor.process( | ||
| dataChunkSize, importOptions.getTransactionBatchSize(), importFileReader); | ||
| } | ||
|
|
||
| /** | ||
| * Registers a new listener to receive import events. | ||
| * | ||
| * @param listener the listener to add | ||
| * @throws IllegalArgumentException if the listener is null | ||
| */ | ||
| public void addListener(ImportEventListener listener) { | ||
| listeners.add(listener); | ||
| } | ||
|
|
||
| /** | ||
| * Removes a previously registered listener. | ||
| * | ||
| * @param listener the listener to remove | ||
| */ | ||
| public void removeListener(ImportEventListener listener) { | ||
| listeners.remove(listener); | ||
| } | ||
|
|
||
| /** {@inheritDoc} Forwards the event to all registered listeners. */ | ||
| @Override | ||
| public void onDataChunkStarted(ImportDataChunkStatus status) { | ||
| for (ImportEventListener listener : listeners) { | ||
| listener.onDataChunkStarted(status); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * {@inheritDoc} Updates or adds the status of a data chunk in the status map. This method is | ||
| * thread-safe. | ||
| */ | ||
| @Override | ||
| public void addOrUpdateDataChunkStatus(ImportDataChunkStatus status) { | ||
| importDataChunkStatusMap.put(status.getDataChunkId(), status); | ||
| } | ||
|
|
||
| /** {@inheritDoc} Forwards the event to all registered listeners. */ | ||
| @Override | ||
| public void onDataChunkCompleted(ImportDataChunkStatus status) { | ||
| for (ImportEventListener listener : listeners) { | ||
| listener.onDataChunkCompleted(status); | ||
| } | ||
| } | ||
|
|
||
| /** {@inheritDoc} Forwards the event to all registered listeners. */ | ||
| @Override | ||
| public void onTransactionBatchStarted(ImportTransactionBatchStatus status) { | ||
| for (ImportEventListener listener : listeners) { | ||
| listener.onTransactionBatchStarted(status); | ||
| } | ||
| } | ||
|
|
||
| /** {@inheritDoc} Forwards the event to all registered listeners. */ | ||
| @Override | ||
| public void onTransactionBatchCompleted(ImportTransactionBatchResult batchResult) { | ||
| for (ImportEventListener listener : listeners) { | ||
| listener.onTransactionBatchCompleted(batchResult); | ||
| } | ||
| } | ||
|
|
||
| /** {@inheritDoc} Forwards the event to all registered listeners. */ | ||
| @Override | ||
| public void onTaskComplete(ImportTaskResult taskResult) { | ||
| for (ImportEventListener listener : listeners) { | ||
| listener.onTaskComplete(taskResult); | ||
| } | ||
| } | ||
|
|
||
| /** {@inheritDoc} Forwards the event to all registered listeners. */ | ||
| @Override | ||
| public void onAllDataChunksCompleted() { | ||
| for (ImportEventListener listener : listeners) { | ||
| listener.onAllDataChunksCompleted(); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Returns the current map of import data chunk status objects. | ||
| * | ||
| * @return a map of {@link ImportDataChunkStatus} objects | ||
| */ | ||
| public ConcurrentHashMap<Integer, ImportDataChunkStatus> getImportDataChunkStatus() { | ||
| return importDataChunkStatusMap; | ||
| } | ||
|
|
||
| /** | ||
| * Creates and returns a mapping of table column data types from the table metadata. | ||
| * | ||
| * @return a {@link TableColumnDataTypes} object containing the column data types for all tables | ||
| */ | ||
| public TableColumnDataTypes getTableColumnDataTypes() { | ||
| TableColumnDataTypes tableColumnDataTypes = new TableColumnDataTypes(); | ||
| tableMetadata.forEach( | ||
| (name, metadata) -> | ||
| metadata | ||
| .getColumnDataTypes() | ||
| .forEach((k, v) -> tableColumnDataTypes.addColumnDataType(name, k, v))); | ||
| return tableColumnDataTypes; | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I missed this in the previous PR review, but we should rename this to
ScalarDbMode. We should also renameScalarDBDaoandScalarDBDaoExceptionaccordingly.Could you please handle this in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create another PR for the changes mentioned.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brfrn169 san,
I have created PR #2582 to address this. I will mark it ready for review once this PR is merged.
Thank you.