-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUT_SYNC does not work with S3 #129
Comments
we experimented same issue with operation.PUT |
This work was done a while ago, but I dropped the ball on getting it reviewed: #134 The work still needs to be rebased (a lot has moved since then) and tested. |
…k size for calculating checksum
We have brainstormed several potential solutions to solve this issue. The current plan is to try to implement #2, and subsequently #1, which will be the most efficient option. Currently, #4, is already hypothetically working thanks to acnewton's pull request from 2020 (#134), which will be the fallback option if #1, or #2 don't work. 1. Multi-read from S3 -> Multi-write to iRODS
2. Single stream/read from S3 -> Multi-write to iRODS
3. Multi-read from S3 -> Single stream/write to iRODS
4. Single stream/read from S3 -> Single stream/write to iRODS
5. Download/Upload
6. Register->Replicate->Trim
|
Added the functionality to transfer the data from an object in S3 to an object in iRODS. Not only registering the file in place. Using Minio library to do the transfer. It is also possible to append to a file by setting a certain offset. The md5sum hash is calculated during streaming. Final calculated hash is compared to the Etag header from the S3 object. Note that this Etag is not always the md5sum, when the file has been uploaded via multipart upload. A more general way of comparing checksum will be necessary.
- Added cli option for Amazon S3 multipart upload file chunk size for calculating checksum - Changes based on pr review, cleaned up code and some error handling - Reverted behavior for null operation and changed wording for multipart - Rewording multipart option and TODO for no-op
Added the functionality to transfer the data from an object in S3 to an object in iRODS. Not only registering the file in place. Using Minio library to do the transfer. It is also possible to append to a file by setting a certain offset. The md5sum hash is calculated during streaming. Final calculated hash is compared to the Etag header from the S3 object. Note that this Etag is not always the md5sum, when the file has been uploaded via multipart upload. A more general way of comparing checksum will be necessary.
- Added cli option for Amazon S3 multipart upload file chunk size for calculating checksum - Changes based on pr review, cleaned up code and some error handling - Reverted behavior for null operation and changed wording for multipart - Rewording multipart option and TODO for no-op
The multi-stream solution will be handled in the work for #207. If the initial solution is complete, I think this can be closed, @avkrishnamurthy. |
What we want
Sync data from AWS S3 bucket to iRODS using the PUT_SYNC operation.
What we did
python -m irods_capability_automated_ingest.irods_sync start --ignore_cache --event_handler 'irods_capability_automated_ingest.examples.sync' --synchronous --progress --s3_keypair aws-s3-keypair --s3_region_name eu-west-1 --log_filename /home/irods/log/test.log --log_level DEBUG /bucket-name /tempZone/home/rods/target [Elapsed Time: 0:00:01] |####################################################################################################################################################| (Time: 0:00:01) count: 1 tasks: ------ failures: 1 retries: ------ (rodssync) 13:29 myhost:/home/irods
What we expected
Successful synchronization: file copied from S3 bucket to iRODS target collection.
What we got
Error log that the file can not be found locally.
Possible solution
It seems that functionality to download/stream data first from S3 in upload_file() and sync_file() functions in irods_capability_automated_ingest/sync_irods.py is missing.
The text was updated successfully, but these errors were encountered: