Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Cloud Storage | CRC32c of large files. #25

Closed
pguedes17 opened this issue Oct 2, 2019 · 3 comments
Closed

Google Cloud Storage | CRC32c of large files. #25

pguedes17 opened this issue Oct 2, 2019 · 3 comments
Assignees
Labels
api: storage Issues related to the googleapis/java-storage API. needs more info This issue needs more information from the customer to proceed. type: question Request for information or clarification. Not an issue.

Comments

@pguedes17
Copy link

Hello,

I'm developing a service to upload files between 1GB and 2GB to Google Cloud Storage.

There's a way to calculate the CRC32hash while i'm writing the stream in the WriteChannel ?

The problem is that always i need to calculate the hash of the whole file before create the blobInfo with the calculated hash.

Here is my current code.

Hash

Thanks everyone!

@JesseLovelace JesseLovelace removed their assignment Oct 18, 2019
@fazunenko
Copy link

I think, the way you upload files is the right way.
Because instances of Blob and BlobInfo classes are immutable objects, you cannot modify Crc32 value of an existing object. So, to enable data validation you need to provide CRC32hash before you open a WriteChannel. The hash calculated after WriteChannel is opened can't be used for validation.

To speed up crc32 hash calculating for your local files I would recommend you to increase the buffer size. 1k seems to be very small size. On my local machine resizing from 1k to 128k reduced hash calculation time from 50 seconds to 4. With the buffer of 1M size calculation took 3 seconds. Experimenting with the size will allow you to figure out the most optimal value.

I hope my answer will help.
Thanks

@fazunenko
Copy link

I also noticed a bug in your code of CRC calculation:

Instead of:
hasher = hasher.putBytes(ByteBuffer.wrap(buffer, 0, limit-1));

should be:
hasher = hasher.putBytes(ByteBuffer.wrap(buffer, 0, limit));

Otherwise, the last byte in each chunk will be ignored.

@athakor athakor transferred this issue from googleapis/google-cloud-java Jan 1, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Jan 1, 2020
@athakor athakor added type: question Request for information or clarification. Not an issue. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Jan 2, 2020
@dmitry-fa dmitry-fa assigned dmitry-fa and unassigned fazunenko Jan 15, 2020
@dmitry-fa dmitry-fa added the needs more info This issue needs more information from the customer to proceed. label Jan 15, 2020
@google-cloud-label-sync google-cloud-label-sync bot added the api: storage Issues related to the googleapis/java-storage API. label Jan 29, 2020
@dmitry-fa
Copy link
Contributor

@Tiozi I hope the question is answered, if you need more details please provide more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/java-storage API. needs more info This issue needs more information from the customer to proceed. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

6 participants