Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3: fix for corrupted cache on multipart uploads #1867

Merged
merged 8 commits into from Apr 15, 2019

Conversation

4 participants
@olveirap
Copy link
Contributor

commented Apr 10, 2019

Added a check for when the external output of a script to s3 is multipart, and setting the multipart chunksize to the correct one to recover the same ETag.

Also set the multipart threshold to ContentLength + 1 of the object if it's single part to avoid the copy to do it in multiple parts.

Added an exception for when the output object's ETag doesn't match the one copied to the cache.

Patch for s3 in #1410. Multipart uploads of variable size not supported

Show resolved Hide resolved dvc/remote/s3.py Outdated
@pared

This comment has been minimized.

Copy link
Collaborator

commented Apr 11, 2019

I think some test would be appropriate for this one, but I guess we need test s3 instance. Am I right on this one @efiop ?

@efiop

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

@pared Indeed, we've agreed that I'll add a test for it, since our s3 testing is not transparent and requires access to our s3 account. I'll send one ASAP.

olveirap and others added some commits Apr 6, 2019

s3: fixed wrong etag when copying multipart objects
The etag of multipart objects depends of the number of parts, when copying to the cache we should do so in the same number of parts that the original object was moved/uploaded in.

Fixes part of #1410
@pared

pared approved these changes Apr 12, 2019

s3: use multipart copy to preserve etags
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
test: add tests for etag preservation on s3
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
@efiop

efiop approved these changes Apr 15, 2019

efiop added some commits Apr 15, 2019

test: requirements: use dev version moto
Specifically because of this spulec/moto#1941

Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
test: requirements: install dev moto without -e
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
test: stop using moto
Turned out to be quite buggy.

Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>

@efiop efiop merged commit 75b4d0e into iterative:master Apr 15, 2019

3 checks passed

Travis CI - Pull Request Build Passed
Details
codeclimate Approved by Ruslan Kuprieiev.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@efiop efiop self-assigned this Apr 15, 2019

@efiop efiop added this to In progress in Weekly tasks via automation Apr 15, 2019

@efiop

This comment has been minimized.

Copy link
Member

commented Apr 15, 2019

@olveirap Adjusted your patch to support multiparts with variable chunk size and added a test for that to our test suit. We will release this during this week. Thank you so much! 🙂

rpip added a commit to rpip/dvc that referenced this pull request Apr 16, 2019

Merge branch 'master' into 1864-display-locked-stages
* master: (45 commits)
  s3: fix for corrupted cache on multipart uploads (iterative#1867)
  Update PULL_REQUEST_TEMPLATE.md
  Create PULL_REQUEST_TEMPLATE.md
  Delete pull_request_template.md
  github: add PR template with a small checklist
  ignore: introduce .dvcignore, apply when searching for stage files
  add: dont use 'file' as var name during iteration
  test: move corrupted hardlink cache test test_repro -> test_run
  test_repro: make cache use hardlink
  cache: make reflink, copy default links
  diff: cli: add documentation link
  Update README.rst
  tests: be paranoic with caplog
  dvc: bump to 0.35.7
  dvc: deprecate --remove-outs
  logger: use exception instead of error
  version: show when the release version is dirty
  'till'=>'until' in `b_ref` arg. help text
  Adds space after period. (See previous commit.)
  Changes ',' for '.' in `-t` help text.
  ...

@efiop efiop moved this from In progress to Done in Weekly tasks Apr 16, 2019

@olveirap olveirap deleted the olveirap:fix-1410 branch Apr 16, 2019

@efiop

This comment has been minimized.

Copy link
Member

commented May 12, 2019

@olveirap Just a heads up: 0.40.0 with this fix is out. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.