-
Couldn't load subscription status.
- Fork 1.3k
Description
DVC version: 0.81.3, ubuntu 18.04 LTS, pip install, python 3.7.4
When specifying a remote object on s3 as a cached output, an ETag mismatch error is sometimes raised during the caching stage. E.g., observe the following debug output.
DEBUG: Removing s3://XXXXX/artifacts/test/train/models/model.01.ckpt.data-00000-of-00002
DEBUG: Created 'copy': s3://XXXXX/s3cache/ee/28f5ee86fc9aaca1aca65e64abde58 -> s3://XXXXX/artifacts/test/train/models/model.01.ckpt.data-00000-of-00002
DEBUG: cache 's3://XXXXX/s3cache/82/ed1912264aec805954939a0c84b5ff' expected '82ed1912264aec805954939a0c84b5ff' actual 'None'
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(429,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
ERROR: failed to run command - ETag mismatch detected when copying file to cache! (expected: '82ed1912264aec805954939a0c84b5ff', actual: '284b1e7130c3689baad589f9de093810-11')In this example,
s3://XXXXX/artifacts/test/train/models/model.01.ckpt.data-00000-of-00002 is 81.4 MB. When dvc attempts to cache the file, it produces a multipart copy with a mismatched ETAG.
I believe this is because s3.copy
Line 172 in 482473a
| s3.copy(source, to_info.bucket, to_info.path, ExtraArgs=extra_args) |
I've patched the problem locally for myself with the following lines:
import boto3
s3.copy(
source, to_info.bucket, to_info.path, ExtraArgs=extra_args,
Config=boto3.s3.transfer.TransferConfig(
multipart_threshold=1024**3
)
)There are probably more elegant methods to avoid a multipart copy. For instance, https://stackoverflow.com/a/38058798 suggests using put_object.