-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding external data on S3 fails #3341
Comments
@shcheklein It is probably some kind of acl issue, I don't think the problem is on dvc side. |
Closing for now, as we've suggested an alternative approach and there is not enough info to pinpoint the issue more than we've done already. |
Missed some details, reopening. |
Was able to reproduce the part with "Removing ...":
then:
cache after that:
When you run it second time:
We need to check why |
Don't see a reasonable explanation for these problems so far, would need way more info about config
and
|
@shcheklein, @efiop, Do we have more information regarding the original issue? |
@skshetry That is pretty much all we have. If it doesn't make sense/not reproducible, feel free to close it. |
There are two problems, @skshetry . I kept them here together in case they are related. I was able to reproduce one of those, like I mentioned (at least I don't see a reasonable explanation why is it "Removing" stuff). This part:
might be p1 if it's confirmed. As for the HEAD operation fail and a few other problems - we definitely need to do more research and try similar scenarios. Is this part p1? - I don't know - would need to dig for an hour or two to say something. If go that path I would try versioned buckets first and would contact the guy on Discord. |
We do move the file from storage to the cache (i.e. copy and then remove the file from storage), and then relinked again from the cache to storage. Lines 426 to 428 in 682275d
As this happens on You can see at the following frames (top - bottom depth, see last two lines):
It can also be verified with a quick Log for single file upload
Log for folder uploads
➜ dvc add s3://dvc-temp/saugat/storage/data -v
2020-03-13 15:52:45,052 DEBUG: PRAGMA user_version;
2020-03-13 15:52:45,052 DEBUG: fetched: [(3,)]
2020-03-13 15:52:45,053 DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
2020-03-13 15:52:45,053 DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
2020-03-13 15:52:45,053 DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
2020-03-13 15:52:45,054 DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
2020-03-13 15:52:45,054 DEBUG: PRAGMA user_version = 3;
2020-03-13 15:52:51,487 DEBUG: cache 's3://dvc-temp/saugat/tmp-1/cache/63/4763d85ae67510ffd8dfad284fa239.dir' expected '634763d85ae67510ffd8dfad284fa239.dir' actual 'None'
2020-03-13 15:52:51,488 WARNING: Output 's3://dvc-temp/saugat/storage/data' of 'data.dvc' changed because it is 'not in cache'
2020-03-13 15:53:04,810 DEBUG: Uploading '../../../../tmp/tmpj44owyz_' to 's3://dvc-temp/saugat/tmp-1/cache/.VPNaHQGJEdCKTyPxbbZ4gp.tmp'
2020-03-13 15:53:08,628 DEBUG: cache 's3://dvc-temp/saugat/tmp-1/cache/63/4763d85ae67510ffd8dfad284fa239.dir' expected '634763d85ae67510ffd8dfad284fa239.dir' actual 'None'
2020-03-13 15:53:12,151 DEBUG: Removing s3://dvc-temp/saugat/tmp-1/cache/.VPNaHQGJEdCKTyPxbbZ4gp.tmp
2020-03-13 15:53:13,452 DEBUG: {'s3://dvc-temp/saugat/storage/data': 'modified'}
2020-03-13 15:53:18,024 DEBUG: Uploading '../../../../tmp/tmp2i8sptd7' to 's3://dvc-temp/saugat/tmp-1/cache/.BhpWSzEfnS49dm9AcrjzDg.tmp'
2020-03-13 15:53:20,826 DEBUG: cache 's3://dvc-temp/saugat/tmp-1/cache/63/4763d85ae67510ffd8dfad284fa239.dir' expected '634763d85ae67510ffd8dfad284fa239.dir' actual '634763d85ae67510ffd8dfad284fa239'
2020-03-13 15:53:20,827 DEBUG: Computed stage 'data.dvc' md5: 'd09e4e34bb5c2ad7858adb8e2436bff2'
2020-03-13 15:53:21,186 DEBUG: Saving 's3://dvc-temp/saugat/storage/data' to 's3://dvc-temp/saugat/tmp-1/cache/63/4763d85ae67510ffd8dfad284fa239.dir'.
2020-03-13 15:53:24,127 DEBUG: cache 's3://dvc-temp/saugat/tmp-1/cache/d3/b07384d113edec49eaa6238ad5ff00' expected 'd3b07384d113edec49eaa6238ad5ff00' actual 'None'
2020-03-13 15:53:25,864 DEBUG: Removing s3://dvc-temp/saugat/storage/data/foo
2020-03-13 15:53:30,758 DEBUG: Created 'copy': s3://dvc-temp/saugat/tmp-1/cache/d3/b07384d113edec49eaa6238ad5ff00 -> s3://dvc-temp/saugat/storage/data/foo
2020-03-13 15:53:31,008 DEBUG: Saving information to 'data.dvc'.
100% Add|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|1/1 [00:45, 45.97s/file]
To track the changes with git, run:
git add data.dvc
2020-03-13 15:53:31,026 DEBUG: SELECT count from state_info WHERE rowid=?
2020-03-13 15:53:31,026 DEBUG: fetched: [(0,)]
2020-03-13 15:53:31,027 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ? |
Thanks @skshetry , great research!! |
@skshetry thank you for nailing this down! I’d appreciate if you could provide more details on a possible fix and the complexity? |
Similar problem - https://discordapp.com/channels/485586884165107732/485596304961962003/733450578138759268 ... and we do extra copy which is not nice |
@shcheklein Yes, I think eventual consistency is to blame for that too. Our S3 tests have been flakey for quite a while now (probably not these days as we migrated to Another thing to note that the following is not guaranteed in S3 (emphasized), for which I think, we do quite often for external outputs:
|
If we can confirm that, then it means that wrapping at least certain operation into @Retry should help. (And also remove that extra copy after all! :)) |
external outputs are no longer supported. |
To keep track of the problem:
https://discordapp.com/channels/485586884165107732/485596304961962003/678284389230182430
more context here:
https://discordapp.com/channels/485586884165107732/485596304961962003/678261668723032075
The text was updated successfully, but these errors were encountered: