-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import: still creates .dvc file even when operation fails #9785
import: still creates .dvc file even when operation fails #9785
Comments
See https://discord.com/channels/485586884165107732/485596304961962003/1135482055048826950 for more context. |
I think we do want to create a Creating the Right now the
on failed downloads. |
Related to #9482 |
Looks like this was introduced in 3aa9f51. Before that, it will fail like:
@farhanhubble We can create the |
@efiop So we need to suppress the errors for studio but seems like we are suppressing them for dvc cli in cases where we shouldn't? |
@dberenbaum so if I do a two step import:
|
No, sorry for the confusion @farhanhubble. Currently, it's working like this: dvc import --no-download # -> Creates a .dvc with empty outs
dvc update # -> Corrupts the .dvc outs on failed download The previous expected behavior was: dvc import --no-download # -> Creates a .dvc with empty outs
dvc update # -> Leaves .dvc untouched on failed download |
It's coming from here: Lines 60 to 64 in b235487
That throws an error but it gets suppressed ( Edit: I found that it's coming from here: Line 291 in b235487
Edit 2: Even if we remove that suppression, we still catch the exception here and fail to raise it 🤦 : Lines 367 to 369 in b235487
|
Looks like it's ultimately coming from iterative/dvc-data#409. Was this for studio @efiop? Can we handle these exceptions more carefully and raise them? |
@dberenbaum Yeah, that's what we've discussed today partially. I'll introduce a mechanism for that shortly. |
Let's add a test for this please. |
I've reverted for now so we don't have to wait, but will introduce a proper onerror mechanism to that later. |
The way I see it. Revert is also a fix of this issue (it's a detail how specifically we made a change), but we want to have a safeguard to avoid this in the future, especially when we implement the |
@shcheklein I wasn't replying to you, we just posted at the same time, sorry for the confusion. The test should likely be in datafs. I'll reopen this when handling iterative/dvc-data#413 if need be. |
Update: it's a bigger problem that's not really about error handling at all. We are corrupting the cache, which is probably at the root of a lot of the recent issues. $ dvc import git@github.com:dberenbaum/dataset2.git cats-dogs
Importing 'cats-dogs (git@github.com:dberenbaum/dataset2.git)' -> 'cats-dogs'
WARNING: 'cats-dogs' is empty.
$ cat cats-dogs.dvc
md5: 80b00298143632f1990d8979ea98a437
frozen: true
deps:
- path: cats-dogs
repo:
url: git@github.com:dberenbaum/dataset2.git
rev_lock: 00db5587e8b4ee2f78421523c7db7434727e4774
outs:
- md5: d751713988987e9331980363e24189ce.dir
size: 0
nfiles: 0
hash: md5
path: cats-dogs
$ cat .dvc/cache/files/md5/d7/51713988987e9331980363e24189ce.dir
[]% Edit: so we aren't getting any s3 errors here because dvc thinks it's an empty directory. |
I guess it's still related since the workflow is:
|
@dberenbaum Great catch! That indeed answers it. The error should mitigate that. I'm looking into solving it properly. |
This is actually the same underlying issue for #9651 and #9786 |
@daavoo Let's keep open for now to track the different use cases for testing and to double check that each scenario works as expected with the fix. One question I have so far is how we can handle caches that have already been corrupted in this way? |
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes iterative#413 Related iterative/dvc#9785
Also fixes legacy TreeError/DataIndexError mess and doesn't mark directories that we've failed to load as loaded in cache, preventing cache corruption. Kudos @dberenbaum for investigation. Note that behaviour in iterative/dvc#9785 is not fixed by this, because it is more related to how we treat this error in datafs, dvcfs and also how fsspec's `fs.walk()` ignores broken directories. So we will look into maybe raising `DataIndexDirError` instead of treating it as a broken directory and ignoring in `datafs/dvcfs.ls`. Fixes #413 Related iterative/dvc#9785
This isn't really corrupting cache. (DVC is not generating a .dir file that is supposed to contain files and then skipping files the in that .dir)
This is backwards. The issue is that we are getting s3 errors (in dvc-data), but those errors were being suppressed and dvc-data was just returning that the directory was empty (instead of re-raising the errors to DVC). The error handling has already been fixed in dvc-data by @efiop, and once the corresponding changes are made in DVC all of the import-related symptoms for this underlying issue should be resolved without the user needing to do anything. After the error handling is fixed, |
We were corrupting the "index cache" in |
Bug Report
Description
Even when
dvc import
fails, it still creates the.dvc
file.Reproduce
Without AWS sandbox credentials set, try
dvc import -v git@github.com:dberenbaum/dataset.git cats-dogs
. Acats-dogs.dvc
file is generated.Expected
No
cats-dogs.dvc
file to be created.The text was updated successfully, but these errors were encountered: