Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The whole metrics show fails if a single dir with metrics is missing #10018

Closed
shcheklein opened this issue Oct 12, 2023 · 2 comments · Fixed by #10375
Closed

The whole metrics show fails if a single dir with metrics is missing #10018

shcheklein opened this issue Oct 12, 2023 · 2 comments · Fixed by #10375
Assignees
Labels
A: experiments Related to dvc exp A: metrics Related to dvc metrics bug Did we break something?

Comments

@shcheklein
Copy link
Member

Take the recent example-get-started-experiments. Change in dvc.lock md5 for results/train directory that among other things has a metrics.json specified in it.

Run dvc metrics show --json:

WARNING: failed to load metrics in revision '', failed to load directory ('results', 'train')
DVC failed to load some metrics for following revisions: ''.
{
  "": {
    "error": {
      "type": "DataIndexDirError",
      "msg": "failed to load directory ('results', 'train')"
    }
  }
}

While there other metrics, and only one of them should be failing, not the whole command / collection.

It's throwing this exception:

2023-10-12 15:29:34,081 DEBUG: failed to load directory ('results', 'train'): https://remote.dvc.org/get-started-pools/files/md5/9e/33e5beef5ce777574de8a1f9aa6c71.dir: 404, message='Not Found', url=URL('https://s3-us-east-2.amazonaws.com/dvc-public/remote/get-started-pools/files/md5/9e/33e5beef5ce777574de8a1f9aa6c71.dir')
Traceback (most recent call last):
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/implementations/http.py", line 417, in _info
    await _file_info(
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/implementations/http.py", line 837, in _file_info
    r.raise_for_status()
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found', url=URL('https://s3-us-east-2.amazonaws.com/dvc-public/remote/get-started-pools/files/md5/9e/33e5beef5ce777574de8a1f9aa6c71.dir')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/index/index.py", line 552, in _load_from_storage
    _load_from_object_storage(trie, entry, storage)
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/index/index.py", line 488, in _load_from_object_storage
    obj = Tree.load(storage.odb, root_entry.hash_info, hash_name=storage.odb.hash_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/hashfile/tree.py", line 193, in load
    with obj.fs.open(obj.path, "r") as fobj:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 228, in open
    return self.fs.open(path, mode=mode, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/spec.py", line 1229, in open
    self.open(
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/spec.py", line 1241, in open
    f = self._open(
        ^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/implementations/http.py", line 356, in _open
    size = size or self.info(path, **kwargs)["size"]
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/asyn.py", line 121, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/asyn.py", line 106, in sync
    raise return_result
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/asyn.py", line 61, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/implementations/http.py", line 430, in _info
    raise FileNotFoundError(url) from exc
FileNotFoundError: https://remote.dvc.org/get-started-pools/files/md5/9e/33e5beef5ce777574de8a1f9aa6c71.dir

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/repo/experiments/collect.py", line 66, in collect_rev
    data = _collect_rev(
           ^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/repo/experiments/collect.py", line 98, in _collect_rev
    return SerializableExp.from_repo(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/repo/experiments/serialize.py", line 57, in from_repo
    params = _gather_params(repo, deps_only=param_deps, on_error="return")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/repo/params/show.py", line 128, in _gather_params
    files_keypaths = _collect_params(
                     ^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/repo/params/show.py", line 80, in _collect_params
    ret.update({file: _params for file in expand_paths(fs, [repo_path])})
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/repo/params/show.py", line 80, in <dictcomp>
    ret.update({file: _params for file in expand_paths(fs, [repo_path])})
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/utils/__init__.py", line 423, in expand_paths
    if fs.isdir(path):
       ^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 191, in isdir
    return self.fs.isdir(path)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/fsspec/spec.py", line 689, in isdir
    return self.info(path)["type"] == "directory"
           ^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/fs/dvc.py", line 371, in info
    return self._info(key, path, ignore_subrepos=ignore_subrepos)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc/fs/dvc.py", line 381, in _info
    dvc_info = dvc_fs.fs.index.info(subkey)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/index/index.py", line 477, in info
    entry = self[key]
            ~~~~^^^^^
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/index/index.py", line 621, in __getitem__
    self._load(dir_key, dir_entry)
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/index/index.py", line 647, in _load
    self.onerror(entry, exc)
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/index/index.py", line 579, in _onerror
    raise exc
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/index/index.py", line 645, in _load
    _load_from_storage(self._trie, entry, self.storage_map[key])
  File "/Users/ivan/Projects/studio/.venv/lib/python3.11/site-packages/dvc_data/index/index.py", line 567, in _load_from_storage
    raise DataIndexDirError(f"failed to load directory {entry.key}") from last_exc
dvc_data.index.index.DataIndexDirError: failed to load directory ('results', 'train')

In Studio it's causing an unexpected error. In VS Code it also means we show the whole commit broken or even worse.

DVC Version

DVC version: 3.25.0 (pip)
-------------------------
Platform: Python 3.11.4 on macOS-13.3.1-arm64-arm-64bit
Subprojects:
	dvc_data = 2.18.1
	dvc_objects = 1.0.1
	dvc_render = 0.6.0
	dvc_task = 0.3.0
	scmrepo = 1.3.1
Supports:
	azure (adlfs = 2023.8.0, knack = 0.11.0, azure-identity = 1.14.0),
	gdrive (pydrive2 = 1.17.0),
	gs (gcsfs = 2023.6.0),
	http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2023.6.0, boto3 = 1.26.76),
	ssh (sshfs = 2023.7.0)
Config:
	Global: /Users/ivan/Library/Application Support/dvc
	System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/55ddcdf6b41fadae3bcb27bf424ab29b
@shcheklein shcheklein added bug Did we break something? A: experiments Related to dvc exp A: metrics Related to dvc metrics labels Oct 12, 2023
@shcheklein
Copy link
Member Author

Getting the same for the gdrive fixture (fixing customer's issue).

Screenshot 2023-10-15 at 9 48 34 AM Screenshot 2023-10-15 at 9 48 55 AM

@efiop efiop self-assigned this Oct 24, 2023
@dberenbaum dberenbaum added the p1-important Important, aka current backlog of things to do label Feb 6, 2024
@dberenbaum dberenbaum removed the p1-important Important, aka current backlog of things to do label Mar 5, 2024
@skshetry
Copy link
Member

This is kinda tricky to fix correctly in the DataFileSystem.

We know it's a directory due to .dir in the hash, so isdir should return True. But when you go to list it, it will fail since there is no actual .dir file.

The ideal fix here would be to try to load .dir file during isdir() calls, and raise FileNotFoundError if they are not found. But that might make everything slower, I assume.
That fix would be in dvc_data.index.info()/dvc_data.index.ls().

On dvc side, we do need to protect ourselves from random errors, so we need to be careful when expanding paths. We can never be sure what we'll get during path expansion, so guarding makes sense to me.

@skshetry skshetry self-assigned this Mar 27, 2024
skshetry added a commit to skshetry/dvc that referenced this issue Mar 27, 2024
Closes iterative#10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
skshetry added a commit to skshetry/dvc that referenced this issue Mar 27, 2024
Closes iterative#10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
skshetry added a commit to skshetry/dvc that referenced this issue Mar 27, 2024
Closes iterative#10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
skshetry added a commit to skshetry/dvc that referenced this issue Mar 27, 2024
Closes iterative#10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
skshetry added a commit to skshetry/dvc that referenced this issue Mar 27, 2024
Closes iterative#10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
skshetry added a commit to skshetry/dvc that referenced this issue Mar 27, 2024
Closes iterative#10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
skshetry added a commit to skshetry/dvc that referenced this issue Mar 27, 2024
Closes iterative#10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
skshetry added a commit that referenced this issue Mar 28, 2024
params/metrics: assume file when dir expansion raises error

Closes #10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
BradyJ27 pushed a commit to BradyJ27/dvc that referenced this issue Apr 22, 2024
params/metrics: assume file when dir expansion raises error

Closes iterative#10018.

We have to be careful when expanding paths. This PR fall backs to assume it
as a file when either of `isdir` and `find` raises error due to some mishaps.
The consequence of this assumption happens when we try to read the file.
In that case, if the directory exists, and we happen to try to open it
as a file, dvc will raise `IsADirectoryError`, but that is handled correctly
and won't fail catastrophically like this did.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp A: metrics Related to dvc metrics bug Did we break something?
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants