- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.3k
 
Closed as not planned
Description
Description
When trying to dvc.api.open a file present on a remote but not locally (neither in cache), call fails with the following exception if the provided path is absolute (it works fine if the provided path is relative to the dvc repository):
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /usr/local/lib/python3.9/site-packages/dvc/fs/data.py:125, in _DataFileSystem.info(self, path, **kwargs)
    124 try:
--> 125     outs = list(self.repo.index.tree.iteritems(key))  # noqa: B301
    126 except KeyError as exc:
File /usr/local/lib/python3.9/site-packages/dvc_data/objects/tree.py:106, in Tree.iteritems(self, prefix)
    104         self._load(key, meta, hash_info)
--> 106 for key, (meta, hash_info) in self._trie.iteritems(**kwargs):
    107     self._load(key, meta, hash_info)
File /usr/local/lib/python3.9/site-packages/pygtrie.py:718, in Trie.iteritems(self, prefix, shallow)
    678 """Yields all nodes with associated values with given prefix.
    679
    680 Only nodes with values are output.  For example::
   (...)
    716     KeyError: If ``prefix`` does not match any node.
    717 """
--> 718 node, _ = self._get_node(prefix)
    719 for path, value in node.iterate(list(self.__path_from_key(prefix)),
    720                                 shallow, self._iteritems):
File /usr/local/lib/python3.9/site-packages/pygtrie.py:630, in Trie._get_node(self, key)
    629 if node is None:
--> 630     raise KeyError(key)
    631 trace.append((step, node))
KeyError: ('local', '/', 'repo', 'path', 'to', 'target.ext')
The above exception was the direct cause of the following exception:
FileNotFoundError                         Traceback (most recent call last)
File /usr/local/lib/python3.9/site-packages/dvc/repo/__init__.py:505, in Repo.open_by_relpath(self, path, remote, mode, encoding)
    504 try:
--> 505     with fs.open(
    506         fs_path,
    507         mode=mode,
    508         encoding=encoding,
    509         remote=remote,
    510     ) as fobj:
    511         yield fobj
File /usr/local/lib/python3.9/site-packages/dvc_objects/fs/base.py:191, in FileSystem.open(self, path, mode, **kwargs)
    190     kwargs.pop("encoding", None)
--> 191 return self.fs.open(path, mode=mode, **kwargs)
File /usr/local/lib/python3.9/site-packages/dvc/fs/data.py:88, in _DataFileSystem.open(self, path, mode, encoding, **kwargs)
     85 def open(  # type: ignore
     86     self, path: str, mode="r", encoding=None, **kwargs
     87 ):  # pylint: disable=arguments-renamed, arguments-differ
---> 88     fs, fspath = self._get_fs_path(path, **kwargs)
     89     return fs.open(fspath, mode=mode, encoding=encoding)
File /usr/local/lib/python3.9/site-packages/dvc/fs/data.py:65, in _DataFileSystem._get_fs_path(self, path, remote)
     63 from dvc.config import NoRemoteError
---> 65 info = self.info(path)
     66 if info["type"] == "directory":
File /usr/local/lib/python3.9/site-packages/dvc/fs/data.py:127, in _DataFileSystem.info(self, path, **kwargs)
    126 except KeyError as exc:
--> 127     raise FileNotFoundError from exc
    129 ret = {
    130     "type": "file",
    131     "size": 0,
   (...)
    135     "name": path,
    136 }
FileNotFoundError:
The above exception was the direct cause of the following exception:
FileMissingError                          Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 with open('/repo/path/to/target.ext') as file:
      2     print(file.read(10))
File /usr/local/lib/python3.9/contextlib.py:119, in _GeneratorContextManager.__enter__(self)
    117 del self.args, self.kwds, self.func
    118 try:
--> 119     return next(self.gen)
    120 except StopIteration:
    121     raise RuntimeError("generator didn't yield") from None
File /usr/local/lib/python3.9/site-packages/dvc/api/data.py:198, in _open(path, repo, rev, remote, mode, encoding)
    196 def _open(path, repo=None, rev=None, remote=None, mode="r", encoding=None):
    197     with Repo.open(repo, rev=rev, subrepos=True, uninitialized=True) as _repo:
--> 198         with _repo.open_by_relpath(
    199             path, remote=remote, mode=mode, encoding=encoding
    200         ) as fd:
    201             yield fd
File /usr/local/lib/python3.9/contextlib.py:119, in _GeneratorContextManager.__enter__(self)
    117 del self.args, self.kwds, self.func
    118 try:
--> 119     return next(self.gen)
    120 except StopIteration:
    121     raise RuntimeError("generator didn't yield") from None
File /usr/local/lib/python3.9/site-packages/dvc/repo/__init__.py:513, in Repo.open_by_relpath(self, path, remote, mode, encoding)
    511         yield fobj
    512 except FileNotFoundError as exc:
--> 513     raise FileMissingError(path) from exc
    514 except IsADirectoryError as exc:
    515     raise DvcIsADirectoryError(f"'{path}' is a directory") from exc
FileMissingError: Can't find '/repo/path/to/target.ext' neither locally nor on remote
Reproduce
Create a repository (at e.g. /repo) with an s3 remote; add a file (/repo/path/to/target.ext) with some lipsum content; add it to dvc; push to remote; delete locally (including cache); then
from dvc.api import open
with open('/repo/path/to/target.ext') as file:
    passshould raise, while
from dvc.api import open
with open('path/to/target.ext') as file:
    passshould work as expected (assuming pwd being at /repo).
Expected
Relative path being interpreted as relative to the current working directory, and absolute path as the path they describe.
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.13.0 (pip)
---------------------------------
Platform: Python 3.9.13 on Linux-5.10.104-linuxkit-x86_64-with-glibc2.31
Supports:
	webhdfs (fsspec = 2022.5.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.5.2),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.5.2),
	s3 (s3fs = 2022.5.0, boto3 = 1.21.21),
	ssh (sshfs = 2022.6.0)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3
Workspace directory: overlay on overlay
Repo: dvc (no_scm)Metadata
Metadata
Assignees
Labels
No labels