-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
A: apiRelated to the dvc.apiRelated to the dvc.apiawaiting responsewe are waiting for your reply, please respond! :)we are waiting for your reply, please respond! :)bugDid we break something?Did we break something?
Description
Bug Report
Issue name
dvc.api.read: RuntimeError when reading file from a large repo
Description
I am getting the following exception, when trying to read a file from the large repo:
File "/Users/radion/ANNA/anna-evidence-doc-classifier/dashboard/onboarding_model.py", line 120, in get_train_data
dvc_buffer = dvc.api.read(
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/api.py", line 88, in read
with open(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/api.py", line 75, in _open
with Repo.open(repo, rev=rev, subrepos=True, uninitialized=True) as _repo:
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/external_repo.py", line 32, in external_repo
path = _cached_clone(url, rev, for_write=for_write)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/external_repo.py", line 152, in _cached_clone
clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/funcy/decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/funcy/flow.py", line 274, in wrap_with
return call()
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/funcy/decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/external_repo.py", line 216, in _clone_default_branch
git = Git.clone(url, clone_path)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/scm/git/__init__.py", line 117, in clone
backend.clone(url, to_path, **kwargs)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/scm/git/backend/gitpython.py", line 181, in clone
tmp_repo = clone_from()
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/git/repo/base.py", line 1148, in clone_from
return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/git/repo/base.py", line 1078, in _clone
handle_process_output(proc, None, to_progress_instance(progress).new_message_handler(),
File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/git/cmd.py", line 151, in handle_process_output
raise RuntimeError(f"Thread join() timed out in cmd.handle_process_output(). Timeout={timeout} seconds")
RuntimeError: Thread join() timed out in cmd.handle_process_output(). Timeout=10.0 seconds
I had done something similar before with a much slimmer repo and there were no problems. Looks like the function is running out of the 10sec timeout during the cloning stage.
Is it possible to customize the timeout or avoid cloning the entire repo?
Environment information
Output of dvc doctor:
DVC version: 2.7.3 (pip)
---------------------------------
Platform: Python 3.8.6 on macOS-10.16-x86_64-i386-64bit
Supports:
gs (gcsfs = 2021.8.1),
hdfs (pyarrow = 5.0.0),
http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: gs
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git
efiop
Metadata
Metadata
Assignees
Labels
A: apiRelated to the dvc.apiRelated to the dvc.apiawaiting responsewe are waiting for your reply, please respond! :)we are waiting for your reply, please respond! :)bugDid we break something?Did we break something?