Skip to content

git: timeout when cloning a large git repo #6663

@RadionBik

Description

@RadionBik

Bug Report

Issue name

dvc.api.read: RuntimeError when reading file from a large repo

Description

I am getting the following exception, when trying to read a file from the large repo:

 File "/Users/radion/ANNA/anna-evidence-doc-classifier/dashboard/onboarding_model.py", line 120, in get_train_data
    dvc_buffer = dvc.api.read(
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/api.py", line 88, in read
    with open(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/api.py", line 75, in _open
    with Repo.open(repo, rev=rev, subrepos=True, uninitialized=True) as _repo:
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/external_repo.py", line 32, in external_repo
    path = _cached_clone(url, rev, for_write=for_write)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/external_repo.py", line 152, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/funcy/flow.py", line 274, in wrap_with
    return call()
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/external_repo.py", line 216, in _clone_default_branch
    git = Git.clone(url, clone_path)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/scm/git/__init__.py", line 117, in clone
    backend.clone(url, to_path, **kwargs)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/dvc/scm/git/backend/gitpython.py", line 181, in clone
    tmp_repo = clone_from()
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/git/repo/base.py", line 1148, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/git/repo/base.py", line 1078, in _clone
    handle_process_output(proc, None, to_progress_instance(progress).new_message_handler(),
  File "/Users/radion/ANNA/anna-evidence-doc-classifier/venv/lib/python3.8/site-packages/git/cmd.py", line 151, in handle_process_output
    raise RuntimeError(f"Thread join() timed out in cmd.handle_process_output(). Timeout={timeout} seconds")
RuntimeError: Thread join() timed out in cmd.handle_process_output(). Timeout=10.0 seconds

I had done something similar before with a much slimmer repo and there were no problems. Looks like the function is running out of the 10sec timeout during the cloning stage.

Is it possible to customize the timeout or avoid cloning the entire repo?

Environment information

Output of dvc doctor:

DVC version: 2.7.3 (pip)
---------------------------------
Platform: Python 3.8.6 on macOS-10.16-x86_64-i386-64bit
Supports:
        gs (gcsfs = 2021.8.1),
        hdfs (pyarrow = 5.0.0),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: gs
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: apiRelated to the dvc.apiawaiting responsewe are waiting for your reply, please respond! :)bugDid we break something?

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions