Skip to content

DVC Pull Ignores identityfile in the SSH config. #6508

@Erotemic

Description

@Erotemic

Bug Report

Potentially related to #1965 #6225 and this may also be a Paramiko bug based on what I've readl.

The underlying issue is that a dvc pull does not seem to support the identityfile in my $HOME/.ssh/config.

My ssh config has a entry that looks like this:

Host bigbox bigbox.kitware.com
    HostName bigbox.kitware.com
    Port 22
    User jon.crall
    identityfile ~/.ssh/id_kitware_ed25519
    ForwardX11 yes
    ForwardX11Trusted yes

My .dvc/cache looks like:

[cache]
    type = "reflink,symlink,hardlink,copy"
    shared = group
    protected = true
[core]
    remote = bigbox
    check_update = False
['remote "bigbox"']
    url = ssh://bigbox.kitware.com/data/dvc-caches/crall_dvc
    port = 22

And I'm able to ssh bigbox just fine. But when I attempted to run dvc pull -r bigbox I got a permission error.

Traceback (most recent call last):
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/dvc/objects/db/base.py", line 443, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 93, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/funcy/objects.py", line 50, in __get__
    return prop.__get__(instance, type)
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/funcy/objects.py", line 28, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/dvc/fs/ssh.py", line 114, in fs
    return _SSHFileSystem(**self.fs_args)
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/fsspec/spec.py", line 75, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/sshfs/spec.py", line 61, in __init__
    self._client, self._pool = self.connect(
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/fsspec/asyn.py", line 88, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/fsspec/asyn.py", line 69, in sync
    raise result[0]
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/joncrall/.pyenv/versions/3.8.6/lib/python3.8/asyncio/tasks.py", line 491, in wait_for
    return fut.result()
  File "/home/joncrall/.pyenv/versions/3.8.6/envs/pyenv3.8.6/lib/python3.8/site-packages/sshfs/utils.py", line 29, in wrapper
    raise PermissionError(exc.reason) from exc
PermissionError: Permission denied

I ended up digging and printing out some info in the objects/db/base.py file and printed:

  0% Querying remote cache|                                                                                                                              |0/1 [00:00<?,     ?file/s]path_infos = [URLInfo: 'ssh://bigbox.kitware.com/data/dvc-caches/crall_dvc/97/12707710626642715101298436624963.dir']
path_info = URLInfo: 'ssh://bigbox.kitware.com/data/dvc-caches/crall_dvc/97/12707710626642715101298436624963.dir'
self.fs = <dvc.fs.ssh.SSHFileSystem object at 0x7fc71425b790>

I stepped into this function with a breakpoint and looked at self.fs.__dict__ and saw:

In [5]: self.__dict__
Out[5]: 
{'jobs': 128,
 'hash_jobs': 4,
 '_config': {'host': 'bigbox.kitware.com',
  'port': 22,
  'tmp_dir': '/media/joncrall/raid/home/joncrall/data/dvc-repos/crall_dvc/.dvc/tmp'},
 'fs_args': {'skip_instance_cache': True,
  'host': 'bigbox.kitware.com',
  'username': 'jon.crall',
  'port': 22,
  'password': None,
  'client_keys': ['/home/joncrall/.ssh/id_personal_ed25519'],
  'timeout': 1800,
  'encryption_algs': ['aes128-gcm@openssh.com',
   'aes256-ctr',
   'aes192-ctr',
   'aes128-ctr'],
  'compression_algs': None,
  'gss_auth': False,
  'agent_forwarding': True,
  'proxy_command': None,
  'known_hosts': None},
 'CAN_TRAVERSE': True}

The main issue being that client_keys was set to /home/joncrall/.ssh/id_personal_ed25519, which is incorrect based on the config.

I was able to work around this by adding my id_personal_ed25519.pub to authorized keys on bigbox, but this is not ideal.

It might be useful to add some debugging info when -v is passed to DVC that prints out some of this SSHFileSystem information, because if I saw this wrong identity file two weeks ago, it would have saved me a ton of time.

Metadata

Metadata

Assignees

Labels

awaiting responsewe are waiting for your reply, please respond! :)fs: sshRelated to the SSH filesystem

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions