Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use paramiko.SSHClient in SFTPDownloader (enables SSH agent authentication) #368

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jstorrs
Copy link

@jstorrs jstorrs commented Jun 22, 2023

These are changes I made to SFTPDownloader in order to support using keys from the agent. This seems to work for me, but I'm new to both pooch and paramiko (and I'm a little overwhelmed by the documentation of both projects), so I'm not sure whether I broke anything or maybe this already worked and I just failed to figure out how to do this.

With this change the following is able to use a key that's loaded into my agent to authenticate the download:

import pooch

downloader = pooch.SFTPDownloader(username='user', allow_agent=True)

DATASET = pooch.create(
    path=pooch.os_cache("dataset"),
    base_url="sftp://hostname.domain/datasets/",
    downloader=downloader,
    registry = {
        "dataset_001.zip": "sha256:62faa9f0b06887acf71d67e5791c039c2009362acb800f8e31bed2cef835481f",
    },
)

DATASET.fetch("dataset_001.zip")

I think this works on *nix/MacOS systems. I'm unsure whether paramiko's SSHClient.load_system_host_keys() works on Windows.

@welcome
Copy link

welcome bot commented Jun 22, 2023

💖 Thank you for opening your first pull request in this repository! 💖

A few things to keep in mind:

No matter what, we are really grateful that you put in the effort to do this!

@jstorrs
Copy link
Author

jstorrs commented Jun 26, 2023

Removed the allow_agent flag. allow_agent=True is paramiko's default so migrating to using the SSHClient class is sufficient to enable SSH agent use. So the example now just looks like this:

import pooch

downloader = pooch.SFTPDownloader(username='user')

DATASET = pooch.create(
    path=pooch.os_cache("dataset"),
    base_url="sftp://hostname.domain/datasets/",
    downloader=downloader,
    registry = {
        "dataset_001.zip": "sha256:62faa9f0b06887acf71d67e5791c039c2009362acb800f8e31bed2cef835481f",
    },
)

DATASET.fetch("dataset_001.zip")

It would be nice if pooch understood URIs such as "sftp://user@hostname.domain:port/datasets/" (it was unexpected that it didn't "just work") but that is not strictly needed for this PR.

@jstorrs jstorrs marked this pull request as ready for review June 26, 2023 16:23
@jstorrs jstorrs changed the title Add allow_agent parameter to SFTPDownloader to support ssh-agent authentication Use paramiko.SSHClient in SFTPDownloader (enables SSH agent authentication) Jun 26, 2023
@leouieda
Copy link
Member

leouieda commented Oct 3, 2023

Hi @jstorrs thank you for the pull request! Sorry it was left for so long. I'm slowly working though my back log and will get to this as soon as I can.

@leouieda
Copy link
Member

@jstorrs had a look at this today and the changes seem reasonable but I'm not very familiar with SFTP (this was implemented by someone else). The tests seem to be failing with the use of the SSHClient because of a missing known_hosts entry: https://github.com/fatiando/pooch/actions/runs/6405819568/job/17902218195?pr=368#step:13:498

The Zenodo failures are being addressed in #373 so don't worry about those. Do you think you can get this working with the old tests? I have no idea how we could test the desired behavior (using the keys), though. Any thoughts on this?

@jstorrs
Copy link
Author

jstorrs commented Oct 20, 2023

Thanks! I understand that error and will figure out how to remedy it. I definitely have a known_hosts file setup and would not have even thought to delete it to test.

Nothing immediately comes to mind about how to test it, though. I'll see if I can find/think of anything.

@jstorrs
Copy link
Author

jstorrs commented Nov 22, 2023

Okay, I think I have started to trace this error down. Where I am at currently is that I can't figure out how to get both password and private key auth to work with paramiko's SSHClient. I think the problem you were describing is fixed by adding client.set_missing_host_key_policy(paramiko.AutoAddPolicy()) but in the interim I think I've run into bigger problems with the authentication flow in paramiko which makes password auth fail if an SSH agent is available.

To work on this I've extracted the code and the test into this script:

import paramiko

paramiko.util.log_to_file("paramiko.log")
    
print("Testing Transport")
connection = paramiko.Transport(sock=('test.rebex.net', 22))
connection.connect(username='demo', password='password')
sftp = paramiko.SFTPClient.from_transport(connection)
sftp.get('/pub/example/pocketftp.png', 'transport.png')
connection.close()

print("Testing SSHClient")
client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(
    hostname='test.rebex.net',
    port=22,
    username='demo',
    password='password',
)
sftp = client.open_sftp()
sftp.get_channel().settimeout = None
sftp.get('/pub/example/pocketftp.png', 'sshclient.png')
client.close()

What seems to happen here is that if an SSH agent is found then SSHClient will fail to authenticate (basically in the log publickey authentication fails but it doesn't continue on to try password authentication). Looking through paramiko's SSHClient the logic of connect() seems to not have that fallback.

It seems like the authentication logic has been a known to be problematic in paramiko for quite some time and there's recently a new way to define the authentication strategy using new AuthStrategy classes (https://docs.paramiko.org/en/3.3/api/auth.html). The documentation of this isn't particularly clear to me at this time. They basically point at fabric's experimental OpenSSHAuthStrategy. I haven't been able to work this part out yet so it's sort of stuck until I can understand better what's going on.

I think emulating OpenSSH's authentication strategy is a probably ideal, but it's also more than we need. But I'm not sure why that's all in fabric rather than paramiko... pooch doesn't depend on fabric and it seems like massive overkill to add fabric just to get publickey+password auth to coexist. There's got to be an easier way to first try publickey and fallback onto password auth... I have to be missing something.

Seems to be similar to paramiko/paramiko#391 see also paramiko/paramiko#387

@jstorrs jstorrs marked this pull request as draft November 22, 2023 22:14
@leouieda
Copy link
Member

Hi @jstorrs thanks for keeping us updated! Sounds like you went through quite the rabbit hole for this PR 🙂

You're right that we wouldn't want to pick up another optional dependency for this to work and I'd be very hesitant to host a lot of code related to this in Pooch itself, since that would mean extra burden to maintain (and something I know very little about).

No rush with this and if you do figure out a way, let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants