-
Notifications
You must be signed in to change notification settings - Fork 416
Description
Hello,
When using fsspec.implementations.sftp.SFTPFileSystem and fs.size() gets called with the fully qualified path (including sftp://servername), the path is not properly stripped and the literal "sftp://servername/folder/file.csv" gets queried, which results in file not found.
The issue stems from SFTPFileSystem overriding info() here: https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/sftp.py#L95 and hiding fsspec.implementations.AbstractFileSystem's info() from here: https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L671
so that the protocol stripping logic here https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L688 isn't called anymore.
We experienced this behaviour when using fsspec from within duckDB, which always uses the fully qualified path in all calls. We also believe that this will happen when exists(), checksum(), sizes(), isdir(), isfile(), ukey(), stat() gets called.
Is this behaviour expected? Should these filesystem calls not strip the protocol and should users take care to not include the protocol in the path? Then I would raise the issue with duckdb instead.