-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ctime/mtime to list of expected values in info #526
Comments
Marked as "good first issue" because this should be simple per implementation, but there are quite a few implementations to go through. |
A list of filesystems and their info keysI collected some about the AbstractFileSystem"name", "size", "type" filesystem_spec/fsspec/spec.py Lines 669 to 670 in 2a8e0ee
arrow"name", "size", "type", "mtime" (datetime | float | None) filesystem_spec/fsspec/implementations/arrow.py Lines 101 to 118 in 2a8e0ee
https://arrow.apache.org/docs/python/generated/pyarrow.fs.FileInfo.html#pyarrow.fs.FileInfo daskreturns whatever the remote fs returns. filesystem_spec/fsspec/implementations/dask.py Lines 93 to 97 in 2a8e0ee
data"name", "size", "type", "mimetype" filesystem_spec/fsspec/implementations/data.py Lines 31 to 35 in 2a8e0ee
dbfs"name", "size", "type" filesystem_spec/fsspec/implementations/dbfs.py Lines 84 to 90 in 2a8e0ee
dirfsreturns whatever the remote fs returns. filesystem_spec/fsspec/implementations/dirfs.py Lines 233 to 241 in 2a8e0ee
ftp"name", "size", "type", "modify", "unix.owner", "unix.group", "unix.mode", and other returned via filesystem_spec/fsspec/implementations/ftp.py Lines 100 to 118 in 2a8e0ee
filesystem_spec/fsspec/implementations/ftp.py Lines 370 to 384 in 2a8e0ee
git"name", "size", "type", "hex", "mode" # mode is octal str, hex is str? filesystem_spec/fsspec/implementations/git.py Lines 90 to 96 in 2a8e0ee
github"name", "size", "type", "sha", "mode" # mode is octal str, sha is str filesystem_spec/fsspec/implementations/github.py Lines 167 to 178 in 2a8e0ee
http"name", "size", "type", "mimetype", "ETag", "Content-MD5", "Digest" filesystem_spec/fsspec/implementations/http.py Lines 190 to 194 in 2a8e0ee
filesystem_spec/fsspec/implementations/http.py Lines 838 to 856 in 2a8e0ee
jupyter"name", "size", "type", "last_modified", "created", "format", "mimetype", "writable" filesystem_spec/fsspec/implementations/jupyter.py Lines 47 to 57 in 2a8e0ee
example: {
"name": "slurm-22382538.out",
"last_modified": "2024-02-09T13:03:30.773865Z",
"created": "2024-02-09T13:03:30.773865Z",
"format": null,
"mimetype": null,
"size": 2896,
"writable": true,
"type": "file"
} libarchive"name", "size", "type", "created", "mode", "uid", "gid", "mtime" filesystem_spec/fsspec/implementations/libarchive.py Lines 165 to 172 in 2a8e0ee
libarchive mappings: filesystem_spec/fsspec/implementations/libarchive.py Lines 145 to 153 in 2a8e0ee
local"name", "size", "type", "created", "isLink", "mode", "uid", "gid", "mtime", "ino", "nlink", "destination" filesystem_spec/fsspec/implementations/local.py Lines 97 to 112 in 2a8e0ee
memory"name", "size", "type", "created" filesystem_spec/fsspec/implementations/memory.py Lines 41 to 47 in 2a8e0ee
reference"name", "size", "type" filesystem_spec/fsspec/implementations/reference.py Lines 224 to 235 in 2a8e0ee
sftp"name", "size", "type", "uid", "gid", "time", "mtime" filesystem_spec/fsspec/implementations/sftp.py Lines 108 to 120 in 2a8e0ee
smb"name", "size", "type", "uid", "gid", "time", "mtime" filesystem_spec/fsspec/implementations/smb.py Lines 168 to 176 in 2a8e0ee
tar"name", "size", "type", "mode", "uid", "gid", "mtime", "chksum", "linkname", "uname", "gname", "devmajor", "devminor" filesystem_spec/fsspec/implementations/tar.py Lines 112 to 116 in 2a8e0ee
example: _ = {
'name': 'somefile.md',
'mode': 420,
'uid': 501,
'gid': 20,
'size': 382,
'mtime': 1707314187,
'chksum': 8314,
'type': 'file',
'linkname': '',
'uname': 'andreaspoehlmann',
'gname': 'staff',
'devmajor': 0,
'devminor': 0
} webhdfs"name", "size", "type", "accessTime", "blockSize", "group", "modificationTime", "owner", "pathSuffix", "permission", "replication" filesystem_spec/fsspec/implementations/webhdfs.py Lines 266 to 270 in 2a8e0ee
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#FileStatus zip"name", "size", "type" filesystem_spec/fsspec/implementations/zip.py Lines 100 to 104 in 2a8e0ee
adlfs"name", "size", "type", "metadata", "creation_time", "deleted", "deleted_time", "last_modified", "content_time", "content_settings", "remaining_retention_days", "archive_status", "last_accessed_on", "etag", "tags", "tag_count", "version_id", "is_current_version" https://github.com/fsspec/adlfs/blob/576fb7a6a53a55375b4458c09e5bb571d945d410/adlfs/spec.py#L49-L67 gcsfshttps://cloud.google.com/storage/docs/json_api/v1/objects#resource s3fs"name", "size", "type", "StorageClass", "VersionId", "ContentType", "ETag", "LastModified" alluxio"name", "size", "type", "last_modification_time_ms" wandb"name", "size", "type", "md5", "mimetype" oci"name", "size", "type", "etag", "md5", "timeCreated", "timeModified", "storageTier", "archivalState" asynclocalsame as local gdrive"name", "size", "type", and other returned via ??? https://developers.google.com/drive/api/reference/rest/v3/files#File dropbox"name", "size", "type", and all public attr from FileMetadata https://dropbox-sdk-python.readthedocs.io/en/latest/api/files.html#dropbox.files.FileMetadata oss"name", "size", "type", "LastModified" webdav"name", "size", "type" and others returned via _ = {
'name': '/',
'href': '/',
'size': None,
'created': datetime.datetime(2024, 2, 9, 14, 40, 9, tzinfo=tzutc()),
'modified': datetime.datetime(2024, 2, 9, 14, 40, 9, tzinfo=datetime.timezone.utc),
'content_language': None,
'content_type': None,
'etag': None,
'type': 'directory',
'display_name': 'test_storage_options0'
} dvc"name", "size", "type", "md5", "md5-dos2unix", "dvc_info", "isdvc", "isout", "fs_info", "isexec", "repo" https://github.com/iterative/dvc/blob/953ae56536f03d915f396cd6cafd89aaa54fafc5/dvc/fs/dvc.py#L41-L69 root"name", "size", "type" box"name", "size", "type", "id", "modified_at", "created_at" lakefs"name", "size", "type", "content-type", "checksum", "mtime" |
Thank you, @ap-- , that is very useful. Also worth adding that some backends that don't really have directories will make fake info dicts for those directories, typically with Your list makes it sound like any FS could do with a |
Yes that would be a great step towards standardizing the info_dict. AbstractFileSystem could even have a default implementation, that tries various different aliases for getting mtime (and potentially others), as well as conversions to the standard datatype (i.e. like this ). For completeness I'm cross-referencing barneygale/pathlib-abc#3 . I started looking into this, because I need to convert info_dicts into an os.stat_result compatible type for universal_pathlib. |
While you're at it, the nanoseconds instead of float times would be good. https://docs.python.org/3/library/os.html#os.stat_result.st_mtime_ns |
Created and/or modified time is returned in the file info of most backends. We should endeavour to surface these in the file info dict with a common format (
datetime.datetime
? unix timestamp?) and key names.e.g.,
The text was updated successfully, but these errors were encountered: