-
Notifications
You must be signed in to change notification settings - Fork 250
[query][qob][hailtop/fs] Expose modification and creation time. #12571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I also just implemented RFC3339 parsing because Google's timestamps are not compliant with ISO8601. |
@@ -101,7 +101,7 @@ def name(self): | |||
|
|||
def close(self): | |||
self.aws.close() | |||
async_to_blocking(self.cm.__aexit__()) | |||
async_to_blocking(self.cm.__aexit__(None, None, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random other bug I found while I was in here.
Fixes hail-is#12540. CHANGELOG: When using Query-on-Batch, hl.hadoop* methods now properly support creation and modification time. Creation time is supported by modern Linuxes but only through a new statx API which is not exposed by the Python standard library. There is a 0.1 version library from 2021 which exposes statx including the "birth time". I chose to raise an exception for now. Each cloud does support a "modification time" but it generally refers to changes to metadata or is just the creation time: - https://cloud.google.com/storage/docs/json_api/v1/objects#resource (see updated) - https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_ResponseSyntax ("Last-Modified", is always creation time) - https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html#API_HeadObject_ResponseSyntax same as above - https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob (see Last-Modified and x-ms-creation-time)
tests and lints resolved |
def time_modified(self) -> datetime.datetime: | ||
'''The time the object was last modified in seconds since the epoch, UTC. | ||
|
||
Not all clouds expose a modification time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you state what happens if the cloud doesn't expose a modification time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change the docs. In reality, every cloud supports modification time but it is just the creation time. The LocalFS does not support creation time except in very new *nix kernels.
@@ -76,6 +87,17 @@ def __init__(self, item: Dict[str, Any]): | |||
async def size(self) -> int: | |||
return self._item['Size'] | |||
|
|||
def time_created(self) -> datetime.datetime: | |||
# https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_ResponseSyntax | |||
# Misleading name: LastModified is creation time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explicitly convert the result from the cloud provider into a Datetime object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazon's Python API returns a datetime.datetime
, mypy verifies this too.
|
||
if is_local: | ||
try: | ||
status.time_created() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure we get the test for time_modified for the Local case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
@@ -76,6 +87,17 @@ def __init__(self, item: Dict[str, Any]): | |||
async def size(self) -> int: | |||
return self._item['Size'] | |||
|
|||
def time_created(self) -> datetime.datetime: | |||
# https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_ResponseSyntax | |||
# Misleading name: LastModified is creation time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazon's Python API returns a datetime.datetime
, mypy verifies this too.
def time_modified(self) -> datetime.datetime: | ||
'''The time the object was last modified in seconds since the epoch, UTC. | ||
|
||
Not all clouds expose a modification time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change the docs. In reality, every cloud supports modification time but it is just the creation time. The LocalFS does not support creation time except in very new *nix kernels.
|
||
if is_local: | ||
try: | ||
status.time_created() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
Fixes #12540.
CHANGELOG: When using Query-on-Batch, hl.hadoop* methods now properly support creation and modification time.
Creation time is supported by modern Linuxes but only through a new statx API which is not exposed by the Python standard library. There is a 0.1 version library from 2021 which exposes statx including the "birth time". I chose to raise an exception for now.
Each cloud does support a "modification time" but it generally refers to changes to metadata or is just the creation time: