Skip to content

[query][qob][hailtop/fs] Expose modification and creation time. #12571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 4, 2023

Conversation

danking
Copy link
Contributor

@danking danking commented Dec 27, 2022

Fixes #12540.

CHANGELOG: When using Query-on-Batch, hl.hadoop* methods now properly support creation and modification time.

Creation time is supported by modern Linuxes but only through a new statx API which is not exposed by the Python standard library. There is a 0.1 version library from 2021 which exposes statx including the "birth time". I chose to raise an exception for now.

Each cloud does support a "modification time" but it generally refers to changes to metadata or is just the creation time:

@danking
Copy link
Contributor Author

danking commented Dec 27, 2022

I also just implemented RFC3339 parsing because Google's timestamps are not compliant with ISO8601.

@@ -101,7 +101,7 @@ def name(self):

def close(self):
self.aws.close()
async_to_blocking(self.cm.__aexit__())
async_to_blocking(self.cm.__aexit__(None, None, None))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random other bug I found while I was in here.

Fixes hail-is#12540.

CHANGELOG: When using Query-on-Batch, hl.hadoop* methods now properly support creation and modification time.

Creation time is supported by modern Linuxes but only through a new statx API which is not
exposed by the Python standard library. There is a 0.1 version library from 2021 which exposes
statx including the "birth time". I chose to raise an exception for now.

Each cloud does support a "modification time" but it generally refers to changes to metadata or is just the creation time:
- https://cloud.google.com/storage/docs/json_api/v1/objects#resource (see updated)
- https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_ResponseSyntax ("Last-Modified", is always creation time)
- https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html#API_HeadObject_ResponseSyntax same as above
- https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob (see Last-Modified and x-ms-creation-time)
@danking
Copy link
Contributor Author

danking commented Jan 19, 2023

tests and lints resolved

jigold
jigold previously requested changes Jan 20, 2023
def time_modified(self) -> datetime.datetime:
'''The time the object was last modified in seconds since the epoch, UTC.

Not all clouds expose a modification time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you state what happens if the cloud doesn't expose a modification time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the docs. In reality, every cloud supports modification time but it is just the creation time. The LocalFS does not support creation time except in very new *nix kernels.

@@ -76,6 +87,17 @@ def __init__(self, item: Dict[str, Any]):
async def size(self) -> int:
return self._item['Size']

def time_created(self) -> datetime.datetime:
# https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_ResponseSyntax
# Misleading name: LastModified is creation time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explicitly convert the result from the cloud provider into a Datetime object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazon's Python API returns a datetime.datetime, mypy verifies this too.


if is_local:
try:
status.time_created()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure we get the test for time_modified for the Local case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Contributor Author

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@@ -76,6 +87,17 @@ def __init__(self, item: Dict[str, Any]):
async def size(self) -> int:
return self._item['Size']

def time_created(self) -> datetime.datetime:
# https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_ResponseSyntax
# Misleading name: LastModified is creation time.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazon's Python API returns a datetime.datetime, mypy verifies this too.

def time_modified(self) -> datetime.datetime:
'''The time the object was last modified in seconds since the epoch, UTC.

Not all clouds expose a modification time.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the docs. In reality, every cloud supports modification time but it is just the creation time. The LocalFS does not support creation time except in very new *nix kernels.


if is_local:
try:
status.time_created()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@danking danking merged commit 36e9cd4 into hail-is:main Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hl.hadoop_stat no longer works for requester pays bucket
2 participants