New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-28636: Make data ID expansion in ingest sometimes optional. #502
Conversation
python/lsst/daf/butler/_butler.py
Outdated
dataIds=groupForType.keys(), | ||
run=run, | ||
# TODO: ask Datastore if it needs expansion instead. | ||
expand=(transfer not in ("direct", None)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would possibly be better to ask datastore the question as to whether it needs expanded dataIds or not given the transfer mode, rather than assuming in butler level but I'm sure we can sort this out later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about I add this to base Datastore
and use it here and in obs_base
:
def needs_expanded_data_ids(
self,
transfer: Optional[str],
entity: Optional[Union[DatasetRef, DatasetType, StorageClass]] = None,
) -> bool:
"""Test whether this datastore needs expanded data IDs to ingest the given entity.
Parameters
----------
transfer : `str` or `None`
Transfer mode for ingest.
entity, optional
Object representing what will be ingested. If not provided (or less informative),
`True` may be returned even if expanded data IDs aren't necessary.
Returns
-------
needed : `bool`
If `True`, expanded data IDs may be needed. `False` only if expansion
definitely isn't necessary.
"""
return True
and use (transfer not in ("direct", None))
as the initial implementation in FileDatastore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work. I like how you added in a representative entity
just in case we need to look up in a datastore config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also added implementations for InMemoryDatastore
(return False
) and ChainedDatastore
, and the latter is worth a look - I don't think it's actually able to forward that entity
argument on to the child datastores while still responding conservatively, which is a bit unfortunate but (I think) not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chained implementation is fine I think.
e18be87
to
ecc5fb9
Compare
ecc5fb9
to
4618c79
Compare
This would be better with a bit more work, but it already speeds up gen2to3 conversion (with direct or in-place) transfers a lot.