New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-16819: Add shims for Gen2 APIs with Gen3 implementations. #70
Conversation
python/lsst/pipe/base/shims.py
Outdated
A Generation 3 data ID that identifies the dataset. | ||
immediate : `bool` | ||
This option is provided for compatibility with | ||
`lsst.daf.persistence.Butler`, but must be `True`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it must be true, might I suggest that have the function argument, but just ignore it below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose in this case that might be better, just because it's possible there is code with immediate=False
that doesn't really care whether it gets a proxy back. But I do think we want to check the values of the other not-actually-supported options in this commit, just to make errors earlier and friendlier.
datasetType : `str` | ||
Name of the Gen2 dataset type. | ||
dataId : `dict` or `~lsst.daf.butler.DataId`, optional | ||
A Generation 3 data ID that identifies the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a generation 3 dataId if we are making a shim? Is there no way to automatically convert to a gen 3 one? ( I suspect I know the answer, but am asking to prompt you if you had any cleaver ideas)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is infrastructure for translating data IDs in the gen2convert
tooling, and if necessary we could try to use that here. But I wanted to start with a shim that didn't do that, and only add that functionality if turned out to be necessary (and then maybe only implement translation for a small subset of keys). And I'm hoping we can get away without doing any translation by instead adding (i.e.) if butler.GENERATION > 2
blocks to just a few Tasks
instead.
python/lsst/pipe/base/shims.py
Outdated
---------- | ||
butlerSubset : `ShimButlerSubset` | ||
ButlerSubset shim instance. | ||
Name of the dataset type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is not clear what the "Name of the dataset type" applies to in the doc string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, dropped a line or something. Will fix.
"""This is a Generation 2 shim for a Generation3 Butler. | ||
""" | ||
|
||
def __init__(self, butlerSubset, dataId): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we taking a subset here, it seems like it is not really used in lue of the datasetType. Is it just because of the API of the old DataRef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just the old API - we almost always use DataRef
as a combination of a dataset type and a data ID, but it originated as the thing iterated over by a ButlerSubset
. One relic of that is that it's moderately common to see dataRef.butlerSubset.butler
, and I wanted to support that.
python/lsst/pipe/base/shims.py
Outdated
def __init__(self, butler, datasetType, dataIds): | ||
self.butler = butler | ||
self.datasetType = datasetType | ||
self._dataIds = list(dataIds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to have iter defined in the same class as it is iterating over, then I think that _dataIds should be constructed as a tuple instead of a list. This will prevent someone from messing with the _dataIds while an existing iterator is outstanding. Unless of course this is needed for backwards compatibility were people abuse modifying the private variable. The other option is to have iter construct and return an object that has its own list pointing to each of the elements.
1ad1177
to
cff2171
Compare
cff2171
to
d57b176
Compare
No description provided.