New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-35741: Add InMemoryDatasetHandle #268
Conversation
Codecov Report
@@ Coverage Diff @@
## main #268 +/- ##
==========================================
- Coverage 81.64% 81.54% -0.11%
==========================================
Files 57 57
Lines 5966 5938 -28
Branches 1222 1221 -1
==========================================
- Hits 4871 4842 -29
Misses 869 869
- Partials 226 227 +1
Continue to review full report at Codecov.
|
52f1e80
to
65a59ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine as-is (just minor line comments on the PR), but I think it's worth thinking about where this should land on the spectrum of "easy to construct" (where the PR leans now) vs. "behaves more like a real DeferredDatasetHandle
". We don't have an ABC that defines what interface PipelineTasks
can expect of these handles, just the concrete implementation, and that actually exposes a LimitedButler
and DatasetRef
, too. Exposing the butler was probably a mistake (and probably mine, as I know @natelust has historically been much more careful about preventing task access to things than I have been), but it's quite possible it was an "intentional" mistake that was actually a pragmatic-at-the-time solution to a real problem, like trying to get at a data ID packer from deep inside some subtask. And while I can't come up with a good reason for a task to get the DatasetRef
if they've already got the data ID - they should only be referring to the dataset type via their own connection name - it also doesn't seem too onerous to demand that an InMemoryDatasetHandle
user provide one, and in doing so sidestep all of your storage class inference concerns (though I'm not really bothered by the logic there now).
So, after typing all of that out, I think we probably do want to leave this as it is - but maybe we should RFC deprecating those public butler
and ref
attributes on DeferredDatasetHandle
, so we can spot any existing usage and make sure no new usage develops.
I admit I had failed to notice that I was wondering if we had a need for an abstract base class and I'm still wondering if this code should be in daf_butler. |
This is like a DeferredDatasetHandle but allows you to create it without a butler so that Task.run() methods can be more easily tested.
2c4c2fb
to
75b2d78
Compare
This is like a DeferredDatasetHandle but allows you to create
it without a butler so that Task.run() methods can be more
easily tested.
Requires lsst/daf_butler#719
Checklist
doc/changes