Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-35741: Add InMemoryDatasetHandle #268

Merged
merged 2 commits into from Aug 1, 2022
Merged

DM-35741: Add InMemoryDatasetHandle #268

merged 2 commits into from Aug 1, 2022

Conversation

timj
Copy link
Member

@timj timj commented Jul 28, 2022

This is like a DeferredDatasetHandle but allows you to create
it without a butler so that Task.run() methods can be more
easily tested.

Requires lsst/daf_butler#719

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Jul 28, 2022

Codecov Report

Merging #268 (f8932a4) into main (9602fcb) will decrease coverage by 0.10%.
The diff coverage is n/a.

❗ Current head f8932a4 differs from pull request most recent head 4dcd4d3. Consider uploading reports for the commit 4dcd4d3 to get more accurate results

@@            Coverage Diff             @@
##             main     #268      +/-   ##
==========================================
- Coverage   81.64%   81.54%   -0.11%     
==========================================
  Files          57       57              
  Lines        5966     5938      -28     
  Branches     1222     1221       -1     
==========================================
- Hits         4871     4842      -29     
  Misses        869      869              
- Partials      226      227       +1     
Impacted Files Coverage Δ
python/lsst/pipe/base/butlerQuantumContext.py 78.49% <0.00%> (-1.51%) ⬇️
tests/test_pipelineTask.py 96.55% <0.00%> (-0.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9602fcb...4dcd4d3. Read the comment docs.

@timj timj requested a review from TallJimbo July 28, 2022 22:21
@timj timj force-pushed the tickets/DM-35741 branch 3 times, most recently from 52f1e80 to 65a59ee Compare August 1, 2022 15:17
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as-is (just minor line comments on the PR), but I think it's worth thinking about where this should land on the spectrum of "easy to construct" (where the PR leans now) vs. "behaves more like a real DeferredDatasetHandle". We don't have an ABC that defines what interface PipelineTasks can expect of these handles, just the concrete implementation, and that actually exposes a LimitedButler and DatasetRef, too. Exposing the butler was probably a mistake (and probably mine, as I know @natelust has historically been much more careful about preventing task access to things than I have been), but it's quite possible it was an "intentional" mistake that was actually a pragmatic-at-the-time solution to a real problem, like trying to get at a data ID packer from deep inside some subtask. And while I can't come up with a good reason for a task to get the DatasetRef if they've already got the data ID - they should only be referring to the dataset type via their own connection name - it also doesn't seem too onerous to demand that an InMemoryDatasetHandle user provide one, and in doing so sidestep all of your storage class inference concerns (though I'm not really bothered by the logic there now).

So, after typing all of that out, I think we probably do want to leave this as it is - but maybe we should RFC deprecating those public butler and ref attributes on DeferredDatasetHandle, so we can spot any existing usage and make sure no new usage develops.

python/lsst/pipe/base/_dataset_handle.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/_dataset_handle.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/_dataset_handle.py Show resolved Hide resolved
@timj
Copy link
Member Author

timj commented Aug 1, 2022

I admit I had failed to notice that .ref and .butler are public interfaces for DeferredDatasetHandle. I would be interested in knowing if anyone is using them. Maybe I can run a test with them as internal parameters and see if anything breaks. Making people always pass in a butler seems to go against making this thing easy to use if you just want to call a .run() method with some objects you already have.

I was wondering if we had a need for an abstract base class and I'm still wondering if this code should be in daf_butler.

This is like a DeferredDatasetHandle but allows you to create
it without a butler so that Task.run() methods can be more
easily tested.
@timj timj force-pushed the tickets/DM-35741 branch 2 times, most recently from 2c4c2fb to 75b2d78 Compare August 1, 2022 21:03
@timj timj merged commit cb57194 into main Aug 1, 2022
@timj timj deleted the tickets/DM-35741 branch August 1, 2022 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants