DM-35741: Add InMemoryDatasetHandle #268

timj · 2022-07-28T20:47:20Z

This is like a DeferredDatasetHandle but allows you to create
it without a butler so that Task.run() methods can be more
easily tested.

Requires lsst/daf_butler#719

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2022-07-28T21:24:06Z

Codecov Report

Merging #268 (f8932a4) into main (9602fcb) will decrease coverage by 0.10%.
The diff coverage is n/a.

❗ Current head f8932a4 differs from pull request most recent head 4dcd4d3. Consider uploading reports for the commit 4dcd4d3 to get more accurate results

@@            Coverage Diff             @@
##             main     #268      +/-   ##
==========================================
- Coverage   81.64%   81.54%   -0.11%     
==========================================
  Files          57       57              
  Lines        5966     5938      -28     
  Branches     1222     1221       -1     
==========================================
- Hits         4871     4842      -29     
  Misses        869      869              
- Partials      226      227       +1

Impacted Files	Coverage Δ
python/lsst/pipe/base/butlerQuantumContext.py	`78.49% <0.00%> (-1.51%)`	⬇️
tests/test_pipelineTask.py	`96.55% <0.00%> (-0.53%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9602fcb...4dcd4d3. Read the comment docs.

TallJimbo

I think this is fine as-is (just minor line comments on the PR), but I think it's worth thinking about where this should land on the spectrum of "easy to construct" (where the PR leans now) vs. "behaves more like a real DeferredDatasetHandle". We don't have an ABC that defines what interface PipelineTasks can expect of these handles, just the concrete implementation, and that actually exposes a LimitedButler and DatasetRef, too. Exposing the butler was probably a mistake (and probably mine, as I know @natelust has historically been much more careful about preventing task access to things than I have been), but it's quite possible it was an "intentional" mistake that was actually a pragmatic-at-the-time solution to a real problem, like trying to get at a data ID packer from deep inside some subtask. And while I can't come up with a good reason for a task to get the DatasetRef if they've already got the data ID - they should only be referring to the dataset type via their own connection name - it also doesn't seem too onerous to demand that an InMemoryDatasetHandle user provide one, and in doing so sidestep all of your storage class inference concerns (though I'm not really bothered by the logic there now).

So, after typing all of that out, I think we probably do want to leave this as it is - but maybe we should RFC deprecating those public butler and ref attributes on DeferredDatasetHandle, so we can spot any existing usage and make sure no new usage develops.

python/lsst/pipe/base/_dataset_handle.py

timj · 2022-08-01T16:29:48Z

I admit I had failed to notice that .ref and .butler are public interfaces for DeferredDatasetHandle. I would be interested in knowing if anyone is using them. Maybe I can run a test with them as internal parameters and see if anything breaks. Making people always pass in a butler seems to go against making this thing easy to use if you just want to call a .run() method with some objects you already have.

I was wondering if we had a need for an abstract base class and I'm still wondering if this code should be in daf_butler.

This is like a DeferredDatasetHandle but allows you to create it without a butler so that Task.run() methods can be more easily tested.

timj force-pushed the tickets/DM-35741 branch from ec9cd28 to 13556c8 Compare July 28, 2022 21:14

timj requested a review from TallJimbo July 28, 2022 22:21

timj force-pushed the tickets/DM-35741 branch 3 times, most recently from 52f1e80 to 65a59ee Compare August 1, 2022 15:17

TallJimbo approved these changes Aug 1, 2022

View reviewed changes

python/lsst/pipe/base/_dataset_handle.py Outdated Show resolved Hide resolved

python/lsst/pipe/base/_dataset_handle.py Outdated Show resolved Hide resolved

python/lsst/pipe/base/_dataset_handle.py Show resolved Hide resolved

timj force-pushed the tickets/DM-35741 branch from 65a59ee to 4dcd4d3 Compare August 1, 2022 20:04

timj added 2 commits August 1, 2022 13:47

Add InMemoryDatasetHandle

63f3124

This is like a DeferredDatasetHandle but allows you to create it without a butler so that Task.run() methods can be more easily tested.

Add news fragment

75b2d78

timj force-pushed the tickets/DM-35741 branch 2 times, most recently from 2c4c2fb to 75b2d78 Compare August 1, 2022 21:03

timj merged commit cb57194 into main Aug 1, 2022

timj deleted the tickets/DM-35741 branch August 1, 2022 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-35741: Add InMemoryDatasetHandle #268

DM-35741: Add InMemoryDatasetHandle #268

timj commented Jul 28, 2022 •

edited

codecov bot commented Jul 28, 2022 •

edited

TallJimbo left a comment •

edited

timj commented Aug 1, 2022

DM-35741: Add InMemoryDatasetHandle #268

DM-35741: Add InMemoryDatasetHandle #268

Conversation

timj commented Jul 28, 2022 • edited

Checklist

codecov bot commented Jul 28, 2022 • edited

Codecov Report

TallJimbo left a comment • edited

Choose a reason for hiding this comment

timj commented Aug 1, 2022

timj commented Jul 28, 2022 •

edited

codecov bot commented Jul 28, 2022 •

edited

TallJimbo left a comment •

edited