DM-38454: butler mode for ModelPackages #21

NimSed · 2023-12-04T21:45:24Z

The commits covered by this PR cover a large time span:

Those from months ago, implemented the "butler-hybrid" mode.
The recent ones convert the "butler-hybrid" to "butler" mode.
Therefore, my first guess is that looking at the changes as a whole is easier, as compared to checking the commits one by one. But that might be just my taste.

the Buler repo.

with a custom formatter.

E.g. the collection "pretrained_models/dummy" will be where the pretrained weights for the model package "dummy" will be stored.

This storage mode allows butler to load pretrained weights from butler and pass them down to the task just like other typical LSST tasks.

TallJimbo

My only big-picture concern is that I'm not sure you're getting much from the StorageAdapterBase class and its subclasses, unless it's all just backwards compatibility and this just isn't the right ticket to remove them. Ideally I'd like to get this down to two modes of operation:

If you call RBTransiNetTask.run directly from Python and pass the payload to use as an argument, you can load it however you'd like (and this is what I'd expect all or most unit tests to do, and possibly a lot of development work, to the extent that's done in interactive Python sessions).
If you run that task as a PipelineTask, the execution framework loads the payload from the butler and passes it to run for you.

I do think there's probably a need for a concrete/final storage utility class as a place to put methods like ingest and low-level I/O code, but I don't see a need for an abstraction layer here (again, except as a way to do backwards compatibility from the rbClassifier_data git-package approach, which would be totally legitimate).

I also suspect that if you weren't trying to fit this into the StorageAdapterBase scheme, a more natural NNModelPackagePayload might actually hold the architecture/weights/metadata instead of a raw BytesIO - but I think you've got the necessary levels of indirection here to change that in the future without breaking anything, so it's not something we need to resolve on this ticket.

I'm hitting "Approve" now because I think all the little things that should get addressed before merge are straightforward and hopefully controversial, and I'm happy to defer any big-picture issues to other tickets.

python/lsst/meas/transiNet/modelPackages/formatters.py

TallJimbo · 2023-12-12T20:16:03Z

python/lsst/meas/transiNet/modelPackages/formatters.py

+        payload = NNModelPackagePayload()
+        with open(self.fileDescriptor.location.path, "rb") as f:
+            payload.bytes = BytesIO(f.read())
+        return payload


Since this formatter can read from and write to bytes, you should probably implement can_read_bytes, toBytes, and fromBytes, too. These will be preferred (when can_read_bytes is True) when reading from object stores for efficiency.

Just added very simple implementations of fromBytes and toBytes -- the base implementation of can_read_bytes should suffice?

I don't know a simple/standard way for testing them though.

python/lsst/meas/transiNet/modelPackages/storageAdapterButler.py

python/lsst/meas/transiNet/modelPackages/storageAdapterButlerHybrid.py

python/lsst/meas/transiNet/modelPackages/utils.py

TallJimbo · 2023-12-12T20:37:06Z

python/lsst/meas/transiNet/rbTransiNetInterface.py

+        # needed (e.g. in butler mode).
+        self.model_package_name = task.config.modelPackageName or 'N/A'
+
+        self.package_storage_mode = task.config.modelPackageStorageMode


Better to pass these (and anything else you get from task) directly to __init__ rather than passing task instance, to avoid introducing a circular interface dependency between the task and this class.

python/lsst/meas/transiNet/rbTransiNetTask.py

timj · 2023-12-12T21:06:06Z

python/lsst/meas/transiNet/modelPackages/storageAdapterButler.py

+        results = self.butler.registry.queryDatasets(StorageAdapterButler.dataset_type_name,
+                                                     collections=f'{StorageAdapterButler.packages_parent_collection}/{self.model_package_name}')  # noqa: E501
+        payload = self.butler.get(list(results)[0])
+        self.from_payload(payload)


I don't understand why this isn't done inside the formatter. How does from_payload know it's a zip file it gets back? This is still subverting the role of butler. Why can't the unzip go inside the formatter and NNModelPayload have the three components in it?

Recapping two separate pairwise discussions: I agree, but as long as there's only one file extension (i.e. zip) in play, we can make that change on a separate ticket and I could imagine that working better in terms of transitioning from the old way to load these.

I'd have some comments mainly re. how this matches (or not) the initial design, and how much we want to add to/remove from abstraction layers, etc. But yes, let's keep it for another ticket.

This way we can also load a model/module which is already loaded as a file-like object.

from a butler repository.

for ingesting new model packages.

using a temporary (but real) butler repository.

Plus some minor fixes.

to be passed back and forth between the formatter and storageAdapterButler.

(only for _real_ runs of pipelines)

butler.

Co-authored-by: Jim Bosch <talljimbo@gmail.com>

NimSed requested a review from TallJimbo December 4, 2023 21:45

NimSed force-pushed the tickets/DM-38454 branch 3 times, most recently from 61618c4 to dc72b2b Compare December 12, 2023 02:44

NimSed added 4 commits December 11, 2023 19:18

A sample script to test ingestion of pretrained models into

6bae4b7

the Buler repo.

Pretrained models can now be stored into and loaded from butler,

011159e

with a custom formatter.

Ingest pretrained models under the pretrained_models collection.

0cfb290

E.g. the collection "pretrained_models/dummy" will be where the pretrained weights for the model package "dummy" will be stored.

Added the "butler-hybrid" mode to the ModelPackage interface.

e376b92

This storage mode allows butler to load pretrained weights from butler and pass them down to the task just like other typical LSST tasks.

NimSed force-pushed the tickets/DM-38454 branch from 5e483af to cc2c627 Compare December 12, 2023 03:32

NimSed added 3 commits December 11, 2023 19:37

Move formatters to a separate module

10daa85

Rename butler-hybrid to butler

baaf5db

Remove flake8 from pytest

7bb791f

NimSed force-pushed the tickets/DM-38454 branch 6 times, most recently from dde8a98 to ff074fc Compare December 12, 2023 04:10

TallJimbo approved these changes Dec 12, 2023

View reviewed changes

timj reviewed Dec 12, 2023

View reviewed changes

NimSed force-pushed the tickets/DM-38454 branch 3 times, most recently from 4361fbc to 9388f53 Compare December 12, 2023 22:22

NimSed added 7 commits December 12, 2023 16:50

Decompose the import_model utility into its constituents.

3b11145

This way we can also load a model/module which is already loaded as a file-like object.

Add StorageAdapterButler to abstract out loading a ModelPackage

c39824e

from a butler repository.

(pre)load the whole package instead of just the weights.

06161ca

Add the static ingest() method, as a convenience tool

cfa8567

for ingesting new model packages.

Define sanity_check_dummy_model() to have a cleaner code.

6f375c2

Add TestModelPackageButler to test the butler-mode model packages

5c0334d

using a temporary (but real) butler repository.

Minor refactoring of tests.

6b20ca7

Plus some minor fixes.

NimSed added 6 commits December 12, 2023 16:50

lookupFunctions are obselete now. Use optional connection instead.

ccc03d7

Major: define a more butler-oriented wrapper around modelPackages

75bf891

to be passed back and forth between the formatter and storageAdapterButler.

Prevent the user from specifying modelPackageName in the butler mode

1943e79

(only for _real_ runs of pipelines)

A demo script show-casing how to ingest a 'local' model-package into

1676a07

butler.

Linting

974d22f

Remove unused formatters

c8bc3d6

NimSed force-pushed the tickets/DM-38454 branch 2 times, most recently from 987aa95 to b429e04 Compare December 13, 2023 00:57

NimSed and others added 3 commits December 12, 2023 17:06

More linting

a556089

Co-authored-by: Jim Bosch <talljimbo@gmail.com>

~silence the exception

23b1488

Co-authored-by: Jim Bosch <talljimbo@gmail.com>

Drop runQuantum's useless implementation :)

f3f6e60

NimSed force-pushed the tickets/DM-38454 branch from b429e04 to f3f6e60 Compare December 13, 2023 01:06

NimSed added 2 commits December 12, 2023 17:14

Obselete the ButlerHybrid mode altogether.

5382a94

Add fromBytes(), toBytes() to NNModelPackageFormatter.

ff1c4c5

NimSed merged commit 1be1135 into main Dec 15, 2023
2 checks passed

NimSed deleted the tickets/DM-38454 branch December 15, 2023 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-38454: butler mode for ModelPackages #21

DM-38454: butler mode for ModelPackages #21

NimSed commented Dec 4, 2023

TallJimbo left a comment

TallJimbo Dec 12, 2023

NimSed Dec 13, 2023

TallJimbo Dec 12, 2023

timj Dec 12, 2023

TallJimbo Dec 13, 2023

NimSed Dec 13, 2023

DM-38454: butler mode for ModelPackages #21

DM-38454: butler mode for ModelPackages #21

Conversation

NimSed commented Dec 4, 2023

TallJimbo left a comment

Choose a reason for hiding this comment

TallJimbo Dec 12, 2023

Choose a reason for hiding this comment

NimSed Dec 13, 2023

Choose a reason for hiding this comment

TallJimbo Dec 12, 2023

Choose a reason for hiding this comment

timj Dec 12, 2023

Choose a reason for hiding this comment

TallJimbo Dec 13, 2023

Choose a reason for hiding this comment

NimSed Dec 13, 2023

Choose a reason for hiding this comment