DM-42220: Incorporate ModelPackage Butler datasets into Prompt Processing #125

kfindeisen · 2024-02-20T20:26:12Z

This PR adds support for machine learning models (specifically, the pretrainedModelPackage dataset type) to prep_butler. Applicable models have already been added to /repo/embargo and our integration testing repos.

The main thing we want to do with the datasetRef collections is test for membership, for which sets are ideal.

python/activator/middleware_interface.py

Previously standard collection names were represented partly by constants, and partly by methods. Making them all attributes/properties, with the same naming convention, provides a more uniform API.

A model is quite large (hundreds of MB), but because the same model is used regardless of data ID, it should be a one-time overhead for the pod. This commit also chains the dummy model to the test repo's DECam/defaults, something that should have been done when it was added.

Now that the model is loaded in prep_butler (in both unit and integration tests), we can build a graph for the complete pipeline without getting missing dataset errors.

kfindeisen added 3 commits February 20, 2024 12:32

Include pretrained_models in dev repo generation.

02a0f26

Use set, not list, as internal type in prep_butler.

6557cb9

The main thing we want to do with the datasetRef collections is test for membership, for which sets are ideal.

Fix formatting in MiddlewareInterface method docs.

92c66b9

kfindeisen force-pushed the tickets/DM-42220 branch from 5c492ac to 2243087 Compare February 20, 2024 20:34

kfindeisen mentioned this pull request Feb 20, 2024

DM-42220: Incorporate ModelPackage Butler datasets into Prompt Processing lsst/ap_pipe#164

Merged

kfindeisen requested a review from hsinfang February 20, 2024 22:15

hsinfang approved these changes Feb 20, 2024

View reviewed changes

python/activator/middleware_interface.py Outdated Show resolved Hide resolved

kfindeisen added 3 commits February 20, 2024 16:08

Standardize handling of collection names in MWI.

12f3db4

Previously standard collection names were represented partly by constants, and partly by methods. Making them all attributes/properties, with the same naming convention, provides a more uniform API.

Allow real-bogus in test pipelines.

aab641c

Now that the model is loaded in prep_butler (in both unit and integration tests), we can build a graph for the complete pipeline without getting missing dataset errors.

kfindeisen force-pushed the tickets/DM-42220 branch from 2243087 to aab641c Compare February 21, 2024 00:15

kfindeisen merged commit 6ef6df2 into main Feb 21, 2024
6 checks passed

kfindeisen deleted the tickets/DM-42220 branch February 21, 2024 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-42220: Incorporate ModelPackage Butler datasets into Prompt Processing #125

DM-42220: Incorporate ModelPackage Butler datasets into Prompt Processing #125

kfindeisen commented Feb 20, 2024

DM-42220: Incorporate ModelPackage Butler datasets into Prompt Processing #125

DM-42220: Incorporate ModelPackage Butler datasets into Prompt Processing #125

Conversation

kfindeisen commented Feb 20, 2024