Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38454: butler mode for ModelPackages #21

Merged
merged 25 commits into from Dec 15, 2023
Merged

DM-38454: butler mode for ModelPackages #21

merged 25 commits into from Dec 15, 2023

Conversation

NimSed
Copy link
Contributor

@NimSed NimSed commented Dec 4, 2023

The commits covered by this PR cover a large time span:

  • Those from months ago, implemented the "butler-hybrid" mode.
  • The recent ones convert the "butler-hybrid" to "butler" mode.
    Therefore, my first guess is that looking at the changes as a whole is easier, as compared to checking the commits one by one. But that might be just my taste.

@NimSed NimSed requested a review from TallJimbo December 4, 2023 21:45
@NimSed NimSed force-pushed the tickets/DM-38454 branch 3 times, most recently from 61618c4 to dc72b2b Compare December 12, 2023 02:44
E.g. the collection "pretrained_models/dummy" will be where the
pretrained weights for the model package "dummy" will be stored.
This storage mode allows butler to load pretrained weights
from butler and pass them down to the task just like other
typical LSST tasks.
@NimSed NimSed force-pushed the tickets/DM-38454 branch 6 times, most recently from dde8a98 to ff074fc Compare December 12, 2023 04:10
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only big-picture concern is that I'm not sure you're getting much from the StorageAdapterBase class and its subclasses, unless it's all just backwards compatibility and this just isn't the right ticket to remove them. Ideally I'd like to get this down to two modes of operation:

  • If you call RBTransiNetTask.run directly from Python and pass the payload to use as an argument, you can load it however you'd like (and this is what I'd expect all or most unit tests to do, and possibly a lot of development work, to the extent that's done in interactive Python sessions).

  • If you run that task as a PipelineTask, the execution framework loads the payload from the butler and passes it to run for you.

I do think there's probably a need for a concrete/final storage utility class as a place to put methods like ingest and low-level I/O code, but I don't see a need for an abstraction layer here (again, except as a way to do backwards compatibility from the rbClassifier_data git-package approach, which would be totally legitimate).

I also suspect that if you weren't trying to fit this into the StorageAdapterBase scheme, a more natural NNModelPackagePayload might actually hold the architecture/weights/metadata instead of a raw BytesIO - but I think you've got the necessary levels of indirection here to change that in the future without breaking anything, so it's not something we need to resolve on this ticket.

I'm hitting "Approve" now because I think all the little things that should get addressed before merge are straightforward and hopefully controversial, and I'm happy to defer any big-picture issues to other tickets.

python/lsst/meas/transiNet/modelPackages/formatters.py Outdated Show resolved Hide resolved
python/lsst/meas/transiNet/modelPackages/formatters.py Outdated Show resolved Hide resolved
python/lsst/meas/transiNet/modelPackages/formatters.py Outdated Show resolved Hide resolved
python/lsst/meas/transiNet/modelPackages/formatters.py Outdated Show resolved Hide resolved
payload = NNModelPackagePayload()
with open(self.fileDescriptor.location.path, "rb") as f:
payload.bytes = BytesIO(f.read())
return payload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this formatter can read from and write to bytes, you should probably implement can_read_bytes, toBytes, and fromBytes, too. These will be preferred (when can_read_bytes is True) when reading from object stores for efficiency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added very simple implementations of fromBytes and toBytes -- the base implementation of can_read_bytes should suffice?

I don't know a simple/standard way for testing them though.

python/lsst/meas/transiNet/modelPackages/utils.py Outdated Show resolved Hide resolved
# needed (e.g. in butler mode).
self.model_package_name = task.config.modelPackageName or 'N/A'

self.package_storage_mode = task.config.modelPackageStorageMode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to pass these (and anything else you get from task) directly to __init__ rather than passing task instance, to avoid introducing a circular interface dependency between the task and this class.

python/lsst/meas/transiNet/rbTransiNetTask.py Outdated Show resolved Hide resolved
results = self.butler.registry.queryDatasets(StorageAdapterButler.dataset_type_name,
collections=f'{StorageAdapterButler.packages_parent_collection}/{self.model_package_name}') # noqa: E501
payload = self.butler.get(list(results)[0])
self.from_payload(payload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this isn't done inside the formatter. How does from_payload know it's a zip file it gets back? This is still subverting the role of butler. Why can't the unzip go inside the formatter and NNModelPayload have the three components in it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recapping two separate pairwise discussions: I agree, but as long as there's only one file extension (i.e. zip) in play, we can make that change on a separate ticket and I could imagine that working better in terms of transitioning from the old way to load these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have some comments mainly re. how this matches (or not) the initial design, and how much we want to add to/remove from abstraction layers, etc. But yes, let's keep it for another ticket.

@NimSed NimSed force-pushed the tickets/DM-38454 branch 3 times, most recently from 4361fbc to 9388f53 Compare December 12, 2023 22:22
@NimSed NimSed force-pushed the tickets/DM-38454 branch 2 times, most recently from 987aa95 to b429e04 Compare December 13, 2023 00:57
NimSed and others added 3 commits December 12, 2023 17:06
Co-authored-by: Jim Bosch <talljimbo@gmail.com>
Co-authored-by: Jim Bosch <talljimbo@gmail.com>
@NimSed NimSed merged commit 1be1135 into main Dec 15, 2023
2 checks passed
@NimSed NimSed deleted the tickets/DM-38454 branch December 15, 2023 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants