Skip to content

Conversation

kfindeisen
Copy link
Member

This PR does a lot of refactoring of upload.py to make it easier to edit and more flexible with its inputs, then adds partial support for "observing" preloaded raw files. Because some of the raw metadata is hardcoded into the get_samples function, this script will break if any other raws are uploaded to rubin-prompt-proto-unobserved.

Because of the heavy refactoring on the first part of the ticket, I recommend reviewing this PR one or a few commits at a time; the net diff combines all the changes into one big rewrite.

@kfindeisen kfindeisen requested a review from parejkoj March 17, 2022 18:46
Copy link
Contributor

@ktlim ktlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure why the uploader needs to be so dynamic, but I'll let John figure that out. The rest looks fine. At some point we need to straighten out what's a group and what's a visit, but the distinction only matters in specialized cases.

ra=hsc_metadata[exposure_id]["ra"],
dec=hsc_metadata[exposure_id]["dec"],
kind="SURVEY",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to need to add rot here and also generate a fixed or random one above.

@kfindeisen
Copy link
Member Author

I'm not exactly sure why the uploader needs to be so dynamic

Because, for me, modifying the existing code was easier than trying to figure out brand-new behavior (e.g., CLI) from scratch.

The client that posts ``next_visit`` messages.
bucket : `google.cloud.storage.Bucket`
The bucket to which to transfer the raws, once observed.
visit_infos : `set` [`activator.Visit`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit nervous calling these "visit_infos", since VisitInfo is a different thing that doesn't align with this very well (though maybe we should make it do so?). Why not just visits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they're not visits either (in any of the senses of the word)! 😭

A name like "_info" feels natural to me, because these are blocks of metadata rather than identifiers (like visit in the Butler context). I'm willing to be flexible on the first part, though.


# TODO: may be cleaner to use a functor object than to depend on
# closures for the bucket and data.
def upload_dummy(visit, snap_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe upload_faked_files instead of dummy? Or does that not even upload files at all, so just upload_faked_visits_no_files (which is too long, but...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does, in fact, upload a file -- most of the contents of -main were uploaded using this code (pre-refactor, of course).

instrument : `str`
The short name of the active instrument.
date : `str`
The current date in YYYYMMDD format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a datetime or astropy date?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is part of the original group generation code, and I don't want to touch it because then I'd need to come up with a different scheme for avoiding collisions between uploaded files.

This format is assumed by several other files.
This change moves the detector-handling logic from process_group to main,
where it will be easier to customize for real test datasets.
This version of the code has a hardcoded dependency on specific HSC
files, which will need to be removed later.
The uploaded files weren't conforming to the Prompt Processing path schema.
Without this fix, we would need to delete old files from the main
bucket on every pass.
Current HSC raws use detectors 50 and 51 only.
@kfindeisen kfindeisen merged commit a6b8322 into main Mar 17, 2022
@kfindeisen kfindeisen deleted the tickets/DM-33970 branch March 17, 2022 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants