-
Notifications
You must be signed in to change notification settings - Fork 0
DM-37387: A tester script using LATISS data #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
106f5fd
to
b0d1e5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few questions about code organization/duplication.
python/tester/upload.py
Outdated
raise NotImplementedError | ||
else: | ||
# HSC-style | ||
group = str(int(group_base) + i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the description of this group as "HSC-style". The groups generated by upload.py
don't match any real instrument, so they should work with LATISS exactly as well as with HSC. Is there something in the application code that tries to parse the group ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, there is nothing "HSC" about it. The group IDs are all made up.
I re-worked some group ID handling and here no longer has the if-else
------- | ||
group_id : `str` | ||
A group ID in the LSST style. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to factor the HSC group ID code into a similar form, for ease of organization? The existing make_exposure_id
and make_hsc_id
are one model for doing so.
I'd suggest at least renaming the functions; right now it's not obvious that make_group_id
is LSST-specific or that make_hsc_id
is for exposure IDs only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and done.
As now (the original) make_group_id
works for both instruments, I renamed it just make_group
hoping that's less confusing with make_*_id
(for exposure id). Also all these are about handling group IDs: make_group
, decode_group
, increment_group
, get_last_group
.
python/tester/upload.py
Outdated
if instrument in _LSST_CAMERA_LIST: | ||
raise NotImplementedError | ||
last_seq_num = get_last_seq_num(dest_bucket, instrument, date) | ||
raw_pool = get_samples_lsst(src_bucket, instrument) | ||
seq_num = last_seq_num + random.randrange(10, 19) | ||
_log.debug(f"Last seq_num {last_seq_num}; new seq_num base {seq_num}") | ||
new_group_base = make_group_id(date, seq_num) | ||
else: | ||
last_group = get_last_group(dest_bucket, instrument, date) | ||
_log.info(f"Last group {last_group}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this block could be factored into its own function (from dest_bucket, instrument, date
to raw_pool, new_group_base
)? It wouldn't reduce the number of imports, but it would make the data flow clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The group incrementation has been factored out, and this part here only get raw_pool
now. I think it's clearer now.
python/tester/upload.py
Outdated
|
||
def get_samples(bucket, instrument): | ||
def get_samples_non_lsst(bucket, instrument): | ||
"""Return any predefined raw exposures for a given instrument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for an instrument that doesn't enforce filename conventions"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some filename conventions are still assumed but the conventions are different for LSST and non-LSST. I rephrased this and added some more docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that for HSC, we were free to choose whatever path was convenient for PP. We couldn't do that for LATISS because the Summit(?) team put their foot down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you meant.
I think the added notes has clarified that.
# Assume that the unobserved bucket uses the same filename scheme as | ||
# the observed bucket. | ||
m = re.match(LSST_REGEXP, blob.key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a useful rule for LSST images? What would happen if the unobserved files followed KTL's informative path scheme, but were uploaded to the "correct" paths in the destination bucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we don't have to follow the same convention as in the LSST's raw file bucket, and we could code in information in the path name, organize the files whatever way, etc. As long as files are loaded in get_samples_lsst
it should work, and the convention isn't really needed there. We can make that work. But, I think it might be less confusing if all LSST raws follow the same convention and can use utilities in activator/raw.py
in the same way. Making dev
as similar to prod
as possible seems a plus. Because metadata are read from the adjacent json, adding new "unobserved" files is very easy. We can drop more files (fits and json) into the unobserved folder and then can run the test with more visits right away.
python/tester/upload.py
Outdated
day_obs, seq_num = decode_group_id(visit.groupId) | ||
exposure_num = LsstBaseTranslator.compute_exposure_id(day_obs, seq_num) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why couldn't this be implemented as the LATISS case of make_exposure_id
(after fixing its assumption that the group ID can be cast to int)? This seems to be the main thing preventing the two branches of upload_from_pool
from being merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the very good idea. This is now shared in upload_from_pool
. It still needs to handle the json for LSST cameras only.
python/tester/upload.py
Outdated
for (key, header) in [("DAYOBS", day_obs), | ||
("SEQNUM", seq_num), | ||
("OBSID", obs_id)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make_exposure_id
return a mapping instead of an individual key and value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_exposure_id
has been modified and it now returns the exposure id plus a set of key-value pairs that should be in the header to be consistent.
451391a
to
ccefd75
Compare
A number had been used as the group ID in the HSC tester for convenience, but the group ID in the nextVisit events is a string. Currently a timestamp-like string is used in the nextVisit events from LSST summit. This commit makes some "group" to be a string, as a preparation step to make the upload.py script work with LSST data.
While the group ID in the existing HSC testers is an number and directly convertable to an integer, the group ID from LSST summit is a string. This commit factors out some group ID handling as a preparation step to make the upload.py script work with LSST data.
For HSC, only one header key is relevant to the exposure ID. But more than one header can be relevants for other instruments. Refactor this in preparation for LATISS for which multiple header keys need to be set to mock an exposure ID.
LSST data have a different file pattern and the as-is upload.py can not work with them. Be more explicit on that. This is a preparation step to make the script work with LSST data.
For an exposure by LSST instruments, there is no direct relationship between its group ID and its exposure number or filename. For the conveience in testers, make up a group ID convention from which the exposure number, day_obs, and seq_num can be deduced.
Unlike the HSC tester which has an incremental group ID number, this LATISS tester uses an incremental exposure sequence number.
The group ID of HSC testers has been number-like (e.g. 2023012300001). This changes HSC's group ID implementation to use a timestamp-like group ID, same as those used in LATISS testers.
These 10 files s3://rubin-pp-users/unobserved/HSC/ are used in upload.py HSC tester. As the group IDs in HSC testers have changed, change the file paths to be consistent. Practically this is not necessary because the unobserved visits' group IDs do not matter to the generated group IDs. But this is done just so to be consistent.
The location and permission of the raw HSC files on USDF have changed and the script no longer works. Update so it can still work.
ccefd75
to
6f3b2bf
Compare
No description provided.