Skip to content

Conversation

hsinfang
Copy link
Collaborator

@hsinfang hsinfang commented Oct 5, 2023

No description provided.

@hsinfang hsinfang force-pushed the tickets/DM-37387 branch 3 times, most recently from 106f5fd to b0d1e5c Compare October 9, 2023 21:01
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few questions about code organization/duplication.

raise NotImplementedError
else:
# HSC-style
group = str(int(group_base) + i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the description of this group as "HSC-style". The groups generated by upload.py don't match any real instrument, so they should work with LATISS exactly as well as with HSC. Is there something in the application code that tries to parse the group ID?

Copy link
Collaborator Author

@hsinfang hsinfang Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, there is nothing "HSC" about it. The group IDs are all made up.

I re-worked some group ID handling and here no longer has the if-else

-------
group_id : `str`
A group ID in the LSST style.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to factor the HSC group ID code into a similar form, for ease of organization? The existing make_exposure_id and make_hsc_id are one model for doing so.

I'd suggest at least renaming the functions; right now it's not obvious that make_group_id is LSST-specific or that make_hsc_id is for exposure IDs only.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and done.

As now (the original) make_group_id works for both instruments, I renamed it just make_group hoping that's less confusing with make_*_id (for exposure id). Also all these are about handling group IDs: make_group, decode_group, increment_group, get_last_group.

Comment on lines 120 to 128
if instrument in _LSST_CAMERA_LIST:
raise NotImplementedError
last_seq_num = get_last_seq_num(dest_bucket, instrument, date)
raw_pool = get_samples_lsst(src_bucket, instrument)
seq_num = last_seq_num + random.randrange(10, 19)
_log.debug(f"Last seq_num {last_seq_num}; new seq_num base {seq_num}")
new_group_base = make_group_id(date, seq_num)
else:
last_group = get_last_group(dest_bucket, instrument, date)
_log.info(f"Last group {last_group}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this block could be factored into its own function (from dest_bucket, instrument, date to raw_pool, new_group_base)? It wouldn't reduce the number of imports, but it would make the data flow clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The group incrementation has been factored out, and this part here only get raw_pool now. I think it's clearer now.


def get_samples(bucket, instrument):
def get_samples_non_lsst(bucket, instrument):
"""Return any predefined raw exposures for a given instrument.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for an instrument that doesn't enforce filename conventions"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some filename conventions are still assumed but the conventions are different for LSST and non-LSST. I rephrased this and added some more docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that for HSC, we were free to choose whatever path was convenient for PP. We couldn't do that for LATISS because the Summit(?) team put their foot down.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you meant.

I think the added notes has clarified that.

Comment on lines +255 to +246
# Assume that the unobserved bucket uses the same filename scheme as
# the observed bucket.
m = re.match(LSST_REGEXP, blob.key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a useful rule for LSST images? What would happen if the unobserved files followed KTL's informative path scheme, but were uploaded to the "correct" paths in the destination bucket?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we don't have to follow the same convention as in the LSST's raw file bucket, and we could code in information in the path name, organize the files whatever way, etc.  As long as files are loaded in get_samples_lsst it should work, and the convention isn't really needed there. We can make that work. But, I think it might be less confusing if all LSST raws follow the same convention and can use utilities in activator/raw.py in the same way. Making dev as similar to prod as possible seems a plus. Because metadata are read from the adjacent json, adding new "unobserved" files is very easy.  We can drop more files (fits and json) into the unobserved folder and then can run the test with more visits right away. 

Comment on lines 349 to 350
day_obs, seq_num = decode_group_id(visit.groupId)
exposure_num = LsstBaseTranslator.compute_exposure_id(day_obs, seq_num)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why couldn't this be implemented as the LATISS case of make_exposure_id (after fixing its assumption that the group ID can be cast to int)? This seems to be the main thing preventing the two branches of upload_from_pool from being merged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the very good idea. This is now shared in upload_from_pool. It still needs to handle the json for LSST cameras only.

Comment on lines 361 to 363
for (key, header) in [("DAYOBS", day_obs),
("SEQNUM", seq_num),
("OBSID", obs_id)]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make_exposure_id return a mapping instead of an individual key and value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_exposure_id has been modified and it now returns the exposure id plus a set of key-value pairs that should be in the header to be consistent.

@hsinfang hsinfang force-pushed the tickets/DM-37387 branch 8 times, most recently from 451391a to ccefd75 Compare October 17, 2023 23:23
A number had been used as the group ID in the HSC tester
for convenience, but the group ID in the nextVisit events
is a string. Currently a timestamp-like string is used in
the nextVisit events from LSST summit.

This commit makes some "group" to be a string, as a preparation
step to make the upload.py script work with LSST data.
While the group ID in the existing HSC testers is an
number and directly convertable to an integer, the
group ID from LSST summit is a string.  This commit
factors out some group ID handling as a preparation step
to make the upload.py script work with LSST data.
For HSC, only one header key is relevant to the exposure ID.
But more than one header can be relevants for other instruments.
Refactor this in preparation for LATISS for which multiple header
keys need to be set to mock an exposure ID.
LSST data have a different file pattern and the as-is upload.py
can not work with them. Be more explicit on that. This is a
preparation step to make the script work with LSST data.
For an exposure by LSST instruments, there is no direct relationship
between its group ID and its exposure number or filename.  For the
conveience in testers, make up a group ID convention from which
the exposure number, day_obs, and seq_num can be deduced.
Unlike the HSC tester which has an incremental group ID number,
this LATISS tester uses an incremental exposure sequence number.
The group ID of HSC testers has been number-like (e.g. 2023012300001).
This changes HSC's group ID implementation to use a timestamp-like
group ID, same as those used in LATISS testers.
These 10 files s3://rubin-pp-users/unobserved/HSC/
are used in upload.py HSC tester.

As the group IDs in HSC testers have changed, change
the file paths to be consistent.  Practically this
is not necessary because the unobserved visits'
group IDs do not matter to the generated group IDs.
But this is done just so to be consistent.
The location and permission of the raw HSC files on USDF
have changed and the script no longer works.  Update so
it can still work.
@hsinfang hsinfang merged commit 3ee2790 into main Oct 18, 2023
@hsinfang hsinfang deleted the tickets/DM-37387 branch October 18, 2023 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants