DM-37387: A tester script using LATISS data #88

hsinfang · 2023-10-05T21:23:47Z

No description provided.

kfindeisen

Looks good, just a few questions about code organization/duplication.

kfindeisen · 2023-10-10T18:25:39Z

python/tester/upload.py

+            raise NotImplementedError
+        else:
+            # HSC-style
+            group = str(int(group_base) + i)


I don't understand the description of this group as "HSC-style". The groups generated by upload.py don't match any real instrument, so they should work with LATISS exactly as well as with HSC. Is there something in the application code that tries to parse the group ID?

You are right, there is nothing "HSC" about it. The group IDs are all made up.

I re-worked some group ID handling and here no longer has the if-else

kfindeisen · 2023-10-10T19:41:29Z

python/tester/utils.py

+    -------
+    group_id : `str`
+        A group ID in the LSST style.
+    """


Would it be possible to factor the HSC group ID code into a similar form, for ease of organization? The existing make_exposure_id and make_hsc_id are one model for doing so.

I'd suggest at least renaming the functions; right now it's not obvious that make_group_id is LSST-specific or that make_hsc_id is for exposure IDs only.

Yes and done.

As now (the original) make_group_id works for both instruments, I renamed it just make_group hoping that's less confusing with make_*_id (for exposure id). Also all these are about handling group IDs: make_group, decode_group, increment_group, get_last_group.

kfindeisen · 2023-10-10T21:05:07Z

python/tester/upload.py

    if instrument in _LSST_CAMERA_LIST:
-        raise NotImplementedError
+        last_seq_num = get_last_seq_num(dest_bucket, instrument, date)
+        raw_pool = get_samples_lsst(src_bucket, instrument)
+        seq_num = last_seq_num + random.randrange(10, 19)
+        _log.debug(f"Last seq_num {last_seq_num}; new seq_num base {seq_num}")
+        new_group_base = make_group_id(date, seq_num)
    else:
        last_group = get_last_group(dest_bucket, instrument, date)
        _log.info(f"Last group {last_group}")


Perhaps this block could be factored into its own function (from dest_bucket, instrument, date to raw_pool, new_group_base)? It wouldn't reduce the number of imports, but it would make the data flow clearer.

The group incrementation has been factored out, and this part here only get raw_pool now. I think it's clearer now.

kfindeisen · 2023-10-10T21:45:47Z

python/tester/upload.py


-def get_samples(bucket, instrument):
+def get_samples_non_lsst(bucket, instrument):
    """Return any predefined raw exposures for a given instrument.


"for an instrument that doesn't enforce filename conventions"?

Some filename conventions are still assumed but the conventions are different for LSST and non-LSST. I rephrased this and added some more docs.

I meant that for HSC, we were free to choose whatever path was convenient for PP. We couldn't do that for LATISS because the Summit(?) team put their foot down.

Ah, I see what you meant.

I think the added notes has clarified that.

kfindeisen · 2023-10-10T22:08:09Z

python/tester/upload.py

+        # Assume that the unobserved bucket uses the same filename scheme as
+        # the observed bucket.
+        m = re.match(LSST_REGEXP, blob.key)


Is this still a useful rule for LSST images? What would happen if the unobserved files followed KTL's informative path scheme, but were uploaded to the "correct" paths in the destination bucket?

Indeed we don't have to follow the same convention as in the LSST's raw file bucket, and we could code in information in the path name, organize the files whatever way, etc. As long as files are loaded in get_samples_lsst it should work, and the convention isn't really needed there. We can make that work. But, I think it might be less confusing if all LSST raws follow the same convention and can use utilities in activator/raw.py in the same way. Making dev as similar to prod as possible seems a plus. Because metadata are read from the adjacent json, adding new "unobserved" files is very easy. We can drop more files (fits and json) into the unobserved folder and then can run the test with more visits right away.

kfindeisen · 2023-10-10T23:08:01Z

python/tester/upload.py

+                day_obs, seq_num = decode_group_id(visit.groupId)
+                exposure_num = LsstBaseTranslator.compute_exposure_id(day_obs, seq_num)


Why couldn't this be implemented as the LATISS case of make_exposure_id (after fixing its assumption that the group ID can be cast to int)? This seems to be the main thing preventing the two branches of upload_from_pool from being merged.

Thank you for the very good idea. This is now shared in upload_from_pool. It still needs to handle the json for LSST cameras only.

kfindeisen · 2023-10-10T23:16:39Z

python/tester/upload.py

+                    for (key, header) in [("DAYOBS", day_obs),
+                                          ("SEQNUM", seq_num),
+                                          ("OBSID", obs_id)]:


Could make_exposure_id return a mapping instead of an individual key and value?

make_exposure_id has been modified and it now returns the exposure id plus a set of key-value pairs that should be in the header to be consistent.

A number had been used as the group ID in the HSC tester for convenience, but the group ID in the nextVisit events is a string. Currently a timestamp-like string is used in the nextVisit events from LSST summit. This commit makes some "group" to be a string, as a preparation step to make the upload.py script work with LSST data.

While the group ID in the existing HSC testers is an number and directly convertable to an integer, the group ID from LSST summit is a string. This commit factors out some group ID handling as a preparation step to make the upload.py script work with LSST data.

For HSC, only one header key is relevant to the exposure ID. But more than one header can be relevants for other instruments. Refactor this in preparation for LATISS for which multiple header keys need to be set to mock an exposure ID.

LSST data have a different file pattern and the as-is upload.py can not work with them. Be more explicit on that. This is a preparation step to make the script work with LSST data.

For an exposure by LSST instruments, there is no direct relationship between its group ID and its exposure number or filename. For the conveience in testers, make up a group ID convention from which the exposure number, day_obs, and seq_num can be deduced.

Unlike the HSC tester which has an incremental group ID number, this LATISS tester uses an incremental exposure sequence number.

The group ID of HSC testers has been number-like (e.g. 2023012300001). This changes HSC's group ID implementation to use a timestamp-like group ID, same as those used in LATISS testers.

These 10 files s3://rubin-pp-users/unobserved/HSC/ are used in upload.py HSC tester. As the group IDs in HSC testers have changed, change the file paths to be consistent. Practically this is not necessary because the unobserved visits' group IDs do not matter to the generated group IDs. But this is done just so to be consistent.

The location and permission of the raw HSC files on USDF have changed and the script no longer works. Update so it can still work.

hsinfang force-pushed the tickets/DM-37387 branch 3 times, most recently from 106f5fd to b0d1e5c Compare October 9, 2023 21:01

kfindeisen approved these changes Oct 11, 2023

View reviewed changes

hsinfang force-pushed the tickets/DM-37387 branch 8 times, most recently from 451391a to ccefd75 Compare October 17, 2023 23:23

hsinfang added 10 commits October 18, 2023 15:09

Refactor make_exposure_id used in HSC testers

693caa8

For HSC, only one header key is relevant to the exposure ID. But more than one header can be relevants for other instruments. Refactor this in preparation for LATISS for which multiple header keys need to be set to mock an exposure ID.

Raise NotImplementedError if upload.py is called with LSST camaras

fb809e7

LSST data have a different file pattern and the as-is upload.py can not work with them. Be more explicit on that. This is a preparation step to make the script work with LSST data.

Support upload.py to run with LATISS data

daa0b74

Unlike the HSC tester which has an incremental group ID number, this LATISS tester uses an incremental exposure sequence number.

Read metadata from json sidecar instead of hardcoding

141ac6c

Use timestamp-like group ID for HSC

1a516c4

The group ID of HSC testers has been number-like (e.g. 2023012300001). This changes HSC's group ID implementation to use a timestamp-like group ID, same as those used in LATISS testers.

Update the script prompt_prototype_upload_raws.sh to be runnable

6f3b2bf

The location and permission of the raw HSC files on USDF have changed and the script no longer works. Update so it can still work.

hsinfang force-pushed the tickets/DM-37387 branch from ccefd75 to 6f3b2bf Compare October 18, 2023 22:13

hsinfang merged commit 3ee2790 into main Oct 18, 2023

hsinfang deleted the tickets/DM-37387 branch October 18, 2023 22:16

		day_obs, seq_num = decode_group_id(visit.groupId)
		exposure_num = LsstBaseTranslator.compute_exposure_id(day_obs, seq_num)

DM-37387: A tester script using LATISS data #88

DM-37387: A tester script using LATISS data #88

Uh oh!

Conversation

hsinfang commented Oct 5, 2023

Uh oh!

kfindeisen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsinfang Oct 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hsinfang Oct 13, 2023 •

edited

Loading