-
Notifications
You must be signed in to change notification settings - Fork 0
DM-37070: tester script using HSC-RC2 data at /repo/main #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4af50d7
to
2fad95a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this, and for cleaning up the code in the process! However, I'm worried that retrieveArtifacts
may mix up the files and data IDs. Could you check this?
python/tester/util.py
Outdated
Notes | ||
----- | ||
The current implementation may overflow if more than ~60 calls to upload.py | ||
are done on the same day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note is a bit inappropriate if this function is in a shared library. Unfortunately, I'm not sure how to express it in more general terms, since the original calculation included the random 10-19 jump added in upload.py:main
, and the ratio of jumps to visits is much more variable in upload_hsc_rc2.py
.
I'm worried that this implementation will overflow once we're able to call upload_hsc_rc2.py
with large numbers of visits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rephrased it in 8624177. I think it reads a bit better now.
Indeed once we scale up more and call upload_hsc_rc2.py
with a larger group number, this will overflow. We might want a future ticket for that. I think a longer string in the header should be fine if there aren't better ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we really are limited to 8 digits: https://github.com/lsst/astro_metadata_translator/blob/main/python/astro_metadata_translator/translators/hsc.py#L221 will reject any longer string as invalid, and the old-style header format actually has fewer possible values. 😞
I suspect we'll have to sacrifice either having a visit ID that looks like the group ID, or having a date-based group ID. Both are useful features for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see an existing ticket, so I've filed DM-37364 for addressing the overflow issue.
python/tester/upload_hsc_rc2.py
Outdated
for detector_id in range(112): | ||
visit = Visit(instrument="HSC", | ||
detector=detector_id, | ||
group=group_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break activator
, which expects that Visit.group
is a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group_id
is now a string.
python/tester/upload_hsc_rc2.py
Outdated
with tempfile.TemporaryDirectory() as temp_dir: | ||
transferred = butler.retrieveArtifacts( | ||
refs, transfer="copy", preserve_path=False, destination=temp_dir) | ||
for artifact, ref in zip(transferred, refs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this operation is safe; the documentation for retrieveArtifacts
says that there may be more artifacts than refs, and that the order doesn't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! I rewrote this without the assumption of same length here.
I'm not sure the current replace_header_key
can work if a raw dataset has multiple artifacts; more info is needed about how the artifacts are like. As this known input (HSC-RC2 on a known filesystem) has a one-to-one relationship between raw dataset and file, I'm just going to verify that and continue, and throw an error if more than one artifact is obtained for one dataset.
eea629c
to
8df2c97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of the retrieveArtifacts
issue! Just a few minor comments.
5aed71f
to
8371a6d
Compare
This basically reverts commit 888efa7
This tester script uses the HSC-RC2 dataset at USDF /repo/main
8371a6d
to
8624177
Compare
No description provided.