Skip to content

Conversation

hsinfang
Copy link
Collaborator

@hsinfang hsinfang commented Dec 1, 2022

No description provided.

@hsinfang hsinfang force-pushed the tickets/DM-37070 branch 3 times, most recently from 4af50d7 to 2fad95a Compare December 6, 2022 22:34
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this, and for cleaning up the code in the process! However, I'm worried that retrieveArtifacts may mix up the files and data IDs. Could you check this?

Comment on lines 125 to 128
Notes
-----
The current implementation may overflow if more than ~60 calls to upload.py
are done on the same day.
Copy link
Member

@kfindeisen kfindeisen Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note is a bit inappropriate if this function is in a shared library. Unfortunately, I'm not sure how to express it in more general terms, since the original calculation included the random 10-19 jump added in upload.py:main, and the ratio of jumps to visits is much more variable in upload_hsc_rc2.py.

I'm worried that this implementation will overflow once we're able to call upload_hsc_rc2.py with large numbers of visits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased it in 8624177. I think it reads a bit better now.

Indeed once we scale up more and call upload_hsc_rc2.py with a larger group number, this will overflow. We might want a future ticket for that. I think a longer string in the header should be fine if there aren't better ideas.

Copy link
Member

@kfindeisen kfindeisen Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we really are limited to 8 digits: https://github.com/lsst/astro_metadata_translator/blob/main/python/astro_metadata_translator/translators/hsc.py#L221 will reject any longer string as invalid, and the old-style header format actually has fewer possible values. 😞

I suspect we'll have to sacrifice either having a visit ID that looks like the group ID, or having a date-based group ID. Both are useful features for debugging.

Copy link
Member

@kfindeisen kfindeisen Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see an existing ticket, so I've filed DM-37364 for addressing the overflow issue.

for detector_id in range(112):
visit = Visit(instrument="HSC",
detector=detector_id,
group=group_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break activator, which expects that Visit.group is a string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_id is now a string.

with tempfile.TemporaryDirectory() as temp_dir:
transferred = butler.retrieveArtifacts(
refs, transfer="copy", preserve_path=False, destination=temp_dir)
for artifact, ref in zip(transferred, refs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this operation is safe; the documentation for retrieveArtifacts says that there may be more artifacts than refs, and that the order doesn't match.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! I rewrote this without the assumption of same length here.
I'm not sure the current replace_header_key can work if a raw dataset has multiple artifacts; more info is needed about how the artifacts are like. As this known input (HSC-RC2 on a known filesystem) has a one-to-one relationship between raw dataset and file, I'm just going to verify that and continue, and throw an error if more than one artifact is obtained for one dataset.

@hsinfang hsinfang force-pushed the tickets/DM-37070 branch 3 times, most recently from eea629c to 8df2c97 Compare December 13, 2022 00:18
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of the retrieveArtifacts issue! Just a few minor comments.

@hsinfang hsinfang force-pushed the tickets/DM-37070 branch 5 times, most recently from 5aed71f to 8371a6d Compare December 14, 2022 23:56
@hsinfang hsinfang merged commit aa9cc53 into main Dec 15, 2022
@hsinfang hsinfang deleted the tickets/DM-37070 branch December 15, 2022 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants