Skip to content

Conversation

kfindeisen
Copy link
Member

This PR adds a regression test for the bug where HSC exposure IDs were being generated with values of 30,000,000+, and rewrites the ID generation algorithm to confine them in the allowed range.

@kfindeisen kfindeisen marked this pull request as ready for review January 12, 2023 00:14
@kfindeisen kfindeisen requested a review from ktlim January 12, 2023 18:29
self.assertEqual(last_group, int(20110101) * 100_000)

def test_exposure_id_hsc(self):
group = "2023011100026"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just a number, avoiding the later int()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get a lot of bugs related to integers being assigned as groups and then passed around like that, so I've insisted on other reviews that we always be strict with the types there. I could hardly make an exception for myself, especially since part of the problem is that we tend to think of group IDs as integers when formally they're not.

Broad queries for camera now always fail, so I picked the skymap as
another "simple" dataset from the test data.
This table can be used by exposure ID generation algorithms to ensure
that their output IDs are always valid.
@kfindeisen
Copy link
Member Author

@ktlim, I implemented the revised generator (thanks for catching the 1.75-year caveat!). I'm still testing, but so far output looks reasonable:

DEBUG:lsst.activator.activator:Received Snap(instrument='HSC', detector=0, group='2023011200017', snap=0, exp_id=1120017, filter='HSC-G')

Can you take another look and let me know if the code meets with your approval?

Copy link
Contributor

@ktlim ktlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks awesome — except the commit comment, which says that monotonicity is sacrificed when it isn't now!

The new ID generation algorithm is guaranteed to return an HSC ID in
the correct range (0-21,474,800), and is less likely to have collisions
from over-uploading. It is only valid until September 2024, but we hope
that will be enough.
While we shouldn't need refresh for the local repo, because there is
only one Butler for it, the central repo has one Butler per worker, and
any individual worker may be long- or short-lived. While it's not a
proper synchronization API, call `central_butler.registry.refresh`
before operations that read from or write to the central repo.
@kfindeisen kfindeisen merged commit bc39f18 into main Jan 12, 2023
@kfindeisen kfindeisen deleted the tickets/DM-37547 branch January 12, 2023 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants