DM-31924: new high-level interface for making IDs for catalogs and their rows #239

TallJimbo · 2023-04-18T17:12:20Z

No description provided.

parejkoj

How is the test coverage of the new id classes? I'm having trouble figuring out how much of the new code would be executed by the tests you added, and whether the tests are actually independent of the code itself.

You say "integers" throughout this: is that intentionally "number of bits" generic, or do you mean 32-bit integers?

I think this could use some narrative documentation on how to use it, probably in doc/lsst.meas.base/config/.

python/lsst/meas/base/_id_generator.py

parejkoj · 2023-04-21T19:05:53Z

python/lsst/meas/base/_id_generator.py

+"""Default release ID to embed in catalog IDs.
+
+This can be changed globally to avoid having to override individual task
+configs to set the release ID.


Is the intention then that this number gets bumped up for the pipelines release that is going to be used to generate the next data release? That procedure should get documented in the "how to make a release" docs, so that we're ready for it in the future.

It's a stretch to say I have a plan for that right now and I don't want to claim to have invented one, though that is more or less I how I envisioned it working (or at least that's better than trying to override this in config in all relevant tasks).

Fair enough: do you think it's worth a short RFC to describe your idea for this process, and so we can think about other places we'll have to intentionally bump version numbers for releases? At least to have more than you and me being aware of this setting.

Oh, and a terrible thought: do we need to worry about point releases for data releases? I hope not!

I think I'll add a quick note to RFC-917 when adopting it, so (as you said) what's present now has a bit more visibility. I've got too many other things that need my attention right now to spend any more time on fleshing this scheme out.

parejkoj · 2023-04-21T19:38:24Z

python/lsst/meas/base/_id_generator.py

+class BaseIdGeneratorConfig(Config):
+    """Base class for configuration of `IdGenerator` instances.
+
+    This class is abstract (it cannot use `abc.ABCMeta` for technical reasons),


What are the technical reasons? If you can't make it an explicit abc, the class docstring probably needs to not which methods need to be implemented in the children.

Metaclass conflict; pex.config also has a custom metaclass.

python/lsst/meas/base/_id_generator.py

parejkoj · 2023-04-21T21:21:21Z

python/lsst/meas/base/_id_generator.py

+        self.n_data_ids = 1 << self.packer.maxBits
+        upper_bits = (self.n_releases - 1).bit_length() + self.packer.maxBits
+        self.counter_bits = IdFactory.computeReservedFromMaxBits(upper_bits)
+        self.n_counters = 1 << self.counter_bits


The class docstring should probably note these three attributes, even though this is intended to be private.

What do you mean? I don't think of attribute docs as going in the class docstring at all; the class Parameters is for __init__ arguments, and these are not.

Sorry about that: I see them as init=False fields now, with docstrings, so that's fine.

python/lsst/meas/base/_id_generator.py

tests/test_id_generator.py

parejkoj · 2023-04-22T00:58:38Z

python/lsst/meas/base/_id_generator.py

+    def __str__(self) -> str:
+        """Return a human-readable representation of the data ID (or a note
+        about its absence) for use in log and error messages.
+        """


Is the dataId the only thing we want to include in these str, or should it say something about the number of bits, too?

I was envisioning str here as being really for logs and exceptions about the catalog, not logs and exceptions about the IdGenerator itself. There are a lot of tasks whose run methods take an idGenerator and that effectively abstracts over whether there is a data ID at all.

We discussed this point on one of the other PRs; using the id generator to access a stringified dataId seems like a bit of an abuse of the system. Maybe just note in the IdGenerator docs that we use it as such?

parejkoj · 2023-04-22T01:30:09Z

python/lsst/meas/base/_id_generator.py

+    @property
+    def catalog_id(self) -> int:
+        """The integer identifier for the full catalog with this data ID, not
+        just one of its rows (`int`).


As I dig into the uses of this, I am surprised that exposureIdInfo.expId is replaced with idGenerator.catalog_id. Thinking about it, the latter makes sense (you're using it for catalog-y things, not exposure-y things), but it might surprise others when they go to look at it ("why doesn't the new packer have an exposure_id I can get?"). Probably just worth being clear about this change when you announce the merging of this ticket.

TallJimbo

How is the test coverage of the new id classes? I'm having trouble figuring out how much of the new code would be executed by the tests you added, and whether the tests are actually independent of the code itself.

Coverage of the new file is 89%; what's missing is mostly just the deprecated things (I've gotten accustomed to using coverage to help me write tests, too).

You say "integers" throughout this: is that intentionally "number of bits" generic, or do you mean 32-bit integers?

I specifically mean 64-bit signed integers for the full row ID, since that's what afw.table uses (FITS says we can't have uint64, and various project-level docs says 64 bits).

TallJimbo · 2023-04-24T20:57:15Z

python/lsst/meas/base/_id_generator.py

+"""Default release ID to embed in catalog IDs.
+
+This can be changed globally to avoid having to override individual task
+configs to set the release ID.


It's a stretch to say I have a plan for that right now and I don't want to claim to have invented one, though that is more or less I how I envisioned it working (or at least that's better than trying to override this in config in all relevant tasks).

TallJimbo · 2023-04-24T21:03:48Z

python/lsst/meas/base/_id_generator.py

+class BaseIdGeneratorConfig(Config):
+    """Base class for configuration of `IdGenerator` instances.
+
+    This class is abstract (it cannot use `abc.ABCMeta` for technical reasons),


Metaclass conflict; pex.config also has a custom metaclass.

python/lsst/meas/base/_id_generator.py

TallJimbo · 2023-04-24T21:07:51Z

python/lsst/meas/base/_id_generator.py

+        raise NotImplementedError("Method is abstract.")
+
+
+class ExposureIdGeneratorConfig(BaseIdGeneratorConfig):


There's no all-encompassing name here, since InstrumentId wouldn't be specific enough to resolve "exposure" vs "visit". But I'm happy to renaming this to DetectorExposureIdGeneratorConfig (and also rename the visit one to DetectorVisitIdGeneratorConfig) if you think that's better.

TallJimbo · 2023-04-24T21:09:09Z

python/lsst/meas/base/_id_generator.py

+        return self.packer.apply(data_id)
+
+
+class IdGenerator:


It is intended to be a base class but it is not an abstract one; the base class is a concrete trivial implementation that's useful for unit tests or in other contexts where the task is being run with no data ID.

python/lsst/meas/base/_id_generator.py

TallJimbo · 2023-04-25T16:26:23Z

python/lsst/meas/base/_id_generator.py

+                f"{k}={v!r}" for k, v in kwargs.items()
+            ]
+            raise ValueError(
+                f"Integer range from numpy.arange({arg_terms}) has values that are not "


I tried to come with a way to do that that doesn't potentially print a huge array, but couldn't come up with one that didn't involve trying to reimplement np.arange arg-handling.

python/lsst/meas/base/_id_generator.py

TallJimbo · 2023-04-25T16:29:46Z

python/lsst/meas/base/_id_generator.py

+        self.n_data_ids = 1 << self.packer.maxBits
+        upper_bits = (self.n_releases - 1).bit_length() + self.packer.maxBits
+        self.counter_bits = IdFactory.computeReservedFromMaxBits(upper_bits)
+        self.n_counters = 1 << self.counter_bits


What do you mean? I don't think of attribute docs as going in the class docstring at all; the class Parameters is for __init__ arguments, and these are not.

tests/test_id_generator.py

parejkoj

Thanks for the added narrative doc; a few comments on formatting there.

parejkoj · 2023-04-26T16:16:08Z

doc/lsst.meas.base/generating_source_and_object_ids.rst

+
+- `SkyMapIdGeneratorConfig` for ``{tract, patch}`` or ``{tract, patch, band}}`` data IDs (most object tables);
+- `DetectorVisitIdGeneratorConfig` for ``{visit, detector}`` data IDs (most source tables);
+- `DetectorExposureIdGeneratorConfig` for ``{exposure, detector}`` data IDs (rare, at least in the default LSST data model).


I think these all need the full namespace in order to get built and linked.

The

.. py:currentmodule:: lsst.meas.base

at the top of the file takes care of this.

Oh, that's cool! I wasn't aware of that. I'll have to remember it for my own docs.

parejkoj · 2023-04-26T16:16:25Z

doc/lsst.meas.base/generating_source_and_object_ids.rst

+- `DetectorVisitIdGeneratorConfig` for ``{visit, detector}`` data IDs (most source tables);
+- `DetectorExposureIdGeneratorConfig` for ``{exposure, detector}`` data IDs (rare, at least in the default LSST data model).
+
+These have a `~BaseIdGeneratorConfig.make_field` method that can be used to define the config field with minimal boilerplate.


I think the full namespace is necessary here, too? And in the paragraph below.

(see above)

This wraps (and slightly reimplements some of) lsst.afw.table.IdFactory, and replaces lsst.obs.base.ExposureIdInfo.

These tasks do not really need need to generate IDs, as they take an input catalog of some sort and use the IDs from that (forcedPhotCcd does also make its own IDs, but I don't think they're useful and I don't think we have room for them in the final schema). But fixing that is something best taken into account when replacing the tasks rather than something we'd ever benefit from addressing in the current ones. I've replaced/renamed some arguments to generateMeasCat without deprecation as I'm pretty sure those methods are effectively private even though we didn't have the foresight to give them a leading underscore.

This better reflects most real-world usage.

parejkoj reviewed Apr 21, 2023

View reviewed changes

parejkoj reviewed Apr 22, 2023

View reviewed changes

TallJimbo commented Apr 25, 2023

View reviewed changes

parejkoj reviewed Apr 26, 2023

View reviewed changes

TallJimbo added 4 commits April 26, 2023 14:08

Add new helper class for generating Catalog IDs.

e041996

This wraps (and slightly reimplements some of) lsst.afw.table.IdFactory, and replaces lsst.obs.base.ExposureIdInfo.

Add overview doc page on generating catalog row IDs.

711e998

Switch IdGenerator example to PipelineTask/runQuantum.

bf7d0f9

This better reflects most real-world usage.

TallJimbo force-pushed the tickets/DM-31924 branch from a75724b to bf7d0f9 Compare April 26, 2023 18:08

TallJimbo merged commit dc3ce67 into main Apr 27, 2023
1 check passed

TallJimbo deleted the tickets/DM-31924 branch April 27, 2023 02:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-31924: new high-level interface for making IDs for catalogs and their rows #239

DM-31924: new high-level interface for making IDs for catalogs and their rows #239

TallJimbo commented Apr 18, 2023

parejkoj left a comment

parejkoj Apr 21, 2023

TallJimbo Apr 24, 2023

parejkoj Apr 26, 2023

parejkoj Apr 26, 2023

TallJimbo Apr 26, 2023

parejkoj Apr 21, 2023

TallJimbo Apr 24, 2023

parejkoj Apr 21, 2023

TallJimbo Apr 25, 2023

parejkoj Apr 26, 2023

parejkoj Apr 22, 2023

TallJimbo Apr 25, 2023

parejkoj Apr 26, 2023

parejkoj Apr 22, 2023

TallJimbo left a comment

TallJimbo Apr 24, 2023

TallJimbo Apr 24, 2023

TallJimbo Apr 24, 2023

TallJimbo Apr 24, 2023

TallJimbo Apr 25, 2023

TallJimbo Apr 25, 2023

parejkoj left a comment

parejkoj Apr 26, 2023

TallJimbo Apr 26, 2023

parejkoj Apr 26, 2023

parejkoj Apr 26, 2023

TallJimbo Apr 26, 2023

		raise NotImplementedError("Method is abstract.")


		class ExposureIdGeneratorConfig(BaseIdGeneratorConfig):

DM-31924: new high-level interface for making IDs for catalogs and their rows #239

DM-31924: new high-level interface for making IDs for catalogs and their rows #239

Conversation

TallJimbo commented Apr 18, 2023

parejkoj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parejkoj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment