DM-27390: replace DimensionGraph.{encode,decode} #415

TallJimbo · 2020-10-31T20:13:08Z

No description provided.

andy-slac

Looks OK, one minor comment.

andy-slac · 2020-11-02T16:37:48Z

python/lsst/daf/butler/core/dimensions/_graph.py

+        hasher = hashlib.blake2b(digest_size=self.DIGEST_SIZE//2)
+        for name in self.required.names:
+            hasher.update(name.encode("ascii"))
+        return hasher.hexdigest()


I do not trust hashes in general due to potential collisions. In this case collisions are possible for cases like ["a", "b"] vs just ["ab"]. It may help adding a non-word separator character (e.g. TAB) to avoid this issue.

I'll at least add that separator; good idea.

I also considered just using an autoincrement integer for the ID (so there would be no DimensionGraph method to get it). The problem with that is that we'd have to load essentially the entire table from the database in order to see if a particular graph already has an ID, though that's more about code complexity than performance because the table should always be small. I also liked the idea of making the ID deterministic across different data repositories, but I don't have a concrete reason why that needs to be the case. Do you think an autoincrement ID would be better here?

Deterministic ID would be nice to have but I guess it's hard to construct or make it stable. If we do not need this ID outside database relation context then autoincrement is probably a better match for that. That table is small so it just a single query/round-trip to database, so I would not worry about performance, and code complexity should not be terrible.

I've added another commit that uses autoincrement integer IDs instead; please take a look. Once the review is finished I'll squash commits and probably just remove DimensionGraph.digest.

andy-slac · 2020-11-02T16:47:13Z

python/lsst/daf/butler/registry/dimensions/static.py

+            ddl.FieldSpec(
+                name="dimension_name",
+                dtype=sqlalchemy.String,
+                length=64,


We should have probably defined a bunch of constants for these 64 magic numbers too.

Are we sure they don't get converted to TEXT...?

At this point I'd be more inclined to just use sqlalchemy.Text and leave the length blank. So many of them are arbitrary, or at best "guess and check".

(messages crossed) I think this does get translated to TEXT, but that's controlled by different code, so the magic number here is still a bit problematic.

My main comment is to note that if you needed it to be exactly 64 then it probably isn't happening. The threshold is >32 for auto conversion to TEXT. You can use TEXT explicitly in the type if you want it to be TEXT. You can't though require a long varchar to actually be a varchar.

andy-slac

Looks good, couple of minor comments.

andy-slac · 2020-11-02T22:00:48Z

python/lsst/daf/butler/registry/dimensions/static.py

@@ -50,6 +53,33 @@
 _VERSION = VersionTuple(5, 0, 0)


+def _makeDimensionGraphTableSpecs() -> ddl.TableSpec:


Is this still used?

I thought I'd deleted that. I certainly will now.

andy-slac · 2020-11-02T22:08:18Z

python/lsst/daf/butler/registry/dimensions/static.py

+        -------
+        graph : `DimensionGraph`
+            Retrieved graph.
+        """


Maybe add Raises section to docstring?

The dataset record storage classes were already being passed a DimensionRecordStorageManager at construction, so passing the DimensionUniverse to their methods (mostly added very recently on DM-27251) was just unnecessary. We'll soon want to pass DimensionRecordStorageManager to CollectionManager at construction for other reasons, so I've done that here to use the same pattern of passing dimensions once early across the board.

DM-27390: replace DimensionGraph.{encode,decode}

TallJimbo force-pushed the tickets/DM-27033 branch from f976944 to a8ab6b1 Compare October 31, 2020 20:20

TallJimbo force-pushed the tickets/DM-27390 branch 2 times, most recently from 6b292c6 to 95bba04 Compare November 2, 2020 14:19

andy-slac reviewed Nov 2, 2020

View reviewed changes

andy-slac approved these changes Nov 2, 2020

View reviewed changes

TallJimbo force-pushed the tickets/DM-27390 branch from 455d209 to 1657740 Compare November 3, 2020 01:20

TallJimbo added 5 commits November 2, 2020 21:20

Add support for saving DimensionGraph definitions to database.

d399055

Pass DimensionRecordStorage manager to DatasetRecordStorageManager.

34a09a1

Switch to using save/loadDimensionGraph instead of encode/decode.

b9bad56

Remove DimensionGraph.{encode, decode}.

074aa29

TallJimbo force-pushed the tickets/DM-27390 branch from 1657740 to b9daf2d Compare November 3, 2020 02:20

TallJimbo merged commit bf031c4 into tickets/DM-27033 Nov 3, 2020

TallJimbo deleted the tickets/DM-27390 branch November 3, 2020 04:53

TallJimbo added a commit that referenced this pull request Nov 4, 2020

Merge pull request #415 from lsst/tickets/DM-27390

3144165

DM-27390: replace DimensionGraph.{encode,decode}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-27390: replace DimensionGraph.{encode,decode} #415

DM-27390: replace DimensionGraph.{encode,decode} #415

TallJimbo commented Oct 31, 2020

andy-slac left a comment

andy-slac Nov 2, 2020

TallJimbo Nov 2, 2020

andy-slac Nov 2, 2020

TallJimbo Nov 2, 2020

andy-slac Nov 2, 2020

timj Nov 2, 2020

TallJimbo Nov 2, 2020

TallJimbo Nov 2, 2020

timj Nov 2, 2020

andy-slac left a comment

andy-slac Nov 2, 2020

TallJimbo Nov 2, 2020

andy-slac Nov 2, 2020

		@@ -50,6 +53,33 @@
		_VERSION = VersionTuple(5, 0, 0)


		def _makeDimensionGraphTableSpecs() -> ddl.TableSpec:

DM-27390: replace DimensionGraph.{encode,decode} #415

DM-27390: replace DimensionGraph.{encode,decode} #415

Conversation

TallJimbo commented Oct 31, 2020

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment