DM-30266 Modify serialization of some objects #573

natelust · 2021-09-14T20:39:46Z

Add/Modify serialization of some objects to support a new serialized
format of QuantumGraphs. In particular this introduces a new method
on some objects to support direct construction if the inputs are
already trusted, skipping validation steps.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

TallJimbo

I've got some pretty big concerns here, and I'm not actually sure it's worth trying to resolve them on this branch - originally this ticket was just going to be about making quantum node IDs into UUIDs to make other Quantum serialization changes easier, but:

this now overlaps enough with DM-30332 that I'd really like to at least try to reconcile the branches before merging, to see if we've solved some problems in fundamentally different ways;
the hash()-keyed DimensionRecord normalization here is broken, but that is the problem that DM-30332 aims to solve much more rigorously;
if we can possibly avoid setting the precedent of adding these direct methods to all of our serialization structures, I think I'll pay off in the long run.

I'd also like @timj to sign off on the serialization changes first, if we do plan to merge this soon; this system is more in his domain than mine (note that he may also object to some of the things I'm doing on DM-30332).

So, where does this leave us? I was hoping not to get back to DM-30332 until after some big query-system changes, because QG serialiization and state stuff in general is a lower priority from a high-level perspective than QG generation stuff right now. But this ticket is mostly done, and there's a lot of content on the DM-30332 branches that I don't want to get too stale, either. And if I can get DM-30332 (or at least the parts that are basically done) out for review soon, we could rebase this branch on that and delegate to it for the DimensionRecord normalization logic instead of fixing it independently.

There's also the big catch that I didn't try to avoid pydantic validation on DM-30332, and the fact that @natelust has done so here makes me worry that DM-30332 could push us off a performance cliff. Maybe I can learn enough from his work here to fix my branch in that respect, but I'm also worried that just turning off validation sort of defeats the purpose of using pydantic, and it might be a sign of bigger problems to come. I'm been wondering for a while now whether QG is going to be the thing that finally pushes us into adding some compiled-language code to the middleware; doing graph algorithms with hundreds of thousands of nodes in Python just strikes me as bonkers, period.

Thoughts on how to proceed welcome.

python/lsst/daf/butler/core/datasets/ref.py

TallJimbo · 2021-09-16T19:38:18Z

python/lsst/daf/butler/core/datasets/ref.py

+        This method should only be called when the inputs are trusted.
+        """
+        node = SerializedDatasetRef.__new__(cls)
+        setter = object.__setattr__


This degree of poking at class internals worries me a lot; we seem to be assuming a lot about pydantic implementation details. If we can't use construct instead of adding these methods, could we at least use it inside the direct implementations?

@natelust can you respond to @TallJimbo 's question here please?

He and I discussed this out of band. I'm not thrilled, but I don't see a great alternative and I'm mostly disappointed in pydantic. Hopefully we can figure out a way to clean it up later.

python/lsst/daf/butler/core/dimensions/_records.py

python/lsst/daf/butler/core/quantum.py

python/lsst/daf/butler/core/dimensions/_records.py

python/lsst/daf/butler/core/dimensions/_graph.py

python/lsst/daf/butler/core/dimensions/_coordinate.py

python/lsst/daf/butler/core/quantum.py

python/lsst/daf/butler/core/ddl.py

python/lsst/daf/butler/core/dimensions/_records.py

codecov · 2021-11-15T17:54:48Z

Codecov Report

Merging #573 (0ec53dc) into main (1184064) will decrease coverage by 0.59%.
The diff coverage is 20.00%.

❗ Current head 0ec53dc differs from pull request most recent head 08a9926. Consider uploading reports for the commit 08a9926 to get more accurate results

@@            Coverage Diff             @@
##             main     #573      +/-   ##
==========================================
- Coverage   83.85%   83.26%   -0.60%     
==========================================
  Files         234      242       +8     
  Lines       29781    30896    +1115     
  Branches     4929     4636     -293     
==========================================
+ Hits        24974    25725     +751     
- Misses       3674     3968     +294     
- Partials     1133     1203      +70

Impacted Files	Coverage Δ
python/lsst/daf/butler/core/dimensions/_records.py	`77.18% <15.38%> (-5.91%)`	⬇️
python/lsst/daf/butler/core/quantum.py	`32.82% <16.41%> (-34.87%)`	⬇️
python/lsst/daf/butler/core/datasets/type.py	`77.83% <18.18%> (-3.42%)`	⬇️
python/lsst/daf/butler/core/datasets/ref.py	`75.00% <18.75%> (-5.25%)`	⬇️
...hon/lsst/daf/butler/core/dimensions/_coordinate.py	`81.98% <25.00%> (-1.46%)`	⬇️
python/lsst/daf/butler/core/dimensions/_graph.py	`79.87% <33.33%> (-1.77%)`	⬇️
python/lsst/daf/butler/core/ddl.py	`81.15% <58.33%> (-1.63%)`	⬇️
...ython/lsst/daf/butler/registry/dimensions/table.py	`84.21% <0.00%> (-0.21%)`	⬇️
python/lsst/daf/butler/_butler.py	`78.76% <0.00%> (-0.18%)`	⬇️
python/lsst/daf/butler/registries/sql.py	`81.54% <0.00%> (-0.04%)`	⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1184064...08a9926. Read the comment docs.

timj · 2021-11-15T18:20:18Z

There don't seem to be any tests that use the serialization direct methods. This seems to be something that daf_butler should be testing rather than expecting pipe_base to do it.

natelust · 2021-12-09T01:49:34Z

@timj or @timj can one of you look this over once more, and or mark it reviewed? One or two small things (lint problem in pipe_base) and I hope tomorrow is the day.

Add/Modify serialization of some objects to support a new serialized format of QuantumGraphs. In particular this introduces a new method on some objects to support direct construction if the inputs are already trusted, skipping validation steps.

TallJimbo requested changes Sep 20, 2021

View reviewed changes

TallJimbo reviewed Sep 21, 2021

View reviewed changes

python/lsst/daf/butler/core/ddl.py Outdated Show resolved Hide resolved

TallJimbo reviewed Sep 21, 2021

View reviewed changes

python/lsst/daf/butler/core/dimensions/_records.py Outdated Show resolved Hide resolved

TallJimbo reviewed Sep 21, 2021

View reviewed changes

python/lsst/daf/butler/core/dimensions/_records.py Show resolved Hide resolved

natelust force-pushed the tickets/DM-30266 branch from fbf806d to 953ce5e Compare November 15, 2021 17:41

natelust force-pushed the tickets/DM-30266 branch from 953ce5e to c7cc0d0 Compare November 15, 2021 21:30

natelust force-pushed the tickets/DM-30266 branch 7 times, most recently from c604def to 85eaba5 Compare December 8, 2021 19:03

Modify serialization of some objects

08a9926

Add/Modify serialization of some objects to support a new serialized format of QuantumGraphs. In particular this introduces a new method on some objects to support direct construction if the inputs are already trusted, skipping validation steps.

natelust force-pushed the tickets/DM-30266 branch from 85eaba5 to 08a9926 Compare December 12, 2021 20:02

natelust merged commit ad64451 into main Dec 12, 2021

natelust deleted the tickets/DM-30266 branch December 12, 2021 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-30266 Modify serialization of some objects #573

DM-30266 Modify serialization of some objects #573

natelust commented Sep 14, 2021

TallJimbo left a comment

TallJimbo Sep 16, 2021

timj Dec 9, 2021

TallJimbo Dec 9, 2021

codecov bot commented Nov 15, 2021 •

edited

timj commented Nov 15, 2021

natelust commented Dec 9, 2021

DM-30266 Modify serialization of some objects #573

DM-30266 Modify serialization of some objects #573

Conversation

natelust commented Sep 14, 2021

Checklist

TallJimbo left a comment

Choose a reason for hiding this comment

TallJimbo Sep 16, 2021

Choose a reason for hiding this comment

timj Dec 9, 2021

Choose a reason for hiding this comment

TallJimbo Dec 9, 2021

Choose a reason for hiding this comment

codecov bot commented Nov 15, 2021 • edited

Codecov Report

timj commented Nov 15, 2021

natelust commented Dec 9, 2021

codecov bot commented Nov 15, 2021 •

edited