Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-30266 Modify serialization of some objects #573

Merged
merged 1 commit into from Dec 12, 2021
Merged

Conversation

natelust
Copy link
Contributor

Add/Modify serialization of some objects to support a new serialized
format of QuantumGraphs. In particular this introduces a new method
on some objects to support direct construction if the inputs are
already trusted, skipping validation steps.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some pretty big concerns here, and I'm not actually sure it's worth trying to resolve them on this branch - originally this ticket was just going to be about making quantum node IDs into UUIDs to make other Quantum serialization changes easier, but:

  • this now overlaps enough with DM-30332 that I'd really like to at least try to reconcile the branches before merging, to see if we've solved some problems in fundamentally different ways;
  • the hash()-keyed DimensionRecord normalization here is broken, but that is the problem that DM-30332 aims to solve much more rigorously;
  • if we can possibly avoid setting the precedent of adding these direct methods to all of our serialization structures, I think I'll pay off in the long run.

I'd also like @timj to sign off on the serialization changes first, if we do plan to merge this soon; this system is more in his domain than mine (note that he may also object to some of the things I'm doing on DM-30332).

So, where does this leave us? I was hoping not to get back to DM-30332 until after some big query-system changes, because QG serialiization and state stuff in general is a lower priority from a high-level perspective than QG generation stuff right now. But this ticket is mostly done, and there's a lot of content on the DM-30332 branches that I don't want to get too stale, either. And if I can get DM-30332 (or at least the parts that are basically done) out for review soon, we could rebase this branch on that and delegate to it for the DimensionRecord normalization logic instead of fixing it independently.

There's also the big catch that I didn't try to avoid pydantic validation on DM-30332, and the fact that @natelust has done so here makes me worry that DM-30332 could push us off a performance cliff. Maybe I can learn enough from his work here to fix my branch in that respect, but I'm also worried that just turning off validation sort of defeats the purpose of using pydantic, and it might be a sign of bigger problems to come. I'm been wondering for a while now whether QG is going to be the thing that finally pushes us into adding some compiled-language code to the middleware; doing graph algorithms with hundreds of thousands of nodes in Python just strikes me as bonkers, period.

Thoughts on how to proceed welcome.

python/lsst/daf/butler/core/datasets/ref.py Outdated Show resolved Hide resolved
This method should only be called when the inputs are trusted.
"""
node = SerializedDatasetRef.__new__(cls)
setter = object.__setattr__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This degree of poking at class internals worries me a lot; we seem to be assuming a lot about pydantic implementation details. If we can't use construct instead of adding these methods, could we at least use it inside the direct implementations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natelust can you respond to @TallJimbo 's question here please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He and I discussed this out of band. I'm not thrilled, but I don't see a great alternative and I'm mostly disappointed in pydantic. Hopefully we can figure out a way to clean it up later.

python/lsst/daf/butler/core/dimensions/_records.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/quantum.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/dimensions/_records.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/dimensions/_graph.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/dimensions/_coordinate.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/core/quantum.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 15, 2021

Codecov Report

Merging #573 (0ec53dc) into main (1184064) will decrease coverage by 0.59%.
The diff coverage is 20.00%.

❗ Current head 0ec53dc differs from pull request most recent head 08a9926. Consider uploading reports for the commit 08a9926 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #573      +/-   ##
==========================================
- Coverage   83.85%   83.26%   -0.60%     
==========================================
  Files         234      242       +8     
  Lines       29781    30896    +1115     
  Branches     4929     4636     -293     
==========================================
+ Hits        24974    25725     +751     
- Misses       3674     3968     +294     
- Partials     1133     1203      +70     
Impacted Files Coverage Δ
python/lsst/daf/butler/core/dimensions/_records.py 77.18% <15.38%> (-5.91%) ⬇️
python/lsst/daf/butler/core/quantum.py 32.82% <16.41%> (-34.87%) ⬇️
python/lsst/daf/butler/core/datasets/type.py 77.83% <18.18%> (-3.42%) ⬇️
python/lsst/daf/butler/core/datasets/ref.py 75.00% <18.75%> (-5.25%) ⬇️
...hon/lsst/daf/butler/core/dimensions/_coordinate.py 81.98% <25.00%> (-1.46%) ⬇️
python/lsst/daf/butler/core/dimensions/_graph.py 79.87% <33.33%> (-1.77%) ⬇️
python/lsst/daf/butler/core/ddl.py 81.15% <58.33%> (-1.63%) ⬇️
...ython/lsst/daf/butler/registry/dimensions/table.py 84.21% <0.00%> (-0.21%) ⬇️
python/lsst/daf/butler/_butler.py 78.76% <0.00%> (-0.18%) ⬇️
python/lsst/daf/butler/registries/sql.py 81.54% <0.00%> (-0.04%) ⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1184064...08a9926. Read the comment docs.

@timj
Copy link
Member

timj commented Nov 15, 2021

There don't seem to be any tests that use the serialization direct methods. This seems to be something that daf_butler should be testing rather than expecting pipe_base to do it.

@natelust
Copy link
Contributor Author

natelust commented Dec 9, 2021

@timj or @timj can one of you look this over once more, and or mark it reviewed? One or two small things (lint problem in pipe_base) and I hope tomorrow is the day.

Add/Modify serialization of some objects to support a new serialized
format of QuantumGraphs. In particular this introduces a new method
on some objects to support direct construction if the inputs are
already trusted, skipping validation steps.
@natelust natelust merged commit ad64451 into main Dec 12, 2021
@natelust natelust deleted the tickets/DM-30266 branch December 12, 2021 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants