Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] core: cursor hooks API and implementation #56748

Closed
wants to merge 1 commit into from

Conversation

rco-odoo
Copy link
Member

Python 3.8 changed the equality rules for bound methods to be based on
the identity of the receiver (__self__) rather than its equality.
This means that in 3.7, methods from different instances will compare
(and hash) equal, thereby landing in the same map "slot", but that isn't
the case in 3.8.

While it's usually not relevant, it's an issue for GroupCalls which is
indexed by a function: in 3.7, that being a method from recordsets
comparing equal will deduplicate them, but not anymore in 3.8, leading
to duplicated callbacks (exactly the thing GroupCalls aims to avoid).

Also, the API of GroupCalls turned out to be unusual and weird. The
bug above is fixed by using a plain list for callbacks, thereby avoiding
comparisons between registered functions. The API is now:

callbacks.add(func)     # add func to callbacks
callbacks.run()         # run all callbacks in addition order
callbacks.clear()       # remove all callbacks

In order to handle aggregated data, the callbacks object provides a
dictionary callbacks.data that any callback function can freely use.
For the sake of consistency, the callbacks.data dict is automatically
cleared upon execution of callbacks.

Discovered by @william-andre

Related to #56583

References:

@rco-odoo
Copy link
Member Author

@antonylesuisse @odony @tde-banana-odoo here is the refactoring of the cursor hooks.
I find the plumbing in mail.thread more tricky than before, and there is apparently a double subscription of the hook (see the build). I will try to figure out what's wrong.

@robodoo robodoo added ☐ legal/cla CI 🤖 Robodoo has seen passing statuses and removed ☑ legal/cla labels Aug 31, 2020
@odony
Copy link
Contributor

odony commented Aug 31, 2020

@antonylesuisse @odony @tde-banana-odoo here is the refactoring of the cursor hooks.
I find the plumbing in mail.thread more tricky than before

@rco-odoo More tricky? How so? It is more declarative and thus verbose, but also much more readable without needing to understand mysterious things like the types parameters or the return value of add.

This really seems like a verbose and error-prone API to use. Especially with the idea of checking if the data dict is falsy then adding a callback, as that seems easy to get wrong (e.g. if you just need a callback with no additional data).

@xmo-odoo How so? Where do you see chances for error? To me it's much less error-prone, on the contrary, because all the concepts are very simple. It's just a particular use case of storing state in the storage. How could you get it wrong, considering you won't even be trying that unless you're exactly looking for a way to store state in some transactional storage? And if you do, whatever you want to do is up to you, you don't have to reproduce this verbatim, it's just a transactional storage, like the HTTP session.

Keep in mind that the cases where we want accumulate parameters are exceptional. Most developers will just be looking for ways to hook one-shot callables, and the API must be ultra simple for those, with zero surprises.

Your proposition seems like it has most of the drawbacks we want to get rid of in the current one:

  • The API looks weird for simple use cases, because the signature of add() leaks all the machinery for "exceptional cases" and you'll need to understand them in the doc even though you most likely don't need it
  • What should "key" be when you only have a one-shot callable?
  • Why would an "add_callback()" method return something that you need to handle?
  • Why would you need to repeatedly call "add_callback()" to obtain that "shared storage"?

Having a free-form storage is less formal but it means:

  • Developers understand all the concepts of the API immediately, there's nothing weird to learn
  • The general case of one-shot callable has a super simple API: add(func)
  • No need to use the storage or even know it exists, unless you somehow need it
  • The lifecycle of the storage is still obvious because it's namespaced
  • This /can/ be used as a generic transactional storage for other reasons, because it's flexible

@rco-odoo
Copy link
Member Author

This really seems like a verbose and error-prone API to use. Especially with the idea of checking if the data dict is falsy then adding a callback, as that seems easy to get wrong (e.g. if you just need a callback with no additional data).

@xmo-odoo How so? Where do you see chances for error?

My first attempt at adapting the mail tracking code seemed pretty easy and straightforward. And it was incorrect: I introduced a subtle bug (see the 2nd commit).

Both you and @xmo-odoo have proposed some code to register the callback, and all those proposals missed a point. The original callback was for a particular model (bound method self.browse()._finalize_tracking), so there was one callback per model. Your proposals were doing one callback for all models, and this requires adaptation of the callback. I tried both options, and the "one callback per model" felt simpler to me.

Moreover, five lines of boilerplate on two data structures doesn't feel right to me, for something that conceptually is nothing more than a setdefault. Note that I may have been biased in my initial API by the fact that I regularly use defaultdict.

Keep in mind that the cases where we want accumulate parameters are exceptional. Most developers will just be looking for ways to hook one-shot callables, and the API must be ultra simple for those, with zero surprises.

I totally agree with that.

Your proposition seems like it has most of the drawbacks we want to get rid of in the current one:

  • The API looks weird for simple use cases, because the signature of add() leaks all the machinery for "exceptional cases" and you'll need to understand them in the doc even though you most likely don't need it
  • What should "key" be when you only have a one-shot callable?
  • Why would an "add_callback()" method return something that you need to handle?
  • Why would you need to repeatedly call "add_callback()" to obtain that "shared storage"?

Having a free-form storage is less formal but it means:

  • Developers understand all the concepts of the API immediately, there's nothing weird to learn
  • The general case of one-shot callable has a super simple API: add(func)
  • No need to use the storage or even know it exists, unless you somehow need it
  • The lifecycle of the storage is still obvious because it's namespaced
  • This /can/ be used as a generic transactional storage for other reasons, because it's flexible

I have the feeling that the callback implementation is not robust enough. Imagine that you register a callback for some field tracking. When the callback is called, a tracking message is created. Imagine this causes a field depending on the record's messages to be recomputed and stored. Now what if the computed field is tracked as well? You need to register the callback once more, or update the aggregation for another callback. Although this looks a tricky use-case, it is possible and I don't see how to forbid it a priori. So we should handle it correctly.

@xmo-odoo
Copy link
Collaborator

xmo-odoo commented Aug 31, 2020

@xmo-odoo How so? Where do you see chances for error? To me it's much less error-prone, on the contrary, because all the concepts are very simple.

That the concepts are simple doesn't mean using or combining them also is. Both C and Go are example of this, both bill themselves as "simple" and both have footguns all around.

Here consider this original code

    mail_tracking = self.env.cr.precommit.data.setdefault('mail_tracking', {})
    initial_values = mail_tracking.setdefault(self._name, {})
    if not initial_values:
        self.env.cr.precommit.add(self._finalize_tracking)

This is the pattern which would get copy/pasted everywhere, and if initial_values doesn't get filled the callback will be added repeatedly, despite that supposedly being the deduplication pattern.

It's just a particular use case of storing state in the storage. How could you get it wrong, considering you won't even be trying that unless you're exactly looking for a way to store state in some transactional storage? And if you do, whatever you want to do is up to you, you don't have to reproduce this verbatim, it's just a transactional storage, like the HTTP session.

But it's not transactional, because the callback and the associated data are not bound together, yet the original code makes one depend on the other.

Keep in mind that the cases where we want accumulate parameters are exceptional. Most developers will just be looking for ways to hook one-shot callables, and the API must be ultra simple for those, with zero surprises.

My question (/ assumption) is not about accumulating parameter but rather about deduplicating callbacks, and to me hook.add(key, callback) looks way simpler than having to mess with data to know if you should register your callback especially if you don't even need associated data.

Now if deduplicating callbacks easily, safely and by default is not the point, then carry on, but I still believe this should be supported by a reliable method properly encoding the pattern.

  • Why would an "add_callback()" method return something that you need to handle?

It doesn't? In my version add_callback is the renaming of the existing add aka it just enqueues a function.

  • Why would you need to repeatedly call "add_callback()" to obtain that "shared storage"?

You don't? add_callback is not how you get the shared storage, and data would still be available.

Having a free-form storage is less formal but it means:

  • Developers understand all the concepts of the API immediately, there's nothing weird to learn

And then they get the usage wrong when they need to do anything beyond trivial.

  • The general case of one-shot callable has a super simple API: add(func)

And the case of a deduplicated one-shot callable is a complete mess.

  • No need to use the storage or even know it exists, unless you somehow need it

You don't need to use the storage or even know it exists either in the version I propose. The one thing you do need to know is the deduplication key, which if the assumption is that we don't care for deduplication then fair enough?

  • The lifecycle of the storage is still obvious because it's namespaced

Except the storage is not namespaced: all callbacks share the same global dict. Instead the developer may namespace said storage by hand.

@odony
Copy link
Contributor

odony commented Aug 31, 2020

My first attempt at adapting the mail tracking code seemed pretty easy and straightforward. And it was incorrect: I introduced a subtle bug (see the 2nd commit).

Was it the problem with an empty dict being falsy? That seems like a pretty common mistake developers make everywhere, no? Conceptually this can't be much simpler, can it? "Is my state storage already there, or do I initialize it?"

Both you and @xmo-odoo have proposed some code to register the callback, and all those proposals missed a point. The original callback was for a particular model (bound method self.browse()._finalize_tracking), so there was one callback per model. Your proposals were doing one callback for all models, and this requires adaptation of the callback. I tried both options, and the "one callback per model" felt simpler to me.

I actually preferred the simplicity of a single global callback, because it means the callback can do smarter things, like choosing the order in which models are processed, or some batch optimizations. Even if it costs another level of indirection in the data structure and code to access it. But if this version works for you, fine.

Moreover, five lines of boilerplate on two data structures doesn't feel right to me, for something that conceptually is nothing more than a setdefault. Note that I may have been biased in my initial API by the fact that I regularly use defaultdict.

That's a bias I don't share. I prefer 5 lines of very straightforward boilerplate code (that we only need once for this exceptional case), rather than 2 lines of obscure code that require much prior knowledge in order to understand / maintain them.
Developers who are very familiar with a language tend to overuse advanced constructs in order to write clever one-liners that actually hurt readability for everyone else. This is similar, in essence.
For instance you could define Callbacks.data = defaultdict(dict) and save a bit of boilerplate - but that would hurt the readability of the API and raise the level of surprise, wouldn't it?

I have the feeling that the callback implementation is not robust enough. Imagine that you register a callback for some field tracking. When the callback is called, a tracking message is created. Imagine this causes a field depending on the record's messages to be recomputed and stored. Now what if the computed field is tracked as well? You need to register the callback once more, or update the aggregation for another callback. Although this looks a tricky use-case, it is possible and I don't see how to forbid it a priori. So we should handle it correctly.

Basically, running a commit hook could recursively trigger the registration of another hook - with no obvious notion of when or whether that new hook would run, or whether the recursion would end. I don't see how you can prevent that in a generic manner - it depends on the business rules. None of the API variations we discussed would fix that, right?

I think making callbacks magically workaround all possible corner cases is out of scope. We want to have something very simple to understand and reason with, and leave the exceptions and corner cases situations for the developers to handle. If the concepts are simple, they will have no trouble solving their issues with the tools provided by the API. For mail.thread tracking we don't have that problem, and this is already an advanced case, right?

@odony
Copy link
Contributor

odony commented Aug 31, 2020

This is the pattern which would get copy/pasted everywhere, and if initial_values doesn't get filled the callback will be added repeatedly, despite that supposedly being the deduplication pattern.

Sorry but no, that's a common mistake developers can make anywhere when lazily initializing a dict, and nothing specific to this particular API.

Now if deduplicating callbacks easily, safely and by default is not the point, then carry on, but I still believe this should be supported by a reliable method properly encoding the pattern.

Exactly! Deduplicating callbacks is not even a thing we want to address. The API is about registering callbacks, that's it. No notion of "deduplication" should ever appear in the API because that's not a problem we expect developers to face most of the time.

  • Why would an "add_callback()" method return something that you need to handle?

It doesn't? In my version add_callback is the renaming of the existing add aka it just enqueues a function.

Wut? Wasn't your example all about changing the signature to hook.add(key, hookfunc) -> data, with your code sample chaining add().setdefault()??

@xmo-odoo
Copy link
Collaborator

This is the pattern which would get copy/pasted everywhere, and if initial_values doesn't get filled the callback will be added repeatedly, despite that supposedly being the deduplication pattern.

Sorry but no, that's a common mistake developers can make anywhere when lazily initializing a dict, and nothing specific to this particular API.

There is a big difference here, which is that deduplicating callbacks relies on the data dict being used properly.

Now if deduplicating callbacks easily, safely and by default is not the point, then carry on, but I still believe this should be supported by a reliable method properly encoding the pattern.

Exactly! Deduplicating callbacks is not even a thing we want to address. The API is about registering callbacks, that's it. No notion of "deduplication" should ever appear in the API because that's not a problem we expect developers to face most of the time.

But deduplicating callbacks was the entire point of the original change. If we don't want to support that we can just rollback the entire thing, there's nothing to add or change.

  • Why would an "add_callback()" method return something that you need to handle?

It doesn't? In my version add_callback is the renaming of the existing add aka it just enqueues a function.

Wut? Wasn't your example all about changing the signature to hook.add(key, hookfunc) -> data, with your code sample chaining add().setdefault()??

That was the "full-blown" version for replacing the deduplicated-call-with-data of the PR. If you just want to add a callback you can just hook.add(key, hookfunc), you never have to care about what it returns.

@robodoo robodoo added ☐ ci/runbot and removed CI 🤖 Robodoo has seen passing statuses ☑ ci/runbot labels Aug 31, 2020
@robodoo robodoo added ☐ ci/runbot CI 🤖 Robodoo has seen passing statuses and removed CI 🤖 Robodoo has seen passing statuses ☑ ci/runbot labels Sep 1, 2020
@rco-odoo
Copy link
Member Author

rco-odoo commented Sep 2, 2020

By the way, what about deprecating cr.after in the upcoming Odoo 14?

Why even keep it since the current change breaks half of what it does without any warning?

The broken part (post-rollback) is not used at all in our codebase, while the other part (post-commit) is.

@xmo-odoo
Copy link
Collaborator

xmo-odoo commented Sep 2, 2020

The broken part (post-rollback) is not used at all in our codebase, while the other part (post-commit) is.

Sure but we don't know what's used in other codebases, so if we don't care we can just fix the places where we use post-commit and remove the entire thing, doesn't make much difference.

@rco-odoo
Copy link
Member Author

rco-odoo commented Sep 2, 2020

@robodoo r+

@robodoo robodoo added the r+ 👌 label Sep 2, 2020
robodoo pushed a commit that referenced this pull request Sep 2, 2020
Python 3.8 changed the equality rules for bound methods to be based on
the *identity* of the receiver (`__self__`) rather than its *equality*.
This means that in 3.7, methods from different instances will compare
(and hash) equal, thereby landing in the same map "slot", but that isn't
the case in 3.8.

While it's usually not relevant, it's an issue for `GroupCalls` which is
indexed by a function: in 3.7, that being a method from recordsets
comparing equal will deduplicate them, but not anymore in 3.8, leading
to duplicated callbacks (exactly the thing GroupCalls aims to avoid).

Also, the API of `GroupCalls` turned out to be unusual and weird.  The
bug above is fixed by using a plain list for callbacks, thereby avoiding
comparisons between registered functions.  The API is now:

    callbacks.add(func)     # add func to callbacks
    callbacks.run()         # run all callbacks in addition order
    callbacks.clear()       # remove all callbacks

In order to handle aggregated data, the `callbacks` object provides a
dictionary `callbacks.data` that any callback function can freely use.
For the sake of consistency, the `callbacks.data` dict is automatically
cleared upon execution of callbacks.

Discovered by @william-andre

Related to #56583

References:

* https://bugs.python.org/issue1617161
* python/cpython#7848
* https://docs.python.org/3/whatsnew/changelog.html#python-3-8-0-alpha-1
  (no direct link because individual entries are not linkable, look for
  bpo-1617161)

closes #56748

Related: odoo/enterprise#12763
Signed-off-by: Raphael Collet (rco) <rco@openerp.com>
@robodoo robodoo closed this Sep 2, 2020
@robodoo robodoo temporarily deployed to merge September 2, 2020 13:52 Inactive
@william-andre
Copy link
Contributor

No more popcorn for me 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI 🤖 Robodoo has seen passing statuses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants