-
Notifications
You must be signed in to change notification settings - Fork 23.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New flush strategies => Metadata on SQL #144747
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e249fec
to
74340d4
Compare
1b6025b
to
f7d8410
Compare
62178c1
to
b012848
Compare
5fdd271
to
9b94112
Compare
I'm sorry, @ryv-odoo: you are not allowed to override this status. |
@robodoo rebase-ff |
Merge method set to rebase and fast-forward. |
@robodoo override=ci/style |
@ryv-odoo linked pull request(s) odoo/documentation#7194, odoo/enterprise#53380, odoo/upgrade#5469 not ready. Linked PRs are not staged until all of them are ready. |
c6efbb5
to
a855f21
Compare
Because we need to move the group field check to _field_to_sql() in next the commit and because 'mail.thread' overrides _flush_search(), we need to move of _leaf_to_sql() to the BaseModel class, rename as _condition_to_sql() and modify its signature to have a better API. It becomes a nice hook to change the behavior of domain translation in the SQL where clause.
Rationale ========= The ORM holds the pending row updates as long as it can, because the second update on a row in a transaction is much slower than the first one. Then we need to flush pending updates that may affect the result of an SQL query. Initially, the consistency mechanism within the ORM was to call _flush_search() before creating the query itself. The purpose of this method was to flush all fields used in the search domain (including the ir.rule domain), the order, and extra fields (used for _read_group()). This current solution has some drawbacks: - Because expression (the class that translates a domain into a WHERE clause) evolved, there are already inconsistencies between fields used in the WHERE clause and the fields flushed by _flush_search(). - The method _search() returns a lazy query object, and we execute it only when needed or add it as a subquery. But _flush_search() is not lazy and will force necessary updates directly even if the query isn't executed at the end (or later, which can lead to some inconsistencies). - Tracking field used in SQL expression makes some API heavier than necessary. For example, for each _read_group*() hooks, we always need to return a list of fields along with the SQL expression. Also, if we want to extend _leaf_to_sql() for related fields, we will need to track fields used in the paths (i.e. add this information in the return value and then change each use of it). Change ====== The idea is to integrate the information about what field was used to create an SQL expression and use that information just before executing the query. We then add a _metadata attribute to the SQL object that holds the field used to create it. This metadata attribute is used only by _field_to_sql(), which we want to be the only method to generate SQL from a field name. These metadata is flushed just before the query is executed. Since we always need the current env to flush with the right context/user, we cannot flush automatically flush in cr.execute(). Instead, we add a method execute_query() on the environment itself. This method is currently almost only used by the ORM itself now and it only accepts (and asserts) SQL object as query. This method may become a mandatory way to execute SQL in order to enforce security (combined with the pylint check) and avoid SQL injection in the long run.
_flush() has been simplified because some parts became unnecessary with the new flush mechanism, which is much more accurate.
We don't need to manually returns fields used in a SQL expression with the way to flush. Metadata is glue to the SQL expression generate by _field_to_sql().
Because `_search` doesn't flush, the some usage needs to be changed to flush correctly. A lot of then are there for have specific behavior when searching on `analytic_distribution`. rewrite all of then with using search()/_read_group() and by overriding _leaf_to_sql()/ _read_group_groupby().
a855f21
to
7703da4
Compare
@robodoo r+ |
robodoo
pushed a commit
that referenced
this pull request
Jan 15, 2024
Because we need to move the group field check to _field_to_sql() in next the commit and because 'mail.thread' overrides _flush_search(), we need to move of _leaf_to_sql() to the BaseModel class, rename as _condition_to_sql() and modify its signature to have a better API. It becomes a nice hook to change the behavior of domain translation in the SQL where clause. Part-of: #144747
robodoo
pushed a commit
that referenced
this pull request
Jan 15, 2024
Rationale ========= The ORM holds the pending row updates as long as it can, because the second update on a row in a transaction is much slower than the first one. Then we need to flush pending updates that may affect the result of an SQL query. Initially, the consistency mechanism within the ORM was to call _flush_search() before creating the query itself. The purpose of this method was to flush all fields used in the search domain (including the ir.rule domain), the order, and extra fields (used for _read_group()). This current solution has some drawbacks: - Because expression (the class that translates a domain into a WHERE clause) evolved, there are already inconsistencies between fields used in the WHERE clause and the fields flushed by _flush_search(). - The method _search() returns a lazy query object, and we execute it only when needed or add it as a subquery. But _flush_search() is not lazy and will force necessary updates directly even if the query isn't executed at the end (or later, which can lead to some inconsistencies). - Tracking field used in SQL expression makes some API heavier than necessary. For example, for each _read_group*() hooks, we always need to return a list of fields along with the SQL expression. Also, if we want to extend _leaf_to_sql() for related fields, we will need to track fields used in the paths (i.e. add this information in the return value and then change each use of it). Change ====== The idea is to integrate the information about what field was used to create an SQL expression and use that information just before executing the query. We then add a _metadata attribute to the SQL object that holds the field used to create it. This metadata attribute is used only by _field_to_sql(), which we want to be the only method to generate SQL from a field name. These metadata is flushed just before the query is executed. Since we always need the current env to flush with the right context/user, we cannot flush automatically flush in cr.execute(). Instead, we add a method execute_query() on the environment itself. This method is currently almost only used by the ORM itself now and it only accepts (and asserts) SQL object as query. This method may become a mandatory way to execute SQL in order to enforce security (combined with the pylint check) and avoid SQL injection in the long run. Part-of: #144747
robodoo
pushed a commit
that referenced
this pull request
Jan 15, 2024
_flush() has been simplified because some parts became unnecessary with the new flush mechanism, which is much more accurate. Part-of: #144747
robodoo
pushed a commit
that referenced
this pull request
Jan 15, 2024
We don't need to manually returns fields used in a SQL expression with the way to flush. Metadata is glue to the SQL expression generate by _field_to_sql(). Part-of: #144747
robodoo
pushed a commit
that referenced
this pull request
Jan 15, 2024
Because `_search` doesn't flush, the some usage needs to be changed to flush correctly. A lot of then are there for have specific behavior when searching on `analytic_distribution`. rewrite all of then with using search()/_read_group() and by overriding _leaf_to_sql()/ _read_group_groupby(). closes #144747 Related: odoo/documentation#7194 Related: odoo/upgrade#5469 Related: odoo/enterprise#53380 Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>
robodoo
pushed a commit
to odoo/documentation
that referenced
this pull request
Jan 15, 2024
closes #7194 Related: odoo/upgrade#5469 Related: odoo/odoo#144747 Related: odoo/enterprise#53380 Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>
vin-odoo
pushed a commit
to odoo/documentation
that referenced
this pull request
Jan 24, 2024
closes #7194 Related: odoo/upgrade#5469 Related: odoo/odoo#144747 Related: odoo/enterprise#53380 Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>
vin-odoo
pushed a commit
to odoo/documentation
that referenced
this pull request
Jan 24, 2024
closes #7194 Related: odoo/upgrade#5469 Related: odoo/odoo#144747 Related: odoo/enterprise#53380 Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New flush strategy
Rationale
The ORM holds the pending row updates as long as it can, because the second update on a row in a transaction is much slower than the first one. Then we need to flush pending updates that may affect the result of an SQL query. Initially, the consistency mechanism within the ORM was to call
_flush_search()
before creating the query itself. The purpose of this method was to flush all fields used in the search domain (including their.rule
domain), the order, and extra fields (used for_read_group()
). This current solution has some drawbacks:expression
(the class that translates a domain into a WHERE clause) evolved, there are already inconsistencies between fields used in the WHERE clause and the fields flushed by_flush_search()
._search()
returns a lazyQuery
object, and we execute it only when needed or add it as a subquery. But_flush_search()
is not lazy and will flush updates directly even if the query isn't executed at the end (or later, which can lead to some inconsistencies)._read_group*()
hook, we always need to return a list of fields along with the SQL expression. Also, if we want to extend_leaf_to_sql()
for related fields, we will need to track the fields used in the path (i.e., add this information in the return value and then change each use of it). The problem with related fields is that the extra fields may belong to other models, which isn't covered by the existing API.Solution
The idea is to integrate the information about what field was used to create an SQL expression and use that information just before executing the query. We then add an attribute
_metadata
to theSQL
object that holds the field used to create it. This metadata attribute is only set by method_field_to_sql()
, which we want to be the only method to generate SQL from a field name.These metadata is flushed just before the query is executed. Since we always need the current environment to flush with the right context/user, we cannot flush automatically from
cr.execute()
. Instead, we add a methodget_select_result
(WIP: name needs to be changed) on theEnvironment
object. This method is currently almost only used by the ORM itself now, and it only accepts (and asserts) anSQL
object as query. This method may become a mandatory way to execute SQL in order to enforce security (combined with the pylint check) and avoid SQL injection in the long run.Security note
We move the group security check of fields from
_flush_search()
to_field_to_sql()
. Also removed the security check from_read_group()
(because it is now done in_field_to_sql()
); see https://github.com/odoo/odoo/pull/146438/files.Some results
The new way to flush is much more accurate, and generally flushes less fields.
In an analysis of 1590 search queries, before we flushed 4520 fields, and now only 4079 fields (-9.75 %).
Also here are some examples to show issues/inefficiency with the previous version:
Example 1: Unnecessary flush of 'active' field when auto_join=True
In the previous version, we flushed:
{'ir.property': ['company_id', 'res_id', 'fields_id'], 'res.company': ['sequence', 'name', 'active']}
Now:
{'ir.property': ['company_id', 'res_id', 'fields_id'], 'res.company': ['sequence', 'name']}
.We don't flush
active
ofres.company
field for no reason anymore.Example 2 : 'parent_path' missing flush when 'child_of' operator is used
In the previous version, we flushed:
{'stock.location': ['location_id'], 'stock.quant': ['in_date', 'location_id', 'product_id']}
Now:
{'stock.location': ['parent_path'], 'stock.quant': ['in_date', 'location_id', 'product_id']}
location_id
of 'stock.location' is not used in the query but theparent_path
is.child_of
wasn't not take in account correctly in_flush_search()
. (Actually it didn't matter too much forparent_path
, because it is updated directly in DB, right ? But if we want to modify it in a compute store, we now can.)Example 3: too many flush due to related store field
In the previous version, we flushed:
{'stock.location': ['scrap_location'], 'stock.move': ['picking_id', 'scrapped', 'location_dest_id', 'company_id']}
Now:
{'stock.move': ['picking_id', 'scrapped', 'company_id']}
We flushed all fields inside a related definition (
location_dest_id
+scrapped
), but actually it is already manage bymodified
(will flagscrapped
to recompute if necessary).Example 4: related flush isn't delegated to
_flush
anymore + missing flushPrevious flushed fields:
{'mrp.bom': ['sequence', 'product_id', 'active', 'product_tmpl_id'], 'product.product': ['priority', 'default_code', 'name', 'active']}
, but actually'product.template': ['priority', 'name']
was indirectly flushed byodoo/odoo/models.py
Line 6285 in 1df6f29
Now:
{'mrp.bom': ['sequence', 'product_id', 'active', 'product_tmpl_id'], 'product.product': ['product_tmpl_id', 'default_code'], 'product.template': ['priority', 'name']}
.product_tmpl_id
of 'product.product' wasn't flushed correctly in the previous version.Alternative solution
_leaf_to_sql()
, it makes the solution much easier. But because the flushing operations aren't batched, this may result in adding extra SQL queries: https://runbot.odoo.com/runbot/batch/1254238/build/55003227 (due to recompute of store field). Also the second point is not covered at all.Other related changes
_field_to_sql()
and becausemail.thread
overrides_flush_search()
, we need to move method_leaf_to_sql()
to classBaseModel
. It is also a nice hook to change the behavior of domain translation in the SQL where clause._search()
doesn't flush anymore, some usage needs to be changed to flush correctly. A lot of them are there for having a specific behavior when searching onanalytic_distribution
. We rewrote all of them by usingsearch()
and overriding_leaf_to_sql()
, like in the previous case._flush()
has been simplified because some parts became unnecessary with the new flushing mechanism, which is much more accurate.https://github.com/odoo/enterprise/pull/53380
https://github.com/odoo/upgrade/pull/5469
odoo/documentation#7194