New flush strategies => Metadata on SQL #144747

ryv-odoo · 2023-12-04T11:12:56Z

New flush strategy

Rationale

The ORM holds the pending row updates as long as it can, because the second update on a row in a transaction is much slower than the first one. Then we need to flush pending updates that may affect the result of an SQL query. Initially, the consistency mechanism within the ORM was to call _flush_search() before creating the query itself. The purpose of this method was to flush all fields used in the search domain (including the ir.rule domain), the order, and extra fields (used for _read_group()). This current solution has some drawbacks:

Because expression (the class that translates a domain into a WHERE clause) evolved, there are already inconsistencies between fields used in the WHERE clause and the fields flushed by _flush_search().
The method _search() returns a lazy Query object, and we execute it only when needed or add it as a subquery. But _flush_search() is not lazy and will flush updates directly even if the query isn't executed at the end (or later, which can lead to some inconsistencies).
Tracking fields used in SQL expression makes some API heavier than necessary. For example, for each _read_group*() hook, we always need to return a list of fields along with the SQL expression. Also, if we want to extend _leaf_to_sql() for related fields, we will need to track the fields used in the path (i.e., add this information in the return value and then change each use of it). The problem with related fields is that the extra fields may belong to other models, which isn't covered by the existing API.

Solution

The idea is to integrate the information about what field was used to create an SQL expression and use that information just before executing the query. We then add an attribute _metadata to the SQL object that holds the field used to create it. This metadata attribute is only set by method _field_to_sql(), which we want to be the only method to generate SQL from a field name.

These metadata is flushed just before the query is executed. Since we always need the current environment to flush with the right context/user, we cannot flush automatically from cr.execute(). Instead, we add a method get_select_result (WIP: name needs to be changed) on the Environment object. This method is currently almost only used by the ORM itself now, and it only accepts (and asserts) an SQL object as query. This method may become a mandatory way to execute SQL in order to enforce security (combined with the pylint check) and avoid SQL injection in the long run.

Security note

We move the group security check of fields from _flush_search() to _field_to_sql(). Also removed the security check from _read_group() (because it is now done in _field_to_sql()); see https://github.com/odoo/odoo/pull/146438/files.

Some results

The new way to flush is much more accurate, and generally flushes less fields.
In an analysis of 1590 search queries, before we flushed 4520 fields, and now only 4079 fields (-9.75 %).
Also here are some examples to show issues/inefficiency with the previous version:

Example 1: Unnecessary flush of 'active' field when auto_join=True

SELECT "ir_property"."id" 
FROM "ir_property" LEFT JOIN "res_company" AS "ir_property__company_id" ON ("ir_property"."company_id" = "ir_property__company_id"."id") 
WHERE (("ir_property"."res_id" IS NULL AND ("ir_property"."fields_id" = %s)) AND (("ir_property"."company_id" IN %s) OR "ir_property"."company_id" IS NULL)) 
ORDER BY "ir_property__company_id"."sequence"  , "ir_property__company_id"."name"   
LIMIT %s

In the previous version, we flushed:
{'ir.property': ['company_id', 'res_id', 'fields_id'], 'res.company': ['sequence', 'name', 'active']}
Now:
{'ir.property': ['company_id', 'res_id', 'fields_id'], 'res.company': ['sequence', 'name']}.

We don't flush active of res.company field for no reason anymore.

Example 2 : 'parent_path' missing flush when 'child_of' operator is used

SELECT "stock_quant"."id" 
FROM "stock_quant" 
WHERE (("stock_quant"."location_id" IN (SELECT "stock_location"."id" FROM "stock_location" WHERE ("stock_location"."parent_path"::text LIKE %s))) AND ("stock_quant"."product_id" = %s)) 
ORDER BY "stock_quant"."in_date" DESC , "stock_quant"."id" DESC

In the previous version, we flushed:
{'stock.location': ['location_id'], 'stock.quant': ['in_date', 'location_id', 'product_id']}
Now:
{'stock.location': ['parent_path'], 'stock.quant': ['in_date', 'location_id', 'product_id']}
location_id of 'stock.location' is not used in the query but the parent_path is. child_of wasn't not take in account correctly in _flush_search(). (Actually it didn't matter too much for parent_path, because it is updated directly in DB, right ? But if we want to modify it in a compute store, we now can.)

Example 3: too many flush due to related store field

SELECT "stock_move"."id"
FROM "stock_move"
WHERE (("stock_move"."picking_id" = %s) AND ("stock_move"."scrapped" = TRUE)) AND ("stock_move"."company_id" IN %s)

In the previous version, we flushed:
{'stock.location': ['scrap_location'], 'stock.move': ['picking_id', 'scrapped', 'location_dest_id', 'company_id']}
Now:
{'stock.move': ['picking_id', 'scrapped', 'company_id']}
We flushed all fields inside a related definition (location_dest_id + scrapped), but actually it is already manage by modified (will flag scrapped to recompute if necessary).

Example 4: related flush isn't delegated to `_flush` anymore + missing flush

SELECT "mrp_bom"."id" 
FROM "mrp_bom" 
    LEFT JOIN "product_product" AS "mrp_bom__product_id" ON ("mrp_bom"."product_id" = "mrp_bom__product_id"."id") 
    LEFT JOIN "product_template" AS "mrp_bom__product_id__product_tmpl_id" ON ("mrp_bom__product_id"."product_tmpl_id" = "mrp_bom__product_id__product_tmpl_id"."id") 
WHERE ((("mrp_bom"."product_id" IN %s) OR ("mrp_bom"."product_id" IS NULL AND ("mrp_bom"."product_tmpl_id" IN %s))) AND ("mrp_bom"."active" = TRUE)) 
ORDER BY "mrp_bom"."sequence"  , "mrp_bom__product_id__product_tmpl_id"."priority" DESC , "mrp_bom__product_id"."default_code"  , "mrp_bom__product_id__product_tmpl_id"."name"->>%s  , "mrp_bom__product_id"."id"  , "mrp_bom"."id"   LIMIT %s

Previous flushed fields:
{'mrp.bom': ['sequence', 'product_id', 'active', 'product_tmpl_id'], 'product.product': ['priority', 'default_code', 'name', 'active']}, but actually 'product.template': ['priority', 'name'] was indirectly flushed by

odoo/odoo/models.py

Line 6285 in 1df6f29

model_fields[field.related_field.model_name].append(field.related_field)

(this complexity has been removed in this PR).
Now:

{'mrp.bom': ['sequence', 'product_id', 'active', 'product_tmpl_id'],  'product.product': ['product_tmpl_id', 'default_code'], 'product.template': ['priority', 'name']}

. product_tmpl_id of 'product.product' wasn't flushed correctly in the previous version.

Alternative solution

Flush field directly when we call _leaf_to_sql(), it makes the solution much easier. But because the flushing operations aren't batched, this may result in adding extra SQL queries: https://runbot.odoo.com/runbot/batch/1254238/build/55003227 (due to recompute of store field). Also the second point is not covered at all.

Other related changes

Because we moved the group field check to _field_to_sql() and because mail.thread overrides _flush_search(), we need to move method _leaf_to_sql() to class BaseModel. It is also a nice hook to change the behavior of domain translation in the SQL where clause.
Because _search() doesn't flush anymore, some usage needs to be changed to flush correctly. A lot of them are there for having a specific behavior when searching on analytic_distribution. We rewrote all of them by using search() and overriding _leaf_to_sql(), like in the previous case.
Method _flush() has been simplified because some parts became unnecessary with the new flushing mechanism, which is much more accurate.

https://github.com/odoo/enterprise/pull/53380
https://github.com/odoo/upgrade/pull/5469
odoo/documentation#7194

robodoo · 2023-12-04T11:13:20Z

.

robodoo · 2024-01-13T07:59:21Z

I'm sorry, @ryv-odoo: you are not allowed to override this status.

ryv-odoo · 2024-01-13T08:01:08Z

@robodoo rebase-ff

robodoo · 2024-01-13T08:01:41Z

Merge method set to rebase and fast-forward.

rco-odoo · 2024-01-13T09:13:12Z

@robodoo override=ci/style

robodoo · 2024-01-13T10:00:36Z

@ryv-odoo linked pull request(s) odoo/documentation#7194, odoo/enterprise#53380, odoo/upgrade#5469 not ready. Linked PRs are not staged until all of them are ready.

Because we need to move the group field check to _field_to_sql() in next the commit and because 'mail.thread' overrides _flush_search(), we need to move of _leaf_to_sql() to the BaseModel class, rename as _condition_to_sql() and modify its signature to have a better API. It becomes a nice hook to change the behavior of domain translation in the SQL where clause.

Rationale ========= The ORM holds the pending row updates as long as it can, because the second update on a row in a transaction is much slower than the first one. Then we need to flush pending updates that may affect the result of an SQL query. Initially, the consistency mechanism within the ORM was to call _flush_search() before creating the query itself. The purpose of this method was to flush all fields used in the search domain (including the ir.rule domain), the order, and extra fields (used for _read_group()). This current solution has some drawbacks: - Because expression (the class that translates a domain into a WHERE clause) evolved, there are already inconsistencies between fields used in the WHERE clause and the fields flushed by _flush_search(). - The method _search() returns a lazy query object, and we execute it only when needed or add it as a subquery. But _flush_search() is not lazy and will force necessary updates directly even if the query isn't executed at the end (or later, which can lead to some inconsistencies). - Tracking field used in SQL expression makes some API heavier than necessary. For example, for each _read_group*() hooks, we always need to return a list of fields along with the SQL expression. Also, if we want to extend _leaf_to_sql() for related fields, we will need to track fields used in the paths (i.e. add this information in the return value and then change each use of it). Change ====== The idea is to integrate the information about what field was used to create an SQL expression and use that information just before executing the query. We then add a _metadata attribute to the SQL object that holds the field used to create it. This metadata attribute is used only by _field_to_sql(), which we want to be the only method to generate SQL from a field name. These metadata is flushed just before the query is executed. Since we always need the current env to flush with the right context/user, we cannot flush automatically flush in cr.execute(). Instead, we add a method execute_query() on the environment itself. This method is currently almost only used by the ORM itself now and it only accepts (and asserts) SQL object as query. This method may become a mandatory way to execute SQL in order to enforce security (combined with the pylint check) and avoid SQL injection in the long run.

_flush() has been simplified because some parts became unnecessary with the new flush mechanism, which is much more accurate.

We don't need to manually returns fields used in a SQL expression with the way to flush. Metadata is glue to the SQL expression generate by _field_to_sql().

Because `_search` doesn't flush, the some usage needs to be changed to flush correctly. A lot of then are there for have specific behavior when searching on `analytic_distribution`. rewrite all of then with using search()/_read_group() and by overriding _leaf_to_sql()/ _read_group_groupby().

ryv-odoo · 2024-01-15T11:18:39Z

@robodoo r+

Because we need to move the group field check to _field_to_sql() in next the commit and because 'mail.thread' overrides _flush_search(), we need to move of _leaf_to_sql() to the BaseModel class, rename as _condition_to_sql() and modify its signature to have a better API. It becomes a nice hook to change the behavior of domain translation in the SQL where clause. Part-of: #144747

Rationale ========= The ORM holds the pending row updates as long as it can, because the second update on a row in a transaction is much slower than the first one. Then we need to flush pending updates that may affect the result of an SQL query. Initially, the consistency mechanism within the ORM was to call _flush_search() before creating the query itself. The purpose of this method was to flush all fields used in the search domain (including the ir.rule domain), the order, and extra fields (used for _read_group()). This current solution has some drawbacks: - Because expression (the class that translates a domain into a WHERE clause) evolved, there are already inconsistencies between fields used in the WHERE clause and the fields flushed by _flush_search(). - The method _search() returns a lazy query object, and we execute it only when needed or add it as a subquery. But _flush_search() is not lazy and will force necessary updates directly even if the query isn't executed at the end (or later, which can lead to some inconsistencies). - Tracking field used in SQL expression makes some API heavier than necessary. For example, for each _read_group*() hooks, we always need to return a list of fields along with the SQL expression. Also, if we want to extend _leaf_to_sql() for related fields, we will need to track fields used in the paths (i.e. add this information in the return value and then change each use of it). Change ====== The idea is to integrate the information about what field was used to create an SQL expression and use that information just before executing the query. We then add a _metadata attribute to the SQL object that holds the field used to create it. This metadata attribute is used only by _field_to_sql(), which we want to be the only method to generate SQL from a field name. These metadata is flushed just before the query is executed. Since we always need the current env to flush with the right context/user, we cannot flush automatically flush in cr.execute(). Instead, we add a method execute_query() on the environment itself. This method is currently almost only used by the ORM itself now and it only accepts (and asserts) SQL object as query. This method may become a mandatory way to execute SQL in order to enforce security (combined with the pylint check) and avoid SQL injection in the long run. Part-of: #144747

_flush() has been simplified because some parts became unnecessary with the new flush mechanism, which is much more accurate. Part-of: #144747

We don't need to manually returns fields used in a SQL expression with the way to flush. Metadata is glue to the SQL expression generate by _field_to_sql(). Part-of: #144747

Because `_search` doesn't flush, the some usage needs to be changed to flush correctly. A lot of then are there for have specific behavior when searching on `analytic_distribution`. rewrite all of then with using search()/_read_group() and by overriding _leaf_to_sql()/ _read_group_groupby(). closes #144747 Related: odoo/documentation#7194 Related: odoo/upgrade#5469 Related: odoo/enterprise#53380 Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>

closes #7194 Related: odoo/upgrade#5469 Related: odoo/odoo#144747 Related: odoo/enterprise#53380 Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>

ryv-odoo force-pushed the master-metadata-sql-ryv branch 2 times, most recently from e249fec to 74340d4 Compare December 4, 2023 11:33

C3POdoo added the RD research & development, internal work label Dec 4, 2023

ryv-odoo force-pushed the master-metadata-sql-ryv branch 10 times, most recently from 1b6025b to f7d8410 Compare December 8, 2023 13:33

ryv-odoo force-pushed the master-metadata-sql-ryv branch 8 times, most recently from 62178c1 to b012848 Compare December 18, 2023 08:43

ryv-odoo force-pushed the master-metadata-sql-ryv branch 6 times, most recently from 5fdd271 to 9b94112 Compare December 26, 2023 14:36

ryv-odoo marked this pull request as ready for review December 26, 2023 14:36

C3POdoo requested a review from a team December 26, 2023 14:39

ryv-odoo force-pushed the master-metadata-sql-ryv branch from c6efbb5 to a855f21 Compare January 15, 2024 08:53

ryv-odoo added 5 commits January 15, 2024 10:53

[REF] core: simplify BaseModel._flush()

ddfc2aa

_flush() has been simplified because some parts became unnecessary with the new flush mechanism, which is much more accurate.

[IMP] core: adapt _read_group*() methods with SQL._metadata

80e6542

We don't need to manually returns fields used in a SQL expression with the way to flush. Metadata is glue to the SQL expression generate by _field_to_sql().

ryv-odoo force-pushed the master-metadata-sql-ryv branch from a855f21 to 7703da4 Compare January 15, 2024 09:53

robodoo pushed a commit that referenced this pull request Jan 15, 2024

[REF] core: simplify BaseModel._flush()

becb45d

_flush() has been simplified because some parts became unnecessary with the new flush mechanism, which is much more accurate. Part-of: #144747

robodoo temporarily deployed to merge January 15, 2024 14:35 Inactive

robodoo closed this Jan 15, 2024

robodoo added the 17.1 label Jan 15, 2024

fw-bot deleted the master-metadata-sql-ryv branch January 29, 2024 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New flush strategies => Metadata on SQL #144747

New flush strategies => Metadata on SQL #144747

ryv-odoo commented Dec 4, 2023 •

edited by rco-odoo

Loading

robodoo commented Dec 4, 2023 •

edited

Loading

robodoo commented Jan 13, 2024

ryv-odoo commented Jan 13, 2024

robodoo commented Jan 13, 2024

rco-odoo commented Jan 13, 2024

robodoo commented Jan 13, 2024

ryv-odoo commented Jan 15, 2024

New flush strategies => Metadata on SQL #144747

New flush strategies => Metadata on SQL #144747

Conversation

ryv-odoo commented Dec 4, 2023 • edited by rco-odoo Loading

New flush strategy

Rationale

Solution

Security note

Some results

Example 1: Unnecessary flush of 'active' field when auto_join=True

Example 2 : 'parent_path' missing flush when 'child_of' operator is used

Example 3: too many flush due to related store field

Example 4: related flush isn't delegated to _flush anymore + missing flush

Alternative solution

Other related changes

robodoo commented Dec 4, 2023 • edited Loading

robodoo commented Jan 13, 2024

ryv-odoo commented Jan 13, 2024

robodoo commented Jan 13, 2024

rco-odoo commented Jan 13, 2024

robodoo commented Jan 13, 2024

ryv-odoo commented Jan 15, 2024

ryv-odoo commented Dec 4, 2023 •

edited by rco-odoo

Loading

Example 4: related flush isn't delegated to `_flush` anymore + missing flush

robodoo commented Dec 4, 2023 •

edited

Loading