Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New flush strategies => Metadata on SQL #144747

Closed
wants to merge 5 commits into from

Conversation

ryv-odoo
Copy link
Contributor

@ryv-odoo ryv-odoo commented Dec 4, 2023

New flush strategy

Rationale

The ORM holds the pending row updates as long as it can, because the second update on a row in a transaction is much slower than the first one. Then we need to flush pending updates that may affect the result of an SQL query. Initially, the consistency mechanism within the ORM was to call _flush_search() before creating the query itself. The purpose of this method was to flush all fields used in the search domain (including the ir.rule domain), the order, and extra fields (used for _read_group()). This current solution has some drawbacks:

  • Because expression (the class that translates a domain into a WHERE clause) evolved, there are already inconsistencies between fields used in the WHERE clause and the fields flushed by _flush_search().
  • The method _search() returns a lazy Query object, and we execute it only when needed or add it as a subquery. But _flush_search() is not lazy and will flush updates directly even if the query isn't executed at the end (or later, which can lead to some inconsistencies).
  • Tracking fields used in SQL expression makes some API heavier than necessary. For example, for each _read_group*() hook, we always need to return a list of fields along with the SQL expression. Also, if we want to extend _leaf_to_sql() for related fields, we will need to track the fields used in the path (i.e., add this information in the return value and then change each use of it). The problem with related fields is that the extra fields may belong to other models, which isn't covered by the existing API.

Solution

The idea is to integrate the information about what field was used to create an SQL expression and use that information just before executing the query. We then add an attribute _metadata to the SQL object that holds the field used to create it. This metadata attribute is only set by method _field_to_sql(), which we want to be the only method to generate SQL from a field name.

These metadata is flushed just before the query is executed. Since we always need the current environment to flush with the right context/user, we cannot flush automatically from cr.execute(). Instead, we add a method get_select_result (WIP: name needs to be changed) on the Environment object. This method is currently almost only used by the ORM itself now, and it only accepts (and asserts) an SQL object as query. This method may become a mandatory way to execute SQL in order to enforce security (combined with the pylint check) and avoid SQL injection in the long run.

Security note

We move the group security check of fields from _flush_search() to _field_to_sql(). Also removed the security check from _read_group() (because it is now done in _field_to_sql()); see https://github.com/odoo/odoo/pull/146438/files.

Some results

The new way to flush is much more accurate, and generally flushes less fields.
In an analysis of 1590 search queries, before we flushed 4520 fields, and now only 4079 fields (-9.75 %).
Also here are some examples to show issues/inefficiency with the previous version:

Example 1: Unnecessary flush of 'active' field when auto_join=True
SELECT "ir_property"."id" 
FROM "ir_property" LEFT JOIN "res_company" AS "ir_property__company_id" ON ("ir_property"."company_id" = "ir_property__company_id"."id") 
WHERE (("ir_property"."res_id" IS NULL AND ("ir_property"."fields_id" = %s)) AND (("ir_property"."company_id" IN %s) OR "ir_property"."company_id" IS NULL)) 
ORDER BY "ir_property__company_id"."sequence"  , "ir_property__company_id"."name"   
LIMIT %s

In the previous version, we flushed:
{'ir.property': ['company_id', 'res_id', 'fields_id'], 'res.company': ['sequence', 'name', 'active']}
Now:
{'ir.property': ['company_id', 'res_id', 'fields_id'], 'res.company': ['sequence', 'name']}.

We don't flush active of res.company field for no reason anymore.

Example 2 : 'parent_path' missing flush when 'child_of' operator is used
SELECT "stock_quant"."id" 
FROM "stock_quant" 
WHERE (("stock_quant"."location_id" IN (SELECT "stock_location"."id" FROM "stock_location" WHERE ("stock_location"."parent_path"::text LIKE %s))) AND ("stock_quant"."product_id" = %s)) 
ORDER BY "stock_quant"."in_date" DESC , "stock_quant"."id" DESC 

In the previous version, we flushed:
{'stock.location': ['location_id'], 'stock.quant': ['in_date', 'location_id', 'product_id']}
Now:
{'stock.location': ['parent_path'], 'stock.quant': ['in_date', 'location_id', 'product_id']}
location_id of 'stock.location' is not used in the query but the parent_path is. child_of wasn't not take in account correctly in _flush_search(). (Actually it didn't matter too much for parent_path, because it is updated directly in DB, right ? But if we want to modify it in a compute store, we now can.)

Example 3: too many flush due to related store field
SELECT "stock_move"."id"
FROM "stock_move"
WHERE (("stock_move"."picking_id" = %s) AND ("stock_move"."scrapped" = TRUE)) AND ("stock_move"."company_id" IN %s)

In the previous version, we flushed:
{'stock.location': ['scrap_location'], 'stock.move': ['picking_id', 'scrapped', 'location_dest_id', 'company_id']}
Now:
{'stock.move': ['picking_id', 'scrapped', 'company_id']}
We flushed all fields inside a related definition (location_dest_id + scrapped), but actually it is already manage by modified (will flag scrapped to recompute if necessary).

Example 4: related flush isn't delegated to _flush anymore + missing flush
SELECT "mrp_bom"."id" 
FROM "mrp_bom" 
    LEFT JOIN "product_product" AS "mrp_bom__product_id" ON ("mrp_bom"."product_id" = "mrp_bom__product_id"."id") 
    LEFT JOIN "product_template" AS "mrp_bom__product_id__product_tmpl_id" ON ("mrp_bom__product_id"."product_tmpl_id" = "mrp_bom__product_id__product_tmpl_id"."id") 
WHERE ((("mrp_bom"."product_id" IN %s) OR ("mrp_bom"."product_id" IS NULL AND ("mrp_bom"."product_tmpl_id" IN %s))) AND ("mrp_bom"."active" = TRUE)) 
ORDER BY "mrp_bom"."sequence"  , "mrp_bom__product_id__product_tmpl_id"."priority" DESC , "mrp_bom__product_id"."default_code"  , "mrp_bom__product_id__product_tmpl_id"."name"->>%s  , "mrp_bom__product_id"."id"  , "mrp_bom"."id"   LIMIT %s

Previous flushed fields:
{'mrp.bom': ['sequence', 'product_id', 'active', 'product_tmpl_id'], 'product.product': ['priority', 'default_code', 'name', 'active']}, but actually 'product.template': ['priority', 'name'] was indirectly flushed by

model_fields[field.related_field.model_name].append(field.related_field)
(this complexity has been removed in this PR).
Now:
{'mrp.bom': ['sequence', 'product_id', 'active', 'product_tmpl_id'], 'product.product': ['product_tmpl_id', 'default_code'], 'product.template': ['priority', 'name']}. product_tmpl_id of 'product.product' wasn't flushed correctly in the previous version.

Alternative solution

  • Flush field directly when we call _leaf_to_sql(), it makes the solution much easier. But because the flushing operations aren't batched, this may result in adding extra SQL queries: https://runbot.odoo.com/runbot/batch/1254238/build/55003227 (due to recompute of store field). Also the second point is not covered at all.

Other related changes

  • Because we moved the group field check to _field_to_sql() and because mail.thread overrides _flush_search(), we need to move method _leaf_to_sql() to class BaseModel. It is also a nice hook to change the behavior of domain translation in the SQL where clause.
  • Because _search() doesn't flush anymore, some usage needs to be changed to flush correctly. A lot of them are there for having a specific behavior when searching on analytic_distribution. We rewrote all of them by using search() and overriding _leaf_to_sql(), like in the previous case.
  • Method _flush() has been simplified because some parts became unnecessary with the new flushing mechanism, which is much more accurate.

https://github.com/odoo/enterprise/pull/53380
https://github.com/odoo/upgrade/pull/5469
odoo/documentation#7194

@robodoo
Copy link
Contributor

robodoo commented Dec 4, 2023

Pull request status dashboard.

@ryv-odoo ryv-odoo force-pushed the master-metadata-sql-ryv branch 2 times, most recently from e249fec to 74340d4 Compare December 4, 2023 11:33
@C3POdoo C3POdoo added the RD research & development, internal work label Dec 4, 2023
@ryv-odoo ryv-odoo force-pushed the master-metadata-sql-ryv branch 10 times, most recently from 1b6025b to f7d8410 Compare December 8, 2023 13:33
@ryv-odoo ryv-odoo force-pushed the master-metadata-sql-ryv branch 8 times, most recently from 62178c1 to b012848 Compare December 18, 2023 08:43
@ryv-odoo ryv-odoo force-pushed the master-metadata-sql-ryv branch 6 times, most recently from 5fdd271 to 9b94112 Compare December 26, 2023 14:36
@ryv-odoo ryv-odoo marked this pull request as ready for review December 26, 2023 14:36
@C3POdoo C3POdoo requested a review from a team December 26, 2023 14:39
@robodoo
Copy link
Contributor

robodoo commented Jan 13, 2024

I'm sorry, @ryv-odoo: you are not allowed to override this status.

@ryv-odoo
Copy link
Contributor Author

@robodoo rebase-ff

@robodoo
Copy link
Contributor

robodoo commented Jan 13, 2024

Merge method set to rebase and fast-forward.

@rco-odoo
Copy link
Member

@robodoo override=ci/style

@robodoo
Copy link
Contributor

robodoo commented Jan 13, 2024

@ryv-odoo linked pull request(s) odoo/documentation#7194, odoo/enterprise#53380, odoo/upgrade#5469 not ready. Linked PRs are not staged until all of them are ready.

Because we need to move the group field check to _field_to_sql() in next
the commit and because 'mail.thread' overrides _flush_search(), we need
to move of _leaf_to_sql() to the BaseModel class, rename as
_condition_to_sql() and modify its signature to have a better API.
It becomes a nice hook to change the behavior of domain translation in
the SQL where clause.
Rationale
=========

The ORM holds the pending row updates as long as it can, because the
second update on a row in a transaction is much slower than the first
one.  Then we need to flush pending updates that may affect the result
of an SQL query.  Initially, the consistency mechanism within the ORM
was to call _flush_search() before creating the query itself. The
purpose of this method was to flush all fields used in the search domain
(including the ir.rule domain), the order, and extra fields (used for
_read_group()).  This current solution has some drawbacks:

- Because expression (the class that translates a domain into a WHERE
  clause) evolved, there are already inconsistencies between fields used
  in the WHERE clause and the fields flushed by _flush_search().
- The method _search() returns a lazy query object, and we execute it
  only when needed or add it as a subquery. But _flush_search() is not
  lazy and will force necessary updates directly even if the query isn't
  executed at the end (or later, which can lead to some inconsistencies).
- Tracking field used in SQL expression makes some API heavier than
  necessary. For example, for each _read_group*() hooks, we always need
  to return a list of fields along with the SQL expression.  Also, if we
  want to extend _leaf_to_sql() for related fields, we will need to
  track fields used in the paths (i.e. add this information in the
  return value and then change each use of it).

Change
======

The idea is to integrate the information about what field was used to
create an SQL expression and use that information just before executing
the query. We then add a _metadata attribute to the SQL object that
holds the field used to create it. This metadata attribute is used
only by _field_to_sql(), which we want to be the only method to generate
SQL from a field name.

These metadata is flushed just before the query is executed.  Since we
always need the current env to flush with the right context/user, we
cannot flush automatically flush in cr.execute().  Instead, we add a
method execute_query() on the environment itself.  This method is
currently almost only used by the ORM itself now and it only accepts
(and asserts) SQL object as query.  This method may become a mandatory
way to execute SQL in order to enforce security (combined with the
pylint check) and avoid SQL injection in the long run.
_flush() has been simplified because some parts became unnecessary
with the new flush mechanism, which is much more accurate.
We don't need to manually returns fields used in a SQL expression with
the way to flush. Metadata is glue to the SQL expression generate by
_field_to_sql().
Because `_search` doesn't flush, the some usage needs to be changed to
flush correctly. A lot of then are there for have specific behavior when
searching on `analytic_distribution`. rewrite all of then with using
search()/_read_group() and by overriding _leaf_to_sql()/
_read_group_groupby().
@ryv-odoo
Copy link
Contributor Author

@robodoo r+

robodoo pushed a commit that referenced this pull request Jan 15, 2024
Because we need to move the group field check to _field_to_sql() in next
the commit and because 'mail.thread' overrides _flush_search(), we need
to move of _leaf_to_sql() to the BaseModel class, rename as
_condition_to_sql() and modify its signature to have a better API.
It becomes a nice hook to change the behavior of domain translation in
the SQL where clause.

Part-of: #144747
robodoo pushed a commit that referenced this pull request Jan 15, 2024
Rationale
=========

The ORM holds the pending row updates as long as it can, because the
second update on a row in a transaction is much slower than the first
one.  Then we need to flush pending updates that may affect the result
of an SQL query.  Initially, the consistency mechanism within the ORM
was to call _flush_search() before creating the query itself. The
purpose of this method was to flush all fields used in the search domain
(including the ir.rule domain), the order, and extra fields (used for
_read_group()).  This current solution has some drawbacks:

- Because expression (the class that translates a domain into a WHERE
  clause) evolved, there are already inconsistencies between fields used
  in the WHERE clause and the fields flushed by _flush_search().
- The method _search() returns a lazy query object, and we execute it
  only when needed or add it as a subquery. But _flush_search() is not
  lazy and will force necessary updates directly even if the query isn't
  executed at the end (or later, which can lead to some inconsistencies).
- Tracking field used in SQL expression makes some API heavier than
  necessary. For example, for each _read_group*() hooks, we always need
  to return a list of fields along with the SQL expression.  Also, if we
  want to extend _leaf_to_sql() for related fields, we will need to
  track fields used in the paths (i.e. add this information in the
  return value and then change each use of it).

Change
======

The idea is to integrate the information about what field was used to
create an SQL expression and use that information just before executing
the query. We then add a _metadata attribute to the SQL object that
holds the field used to create it. This metadata attribute is used
only by _field_to_sql(), which we want to be the only method to generate
SQL from a field name.

These metadata is flushed just before the query is executed.  Since we
always need the current env to flush with the right context/user, we
cannot flush automatically flush in cr.execute().  Instead, we add a
method execute_query() on the environment itself.  This method is
currently almost only used by the ORM itself now and it only accepts
(and asserts) SQL object as query.  This method may become a mandatory
way to execute SQL in order to enforce security (combined with the
pylint check) and avoid SQL injection in the long run.

Part-of: #144747
robodoo pushed a commit that referenced this pull request Jan 15, 2024
_flush() has been simplified because some parts became unnecessary
with the new flush mechanism, which is much more accurate.

Part-of: #144747
robodoo pushed a commit that referenced this pull request Jan 15, 2024
We don't need to manually returns fields used in a SQL expression with
the way to flush. Metadata is glue to the SQL expression generate by
_field_to_sql().

Part-of: #144747
robodoo pushed a commit that referenced this pull request Jan 15, 2024
Because `_search` doesn't flush, the some usage needs to be changed to
flush correctly. A lot of then are there for have specific behavior when
searching on `analytic_distribution`. rewrite all of then with using
search()/_read_group() and by overriding _leaf_to_sql()/
_read_group_groupby().

closes #144747

Related: odoo/documentation#7194
Related: odoo/upgrade#5469
Related: odoo/enterprise#53380
Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>
robodoo pushed a commit to odoo/documentation that referenced this pull request Jan 15, 2024
closes #7194

Related: odoo/upgrade#5469
Related: odoo/odoo#144747
Related: odoo/enterprise#53380
Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>
@robodoo robodoo closed this Jan 15, 2024
@robodoo robodoo added the 17.1 label Jan 15, 2024
vin-odoo pushed a commit to odoo/documentation that referenced this pull request Jan 24, 2024
closes #7194

Related: odoo/upgrade#5469
Related: odoo/odoo#144747
Related: odoo/enterprise#53380
Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>
vin-odoo pushed a commit to odoo/documentation that referenced this pull request Jan 24, 2024
closes #7194

Related: odoo/upgrade#5469
Related: odoo/odoo#144747
Related: odoo/enterprise#53380
Signed-off-by: Rémy Voet (ryv) <ryv@odoo.com>
@fw-bot fw-bot deleted the master-metadata-sql-ryv branch January 29, 2024 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
17.1 RD research & development, internal work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants