-
Notifications
You must be signed in to change notification settings - Fork 78
[FIX] orm: in recompute_fields, avoid memory error from postgresql #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] orm: in recompute_fields, avoid memory error from postgresql #322
Conversation
|
Testing this now, we'll know in ~6h |
|
Tests with the currently effected DB look good (on PG17). Last fixup is to fix PG12 - PG15 (require an alias for aggreagates on a subquery) |
7c490a9 to
ee6740b
Compare
|
pushed fixup to take this case into regard: https://runbot.odoo.com/runbot/build/89536248 |
3e2c407 to
0eafd6a
Compare
71d2b20 to
fbe98ac
Compare
7ad9bac to
0be7109
Compare
|
upgradeci retry with always only account hr_recruitment_extract hr_expense sale_stock sale_subscription crm l10n_it_edi sale purchase mrp |
342ad22 to
3a35af4
Compare
src/util/orm.py
Outdated
| yield id_ | ||
| def get_ids(): | ||
| with named_cursor(cr, itersize=2**20) as ncr: | ||
| ncr.execute("SELECT * FROM _upgrade_rf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it fail if the query is wrong. Avoid id-injection to the ORM.
| ncr.execute("SELECT * FROM _upgrade_rf") | |
| ncr.execute("SELECT id::int FROM _upgrade_rf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't cast it to int. On some (really) huge tables, it's not excluded that they has to promote the id to bigint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an optional stuff. But note that we could get select name from table as query. Whether this is an issue for the ORM or not it would depend on the running version... The error later could be harder to understand than a type cast error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an optional stuff. But note that we could get
select name from tableas query. Whether this is an issue for the ORM or not it would depend on the running version... The error later could be harder to understand than a type cast error.
If we really want to do that, I would make it such that it runs a normal create table first. In there, we can specify the column type and create the PK all in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to :) I just wanted a quick/simple way to avoid the issue. We could cast to bigint as well, but it may have some perf impact.
c19f6e0 to
16b6057
Compare
|
Last fixup is to fix: |
59d723c to
27827d3
Compare
🎉 Test on big DB is still running, results should come in before tomorrow. EDIT: bundle tested successfully on big DB. @KangOl |
27827d3 to
c9b3b39
Compare
|
Pushed a small simplification that came to mind while applying the same idea to iter_browse |
|
I think your last patch is leaving behind the table _upgrade_rf when count is zero. |
Oh, well. I forgot (to think - thanks for doing it for me) about the early return ... too bad then. Changing it back. |
c9b3b39 to
27827d3
Compare
improves 13adede Queries run through client-side cursors will make postgresql materialze the whole of the result immediately (which is actually, why `cr.rowcount` is always available right after `execute` in this case). With server side cursors (named cursors) on the other hand, tuples are materialized when they are fetched. This is why running the `query` for ids through the client-side cursor just to be able to access `cr.rowcount`, can cause an out-of-memory exception from PostgreSQL. We fix this by wrapping the query in a `CREATE TABLE AS` statement that inserts returned ids into a temporary table. We then use a named_cursor to fetch ids from this table in chunks, server-side. Another approach would have been to just wrap the query in a `SELECT count(*)` query and run this once to get the `count`. The approach using `CREATE TABLE AS` has been chosen over that solution to support queries that include DML statements (e.g. `UPDATE ... RETURNING`) that affect the results of the compute, as it allows us to run the query on the main (client) cursor, while still using a named_cursor for fetching the ids memory-efficiently.
27827d3 to
d66aa37
Compare
|
Updated commit message. From my side this is ready and it is still the same what Alvaro approved already. |
KangOl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robodoo r+

improves 13adede
Queries run through client-side cursors will make postgresql materialze the whole of the result immediately (which is actually, why
cr.rowcountis always available right afterexecutein this case). With server side cursors (named cursors) on the other hand, tuples are materialized when they are fetched. This is why running thequeryfor ids through the client-side cursor just to be able to accesscr.rowcount, can cause an out-of-memory exception from PostgreSQL.We fix this by wrapping the query in a
CREATE TABLE ASstatement that inserts returned ids into a temporary table. We then use a named_cursor to fetch ids from this table in chunks, server-side.Another approach would have been to just wrap the query in a
SELECT count(*)query and run this once to get thecount. The approach usingCREATE TABLE AShas been chosen over that solution to support queries that include DML statements (e.g.UPDATE ... RETURNING) that affect the results of the compute, as it allows us to run the query on the main (client) cursor, while still using a named_cursor for fetching the ids memory-efficiently.