Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] core: reduce memory use of BaseModel._flush() #162442

Draft
wants to merge 1 commit into
base: 17.0
Choose a base branch
from

Conversation

cawo-odoo
Copy link
Contributor

@cawo-odoo cawo-odoo commented Apr 18, 2024

Motivation: MemoryError exceptions when a large number of records on the same model have dirty fields. Such often happens during upgrades.

In the current implementation, the cached data is re-arranged in multiple steps using local data structures. The most problematic is id_vals[record.id][field.name], because it creates a dictionary with a potentially long field name (think studio fields) as key for each dirty record. For thousands of records, this quickly accumulates to 10s or even 100s of MiB in RAM.

The idea of this patch is:

  1. collect all dirty ids for all dirty fields on the model. This does not cost additional memory, since the list of ids per field will be pop()'ed from the cache.
  2. Walk over fields and ids collecting all fields and values of each id in the same loop, consuming objects from the walked structure while directly building the updates dictionary, without creating the intermediate data structures.

This way, the _flush method only consumes a marginal amount of memory compared to the memory already consumed by the cache.

Using profiling while flushing 50k records in both versions of the code, it has shown that the reduction of space complexity does not increase the runtime. As one can see, the vast amount of time is consumed by the SQL operation in both versions of the code:
image

@robodoo
Copy link
Contributor

robodoo commented Apr 18, 2024

@C3POdoo C3POdoo added the RD research & development, internal work label Apr 18, 2024
@cawo-odoo cawo-odoo force-pushed the 17.0-fix_model_flush_mem_use-cawo branch 2 times, most recently from 28648c2 to 759bcbb Compare April 25, 2024 14:44
odoo/models.py Outdated Show resolved Hide resolved
@rco-odoo
Copy link
Member

@cawo-odoo I just pushed a commit with a proposed simplification.
It still fixes the main memory issue, but without all the detailed little optimizations.
I believe all those low-level optimizations are overall useless, and only make the algorithm harder to understand.
Can you please give it a try?

Motivation: MemoryError exceptions when a large number of records on the same
model have dirty fields. Such often happens during upgrades.

In the current implementation, the cached data is re-arranged in multiple steps
using local data structures. The most problematic is
`id_vals[record.id][field.name]`, because it creates a dictionary with a
potentially long field name (think studio fields) as key for each dirty record.
For thousands of records, this quickly accumulates to 10s or even 100s of MiB
in RAM.

The idea of this patch is:
1. collect all dirty ids for all dirty fields on the model. This does not cost
   additional memory, since the OrderedSet of ids per field will be pop()'ed
   from the cache.
2. Walk over fields and ids collecting all fields and values of each id in the
   same loop, directly building the "updates", without creating the
   intermediate data structure.
This way, the _flush method only consumes a small amount of memory compared to
the memory already consumed by the cache.

Using profiling while flushing 50k records in both versions of the code, it has
shown that we can save this memory without increasing the runtime.

Co-authored-by: Raphael Collet <rco@odoo.com>
@cawo-odoo cawo-odoo force-pushed the 17.0-fix_model_flush_mem_use-cawo branch from 588e726 to c415b25 Compare April 29, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RD research & development, internal work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants