Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] base, base_address_extended: optimisation of partners import #31876

Closed

Conversation

Projects
None yet
7 participants
@xmo-odoo
Copy link
Collaborator

commented Mar 15, 2019

If bulk-importing clients with a parent_id set, there's a ton of extra /
post-processing work which is not batched and "de-optimises" the
batching added to import.

High-level comparison of performing a "Test Import" on a synthetic test
file of 10000 partners (all with a parent_id set to a pre-existing
record) went from ~30mn to ~8mn on my machine.

Fixes #31549

@robodoo robodoo added the CI 🤖 label Mar 15, 2019

@C3POdoo C3POdoo added the RD label Mar 15, 2019

@xmo-odoo xmo-odoo force-pushed the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch Mar 15, 2019

@robodoo robodoo removed the CI 🤖 label Mar 15, 2019

@xmo-odoo xmo-odoo changed the title WIP optimisation of partners import (if partner_id) Optimisation of partners import (if parent_id) Mar 15, 2019

@xmo-odoo xmo-odoo force-pushed the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch to b6995a2 Mar 15, 2019

@xmo-odoo

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 15, 2019

@KangOl the with recursive is not complex at all, but would you mind checking it anyway in case I've missed something?

@KangOl

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2019

The query looks correct.

@xmo-odoo xmo-odoo force-pushed the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch 2 times, most recently from 360de09 to f9705d4 Mar 19, 2019

@robodoo robodoo added the CI 🤖 label Mar 19, 2019

@xmo-odoo xmo-odoo force-pushed the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch from f9705d4 to 50cba22 Mar 19, 2019

@robodoo robodoo added CI 🤖 and removed CI 🤖 labels Mar 19, 2019

@xmo-odoo xmo-odoo force-pushed the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch from 50cba22 to 79bfc35 Mar 22, 2019

@robodoo robodoo removed the CI 🤖 label Mar 22, 2019

FROM res_partner p, cpid
WHERE NOT is_company and p.parent_id = cpid.id
)
SELECT * FROM cpid WHERE id =ANY(%s)

This comment has been minimized.

Copy link
@odony

odony Mar 22, 2019

Contributor

On a sample database with 1.5M partner records, this query takes 7 full seconds to execute even if self.ids is a singleton, because it works top-down -> Explain plan.
Perhaps this can be avoided with a bottom-up recursion instead?

This comment has been minimized.

Copy link
@moylop260

This comment has been minimized.

Copy link
@KangOl

KangOl Mar 25, 2019

Contributor

@moylop260 nope, this bug has been resolved in PostgreSQL 9.3 (see update at the end of the article).

We can accelerate the query by only computing value on the tree of self.ids

WITH RECURSIVE tree(id,parent_id,is_company) AS (
    SELECT id, parent_id, is_company
      FROM res_partner
     WHERE id = ANY(%s)
     UNION
    SELECT p.id, p.parent_id, p.is_company
      FROM res_partner p, tree t
     WHERE p.id = t.parent_id
),
cpid(id, commercial_partner_id) AS (
    SELECT id, id
      FROM tree
     WHERE is_company
        OR parent_id IS NULL
     UNION
    SELECT p.id, cpid.commercial_partner_id
     FROM tree p, cpid
    WHERE NOT is_company
      AND p.parent_id = cpid.id
)
SELECT * FROM cpid WHERE id = ANY(%s);

This comment has been minimized.

Copy link
@xmo-odoo

xmo-odoo Apr 1, 2019

Author Collaborator

@odony can you check your test case with @KangOl's query? (it doesn't seem to have a significant negative impact on the test case file so works for me).

This comment has been minimized.

Copy link
@odony

odony Apr 1, 2019

Contributor

@xmo-odoo the bottom-up query from @KangOl works much better indeed, but it still needs to be slightly modified to consider cases where is_company is NULL (as was the case with the previous implementation, BTW).
This NULL case should be included in the test cases too, if they did not catch it.

Here's a proposition of modified version with a single CTE. It covers the NULL is_company case and seems to be marginally faster (12ms instead of 67ms on 500 IDS in my 1.5M test database).
For a single ID both versions run in less than 0.5 ms.

WITH RECURSIVE cpid(id, parent_id, commercial_partner_id, final) AS (
    SELECT id, parent_id, id,       
           ((is_company IS NOT NULL AND is_company) OR
            parent_id IS NULL) as final
      FROM res_partner                                         
     WHERE id = ANY(%s)
  UNION                                   
    SELECT cpid.id, p.parent_id, p.id,
           ((is_company IS NOT NULL AND is_company) OR
            p.parent_id IS NULL) as final
      FROM res_partner p JOIN cpid ON (cpid.parent_id = p.id)
     WHERE NOT cpid.final
)                
SELECT cpid.id, cpid.commercial_partner_id
  FROM cpid
 WHERE final AND id = ANY(%s);

This comment has been minimized.

Copy link
@xmo-odoo

xmo-odoo Apr 2, 2019

Author Collaborator

Looking at it more, and after writing some tests, the special handling for is_company being null seems completely unnecessary in that version: since we're checking for is_company (and then final) being true, them being null or false makes no difference. Was probably an issue with @KangOl's version as it checks specifically for not is_company.

(though going with "explicit is better than implicit", I guess leaving it in doesn't hurt and is clearer)

This comment has been minimized.

Copy link
@odony

odony Apr 2, 2019

Contributor

@odony how about coalesce(is_company, false) instead of is_company is not null and is_company?

Sure, equivalent and even shorter :)

WITH RECURSIVE cpid(id, parent_id, commercial_partner_id, final) AS (
    SELECT id, parent_id, id,       
           (COALESCE(is_company, false) OR parent_id IS NULL) as final
      FROM res_partner                                         
     WHERE id = ANY(%s)
  UNION                                   
    SELECT cpid.id, p.parent_id, p.id,
           (COALESCE(is_company, false) OR p.parent_id IS NULL) as final
      FROM res_partner p JOIN cpid ON (cpid.parent_id = p.id)
     WHERE NOT cpid.final
)                
SELECT cpid.id, cpid.commercial_partner_id
  FROM cpid
 WHERE final AND id = ANY(%s);

This comment has been minimized.

Copy link
@odony

odony Apr 2, 2019

Contributor

Looking at it more, and after writing some tests, the special handling for is_company being null seems completely unnecessary in that version: since we're checking for is_company (and then final) being true, them being null or false makes no difference. Was probably an issue with @KangOl's version as it checks specifically for not is_company.

I don't think we can spare the COALESCE because we'd still get unknown branches in the comparison when parent_id is set, and thus skip result rows.

Let's say company_id is NULL and parent_id = 3:

SELECT (NULL OR 3 IS NULL) as final;
 final  
--------
 <NULL>
(1 row)

Do your tests include a row where is_company is NULL and parent_id is set? ;-)

This comment has been minimized.

Copy link
@xmo-odoo

xmo-odoo Apr 2, 2019

Author Collaborator

Let's say company_id is NULL and parent_id = 3:

This yields final=null instead of final=false, but once through the where the result's the same, and null or true returns true.

Do your tests include a row where is_company is NULL and parent_id is set? ;-)

Yup.

This comment has been minimized.

Copy link
@odony

odony Apr 3, 2019

Contributor

Let's say company_id is NULL and parent_id = 3:

This yields final=null instead of final=false, but once through the where the result's the same, and null or true returns true.

Well, the WHERE NOT cpid.final inside the recursive branch will fail to pick up a NULL, and the final result row will be missing entirely. You must doing something different ;-)

Here is an example where one result line is missing when there is no COALESCE:

# select id, parent_id, is_company from res_partner where id in (4823, 62413, 25525);
  id   | parent_id | is_company 
-------+-----------+------------
  4823 |     62413 | <NULL>
 25525 |    <NULL> | f
 62413 |    <NULL> | t
(3 rows)

openerp=# WITH RECURSIVE cpid(id, parent_id, commercial_partner_id, final) AS (
    SELECT id, parent_id, id,       
           (is_company OR parent_id IS NULL) as final
      FROM res_partner                                         
     WHERE id = ANY(ARRAY[4823,25525])
  UNION                                   
    SELECT cpid.id, p.parent_id, p.id,
           (is_company OR p.parent_id IS NULL) as final
      FROM res_partner p JOIN cpid ON (cpid.parent_id = p.id)
     WHERE NOT cpid.final
)                
SELECT cpid.id, cpid.commercial_partner_id
  FROM cpid
 WHERE final AND id = ANY(ARRAY[4823,25525]);
  id   | commercial_partner_id 
-------+-----------------------
 25525 |                 25525
(1 row)

The above is fixed by adding the necessary COALESCE, either when computing final or when testing final in the recursive WHERE:

openerp=# WITH RECURSIVE cpid(id, parent_id, commercial_partner_id, final) AS (
    SELECT id, parent_id, id,       
           (is_company OR parent_id IS NULL) as final
      FROM res_partner                                         
     WHERE id = ANY(ARRAY[4823,25525])
  UNION                                   
    SELECT cpid.id, p.parent_id, p.id,
           (is_company OR p.parent_id IS NULL) as final
      FROM res_partner p JOIN cpid ON (cpid.parent_id = p.id)
     WHERE NOT (COALESCE(cpid.final, false))
)                
SELECT cpid.id, cpid.commercial_partner_id
  FROM cpid
 WHERE final AND id = ANY(ARRAY[4823,25525]);
  id   | commercial_partner_id 
-------+-----------------------
 25525 |                 25525
  4823 |                 62413
(2 rows)

@robodoo robodoo added the CI 🤖 label Mar 22, 2019

@xmo-odoo xmo-odoo force-pushed the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch from 79bfc35 to 51072ae Apr 1, 2019

@robodoo robodoo added CI 🤖 and removed CI 🤖 labels Apr 1, 2019

@xmo-odoo xmo-odoo force-pushed the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch from 51072ae to 8051ac7 Apr 2, 2019

@robodoo robodoo removed the CI 🤖 label Apr 2, 2019

@KangOl

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

Do not commit the Profiler (or do it in another PR)

@robodoo robodoo added the CI 🤖 label Apr 2, 2019

xmo-odoo added some commits Mar 14, 2019

[IMP] base_address_partner: optimise transfer betwen parent and child
Mostly a concern during bulk import of partners with parents.

First remove syncing extended fields, that seems completely
unnecessary since the street gets sync'd and will get split through the
normal process (updating the sub-fields), by also syncing the split
fields we're redundantly calling _set_steet and rewriting the address
we just sync'd.

Second if write()ing both the country and street, the override would
then go and re-write the street based on what had *just* been split
into sub-street fields, which could waste a lot of time during the
import post-process as address fields get moved back and forth between
parents and children leading to *lots* of writing both country and
address together, we're talking:

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10002/2    0.111    0.000  344.390  172.195 res_partner.py:181(write)
[...]
      2    0.455    0.228  165.264   82.632 base_address_extended.py:41(_set_street)

My initial instinct was to just add the country_id to _set_street and
remove the write override but it would break
88ff6be: if the country alone is
updated on a partner, we do want to re-format the "unified" address
based on individual fields and the new country's format.

Note: it might be that the logic would be more sensible by inversing
the entire thing, such that the split fields are the proper
source (regular stored fields) and street is converted to a computed
field instead, but that'd be a larger model change. Seems like it'd
make more sense though, at least given what the modules attempts to
do (OTOH the module probably fails
https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/
in just about all the ways).
[IMP] base: try to batch update of children address & commercial fields
With profiling enabled, importing 10k partners, with all of them
having the same parent (though not with all of them having the same
is_company setting) lowers sync / post-process time from ~550s to
~330s.

Put it in an override to _load_records_create so it's only active for
imports (which is the original report & test case), use a context key
to avoid going the post-processing work in create.
[IMP] base: batching of _compute_commercial_partner
Should correctly handle / fallback to regular code when processing
non-stored partners.

Perf difference according to cprofile on my machine:

original python:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.134    0.134  205.380  205.380 res_partner.py:277(_compute_commercial_partner)
SQLized
     1    0.118    0.118   67.239   67.239 res_partner.py:280(_compute_commercial_partner)

most of the time leftover seems to be in modified_draft

@xmo-odoo xmo-odoo force-pushed the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch from 8051ac7 to 0851524 Apr 2, 2019

@robodoo robodoo added CI 🤖 and removed CI 🤖 labels Apr 2, 2019

@xmo-odoo

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 4, 2019

robodoo r+

@robodoo robodoo added the r+ 👌 label Apr 4, 2019

@robodoo

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

Because this PR has multiple commits, I need to know how to merge it:

  • merge to merge directly, using the PR as merge commit message
  • rebase-merge to rebase and merge, using the PR as merge commit message
  • rebase-ff to rebase and fast-forward
@odony

odony approved these changes Apr 4, 2019

Copy link
Contributor

left a comment

Thanks 👍
You might want to reword the PR desc, as we should use the merge method. Some of the commit messages should probably also mention that they fix #31549.

@xmo-odoo xmo-odoo changed the title Optimisation of partners import (if parent_id) [FIX] base, base_address_extended: optimisation of partners import Apr 4, 2019

@xmo-odoo

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 4, 2019

robodoo rebase-merge

@robodoo

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

Merge method set to rebase and merge, using the PR as merge commit message

@robodoo robodoo added the merging 👷 label Apr 4, 2019

@robodoo robodoo closed this in 2510217 Apr 4, 2019

@robodoo robodoo added merged 🎉 and removed merging 👷 labels Apr 4, 2019

@robodoo

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2019

Merged, thanks!

@nim-odoo nim-odoo deleted the odoo-dev:12.0-partners-import-perf-parent-id-xmo branch Apr 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.