Skip to content

Improving data source synchronization performance #1186

@Jylhis

Description

@Jylhis

Members synchronization process is currently quite slow for our use case and I would like to improve this.

Expected Behavior

Synchronization to be fast as possible and preferably maximum wait time of 1h in worst case scenario.

Current Behavior

Currently synchronizing 100 000 members (60k add and 40k update) takes ~10 min.

Possible Solution

Running profiler for the sync process shows that majority of the sync time is spent executing database queries. The sync process seems to be creating 4 queries per member update and 7 queries per member for remove and add.

I pushed draft PR (#1184) where I have moved couple queries out of loops and doing some of the check in code and I also grouped some of the queries in transactions. Here are the benchmarks I ran with the changes:

add 20 000, remove 20 000 update 20 000 members

Before After
Time (sec) 263 67.2
Calls to do_prepared_query 240 032 120 032

add 60 000, update 40 000 members

Before After
Time (sec) 569 141
Calls to do_prepared_query 580 032 260 032

Context

We need to synchronize around 4 milj. memberships total a cross all lists and we would like to keep our list memberships as much as possible in sync with our data source and minimizing the wait time for sync.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions