Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up delta_apply jobs by using an indexed column for searching #646

Merged
merged 2 commits into from
May 10, 2023

Conversation

suricactus
Copy link
Collaborator

Filtering by content__clientId is causing super slow query. While in pure SQL we can index on specific field in the JSON, like:

CREATE INDEX ON core_delta(((content->>'clientId')::uuid))

It is not so easy to run a query that uses the index in Django ORM.

Alternatively, we can create index directly on the JSON value:

CREATE INDEX ON core_delta((content->'clientId'))

But this will make the index 5x bigger.

In summary, it was chosen to add a new indexed column with the same data as in the JSON, because it's storage cheaper and faster solution.

Also:
Move before_docker_run to be before we set job status to STARTED

@suricactus suricactus added bug Something isn't working patch Requires patch version change labels May 9, 2023
@duke-nyuki
Copy link
Collaborator

@opengisch opengisch deleted a comment from duke-nyuki May 9, 2023
@suricactus suricactus requested a review from faebebin May 9, 2023 22:27
@suricactus suricactus force-pushed the QF-2663-slow-json-query branch 2 times, most recently from 24ed6c6 to 28f7efb Compare May 9, 2023 23:16
Filtering by `content__clientId` is causing super slow query. While in
pure SQL we can index on specific field in the JSON, like:

```CREATE INDEX ON core_delta(((content->>'clientId')::uuid))```

It is not so easy to run a query that uses the index in Django ORM.

Alternatively, we can create index directly on the JSON value:

```CREATE INDEX ON core_delta((content->'clientId'))```

But this will make the index 5x bigger.

In summary, it was chosen to add a new indexed column with the same data
as in the JSON, because it's storage cheaper and faster solution.
Base automatically changed from QF-2676-always-upload-delta-prevent-job to master May 10, 2023 09:00
Copy link
Member

@faebebin faebebin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM

Tried locally all works.
BTW think there is probably many places still for those kind of performance issues in filtering (and sorting) mostly concerning the admin . Good to have this as an example.

@suricactus suricactus merged commit 57eb510 into master May 10, 2023
4 checks passed
@suricactus suricactus deleted the QF-2663-slow-json-query branch May 10, 2023 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch Requires patch version change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants