Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize segment SQL when segment subqueries are used #21016

Merged
merged 34 commits into from Jan 4, 2024

Conversation

diosmosis
Copy link
Member

@diosmosis diosmosis commented Jul 14, 2023

Description:

Fixes #20467

Changes:

  • Refactor segment parsing and related code to use an expression tree (as nested array) for the intermediate data structures. First level in the array represents AND-ed expressions, the second level represents OR-ed expressions. This makes it easier to manipulate the expression before converting it to SQL and simplifies the related code.
  • Merge adjacent NOT IN segment subqueries so fewer exist in segment SQL overall. Segment subqueries that are next to each other in a single OR expression chain are merged, and segment subqueries that are not parts of OR expression chains but are part of the overall AND expression chain are merged.

Review

@diosmosis diosmosis marked this pull request as draft July 14, 2023 02:42
diosmosis added a commit to matomo-org/plugin-CustomVariables that referenced this pull request Jul 17, 2023
@diosmosis diosmosis force-pushed the 20467-segment-not-in-optimizations branch from 05cd479 to 25cbc21 Compare July 25, 2023 18:46
@diosmosis diosmosis added the Needs Review PRs that need a code review label Jul 25, 2023
@diosmosis diosmosis marked this pull request as ready for review July 25, 2023 18:49
@diosmosis diosmosis force-pushed the 20467-segment-not-in-optimizations branch from 25cbc21 to a349dd2 Compare July 25, 2023 18:50
@diosmosis diosmosis removed the Needs Review PRs that need a code review label Jul 25, 2023
@diosmosis diosmosis added the Needs Review PRs that need a code review label Jul 25, 2023
@mattab mattab added this to the 5.1.0 milestone Jul 28, 2023
@github-actions

This comment was marked as resolved.

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Aug 5, 2023
@diosmosis diosmosis added the Do not close PRs with this label won't be marked as stale by the Close Stale Issues action label Aug 5, 2023
@github-actions github-actions bot removed the Stale The label used by the Close Stale Issues action label Aug 6, 2023
@github-actions

This comment was marked as resolved.

@github-actions github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Aug 14, 2023
@github-actions

This comment was marked as resolved.

@github-actions github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Aug 22, 2023
@github-actions

This comment was marked as resolved.

@github-actions github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Aug 30, 2023
@sgiehl sgiehl removed the Needs Review PRs that need a code review label Dec 8, 2023
@bx80 bx80 added the Needs Review PRs that need a code review label Dec 11, 2023
@bx80 bx80 changed the title optimize segment SQL when segment subqueries are used Optimize segment SQL when segment subqueries are used Dec 11, 2023
@michalkleiner
Copy link
Contributor

michalkleiner commented Dec 11, 2023

I've looked through the change and I can't confidently give it an approval on its merits, I could only do it on the fact the tests are passing, but the tests were adjusted as well, so I would need more time or defer to someone else.

@bx80
Copy link
Contributor

bx80 commented Dec 12, 2023

@sgiehl Now the urlencoding issue is resolved, could you take another look at reviewing this PR when you get a chance? Thanks! 🙂

Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the code looks fine to me. But as mentioned before it's really hard to ensure if it might introduce any possible regressions regarding the resulting data.
To minimize the risk I did some more local testing by creating a bunch of fake visits for a certain day. Then I created various different segments and added some logging to receive the amount of rows inserted in the temporary segment table. Then I ran archiving once without the config flag and once with it and compared the numbers. For my local tests the numbers were the same. We nevertheless could so some similar testing with bigger data sets or maybe really compare the resulting visit ids to ensure there are no differences.

Copy link
Contributor

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Dec 21, 2023
@bx80
Copy link
Contributor

bx80 commented Dec 21, 2023

I've reverted the feature flag change and merged in Michal's improved parsing code. If somebody wants to give this a last quick check over then it should be ready to merge into 5.x-dev 🏁

Copy link
Contributor

@michalkleiner michalkleiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ben and Stefan tested this extensively and we have done some more tests in the cloud environment without any obvious issues, so I guess that's as much as we can do at this stage.

@sgiehl sgiehl modified the milestones: 5.1.0, 5.0.1 Jan 2, 2024
@sgiehl sgiehl merged commit b0c7a6b into 5.x-dev Jan 4, 2024
24 of 25 checks passed
@sgiehl sgiehl deleted the 20467-segment-not-in-optimizations branch January 4, 2024 15:27
sgiehl added a commit to matomo-org/plugin-CustomVariables that referenced this pull request Jan 5, 2024
* fix build for matomo-org/matomo#21016

* improve test

---------

Co-authored-by: diosmosis <diosmosis@users.noreply.github.com>
Co-authored-by: Ben <ben.burgess@innocraft.com>
Co-authored-by: Michal Kleiner <michal@innocraft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Do not close PRs with this label won't be marked as stale by the Close Stale Issues action Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of archiving queries when a segment uses "Not contains" or "not equals"
6 participants