Optimize segment SQL when segment subqueries are used #21016

diosmosis · 2023-07-14T02:42:37Z

Description:

Fixes #20467

Changes:

Refactor segment parsing and related code to use an expression tree (as nested array) for the intermediate data structures. First level in the array represents AND-ed expressions, the second level represents OR-ed expressions. This makes it easier to manipulate the expression before converting it to SQL and simplifies the related code.
Merge adjacent NOT IN segment subqueries so fewer exist in segment SQL overall. Segment subqueries that are next to each other in a single OR expression chain are merged, and segment subqueries that are not parts of OR expression chains but are part of the overall AND expression chain are merged.

Review

…s internally instead of flat structure (this is to make re-ordering easier)

…hey are within the same OR sequence or are alone within the same AND sequence

…ee structure

…s so they are still supported

michalkleiner · 2023-12-11T11:13:02Z

I've looked through the change and I can't confidently give it an approval on its merits, I could only do it on the fact the tests are passing, but the tests were adjusted as well, so I would need more time or defer to someone else.

bx80 · 2023-12-12T02:53:11Z

@sgiehl Now the urlencoding issue is resolved, could you take another look at reviewing this PR when you get a chance? Thanks! 🙂

…ditional testing

sgiehl

In general the code looks fine to me. But as mentioned before it's really hard to ensure if it might introduce any possible regressions regarding the resulting data.
To minimize the risk I did some more local testing by creating a bunch of fake visits for a certain day. Then I created various different segments and added some logging to receive the amount of rows inserted in the temporary segment table. Then I ran archiving once without the config flag and once with it and compared the numbers. For my local tests the numbers were the same. We nevertheless could so some similar testing with bigger data sets or maybe really compare the resulting visit ids to ensure there are no differences.

github-actions · 2023-12-21T01:45:35Z

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

…tree (#21669) Co-authored-by: Ben Burgess <88810029+bx80@users.noreply.github.com>

bx80 · 2023-12-21T04:00:46Z

I've reverted the feature flag change and merged in Michal's improved parsing code. If somebody wants to give this a last quick check over then it should be ready to merge into 5.x-dev 🏁

michalkleiner

Ben and Stefan tested this extensively and we have done some more tests in the cloud environment without any obvious issues, so I guess that's as much as we can do at this stage.

* fix build for matomo-org/matomo#21016 * improve test --------- Co-authored-by: diosmosis <diosmosis@users.noreply.github.com> Co-authored-by: Ben <ben.burgess@innocraft.com> Co-authored-by: Michal Kleiner <michal@innocraft.com>

diosmosis marked this pull request as draft July 14, 2023 02:42

diosmosis added a commit to matomo-org/plugin-CustomVariables that referenced this pull request Jul 17, 2023

fix build for matomo-org/matomo#21016

539b702

diosmosis force-pushed the 20467-segment-not-in-optimizations branch from 05cd479 to 25cbc21 Compare July 25, 2023 18:46

diosmosis added the Needs Review PRs that need a code review label Jul 25, 2023

diosmosis marked this pull request as ready for review July 25, 2023 18:49

diosmosis added 6 commits July 25, 2023 11:50

refactor segment parsing to use a tree structure for logical operator…

8bdf897

…s internally instead of flat structure (this is to make re-ordering easier)

first optimization: merge subquery segment conditions together when t…

6c5b97b

…hey are within the same OR sequence or are alone within the same AND sequence

remove todo

e0403eb

fix test and adjust segmentformatter to use new segment expression tr…

0d492ac

…ee structure

update expected test output

88c13f5

fix CustomVariables plugin

a349dd2

diosmosis force-pushed the 20467-segment-not-in-optimizations branch from 25cbc21 to a349dd2 Compare July 25, 2023 18:50

diosmosis removed the Needs Review PRs that need a code review label Jul 25, 2023

diosmosis added 2 commits July 25, 2023 12:20

remove code redundancy and add some docs

62fc00f

convert backslash escaped operator characters to urlencoded character…

270916d

…s so they are still supported

diosmosis added the Needs Review PRs that need a code review label Jul 25, 2023

mattab added this to the 5.1.0 milestone Jul 28, 2023

This comment was marked as resolved.

Sign in to view

github-actions bot added the Stale The label used by the Close Stale Issues action label Aug 5, 2023

diosmosis added the Do not close PRs with this label won't be marked as stale by the Close Stale Issues action label Aug 5, 2023

github-actions bot removed the Stale The label used by the Close Stale Issues action label Aug 6, 2023

This comment was marked as resolved.

Sign in to view

github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Aug 14, 2023

This comment was marked as resolved.

Sign in to view

github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Aug 22, 2023

This comment was marked as resolved.

Sign in to view

github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Aug 30, 2023

Merge branch '5.x-dev' into 20467-segment-not-in-optimizations

6d4ed7a

sgiehl removed the Needs Review PRs that need a code review label Dec 8, 2023

bx80 added 4 commits December 11, 2023 22:06

Rework parseTree to avoid removing non-escaping backslashes

b923eef

Update submodule

8ac6943

Merge branch '5.x-dev' into 20467-segment-not-in-optimizations

b8ceb49

Update tests (minor whitespace change)

e774da9

bx80 added the Needs Review PRs that need a code review label Dec 11, 2023

bx80 requested review from sgiehl and michalkleiner December 11, 2023 10:20

bx80 changed the title ~~optimize segment SQL when segment subqueries are used~~ Optimize segment SQL when segment subqueries are used Dec 11, 2023

Temporarily reworked new behaviour to be behind a feature flag for ad…

abd9a8b

…ditional testing

sgiehl approved these changes Dec 13, 2023

View reviewed changes

github-actions bot added Stale The label used by the Close Stale Issues action and removed Stale The label used by the Close Stale Issues action labels Dec 21, 2023

bx80 and others added 3 commits December 21, 2023 16:53

Revert feature flag

4745a99

Merge branch '5.x-dev' into 20467-segment-not-in-optimizations

8b4601f

Use regular expressions to parse segment string into segment logical …

48fae99

…tree (#21669) Co-authored-by: Ben Burgess <88810029+bx80@users.noreply.github.com>

sgiehl added 2 commits December 21, 2023 12:42

Merge branch '5.x-dev' into 20467-segment-not-in-optimizations

bf0f777

fix cs

438200a

michalkleiner approved these changes Dec 22, 2023

View reviewed changes

Merge branch '5.x-dev' into 20467-segment-not-in-optimizations

51ba0b0

sgiehl modified the milestones: 5.1.0, 5.0.1 Jan 2, 2024

sgiehl merged commit b0c7a6b into 5.x-dev Jan 4, 2024
24 of 25 checks passed

sgiehl deleted the 20467-segment-not-in-optimizations branch January 4, 2024 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize segment SQL when segment subqueries are used #21016

Optimize segment SQL when segment subqueries are used #21016

diosmosis commented Jul 14, 2023 •

edited

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

michalkleiner commented Dec 11, 2023 •

edited

bx80 commented Dec 12, 2023

sgiehl left a comment

github-actions bot commented Dec 21, 2023

bx80 commented Dec 21, 2023

michalkleiner left a comment

Optimize segment SQL when segment subqueries are used #21016

Optimize segment SQL when segment subqueries are used #21016

Conversation

diosmosis commented Jul 14, 2023 • edited

Description:

Review

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

michalkleiner commented Dec 11, 2023 • edited

bx80 commented Dec 12, 2023

sgiehl left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 21, 2023

bx80 commented Dec 21, 2023

michalkleiner left a comment

Choose a reason for hiding this comment

diosmosis commented Jul 14, 2023 •

edited

michalkleiner commented Dec 11, 2023 •

edited