Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve page conversion attribution performance with pre-calculated field #20526

Merged
merged 99 commits into from
May 25, 2023

Conversation

bx80
Copy link
Contributor

@bx80 bx80 commented Mar 28, 2023

Description:

Fixes #20375

  • Adds a migration to create a new log_conversion.pageviews_before column (> 4.14.0 installs will already have this).
  • New conversions will now write the number of pageviews in the visit prior to the conversion into this field.
  • Adds a new console command core:calculate-conversion-pages to populate the new field for historic data.
  • The migration will automatically attempt to populate the new field for conversions the last 48hrs, providing the host is not *.matomo.cloud and there are less than 10,000 conversions in that period.
  • The pages / goals archiving query has been adjusted to use the new field instead of an expensive sub query.

Review

@bx80 bx80 added c: Performance For when we could improve the performance / speed of Matomo. Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. labels Mar 28, 2023
@bx80 bx80 added this to the 5.0.0 milestone Mar 28, 2023
@bx80 bx80 self-assigned this Mar 28, 2023
core/Tracker/Model.php Outdated Show resolved Hide resolved
core/Updates/5.0.0-b1.php Outdated Show resolved Hide resolved
core/Tracker/Model.php Outdated Show resolved Hide resolved
@bx80 bx80 added Needs Review PRs that need a code review and removed Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. labels Apr 14, 2023
@bx80 bx80 requested a review from sgiehl April 14, 2023 05:33
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments. Otherwise I was wondering if the current approach of simply replacing the archiving logic with a one based on the new column together with providing a command to recalculate the columns is suitable for us.

An alternative would be to keep the old possibility to calculate the values and decide whether to use the new or the old method after checking if the new column has values for all dates within the requested date range or not.
This would remove the requirements of having a command to update old datasets.

Using the current approach would otherwise mean that we would need to run the command for all cloud installs even if archives are already built. Otherwise we would have incorrect numbers after invalidating certain archives or if e.g. a new segment or custom report is created that triggers archiving for date ranges before the update.

@tsteur Which solution would you prefer?

plugins/Goals/Columns/PageviewsBefore.php Outdated Show resolved Hide resolved
core/Tracker/Visitor.php Outdated Show resolved Hide resolved
core/Updates/5.0.0-b1.php Outdated Show resolved Hide resolved
core/Updates/5.0.0-b1.php Outdated Show resolved Hide resolved
core/Updates/5.0.0-b1.php Outdated Show resolved Hide resolved
@tsteur
Copy link
Member

tsteur commented Apr 16, 2023

An alternative would be to keep the old possibility to calculate the values and decide whether to use the new or the old method after checking if the new column has values for all dates within the requested date range or not.
This would remove the requirements of having a command to update old datasets.

Using the current approach would otherwise mean that we would need to run the command for all cloud installs even if archives are already built. Otherwise we would have incorrect numbers after invalidating certain archives or if e.g. a new segment or custom report is created that triggers archiving for date ranges before the update.

On the cloud it shouldn't cause any issues unless someone invalidated historical data and this should be rare and is ok if then the data wouldn't show up correctly. All other data would be already archived using the old logic and the new logic would then start for newly tracked data. We may just have to populate the data for the last 24 hours to ensure the current day is correct. It would also apply to new segments/custom reports but we're only archiving 1 month back on Cloud so this wouldn't be too concerning and the report is potentially also not viewed too often. In the worst case we'd rather invalidate the data and re-archive it if someone asks.

I'm thinking checking all the column values be quite time consuming on the database and may take quite a bit of time. It would only apply for Cloud for invalidated data so it wouldn't be needed. On the Cloud we'd rather not have that extra query.

@bx80 bx80 force-pushed the m20375-pages-before-field branch 3 times, most recently from b374ea1 to 8604d9f Compare April 18, 2023 21:40
@bx80 bx80 force-pushed the m20375-pages-before-field branch from 8604d9f to 93bcbb9 Compare April 25, 2023 22:53
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label May 8, 2023
bx80 added 12 commits May 8, 2023 14:15
…, Visitor class and populate via the VisitRecognizer. Adjust referrer attribution and tests to use the immutable properties.
…ulate field on insert conversion, console command to calculate history
…version insert, calculate value from other fields instead of using a query, VisitInfo to provide access to original visit values.
@bx80 bx80 requested a review from sgiehl May 17, 2023 05:18
@bx80 bx80 requested a review from sgiehl May 18, 2023 23:24
@sgiehl
Copy link
Member

sgiehl commented May 19, 2023

@bx80 The system tests on PHP 7.2 are currently running into a segfault. I'm able to reproduce that segfault on my vm.
It seems caused by any change in this PR, as the tests are passing on 5.x-dev. I'll check which test is causing it so we can further investigate...

core/Updates/5.0.0-b1.php Outdated Show resolved Hide resolved
@sgiehl
Copy link
Member

sgiehl commented May 19, 2023

The segfault occurs within the test BackwardsCompatibility1XTest. That test insets a database dump of Matomo 1.x and then tries to perform a database update here:

$result = Fixture::updateDatabase();

This segfaults when it tries to perform the custom migration you have added. Commenting that part out lets the tests pass again. Don't have time right now to investigate that further...

@bx80
Copy link
Contributor Author

bx80 commented May 22, 2023

@sgiehl I was able to recreate the segfault locally with PHP 7.4.33, strangely it doesn''t occur at all with with PHP 8.2.5.

It seems to be caused by the custom migration using the plugins/Goals/Model/getActiveGoals() method, directly retrieving the goal ids with a simple query doesn't produce the segfault. I can't see any obvious reason why using that method would cause the error.

As part of debugging this I've also reworked the custom migration to more closely match the 4.0.0 custom migration by generating an array of BoundSql objects and then using the $updater->executeMigrations() method instead of executing the queries directly.

Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more comments.
In relation to this PR we should maybe create a new FAQ to explain what to do if the page goal metrics aren't shown for older dates when e.g. a new segment is created. Running a command won't be something people might expect.
Not sure if that is something we might also need to mention somewhere in the release blog post.

plugins/Goals/Commands/CalculateConversionPages.php Outdated Show resolved Hide resolved
core/Updates/5.0.0-b1.php Show resolved Hide resolved
@bx80 bx80 requested a review from sgiehl May 24, 2023 04:25
Copy link
Member

@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a last look through the code and imho it looks fine now.
Added a last minor suggestions, which I will apply myself before merging it.

@sgiehl sgiehl merged commit 20a7c0e into 5.x-dev May 25, 2023
@sgiehl sgiehl deleted the m20375-pages-before-field branch May 25, 2023 13:48
@sgiehl sgiehl removed the Needs Review PRs that need a code review label May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve page conversion attribution performance with pre-calculated field
3 participants