Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Campaign summary statistics (performance boost and date range specificity) #6651

Open
wants to merge 43 commits into
base: staging
from

Conversation

@heathdutton
Copy link
Member

heathdutton commented Sep 27, 2018

Q A
Bug fix?
New feature? Yes
Automated tests included?
Related user documentation PR URL
Related developer documentation PR URL
Issues addressed (#s or URLs)
BC breaks? No
Deprecations? No

Description:

TL;DR Make big campaigns practical.

If you have a campaign with 20 events, and 1700 leads a day, that's over a million new rows, per month and if the campaign is still active the queries on the campaign view page will take several minutes on a good day.

Even with various recent improvements, the problem is still growing exponentially as you scale up. This PR aims to fix the problem permanently by summarizing historical campaign data by hours (as services like Google Analytics do). The view will behave exactly the same (same charts, same tabular data, no delays), but will have drastically fewer rows to crunch through to generate the data (no joins, just one flat table with some sums).

A command line function mautic:campaign:summarize is provided to perform an optional one-time backfill of the summary data. This might be needed by someone who wants to turn on campaign summary data and needs it to be retroactive. The command runs backward through time (current data is usually more pertinent), filling in missing summary data till it's done. It handles about 10.5 million campaign event log entries per hour in my testing. It's not super speedy with 200+mil. events, but it's a one-time thing. This command can also be used to true-up data in case you are experiencing exceptions within your campaign cron tasks.

I've merged in and refactored work from #6021 (Optionally display campaign data by date range) for a number of reasons. This allows you to enable "Use date range" so that all data on the Campaign view respects the date range provided. This makes the view faster and more useful for long-running campaigns. This also adds support for query caching (for whoever may use that feature by modifying their config).

We've been running a variation of this PR (as a patch) in production via mautic-eb since Sept 2018. It's pretty critical for us, as we have campaigns with ~28 million leads in them. Viewing such a campaign without this is literally impossible.

Steps to test this PR:

  1. Go to https://mautibox.com/6651/s/config/edit and Campaign Settings.
  2. Enable "Use summary statistics" and "Use date range"
  3. Progress a contact through any campaign.
  4. Go to the campaign https://mautibox.com/6651/s/campaigns and view the Campaign Statistics chart. It should display the events exactly as you would expect.
  5. View the Campaign Actions/Conditions/Decisions tabs (if there are any) and they should show the same results they always have as well.
  6. Pick a date range with no contacts and hit apply. You'll see the campaign Actions/Conditions/Decisions/Contacts tabs disappear or contain no content. These tabs now respect the date range provided.
@heathdutton heathdutton changed the title Fix campaign view performance (WIP) Campaign summary statistics (WIP) Oct 1, 2018
heathdutton added 3 commits Oct 2, 2018
Before this the count query was still being ran, which in many cases is
a huge overhead when there are millions of leads.
We already have the lead count, and that extra lead count costs us many
seconds to query.  In addition supporting a dynamic orderBy allows us
to avoid a tmp table file sort on millions of rows, but we’ll get
effectively the same intended result, being the most recent leads added
to the campaign in 3ms vs 89s.
@heathdutton

This comment has been minimized.

Copy link
Member Author

heathdutton commented Oct 2, 2018

In my testing just now, I took a page that takes 10~15 minutes to load down to ~8s. I'd consider that an improvement. If we merge in (and refactor) #6021 and load contacts by ajax we're talking about a complete transformation.

heathdutton added 6 commits Oct 2, 2018
After talking with John, I realized that if there is a SQL error at
high load, this would be one way to remedy the issue by cron.
This merges in and refactors #6021 and includes some basic query
caching for those who enable APCu or similar at the ORM level.
heathdutton added 2 commits Oct 9, 2018
@heathdutton heathdutton changed the title Campaign summary statistics (WIP) Campaign summary statistics (performance boost and date range specificity) Oct 9, 2018
@heathdutton heathdutton removed the WIP label Oct 10, 2018
heathdutton added 8 commits Oct 10, 2018
Something recently dropped this.
Per day was taking upwards of 1.2hrs per day in production while the
data is in motion, which can cause a table lock. By hour is slower
overall, but safer.
…re/campaign-summary
@heathdutton heathdutton added this to Code Review (2 required) in Mautic 2 Dec 6, 2018
@heathdutton heathdutton removed this from Code Review (2 required) in Mautic 2 Dec 6, 2018
heathdutton added 2 commits Dec 10, 2018
@imihandstand

This comment has been minimized.

Copy link

imihandstand commented Dec 12, 2018

Sorry for the question, I'm very new to github: What is the easiest way to implement this code-change? Do I have to manually change all 19 files on my server or is there a simpler way? Thank you!

@heathdutton

This comment has been minimized.

Copy link
Member Author

heathdutton commented Dec 12, 2018

@imihandstand if you are using a recent version like 2.14.2+ you may be able to apply this as a git patch. Go into your mautic folder (not in production) and run curl -L https://github.com/mautic/mautic/pull/6651.diff | git apply -v

That will overlay these code changes with your mautic install.
If you just want to test this code you can go to http://mautibox.com/6651

If you want to be on the safe side, wait till version 2.16.0 which will likely have this in it.

heathdutton added 2 commits Mar 26, 2019
# Conflicts:
#	app/bundles/CampaignBundle/Controller/CampaignController.php
#	app/bundles/CampaignBundle/Entity/LeadEventLogRepository.php

We may need later refactoring for pending counts.
@heathdutton

This comment has been minimized.

Copy link
Member Author

heathdutton commented Mar 26, 2019

I've resolved conflicts, but note that pending counts are recalculated as of 2.15.1 beta (which is good). Giving pending figures is impractical when filtering by date, so I've disabled that for now when summary/dates are enabled. May revisit this in the future, but if not, just keep in mind that isn't supported with this optional feature.

@heathdutton heathdutton requested a review from kuzmany Mar 26, 2019
scottshipman and others added 8 commits Apr 9, 2019
This will allow one to use this as an hourly cron job via --rebuild to
correct data should one have locks on mass inserts.
[ENG-890] save summaries at end of kickoff executioner or when count …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.