Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiving is taking too much time for no apparent reason #20457

Closed
candaj2 opened this issue Mar 13, 2023 · 4 comments
Closed

Archiving is taking too much time for no apparent reason #20457

candaj2 opened this issue Mar 13, 2023 · 4 comments
Labels
wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.

Comments

@candaj2
Copy link

candaj2 commented Mar 13, 2023

Hello,

We are trying to speed up our archiving process who can take up to 36+ hours to execute.
We have one site, in particular which is taking itself 32+ hours, without any apparent reason.

Some data about our matomo instance:
On premise
Version: 3.14.1
php version: 7.1.5

1008 websites
daily numbers:
~ 601K visits
~ 3M pageviews
~ 3.8M actions

Mysql:
CPU & Memory: 12CPU 64Gb
version: 5.7.18
Site (ondisk): 1.7T

The website in question itself is around 10.6K visits for 29.5K pageviews. Not that much in the total.
Here's the logs from the archiving process for that particular website:

INFO [2023-03-07 11:08:44] 27509 - tracking data found for website id XXX since 2023-03-07 07:09:26 UTC (since the last successful archiving)
INFO [2023-03-07 11:08:45] 27509 Will pre-process for website id = XXX, period = day, date = last2
INFO [2023-03-07 11:08:45] 27509 - pre-processing all visits
INFO [2023-03-07 17:16:16] 27509 Archived website id = XXX, period = day, 0 segments, 17574 visits in last 3 days, 5674 visits today, Time elapsed: 22051.280s
INFO [2023-03-07 17:16:16] 27509 Will pre-process for website id = XXX, period = week, date = last2
INFO [2023-03-07 17:16:16] 27509 - pre-processing all visits
INFO [2023-03-07 22:05:26] 27509 Archived website id = XXX, period = week, 0 segments, 73281 visits in last 2 weeks, 21717 visits this week, Time elapsed: 17350.437s
INFO [2023-03-07 22:05:27] 27509 Will pre-process for website id = XXX, period = month, date = last2
INFO [2023-03-07 22:05:27] 27509 - pre-processing all visits
INFO [2023-03-08 04:03:27] 27509 Archived website id = XXX, period = month, 0 segments, 282077 visits in last 2 months, 52470 visits this month, Time elapsed: 21481.344s
INFO [2023-03-08 04:03:28] 27509 - Will invalidate archived reports for 2023-03-07 for following websites ids: XXXX,XXX,XX
INFO [2023-03-08 04:03:30] 27509 Will pre-process for website id = XXX, period = year, date = last2
INFO [2023-03-08 04:03:30] 27509 - pre-processing all visits
INFO [2023-03-08 08:23:21] 27509 Archived website id = XXX, period = year, 0 segments, 0 visits in last 3 years, 0 visits this year, Time elapsed: 15593.580s
INFO [2023-03-08 08:23:22] 27509 Will pre-process for website id = XXX, period = range, date = last7
INFO [2023-03-08 08:23:22] 27509 - pre-processing all visits
INFO [2023-03-08 14:14:02] 27509 Archived website id = XXX, period = range, 0 segments, 44061 visits in last 7 ranges, 44061 visits this range, Time elapsed: 21041.141s
INFO [2023-03-08 14:14:03] 27509 Will pre-process for website id = XXX, period = range, date = previous7
INFO [2023-03-08 14:14:03] 27509 - pre-processing all visits
INFO [2023-03-08 14:14:23] 27509 Archived website id = XXX, period = range, 0 segments, 52530 visits in previous 7 ranges, 52530 visits this range, Time elapsed: 20.967s
INFO [2023-03-08 14:14:24] 27509 Will pre-process for website id = XXX, period = range, date = previous30
INFO [2023-03-08 14:14:24] 27509 - pre-processing all visits
INFO [2023-03-08 14:14:53] 27509 Archived website id = XXX, period = range, 0 segments, 245985 visits in previous 30 ranges, 245985 visits this range, Time elapsed: 30.017s
INFO [2023-03-08 14:14:54] 27509 Will pre-process for website id = XXX, period = range, date = last12
INFO [2023-03-08 14:14:54] 27509 - pre-processing all visits
INFO [2023-03-08 19:33:04] 27509 Archived website id = XXX, period = range, 0 segments, 0 visits in last 30 ranges, 0 visits this range, Time elapsed: 19091.357s
INFO [2023-03-08 19:33:04] 27509 Archived website id = XXX, 8 API requests, Time elapsed: 116660.142s [212/1003 done]

The archiving process on this one took 143483.826s (~40h) + ~1h of scheduled task.

My underlying question is: What is, for a website that does not seems that big, is taking so much time ?
I'm trying to parallelize the archiving process for that website (and some others) does not impact the others but I need to know what is taking time in the processing in order to separate it automatically.

Don't hesitate to ask for other metrics if you have any idea.

@candaj2 candaj2 added Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. To Triage An issue awaiting triage by a Matomo core team member labels Mar 13, 2023
@tsteur
Copy link
Member

tsteur commented Mar 13, 2023

Quick input on this one. It may help to run OPTIMIZE TABLE matomo_log_visit and similarly on all other ...log_ tables such as log_action and log_link_visit_action. Depending on your table prefix your table names may start with matomo_ or piwik_ or no prefix it all for example.

In your config/config.ini.php if you ever tweaked it you may want to check there isn't any setting configured starting with always_archive_data_. If you have tweaked other config settings this may be good to know as well. You may want to share the information of Matomo's system report see https://matomo.org/faq/troubleshooting/how-do-i-find-and-copy-the-system-check-in-matomo-on-premise/

@candaj2
Copy link
Author

candaj2 commented Mar 14, 2023

Hello @tsteur
Thank you for your answer ! I'll take a look to the OPTIMIZE TABLE, you are right, it will not harm to do it.

For the config, in matomo 3 we cannot copy as simply as in matomo 4 but no tweaked as been done except on the sample rate for session recording.
On system report everything is green flaged except for "shell_exec" which is disabled for security purposes by our company. Is the shell_exec really help archiving faster ?

I'll add also that the VM on which the archiving runs is dedicated to that process with plainty of room and memory and php memory_limit is high enough to handle almost everything (51Gb)

@michalkleiner
Copy link
Contributor

Hello @candaj2,
did you have any luck speeding things up through trying to optimize the db tables?
Any chance to get any extra logging information from MySQL if there's any slow queries or performance spikes or other profiling information?

@michalkleiner michalkleiner added Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users. and removed To Triage An issue awaiting triage by a Matomo core team member labels May 2, 2023
@sgiehl
Copy link
Member

sgiehl commented Oct 9, 2023

Closing, as there hadn't been any feedback in months.

@sgiehl sgiehl closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2023
@sgiehl sgiehl added answered For when a question was asked and we referred to forum or answered it. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it. and removed Waiting for user feedback Indicates the Matomo team is waiting for feedback from the author or other users. Potential Bug Something that might be a bug, but needs validation and confirmation it can be reproduced. answered For when a question was asked and we referred to forum or answered it. labels Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Projects
None yet
Development

No branches or pull requests

4 participants