You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
invalidFor issues or pull requests that are no longer relevant to Matomo core.TaskIndicates an issue is neither a feature nor a bug and it's purely a "technical" change.
On the next archiver run we create the date parameter when archiving based on lastRunKey=time(). However, the problem is that in step 1 potentially not all periods were actually archived. Many could have been skipped because we are for example already archiving a different period (when we want to archive for example month, but a week period is currently being processed, than we skip month but AFAIK still return $success = true).
This could lead to randomly some periods being skipped in some edge conditions. Like it might use &period=day&date=last2 when it was actually last finished correctly only 3 days ago and then maybe not fully process 3 days ago.
I haven't actually debugged this, but seeing the code there I very much would expect this behaviour and this was introduced by recent changes we added this/last year where we started skipping some periods for example if another period is in progress. We might skip entire segments etc.
We archive max 1 segment at once for example (or 2 or 3 ...)
This means we execute segment 1
1 hour later we want to launch the next segment archive
We need to check if some other period is in progress/segment and shortly before we launch that segment abort that request... currently we would still return $success=true AFAIK when it should be $success=false
A possible problem is that the more segments you have for example, the more likely it is that at some point a segment or period will be skipped, therefore $success = false and therefore on every archive run we do not update the lastRunKey leading eventually to things like last100 or last200 making archiving very slow.
I reckon best be to remove all logic around lastRunKey ideally but not sure what better logic to use and what is feasible.
instead of detecting one lastProcessedTime for all periods and dates, ideally we would detect the lastProcessedTime on demand shortly before starting the archiving of a period/segment... by default we would use last3 or so (or whatever the default is) and if any periods were invalidated since then, we would need to adjust behaviour and use better date or launch multiple archive queries. Instead of last52 we could eg use last3 and then another query where we specifically archive the invalidated date.
I know I'm not explaining it well, it's a difficult topic. Happy to provide more insights.
The text was updated successfully, but these errors were encountered:
invalidFor issues or pull requests that are no longer relevant to Matomo core.TaskIndicates an issue is neither a feature nor a bug and it's purely a "technical" change.
We are using the lastRunKey https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L280-L284 to determine when an archiver was last run for day or period.
We basically:
On the next archiver run we create the
date
parameter when archiving based onlastRunKey=time()
. However, the problem is that in step 1 potentially not all periods were actually archived. Many could have been skipped because we are for example already archiving a different period (when we want to archive for example month, but a week period is currently being processed, than we skip month but AFAIK still return $success = true).This could lead to randomly some periods being skipped in some edge conditions. Like it might use
&period=day&date=last2
when it was actually last finished correctly only 3 days ago and then maybe not fully process 3 days ago.I haven't actually debugged this, but seeing the code there I very much would expect this behaviour and this was introduced by recent changes we added this/last year where we started skipping some periods for example if another period is in progress. We might skip entire segments etc.
From what I see we already return
$success= false
here https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L1006-L1008 which should prevent the lastRunKey from being written, but this seems not yet implemented for long running segment archives when dynamically aborting a request shortly before it is supposed to be executed in https://github.com/matomo-org/matomo/blob/3.12.0/core/CronArchive.php#L1013-L1017This code is triggered when
A possible problem is that the more segments you have for example, the more likely it is that at some point a segment or period will be skipped, therefore
$success = false
and therefore on every archive run we do not update thelastRunKey
leading eventually to things likelast100
orlast200
making archiving very slow.I reckon best be to remove all logic around
lastRunKey
ideally but not sure what better logic to use and what is feasible.This is also related to https://github.com/matomo-org/matomo/pull/14937/files#r343304194 and #15117
instead of detecting one
lastProcessedTime
for all periods and dates, ideally we would detect thelastProcessedTime
on demand shortly before starting the archiving of a period/segment... by default we would uselast3
or so (or whatever the default is) and if any periods were invalidated since then, we would need to adjust behaviour and use better date or launch multiple archive queries. Instead of last52 we could eg uselast3
and then another query where we specifically archive the invalidated date.I know I'm not explaining it well, it's a difficult topic. Happy to provide more insights.
The text was updated successfully, but these errors were encountered: