Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some periods may not be archived at all #9468

Closed
quba opened this Issue Jan 4, 2016 · 17 comments

Comments

Projects
None yet
3 participants
@quba
Copy link
Contributor

commented Jan 4, 2016

In one of recent Piwik versions, a feature was introduced that checks if there are visits for given site ID between now and last archiving time, e.g.
- tracking data found for website id 52 (between 2016-01-04 00:36:23 and 2016-01-04 02:35:47)

But in case a site tracked visits only for a short period of time, it's possible that archiving won't run for a period bigger than day and this will lead to empty archives after those temporary are deleted.

I remember that in the past there was also a check that forced archiving for sites that reached midnight in their timezone. Currently it seems like this check is missing.

@mattab

This comment has been minimized.

Copy link
Member

commented Jan 15, 2016

Hi @quba thanks for the report. Do you maybe have an idea how to reproduce this issue?

@mattab mattab added this to the 2.16.1 milestone Jan 15, 2016

@mattab mattab added the Bug label Jan 15, 2016

@quba

This comment has been minimized.

Copy link
Contributor Author

commented Jan 15, 2016

E.g. track a few visit today some time, e.g. 4 p.m. and then stop tracking. The temporary archives will be purged tomorrow and no archiving will start tomorrow as there won't be new visits between last archiving and tomorrow.

@quba

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2016

Reoccured today. The workaround is to add --force-all-websites parameter to the archiving script.

@tsteur

This comment has been minimized.

Copy link
Member

commented Apr 7, 2016

Do you maybe have the log output of that archiving?

@tsteur

This comment has been minimized.

Copy link
Member

commented Apr 7, 2016

Also are there multiple archiver running at the same time?

@tsteur

This comment has been minimized.

Copy link
Member

commented Apr 7, 2016

Can you also post all options that are used to run the archiver?

@quba

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2016

I remember that in the past there was also a check that forced archiving for sites that reached midnight in their timezone. Currently, it seems like this check is missing.

Everything's fine with this archiving (minimum set of params, archiving once per hour). Data is there, but only temporary archives (deleted after some period of time). If after midnight there are no new visits, there's no possibility to run archiving and to create valid archives.

@tsteur

This comment has been minimized.

Copy link
Member

commented Apr 7, 2016

Can you answer the other question? There are still checks whether a website has been processed since midnight but thinking it could be eg related to having multiple archives running at the same time.

@tsteur

This comment has been minimized.

Copy link
Member

commented Apr 7, 2016

I couldn't really reproduce it so far I would say but it's hard to reproduce in general. I basically set the time when what ran etc hard in the code to reproduce it a bit more easily.

The only thing I noticed is that https://github.com/piwik/piwik/blame/2.16.0/core/CronArchive.php#L1219 here the array_diff should be maybe an array_intersect. $websiteDayHasFinishedSinceLastRun returns a list of all websites that have finished since last run. In my example all websites were finished since last run so $websiteDayHasFinishedSinceLastRun = array(1,2,3, ..., 100);. I wanted to archive all websites so $websiteIds = array(1,2,3,..., 100);. The diff was an empty array and it resulted in $this->websiteDayHasFinishedSinceLastRun = array() but I believe it should have been $this->websiteDayHasFinishedSinceLastRun = array(1,2,3,..., 100);. Not sure if the current behaviour is expected or not. I will try to make this change and see what the tests say. If this is buggy, quite a lot should actually change in terms of what will be reprocessed when...

tsteur added a commit that referenced this issue Apr 7, 2016

@mattab

This comment has been minimized.

Copy link
Member

commented Apr 8, 2016

The only thing I noticed is that https://github.com/piwik/piwik/blame/2.16.0/core/CronArchive.php#L1219 here the array_diff should be maybe an array_intersect

Great find Thomas. Bug was here since the beginning 4+ years ago 0508f2c

@tsteur

This comment has been minimized.

Copy link
Member

commented Apr 8, 2016

I issued a PR. Possible that it fixes the issue but not 100% sure

@quba

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2016

Can you answer the other question?

Which one? I don't have access to archiving logs anymore. There's only one archiving process. Minimum set of options.

Just to make sure, I'll try to write down the use case (to confirm that this bug fix will cover this one).

  • (archiving) no visits for siteID 1 - archiving skipped
  • new visit for siteID 1
  • (archiving) new visits for siteID 1 - archiving done, visits visible on graphs, archives marked as temporary (not closed period)
  • (archiving) no visits for siteID 1 - archiving skipped
  • (archiving) no visits for siteID 1 - archiving skipped
  • midnight
  • (archiving) no visits for siteID 1 - archiving skipped, scheduled task removes temporary archives
  • no visits on graphs for the previous day
@tsteur

This comment has been minimized.

Copy link
Member

commented Apr 8, 2016

I was especially wondering whether there are multiple archivers running at the same time. The description sounds like the fix in #10022 might help

@quba

This comment has been minimized.

Copy link
Contributor Author

commented Apr 11, 2016

Great find Thomas. Bug was here since the beginning 4+ years ago 0508f2c

I can see that it's merged now. If it was there since the beginning, maybe it's worth adding a test to not regress?

@quba

This comment has been minimized.

Copy link
Contributor Author

commented Apr 11, 2016

Additionally, the change was merged, not added to the changelog, available in 2.16.1. I think I don't get it really...

@mattab

This comment has been minimized.

Copy link
Member

commented Apr 11, 2016

@quba sorry we simply forgot to close this issue which should now be fixed!

Yes we should add a test. we discussed it on friday and decided the CronArchiver should be refactored to allow easier testing. In general, this code needs some love and a set of tests. I'll create an issue and ref this one.

@mattab mattab closed this Apr 11, 2016

@mattab mattab modified the milestones: 2.16.1, 2.16.x (LTS) Apr 11, 2016

@mattab

This comment has been minimized.

Copy link
Member

commented Apr 11, 2016

just added #9468 Some periods may not be archived at all [by @tsteur] to the changelog, thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.