Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some periods may not be archived at all #9468

Closed
quba opened this issue Jan 4, 2016 · 17 comments
Closed

Some periods may not be archived at all #9468

quba opened this issue Jan 4, 2016 · 17 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@quba
Copy link
Contributor

quba commented Jan 4, 2016

In one of recent Piwik versions, a feature was introduced that checks if there are visits for given site ID between now and last archiving time, e.g.
- tracking data found for website id 52 (between 2016-01-04 00:36:23 and 2016-01-04 02:35:47)

But in case a site tracked visits only for a short period of time, it's possible that archiving won't run for a period bigger than day and this will lead to empty archives after those temporary are deleted.

I remember that in the past there was also a check that forced archiving for sites that reached midnight in their timezone. Currently it seems like this check is missing.

@mattab
Copy link
Member

mattab commented Jan 15, 2016

Hi @quba thanks for the report. Do you maybe have an idea how to reproduce this issue?

@mattab mattab added this to the 2.16.1 milestone Jan 15, 2016
@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Jan 15, 2016
@quba
Copy link
Contributor Author

quba commented Jan 15, 2016

E.g. track a few visit today some time, e.g. 4 p.m. and then stop tracking. The temporary archives will be purged tomorrow and no archiving will start tomorrow as there won't be new visits between last archiving and tomorrow.

@quba
Copy link
Contributor Author

quba commented Mar 14, 2016

Reoccured today. The workaround is to add --force-all-websites parameter to the archiving script.

@tsteur
Copy link
Member

tsteur commented Apr 7, 2016

Do you maybe have the log output of that archiving?

@tsteur
Copy link
Member

tsteur commented Apr 7, 2016

Also are there multiple archiver running at the same time?

@tsteur
Copy link
Member

tsteur commented Apr 7, 2016

Can you also post all options that are used to run the archiver?

@quba
Copy link
Contributor Author

quba commented Apr 7, 2016

I remember that in the past there was also a check that forced archiving for sites that reached midnight in their timezone. Currently, it seems like this check is missing.

Everything's fine with this archiving (minimum set of params, archiving once per hour). Data is there, but only temporary archives (deleted after some period of time). If after midnight there are no new visits, there's no possibility to run archiving and to create valid archives.

@tsteur
Copy link
Member

tsteur commented Apr 7, 2016

Can you answer the other question? There are still checks whether a website has been processed since midnight but thinking it could be eg related to having multiple archives running at the same time.

@tsteur
Copy link
Member

tsteur commented Apr 7, 2016

I couldn't really reproduce it so far I would say but it's hard to reproduce in general. I basically set the time when what ran etc hard in the code to reproduce it a bit more easily.

The only thing I noticed is that https://github.com/piwik/piwik/blame/2.16.0/core/CronArchive.php#L1219 here the array_diff should be maybe an array_intersect. $websiteDayHasFinishedSinceLastRun returns a list of all websites that have finished since last run. In my example all websites were finished since last run so $websiteDayHasFinishedSinceLastRun = array(1,2,3, ..., 100);. I wanted to archive all websites so $websiteIds = array(1,2,3,..., 100);. The diff was an empty array and it resulted in $this->websiteDayHasFinishedSinceLastRun = array() but I believe it should have been $this->websiteDayHasFinishedSinceLastRun = array(1,2,3,..., 100);. Not sure if the current behaviour is expected or not. I will try to make this change and see what the tests say. If this is buggy, quite a lot should actually change in terms of what will be reprocessed when...

tsteur added a commit that referenced this issue Apr 7, 2016
@mattab
Copy link
Member

mattab commented Apr 8, 2016

The only thing I noticed is that https://github.com/piwik/piwik/blame/2.16.0/core/CronArchive.php#L1219 here the array_diff should be maybe an array_intersect

Great find Thomas. Bug was here since the beginning 4+ years ago 0508f2c

@tsteur
Copy link
Member

tsteur commented Apr 8, 2016

I issued a PR. Possible that it fixes the issue but not 100% sure

@quba
Copy link
Contributor Author

quba commented Apr 8, 2016

Can you answer the other question?

Which one? I don't have access to archiving logs anymore. There's only one archiving process. Minimum set of options.

Just to make sure, I'll try to write down the use case (to confirm that this bug fix will cover this one).

  • (archiving) no visits for siteID 1 - archiving skipped
  • new visit for siteID 1
  • (archiving) new visits for siteID 1 - archiving done, visits visible on graphs, archives marked as temporary (not closed period)
  • (archiving) no visits for siteID 1 - archiving skipped
  • (archiving) no visits for siteID 1 - archiving skipped
  • midnight
  • (archiving) no visits for siteID 1 - archiving skipped, scheduled task removes temporary archives
  • no visits on graphs for the previous day

@tsteur
Copy link
Member

tsteur commented Apr 8, 2016

I was especially wondering whether there are multiple archivers running at the same time. The description sounds like the fix in #10022 might help

@quba
Copy link
Contributor Author

quba commented Apr 11, 2016

Great find Thomas. Bug was here since the beginning 4+ years ago 0508f2c

I can see that it's merged now. If it was there since the beginning, maybe it's worth adding a test to not regress?

@quba
Copy link
Contributor Author

quba commented Apr 11, 2016

Additionally, the change was merged, not added to the changelog, available in 2.16.1. I think I don't get it really...

@mattab
Copy link
Member

mattab commented Apr 11, 2016

@quba sorry we simply forgot to close this issue which should now be fixed!

Yes we should add a test. we discussed it on friday and decided the CronArchiver should be refactored to allow easier testing. In general, this code needs some love and a set of tests. I'll create an issue and ref this one.

@mattab mattab closed this as completed Apr 11, 2016
@mattab mattab modified the milestones: 2.16.1, 2.16.x (LTS) Apr 11, 2016
@mattab
Copy link
Member

mattab commented Apr 11, 2016

just added #9468 Some periods may not be archived at all [by @tsteur] to the changelog, thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

3 participants