Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
sum_daily_nb_uniq_visitors calculations incorrect for some ranges in many API methods #4377
The sum_daily_nb_uniq_visitors is incorrect for certain data ranges when calling API methods using period=range. I've discovered this issue within UserCountry, DeviceDetection, UserSettings, and Provider methods. I suspect it exists in more, but my test have only included those so far.
To reproduce from demo.piwik.org:
Referencing the results for Germany in the following UserCountry reports:
returns nb_visits = 5380, sum_daily_nb_uniq_visitors = 4759
Add one day -> 2013-11-01 to 2013-12-01:
returns an empty result set!
Add another day -> 2013-11-01 to 2013-12-02:
nb_visits = 5696, sum_daily_nb_uniq_visitors = 289 (!?)
Clearly the 2nd API call returning nothing is a problem. With the 3rd, you can see that the increase of 2 days from the 1st call increased the visit count by a believable number, but the unique visitors total drop dramatically from 4759 to only 289. That is impossible.
I've also found another issue that may be related which is detailed in the following post.
They don't seem to follow the same pattern to generate the bad results. Still, they both involve irregular visit report numbers on API calls using period=range on some date ranges, so there may be a connection.
To reproduce the nb_visits error, look at the nb_visits value for Germany in the following links...
From Nov 1 to Nov 8, it reports 1490 visits.
From Nov 1 to Nov 15, it reports 1470 visits.
From Nov 1 to Nov 25, it reports 543 visits.
From Nov 1 to Nov 30, it reports 5380 visits.
As of 2.0.2, the second example of the sum_daily_nb_uniq_visitors tests no longer returns an empty result set. It is instead returning a value which is certainly incorrect. In fact it looks as if in the cases of testing ranges like 2013-11-01 to 2013-12-01 and 2013-11-01 to 2013-12-02, the value returned is the sum only for the dates in the month of December. Its as if it is ignoring the values from November entirely.
(re-writing here my post on http://forum.piwik.org/read.php?2,110025,110045 with more details )
I sometimes experience the very same problem on several of my websites tracked using Piwik 1.12, but it happend again right now when testing 2.1-rc3.
The workaround I use to fix the problem when I see it on a specific period, is to run an "invalidateArchivedReports" operation:
and then re-launch archive.php:
sudo -u apache php /.../misc/cron/archive.php --url=http://... --force-idsites=61 --force-all-periods
Note: the erratic metrics I had right now were not for sum_daily_nb_uniq_visitors but for visits and actions. The unique visitors metric was the same after the workaround. So here the number of visits was lower than the number of unique visitors because the number of visits was wrong.
It appears that the secondary issue I reported on in comment 3 has been resolved as of 2.1. The original issue, however, is still open. The results are a bit different than I described in the post, but the sum_daily_nb_uniq_visitors are still wrong, nonetheless.
In 9e86c79: refs #4377 make sure metrics like sum_daily_nb_uniq_visitors (which are renamed after aggregation) are summed correctly. If period is for instance 2014-04-01,2014-05-01 we will sum two periods. The month of April 2014 and May 1st. The dataTable of the month will already contain the renamed column (as it was aggregated before) whereas May 1st datatable will not contain the renamend column but the original. Both columns cannot be summed therefore and the original column will overwrite the value of the renamed column. Meaning sum_daily_nb_uniq_visitors is in this case always the value of May 1st
…s (which are renamed after aggregation) are summed correctly. If period is for instance 2014-04-01,2014-05-01 we will sum two periods. The month of April 2014 and May 1st. The dataTable of the month will already contain the renamed column (as it was aggregated before) whereas May 1st datatable will not contain the renamend column but the original. Both columns cannot be summed therefore and the original column will overwrite the value of the renamed column. Meaning sum_daily_nb_uniq_visitors is in this case always the value of May 1st