Edge case: each page is a new visit #1916

mattab opened this Issue Dec 23, 2010 · 13 comments

3 participants

Piwik Open Source Analytics member

When the cookie is somehow read only, old timestamps will be read and new visits generated every pageview for these buggy requests. This could maybe be caused by a Adblock type extension blocking writes to the cookie, but still passing it to the request.


$host = "piwik-domain.com";

$request = "GET /piwik.php?idsite=2&rec=1&url=http%3A%2F%2Fwww.domain.de%2F&res=1280x1024&h=7&m=57&s=51&cookie=1&urlref=http%3A%2F%2Fwww.domain.de%2F&rand=0.6439636907182041&pdf=1&qt=1&realp=0&wma=1&dir=0&fla=1&java=1&gears=0&ag=1&action_name=Some%20Action HTTP/1.1
Host: $host
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)
Connection: close
Referer: http://www.refdomain.de/somepage
Cookie: piwik_visitor=[INSERT COOKIE DATA]


$fsock_fp = fsockopen($host, 80, $errno, $errstr, 10);
fwrite($fsock_fp, $request);

echo '<pre>';
echo $request;
while (!feof($fsock_fp))
    echo fgets($fsock_fp, 128);
echo '</pre>';


Piwik Open Source Analytics member

Maybe a solution would be to consolidate the visits at the beginning of archiving: deleting all visits from the same visitor that happen within 30min ranges.


We should be able to fix this in #409.


It's possible this is caused by bots (e.g., web scrapers). On the initial request, the bot saves cookies to its cookie jar, and on subsequent requests, sends the cookies without updating the cookie jar.

Another possibility is that the Tracker has gotten slower, and that this is a duplicate of #1108, experiencing the race condition where:

  • user browses page A, sending cookie X1
  • before server can respond with updated cookie X2, user browses page B, resending cookie X1

We can mitigate this by calling $this->end() before Piwik_Common::runScheduledTasks().

When we implement #409, we'll only be sending idcookie, so $this->end() can be called even sooner, e.g., as soon as we've confirmed it's a returning visitor. (This will also improve perceived tracker responsiveness.)


Do you still have problems to reproduce this issue?
I am willing to give you ssh access to my server to analyse this live on an affected machine.

Piwik Open Source Analytics member

awesome, I can replicate so it's OK. stay tuned..

vipsoft, I'm going to force the tracker to check the cookie value on each request. This will be overhead compared to current algorithm, but that's the price to pay for accuracy when bad data is coming in.

Then we'll be pretty close to have 1st party cookie only, since the code will be based on the unique ID.

Piwik Open Source Analytics member

Could also be triggered in use case:

  • go to homepage,
  • before Piwik loads (and with a more than 30min old piwik cookie)...
  • ... middle click and open many other pages

Each piwik request will receive a page view with the old cookie until the new one is set in the browser cookie jar.

Piwik Open Source Analytics member

(In [3634]) Fixes #1916
Now always checking in the DB if we saw the visitor earlier. The cookie also becomes much smaller.
Renamed the setting enable_detect_unique_visitor_using_settings now called trust_visitors_cookies as it is different logic, and should only be enabled in intranet where IP is same for all users.
This will also help getting 1st party cookie implemented Refs #409


Can you provide a patch or will you release a new update soon?

Piwik Open Source Analytics member

Please try the new beta at: http://builds.piwik.org/piwik-1.1.2b1.zip

let me know if it fixes the issue completely :)


Thanks matt.

I installed the version, let's see what happens. I will report later the day if it worked out.

FYI: I got a JS Alert when I first opened the page :)

There is no/bad markup for form tag

Dunno if this has something to do with Piwik. However it just appeared once, now it's gone even on page reload.


Matt: Seems to work like a charm with 1.1.2b1! Great work, thanks for your fast help.

I guess the wrong counts cannot be undone in db, right? So my daily (doesn't really matter) but also weekly and monthly data is not usable for analysis anymore!?

Or might there be a way to re-parse the data of the last day?


Replying to vipsoft:

Another possibility is that the Tracker has gotten slower, and that this is a duplicate of #1108, experiencing the race condition where:

  • user browses page A, sending cookie X1
  • before server can respond with updated cookie X2, user browses page B, resending cookie X1

What happened to us yesterday and today seconds this hypothesis :
After upgrade to 1.1, the issue appeared (visit miscount). Maybe the tracker code got slower, because, indeed, our piwik server load increased.
After upgrade to 1.1.2b1 issue disappeared.
The issue caused the most severe spikes on sites with the most returning visitors, and sites with high number of actions / high action frequency (tracked ajax requests, for an example)

I second awesome's question, is there a way to rebuild visits, and repair yesterday stats (we can code something and contribute it if you give us some hints) ?

Piwik Open Source Analytics member

I haven't tested (WARNING) but a query like this might work:

delete from piwik_log_visit
where visit_server_date = $THE_DATE
and where visitor_idcookie IN (
SELECT visitor_idcookie from piwik_log_visit 
where visit_server_date = $THE_DATE
group by visitor_idcookie
having count(*)> 1

This will delete all visits from visitors beyond their first visit on $THE_DATE and therefore keep only one visit per visitor on that day

Please test on a test dataset before applying to your real one (or use on a copy of the table)

@mattab mattab added this to the Piwik 1.2 milestone Jul 8, 2014
@mattab mattab self-assigned this Jul 8, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment