Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Edge case: each page is a new visit #1916

Closed
mattab opened this Issue · 13 comments

3 participants

Matthieu Aubry Anthon Pang Anonymous Piwik user
Matthieu Aubry
Owner

When the cookie is somehow read only, old timestamps will be read and new visits generated every pageview for these buggy requests. This could maybe be caused by a Adblock type extension blocking writes to the cookie, but still passing it to the request.


<?php

$host = "piwik-domain.com";

$request = "GET /piwik.php?idsite=2&rec=1&url=http%3A%2F%2Fwww.domain.de%2F&res=1280x1024&h=7&m=57&s=51&cookie=1&urlref=http%3A%2F%2Fwww.domain.de%2F&rand=0.6439636907182041&pdf=1&qt=1&realp=0&wma=1&dir=0&fla=1&java=1&gears=0&ag=1&action_name=Some%20Action HTTP/1.1
Host: $host
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)
Connection: close
Referer: http://www.refdomain.de/somepage
Cookie: piwik_visitor=[INSERT COOKIE DATA]

";

$fsock_fp = fsockopen($host, 80, $errno, $errstr, 10);
fwrite($fsock_fp, $request);

echo '<pre>';
echo $request;
while (!feof($fsock_fp))
{
    echo fgets($fsock_fp, 128);
}
echo '</pre>';

fclose($fsock_fp);

?> 
Matthieu Aubry
Owner

Maybe a solution would be to consolidate the visits at the beginning of archiving: deleting all visits from the same visitor that happen within 30min ranges.

Anthon Pang
Collaborator

We should be able to fix this in #409.

Anthon Pang
Collaborator

It's possible this is caused by bots (e.g., web scrapers). On the initial request, the bot saves cookies to its cookie jar, and on subsequent requests, sends the cookies without updating the cookie jar.

Another possibility is that the Tracker has gotten slower, and that this is a duplicate of #1108, experiencing the race condition where:

  • user browses page A, sending cookie X1
  • before server can respond with updated cookie X2, user browses page B, resending cookie X1

We can mitigate this by calling $this->end() before Piwik_Common::runScheduledTasks().

When we implement #409, we'll only be sending idcookie, so $this->end() can be called even sooner, e.g., as soon as we've confirmed it's a returning visitor. (This will also improve perceived tracker responsiveness.)

Anonymous Piwik user

Do you still have problems to reproduce this issue?
I am willing to give you ssh access to my server to analyse this live on an affected machine.

Matthieu Aubry
Owner

awesome, I can replicate so it's OK. stay tuned..

vipsoft, I'm going to force the tracker to check the cookie value on each request. This will be overhead compared to current algorithm, but that's the price to pay for accuracy when bad data is coming in.

Then we'll be pretty close to have 1st party cookie only, since the code will be based on the unique ID.

Matthieu Aubry
Owner

Could also be triggered in use case:

  • go to homepage,
  • before Piwik loads (and with a more than 30min old piwik cookie)...
  • ... middle click and open many other pages

Each piwik request will receive a page view with the old cookie until the new one is set in the browser cookie jar.

Matthieu Aubry
Owner

(In [3634]) Fixes #1916
Now always checking in the DB if we saw the visitor earlier. The cookie also becomes much smaller.
Renamed the setting enable_detect_unique_visitor_using_settings now called trust_visitors_cookies as it is different logic, and should only be enabled in intranet where IP is same for all users.
This will also help getting 1st party cookie implemented Refs #409

Anonymous Piwik user

Can you provide a patch or will you release a new update soon?

Matthieu Aubry
Owner

Please try the new beta at: http://builds.piwik.org/piwik-1.1.2b1.zip

let me know if it fixes the issue completely :)

Anonymous Piwik user

Thanks matt.

I installed the version, let's see what happens. I will report later the day if it worked out.

FYI: I got a JS Alert when I first opened the page :)

There is no/bad markup for form tag

Dunno if this has something to do with Piwik. However it just appeared once, now it's gone even on page reload.

Anonymous Piwik user

Matt: Seems to work like a charm with 1.1.2b1! Great work, thanks for your fast help.

I guess the wrong counts cannot be undone in db, right? So my daily (doesn't really matter) but also weekly and monthly data is not usable for analysis anymore!?

Or might there be a way to re-parse the data of the last day?

Anonymous Piwik user

Replying to vipsoft:

Another possibility is that the Tracker has gotten slower, and that this is a duplicate of #1108, experiencing the race condition where:

  • user browses page A, sending cookie X1
  • before server can respond with updated cookie X2, user browses page B, resending cookie X1

What happened to us yesterday and today seconds this hypothesis :
After upgrade to 1.1, the issue appeared (visit miscount). Maybe the tracker code got slower, because, indeed, our piwik server load increased.
After upgrade to 1.1.2b1 issue disappeared.
The issue caused the most severe spikes on sites with the most returning visitors, and sites with high number of actions / high action frequency (tracked ajax requests, for an example)

I second awesome's question, is there a way to rebuild visits, and repair yesterday stats (we can code something and contribute it if you give us some hints) ?

Matthieu Aubry
Owner

I haven't tested (WARNING) but a query like this might work:

delete from piwik_log_visit
where visit_server_date = $THE_DATE
and where visitor_idcookie IN (
SELECT visitor_idcookie from piwik_log_visit 
where visit_server_date = $THE_DATE
group by visitor_idcookie
having count(*)> 1
)

This will delete all visits from visitors beyond their first visit on $THE_DATE and therefore keep only one visit per visitor on that day

Please test on a test dataset before applying to your real one (or use on a copy of the table)

Matthieu Aubry mattab added this to the Piwik 1.2 milestone
Matthieu Aubry mattab self-assigned this
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.