Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Implement first party cookie in Piwik #409

Closed
mattab opened this Issue · 70 comments

4 participants

Matthieu Aubry Anonymous Piwik user Anthon Pang samgabriel
Matthieu Aubry
Owner

Currently Piwik is using several third party cookies. we want Piwik to create, by default, 1st party cookies only. This is mainly for privacy reasons, but also for better accuracy in counting unique visitors (1st party cookies are more often accepted and less often deleted by users)

This ticket is a requirement for #134 and #1984

Keywords: scalability, cookie, 1st party cookie

Anonymous Piwik user

+1 for this

Any news? We have piwik deployed to track widgets views (LOTS of hits from differents domains) and we are forced to increase header size in apache...

Anonymous Piwik user

same issue here. I already had to increase allowed header size in nginx 2 times with just a couple thousand sites.

Matthieu Aubry
Owner

This is planned to be fixed before Piwik 1.0, which means in the next 2 months. If you can help with implementation or testing, please let us know. This is def a high priority issue.

Anonymous Piwik user

I would love to help with testing

Matthieu Aubry
Owner

We should do the quick fix solution for 1.0, ensuring we store the last websites data, up to a reasonnable limit (1kb?). If a cookie does on average 200b we could still store 5 sites without failing as it is now.

We could then do the scalable long term solution post 1.0.

Matthieu Aubry
Owner

The goal would be to slightly update the Cookie mechanism in Tracker to have it store a total max of 1kb, discarding older tracking cookies.

Anthon Pang
Collaborator

Long term solution should also look at the race condition is #1107 and multi-site "ignore" cookie in #1376.

Matthieu Aubry
Owner

I will implement the quick fix..

Matthieu Aubry
Owner

(In [2777]) Refs #409

  • Quick fixes; ensuring tracking cookies never exceed 1k. it was surprisingly simple to implement, nice...
  • also adding small test failure script in misc/
Anonymous Piwik user

Any news on when Piwik is going to support 1st party cookies?

3rd party cookies are a less well-accepted. Not only by browsers, but also by people.
I think it'll be good for stats, for Piwik PR and Piwik acceptance to switch over.

thanks!

Matthieu Aubry
Owner

When implemented, we should also have the PiwikTracker api class set the 1st party cookie forwarded from the piwik server response.

Anthon Pang
Collaborator

In [3544], I added core/Tracker/Cookie.php to encapsulate the ignore_cookie. But it too suffers from the third-party cookie issue.

Anthon Pang
Collaborator

Replying to matt:

When implemented, we should also have the PiwikTracker api class set the 1st party cookie forwarded from the piwik server response.

The first-party "cookie" will actually be a UUID (not necessarily rfc4122 compliant) generated by piwik.js and passed to piwik.php via a new parameter. Any allowed third-party cookies will continue to be signed and sent via the Cookie: header.

The tracker session table will map first and third party visitor id_cookies (plus idsite to act as indices) to rows that contain the former cookie store.

Matthieu Aubry
Owner

Use cases for this feature:

  • User tracks one main domain name
    • standard use case, there is only one set of cookie
  • User tracks domain name AND many subdomains within one Piwik website
    • cookies are shared across all subdomains, via a call to setCookieDomain()
  • User tracks domain name in one Piwik website, and other subdomains in other Piwik websites
    • cookies are NOT be shared across subdomains when setCookieDomain() is not called
  • User tracks one domain name under several Piwik websites (ie. sepearate sections in separate Piwik website)
    • cookies are NOT shared if setCookiePath() was called with the path ot set the cookie to. Similar to GA
  • User tracks one domain name, but specific pages are different Piwik websites - for example when tracking a 'user page' on a social network type website. If the URL is not in a sub-directory, then first party cookies will be shared across all websites. If we had cookies for each page, then we would quickly overflow the cookie limit (assuming visitors view many user pages). This use case is not supported in Piwik.

  • User tracks several domain names, inside one Piwik website - This use case is not covered in this proposal: cookies will NOT be shared across domains. This is what setAllowLinked GA feature does, but we are OK not implementing this at this stage.

Requirements piwik.js

  • New cookie _pk_id
    • Valid 2 years after the latest page view
    • Contains a 64b int UUID generated on cookie create. How to build a random good UUID? keeping first 16 bytes of md5 would work well (need 16b,not 8 only, since it is hex string)
    • Contains timestamp of cookie creation date, in UTC and seconds Math.round(new Date().getTime() / 1000) This will be used to process 'Days to conversion' for goal conversions.
    • Contains visits count, initially 1 (updated when _pk_ses is created)
    • Contains timestamp of last page view of the last visit before this visit. This is used to process "Days since last visit" #583 and "Days to purchase" #2031
  • New cookie _pk_ses
    • Valid 30minutes after the latest page view
    • Contains no data
    • Every time _pk_ses is created, increase _pk_id visits counter by 1. This will be used to report "Visits to conversion"
  • New cookie _pk_ref
    • Valid 6 months, from date of creation.
    • Contains ref URL, truncated at 1024b
    • Contains time at which ref URL was set
    • The referer URL set in this cookie depends on first/last referer attribution. Also, a direct entry will always be overwritten by non direct referers. Pseudo code:
  IF the visit is new (ie. there was no cookie _pk_ses when track* was called initially)
  AND there is a referer URL which domain is not the current domain, or any subdomain set in setDomainNames
  AND (_pk_ref is empty // if _pk_ref cookie is not set, we always set it
       OR setConversionAttributionFirstReferer == false // if _pk_ref cookie is already set, but overwrite the value since we want to attribute last known referer
       OR _pk_ref is set AND hostname of _pk_ref URL is the current domain, OR any subdomain // the _pk_ref was set to a referer, but as we evaluate this URL again now, it seems this URL does not fit the spec. This could happen if a _pk_ref URL was set earlier, and then user updated website to setDomainNames(..). We want to improve visitors cookies data in this case.
       )
THEN update _pk_ref with current referer URL truncated 1k
  • To test a URL hostname, we can simply use JS .indexOf as it will do the job nicely and be easier to maintain than parsing URLs properly
    • All new cookies must be as space efficient as possible, ie.
  • no named index for 'arrays like' cookies, just use a . separator for values
  • records as little info as possible, and always truncate when user input data
    • All 1st party cookies are sent along with each request to piwik.php
  • &_id=UUID_IN_PK_ID
  • &_idts=UUID_CREATED_TIMESTAMP_IN_PK_ID
  • &_idvc=VISITS_COUNT_IN_PK_ID
  • &_idn=1
    • If _pk_id was created on this page, set _idn=1, otherwise set _idn=0. This means idnew, ie. 'new visitor' (or 'returning visitor')
  • &_ref=ENCODED_URL_IN_CONTENT_PK_REF
  • &_ses=1
  • &_viewts=TIMESTAMP_OF_LAST_PAGE_VIEW_OF_LAST_VISIT
  • &_refts=TIMESTAMP_OF_REFERRAL_URL
  • Cookies should work on 'localhost' or 'intranet' host names (but JS cookies need a proper domain name to be set)
  • API
    • setCookieDomain() - '.example.org' to set to all subdomains as well
    • setCookieNamePrefix() - to change _pk to something else
    • setCookiePath() - sets the path on which to set the cookie. Useful to track a specific section of a website separately from the main website (unique visitors, referer attributions, etc.).
    • setVisitorCookieTimeout() - to change default 2yo
    • setConversionAttributionFirstReferer() - by default, we attribute last referer set for a visit (call setConversionAttributionFirstReferer(false) in constructor) but if called by used, we would attribute a conversion to the first referer set in a past visit
    • getVisitorId() - returns the 16 characters ID from the cookie (without the visit count & other info)

Requirements piwik.php

  • Update code to get the various new parameters and use them in Tracker
  • Allow to use third party cookies with a setting. If enabled, Piwik will use 1st party AND 3rd party cookies. [Tracker] use_third_party_cookies = 0 by default
  • add log_conversion.days_to_conversion that counts days to conversion trusting the js timestamp (better than nothing)
  • add log_conversion.visits_to_conversion that counts visits until conversion
  • delete from schema log_conversion.referer_idvisit since it is unused
  • Add new report in Piwik "Days to Conversion"
  • Add new report in Piwik "Visits to Conversion"

Documentation:

  • Add doc of new public JS API functions in the JS doc

Ideas for V2

  • Set the 1st party cookies in PiwikTracker so that this is consistent with piwik.js
  • A concern is that cookie jar size will be potentially large because of _pk_ref containing full ref URL, ie. around 1k (since we truncate URL at 1k). A fix for this would be to do a basic parsing of the referer in piwik.js (like GA and other WA tools do). For example parsing keywords of top 50 search engines. I think we don't need to do this in V1 since it is really too much effort / QA, but worth keeping in mind for a future improvement.
Matthieu Aubry
Owner

Also I think the piwik_ignore cookie should stay 3rd party (and signed), to avoid abuse.

Matthieu Aubry
Owner

(In [3634]) Fixes #1916
Now always checking in the DB if we saw the visitor earlier. The cookie also becomes much smaller.
Renamed the setting enable_detect_unique_visitor_using_settings now called trust_visitors_cookies as it is different logic, and should only be enabled in intranet where IP is same for all users.
This will also help getting 1st party cookie implemented Refs #409

Matthieu Aubry
Owner

Also we need to think about subdomains tracking and first party cookies. How does GA handle this for example? see for reference: http://www.roirevolution.com/blog/2011/01/google_analytics_subdomain_tracking.php

and http://www.dannytalk.com/how-to-track-sub-domains-cross-domains-in-google-analytics/

Matthieu Aubry
Owner
  • the first party cookie needs a domain name to be set - how is it going to work on intranet 'http://localhost', 'http://intranet' ?
Anthon Pang
Collaborator

matt: do you still want this one? It doesn't appear in the request. To manage this on the client, requires also keeping track of the timestamp for the most recent page view of the current visit.

  • Contains timestamp of last page view of the last visit before this visit. This is used to process "Days since last visit" #583 and "Days to purchase" #2031
Anthon Pang
Collaborator

Because of this condition:

AND there is a referer URL which domain is not the current domain, or any subdomain set in setDomainNames

_pk_ref will never contain a referer for the current domain or subdomain; so, this expression will never be true:

OR _pk_ref is set AND hostname of _pk_ref URL is the current domain, OR any subdomain
Anthon Pang
Collaborator

re: comment:38 - oops, I didn't scroll all the way to the right to read your comment; got it

The timestamp in comment:37 is still an open question.

Also, you mention that _pk_ref "Contains time at which ref URL was set", but this timestamp doesn't appear in the request either. (If I store this in the cookie, I need to change the delimeter, as the referrer may contain '.')

Matthieu Aubry
Owner

To manage this on the client, requires also keeping track of the timestamp for the most recent page view of the current visit.

OK that's right, this timestamp can also be saved in the cookie (ie. _pk_ses cookie?)

"Contains time at which ref URL was set", but this timestamp doesn't appear in the request either. (If I store this in the cookie, I need to change the delimeter, as the referrer may contain '.')

what do you mean by "it doesnt appear in the request"? I mean, the _pk_ref must contain the URL as well as the client timestamp when the cookie was last updated with a ref URL.

Thx

Anthon Pang
Collaborator

I mean your specification doesn't show any parameters in the request to piwik.php for these timestamps.

&_viewts=TIMESTAMP_OF_LAST_PAGE_VIEW_OF_LAST_VISIT
&_refts=TIMESTAMP_OF_REFERRAL

If I understand _pk_ses correctly, the timestamp of the most recent page view (cvts) would have to instead be stored in _pk_id.

Matthieu Aubry
Owner

Indeed, I now updated the request to add these 2 timestamp

Also I'm not sure what I meant by: &_ses=1 in the URL... ? maybe this is not useful.

Anthon Pang
Collaborator

Maybe if _ses=0, the server should use third-party cookies?

Anthon Pang
Collaborator

(In [3783]) refs #409 - first party cookies

  • API changes:
    • added: setCookieNamePrefix(cookieNamePrefix)
    • added: setCookieDomain(domain)
    • added: setCookiePath(path)
    • added: setVisitorCookieTimeout(timeout) - defaults to 2 years since last page view
    • added: setSessionCookieTimeout(timeout) - defaults to 30 minutes since last activity
    • added: setReferralCookieTimeout(timeout) - defaults to 6 months from the first visit
    • added: setConversionAttributionFirstReferer(enable)
    • added: getVisitorId()
      • for asynchronous tracking, use:
    var visitorId;

    _paq.push(function () {
        visitorId = this.getVisitorId();
    });
  • Cookie notes:
    • The default cookie path is '/'. This might be viewed as a potentially insecure default because it allows cookies to be shared across directories on the same domain. (Again, see the social network example.) This is unfortunately, a necessity. If we leave the path blank, the behaviour is undefined (i.e., browser or browser-version dependent). For example, earlier versions of Firefox would default to '/'; later versions default to the origin path.
    • I was hoping to avoid this, but I added a hash to the cookie content similar to GA's setAllowHash(). This is needed for two reasons:
      1. Cookies are uniquely identified by the tuple (key,domain,path). Hashing only the domain is a bug. (See "social network website" use case.)
      2. There's a long-standing cookie+subdomain bug in Firefox (Gecko) dating back to 1.0 that leaks cookies from "example.com" (not ".example.com") to "xyz.example.com". @see https://bugzilla.mozilla.org/show_bug.cgi?id=363872
  • changed internal setCookie() method to take expiry time in milliseconds (was days)
  • removed internal dropCookie() method as it was never used

    @todo Missing unit tests and cross browser testing

refs #739 - piwik.js improvements

  • jslint 2011-01-09
  • new unit tests (integrated jslint, is_a functions, sha1(), utf8_encode(), etc)
  • use ECMAScript String.substring() instead of non-standard (although widely supported) String.substr()
  • implement domainFixup() so "example.com" and "example.com." are equivalent

  • API changes:

    • added: killFrame() - a frame buster
    • added: redirectFile( url ) - redirect if browsing off-line, aka file: buster; url is where to redirect to
    • added: setHeartBeatTimer( delay ) - send heart beat 'delay' milliseconds after initial trackPageView(); set to 0 to disable
    • removed: piwik_log() - legacy tracking code; see trackLink()
    • removed: piwik_track() - legacy tracking code; see trackPageView()
    • removed: setDownloadClass() - deprecated; see setDownloadClasses()
    • removed: setLinkClass() - deprecated; see setLinkClasses()

refs #752 - track middle mouse button clicks (via mousedown+mouseup pseudo-click handler); defaults to tracking true "clicks"

  • API changes:
    • modified: addListener( element, enablePseudoClickHandler = false )
    • modified: enableLinkTracking( enablePseudoClickHandler = false )

refs #1984 - custom variables vs custom data

@todo These are just stubs.

  • API changes:

    • added: setCustomVar(slotId, key, value, opt_scope) - scope is 1 (visitor), 2 (sesson), 3 (page)
    • added: getCustomVar(slotId)
    • added: deleteCustomVar(slotId)
  • API changes for consistency:

    • added: setCustomVar(slotId, obj, opt_scope)
    • added: setCustomData(key, value)
    • for the equivalent of deleteCustomData(), use:
    tracker.setCustomData(null);
Anthon Pang
Collaborator

(In [3784]) refs #409 - use getCookieName() in hasCookies() test

Anthon Pang
Collaborator

Mark as fixed. Future commits to #1984.

Matthieu Aubry
Owner

I still have to do some work :)

  • Requirements piwik.php
  • Integration testing

Also,

Anthon Pang
Collaborator

ok. on my todo list.

Matthieu Aubry
Owner

JS code review

  • great commit, the Piwik JS api is now very much excellent and full featured.

Questions/feedback

  • Are ref URLs encoded by default? in the cases where: it comes from the browser itself, OR when it was set via setReferrerUrl ?
  • If ref URL can contain a space (ie. sometimes not encoded), it will record a bogus cookie - should ref.split(' '); be ref.split(' ', limit = 1) ?
  • Referrer url doesn't seem to be truncated at 1k, important for keeping cookie space in control
  • Running the new JS for the first time, I see in the http request:
_ref    undefined
_refts  undefined
_viewts undefined

I think these should be set only when they have a value

  • can all cookie timeout methods take seconds as input? this is less risky (if they enter the timeout in seconds but expects milliseconds, things will break), but also more consistent/user friendly
  • getVisitorId() returns undefined (visitorId not set)
    • I looked at the cookie after some testing, and noticed the last field of 'id' cookie is undefined: PREFIXid.1fffd42e=fb6f5c3ec259b00e.1295573291.1.1295573291.undefined;
  • I don't think we need enableServerCookies(): enabling 3rd party cookies will be done in server side via config setting, will the client side have a use?

Pending more items as well docs

  • pending unit tests covering new functions and as much code coverage as possible
  • pending the run of these unit tests on most browsers to check errors are not triggered (most important) and check that cookies / requests are set correctly (to avoid an error such as #1962)
Matthieu Aubry
Owner
  • For compability with https pages, the cookie secure flag should be set automatically based on the current URL protocol (in setCookie())
Anthon Pang
Collaborator

Replying to matt:

  • Are ref URLs encoded by default? in the cases where: it comes from the browser itself, OR when it was set via setReferrerUrl ?

Browser-dependent. We have to encode it in case it isn't.

  • If ref URL can contain a space (ie. sometimes not encoded), it will record a bogus cookie - should ref.split(' '); be ref.split(' ', limit = 1) ?

Good point. I've changed it to use limit=1 and '.' as a separator (consistent with id).

  • Referrer url doesn't seem to be truncated at 1k, important for keeping cookie space in control

No, it isn't. The spec is 4K. The actual limit is browser dependent, and also subject to server configuration limits.

  • Running the new JS for the first time, I see in the http request:

I'll fix that.

  • can all cookie timeout methods take seconds as input? this is less risky (if they enter the timeout in seconds but expects milliseconds, things will break), but also more consistent/user friendly

This is for consistency with G.

  • getVisitorId() returns undefined (visitorId not set)

I'll fix that.

  • I looked at the cookie after some testing, and noticed the last field of 'id' cookie is undefined: PREFIXid.1fffd42e=fb6f5c3ec259b00e.1295573291.1.1295573291.undefined;

Same bug as running JS for the first time.

  • I don't think we need enableServerCookies(): enabling 3rd party cookies will be done in server side via config setting, will the client side have a use?

Another analytics offers a thirdParty setting via JS. Removed for now.

Replying to matt:

  • For compability with https pages, the cookie secure flag should be set automatically based on the current URL protocol (in setCookie())

Ok.

Anthon Pang
Collaborator

(In [3789]) refs #409 - remove enableServerCookies(); fix bugs found in matt's review

Anthon Pang
Collaborator

(In [3794]) refs #409 - set secure flag in cookies per comment:51

Anthon Pang
Collaborator

(In [3797]) refs #409 - rename setConversionAttributionFirstReferer to setConversionAttributionFirstReferrer for correctness/consistency, i.e., referrer/referral

Anthon Pang
Collaborator

(In [3814]) refs #409 - reorg js unit tests

Anthon Pang
Collaborator

(In [3817]) refs #409 - added setDoNotTrack(bool); updated jslint to 2011-01-26

Anthon Pang
Collaborator

(In [3818]) refs #409 - small optimization to r3817

Anthon Pang
Collaborator

_ref is showing up undefined in my logs; I'll fix this and add some more unit tests (tomorrow?)

Matthieu Aubry
Owner

Replying to vipsoft:

Replying to matt:

  • Are ref URLs encoded by default? in the cases where: it comes from the browser itself, OR when it was set via setReferrerUrl ?

Browser-dependent. We have to encode it in case it isn't.

OK, should JS ensure all URLs are encoded before working on them?

  • Referrer url doesn't seem to be truncated at 1k, important for keeping cookie space in control

No, it isn't. The spec is 4K. The actual limit is browser dependent, and also subject to server configuration limits.

A cookie too big is not desirable as it will show up in all http request and slow the page load,plus it could cause other problems with cookie space.

we must truncate at some lenght, maybe 2k?

  • can all cookie timeout methods take seconds as input? this is less risky (if they enter the timeout in seconds but expects milliseconds, things will break), but also more consistent/user friendly

This is for consistency with G.

OK, I vote for using seconds as ms doesn't make sense in this case. Let's not follow GA API since it will cause user errors (and we have already a few differences anyway)

OK for other modifications, good stuff. Is there anything still open appart from the points above?

Anthon Pang
Collaborator

We already assume URLs are decoded when working on them. Values are decoded by getCookie; conversely, values are encoded by setCookie and sendRequest. I don't see any need to change this.

This isn't a problem that we need to solve. Users may want to be aware of potential limits, but they shouldn't be artificially constrained. Tracking requests are sent asynchronously, and shouldn't affect page load time. Loading piwik.js (minified at 14K), when it isn't in the cache, has more impact on page load times.

I'll change the API methods to expect seconds, but we should do so for all methods. For setLinkTrackingTimer() this will be a compat-buster.

As an observation, when Piwik is on the same domain as the site being tracked, first party cookies will be sent in the Cookie: header, in addition to being in the tracking request. Some ideas would be to (a) leave this as is, (b) add a method to disable first party cookies, or (c) detect when the site being tracked and tracker are on the same domain and in this case, shorten the request string by excluding the cookie values.

Anthon Pang
Collaborator

(In [3846]) refs #409 - fix _ref=undefined bug caused by split('.', 1); also external API methods now expect seconds, and convert to milliseconds internally

Anthon Pang
Collaborator

Replying to vipsoft:

I'll change the API methods to expect seconds, but we should do so for all methods. For setLinkTrackingTimer() this will be a compat-buster.

Done.

As an observation, when Piwik is on the same domain as the site being tracked, first party cookies will be sent in the Cookie: header, in addition to being in the tracking request. Some ideas would be to (a) leave this as is, (b) add a method to disable first party cookies, or (c) detect when the site being tracked and tracker are on the same domain and in this case, shorten the request string by excluding the cookie values.

The problem with (c) is that the cookies are unsigned, so the server discards the value.

Anthon Pang
Collaborator

(d) detect when the site being tracked and tracker are on the same domain, and in this case, automatically disable first party cookies

Anthon Pang
Collaborator

for (b) and (d), cvar would be an exception.

Anthon Pang
Collaborator

fwiw I think the redundancy in the Cookie: header is a low priority -- it isn't a problem we need to solve now.

Matthieu Aubry
Owner

setLinkTrackingTimer is fine in milliseconds, since it requires this precision (which is not needed/desired for cookie timeouts). We can clarify what parameter we expect in the documentation and in the parameter names. I vote for revert as introducing an API change in the documented method at this stage is not possible - thoughts?

My concern with cookie sizes was purely around slowing down the whole website experience, since 1st party cookies are in the cookie headers. So with a 2k cookies, fetching 10 images and 5 other resources will cause an overhead of 2k * 15 = 30k data transmitted over http, which could result in worsen user experience. I still think we must truncate to 1 or 2k, but agreed that this should be documented and maybe could be changed via a new setConversionReferrerUrlTruncation() or something similar.

Anthon Pang
Collaborator

(In [3852]) refs #409 - revert API change to setLinkTrackingTimer()

Anthon Pang
Collaborator

Since the conversion referral URL is set (if needed) at the beginning of a new session and used (currently) at most once per visit, one idea would be to store this server side. This would minimize the cookie size and transmission overhead; the tradeoff is executing some extra (albeit infrequent) SQL on the server.

Anthon Pang
Collaborator

There's also a small privacy/security issue with storing the referral URL in a cookie.

  • It's persistent (unlike document.referer).
  • May be targeted by a browsing history hijack.
  • It could be used for competitive intelligence by third-parties. (e.g., Microsoft's Customer Experience Improvement Program)
Matthieu Aubry
Owner

vipsoft, I updated my comment about the visitor log table new feature, see #1434 - I think it would be best to go this way in the future indeed. Just more overhead for more features :)

Anthon Pang
Collaborator

Ok. Hopefully it won't take as long as it did this ticket... ;)

(The space/transmission overhead gets worse when there are multiple trackers on the same page, using different cookie name prefixes.)

Anthon Pang
Collaborator

(In [3868]) refs #409 - add back legacy tracking; update jslint

Matthieu Aubry
Owner

(In [3888]) Refs #409

  • Deprecated setting, moved to JS API instead
; if set to 0, any goal conversion will be credited to the last more recent non empty referer. 
; when set to 1, the first ever referer used to reach the website will be used
use_first_referer_to_determine_goal_referer = 0
  • New setting to allow using 3rd party cookies for visitor ID cookie only
; Piwik uses first party cookies by default. If set to 1, 
; the visit ID cookie will be set on the Piwik server domain as well
; this is useful when you want to do cross websites analysis 
use_third_party_cookies = 0
  • Tracker uses 1st cookie values for Goals referrer attribution
  • removed log_conversion.referer_idvisit field, unused
Matthieu Aubry
Owner

(In [3892]) Refs #409

  • Adding new metrics: Visit count, Days since first visit, Days since last visit, these are new fields in the table
  • The new Reports will be done in 1.3
  • Reading the timestamps and visit count from the 1st party cookie
  • Fixing tests that are using the 1st party cookies (added also tests for the 3rd party cookie use case)
Matthieu Aubry
Owner

(In [3893]) Refs #409 Disabling getVisitorId() for now as it doesn't work when called before track* (the object should init the uuid member before getRequest())

Would be nice to have though, to make it trivial to get the visitorId from piwik into other systems (Salesforce, Form fill), and then also allow querying the Live! API to fetch data about this visitor.

Matthieu Aubry
Owner

I think all outstanding points, appart from JS tests and JS Doc, are in trunk and working?

Anthon Pang
Collaborator

We can't reliably retrieve an existing uuid until the cookie domain, path, and prefix are definite. If we pre-initialize it and then re-read the cookie each time domain, path, or prefix is changed, then the side effect is that the uuid may be differ depending on when getvisitorid is called.

Vote to either re-enable the as-implemented behaviour or remove this feature entirely.

Matthieu Aubry
Owner

My idea was to have getVisitorId() call a loadIdCookie or similar, that would only pre-load this cookie so we can read it. User should call the getVisitorId when all setCookie* have been called, but he shouldn't have to call it after track*, since he might require it before we can wait for the request (eg. when sending a form in the page, wanting to attach the Piwik ID)

Anthon Pang
Collaborator

(In [3939]) refs #409:

  • always use Crockford's JSON module (renamed to JSON2) to workaround broken "native implementations"
  • add JSON unit tests
  • revert [and 3900; rewrite getVisitorId() per comment:80
  • refactor browser feature detection for fingerprinting (used to generate uuid)
  • setDomains() now takes either '*.domain' or '.domain'
  • Safari emits warnings for Content-Length and Connection as "unsafe headers" in XHR POST request

refs #1984:

  • partially revert [3882] in order for the unit tests to run
  • fix inconsistency in getCustomVariable() depending on whether it is loaded from memory or from a cookie

refs #2078 Webkit bug ("Failed to load resource") when link target is the current window/tab

  • requires further discussion because the workaround may not be desirable behavior, i.e.,
if ((new RegExp('WebKit')).test(navigatorAlias.userAgent)
    && (!sourceElement.target.length || sourceElement.target === '_self')
    && linkType === 'link')
{
    // open outlink in a new window
    sourceElement.target = '_blank';
}
Anthon Pang
Collaborator

(In [3960]) refs #409 - add site ID to cookie name; shorten domain hash to 16 bits (4 hexit characters)

This is a hybrid between the previous implementation and what I proposed.

  • Adding idsite to the cookie name means subdomains that track using different site IDs can still use/share subdomain cookies
  • Keeping the domain hash in the cookie name will make it easier in future to delete invalid cookies (integrity check)

Decided not to auto-set www.example.com's cookie domain=.example.com -- as the convenience introduces side-effects, and I have a feeling will be more trouble than beneficial. Will continue to leave it to the user to explicitly set the cookie domain. Users should be advised to redirect example.com to www.example.com (or vice-versa) to:

a) to avoid separate cookies between the two domains, and
b) to improve SEO. (Google for "seo www vs no www".)

samgabriel

the revision [3960] is leading to a lot of breaking on our sites. We track multiple siteIds using the same domain name. On each request we call trackPageView twice once for each siteid. The new mechanism of adding the site id to the cookie name is causing the headers to overflow the server buffers. Leading to numerous errors on our server.

The goal that was to be achieved by this change i believe was to be able to track different site Ids for the sub domains. But if that is a requirement of the application then the application should do so by calling trackPageView twice or three times.

The current implementation would lead to an endless increase in the number of cookies as the user moves from one site to the next. which is what is happening on our side.

Anthon Pang
Collaborator

Sam: it depends how you use piwik.js. (fyi the reason for the hash is mentioned in comment:44)

Are you using two sites ids across the entire site?

Or many more? eg one site-wide, and another that varies/depends on some area of the website? In this scenario, you should use setCookiePath()

Can you see if the TrackSiteByUrl plugin can be adapted for your environment?

samgabriel

I looked at comment:44 the only thing I can see regarding the relevance would be the social network example. Unfortunately I couldn't find the description of this use case.

In our setup, we are using the same URL for all the various sites so cookie paths are not going to work.

Regarding the site ids we have one siteId that is site wide and another one based on the client. We have thousands of clients that we track. we developed our own plugin that creates the site during our own account creation process, retrieves the site id and embed that into the db for tracking.

One thing to note here is if you think about it abstractly, you have one user one browser. Can that user really have multiple identities, referral URLs ..etc based on the siteId????!!! I think this fix is trying to do something Piwik shouldn't be responsible for.

This is on a side note, but referral URLs can be monstrous in size. adding them to cookies can be a real pain on the server. if you already have them in the db based on the visitor id/siteId should they really be in the cookie as well?

Anthon Pang
Collaborator

The hash addresses the subdomain cookie leak problem in Firefox.

Each tracker instance can point to a different Piwik server. If you're using cookie domains and/or paths, then it is possible for the cookie contents to be different.

samgabriel

But if that is case then you can create the hash based on the piwik domain url instead of based on the siteId.

I still don't understand how adding the site Ids will fix the FF issue. Wouldn't the cookies still leak to the subdomains?

Regarding different siteIds values for subdomains. I might be wrong here but if there are two cookies with the same name if the subdomain is set for one of them and you visited the subdomain, wouldn't that return the one that has the subdomain set?

Anthon Pang
Collaborator

the hash is only on cookie domain and path

in any case, I think you're focussing too much on the hash

the bigger picture is that your visitors are amassing many, large cookies. What you expect/want is one client side cookie with server-side storage for the bulk of the cookie contents, that can somehow be mapped to one or more tracking site IDs. This wasn't part of the scope of this ticket, so it isn't something Piwik does right now. I'll create a new ticket for this feature request and we'll figure it out from there.

Matthieu Aubry
Owner

See the new ticket at #2680

Matthieu Aubry
Owner

See also #2211 piwik.js: Cross domain tracking

Matthieu Aubry mattab added this to the Piwik 1.2 milestone
Matthieu Aubry mattab self-assigned this
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.