Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement first party cookie in Piwik #409

Closed
mattab opened this Issue Nov 4, 2008 · 70 comments

Comments

Projects
None yet
4 participants
@mattab
Copy link
Member

mattab commented Nov 4, 2008

Currently Piwik is using several third party cookies. we want Piwik to create, by default, 1st party cookies only. This is mainly for privacy reasons, but also for better accuracy in counting unique visitors (1st party cookies are more often accepted and less often deleted by users)

This ticket is a requirement for #134 and #1984

Keywords: scalability, cookie, 1st party cookie

@anonymous-piwik-user

This comment has been minimized.

Copy link

anonymous-piwik-user commented Jun 16, 2010

+1 for this

Any news? We have piwik deployed to track widgets views (LOTS of hits from differents domains) and we are forced to increase header size in apache...

@anonymous-piwik-user

This comment has been minimized.

Copy link

anonymous-piwik-user commented Jun 21, 2010

same issue here. I already had to increase allowed header size in nginx 2 times with just a couple thousand sites.

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jun 21, 2010

This is planned to be fixed before Piwik 1.0, which means in the next 2 months. If you can help with implementation or testing, please let us know. This is def a high priority issue.

@anonymous-piwik-user

This comment has been minimized.

Copy link

anonymous-piwik-user commented Jun 21, 2010

I would love to help with testing

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jul 20, 2010

We should do the quick fix solution for 1.0, ensuring we store the last websites data, up to a reasonnable limit (1kb?). If a cookie does on average 200b we could still store 5 sites without failing as it is now.

We could then do the scalable long term solution post 1.0.

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jul 20, 2010

The goal would be to slightly update the Cookie mechanism in Tracker to have it store a total max of 1kb, discarding older tracking cookies.

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jul 27, 2010

Long term solution should also look at the race condition is #1107 and multi-site "ignore" cookie in #1376.

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jul 29, 2010

I will implement the quick fix..

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jul 29, 2010

(In [2777]) Refs #409

  • Quick fixes; ensuring tracking cookies never exceed 1k. it was surprisingly simple to implement, nice...
  • also adding small test failure script in misc/
@anonymous-piwik-user

This comment has been minimized.

Copy link

anonymous-piwik-user commented Nov 23, 2010

Any news on when Piwik is going to support 1st party cookies?

3rd party cookies are a less well-accepted. Not only by browsers, but also by people.
I think it'll be good for stats, for Piwik PR and Piwik acceptance to switch over.

thanks!

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Dec 22, 2010

When implemented, we should also have the PiwikTracker api class set the 1st party cookie forwarded from the piwik server response.

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Dec 30, 2010

In [3544], I added core/Tracker/Cookie.php to encapsulate the ignore_cookie. But it too suffers from the third-party cookie issue.

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Dec 30, 2010

Replying to matt:

When implemented, we should also have the PiwikTracker api class set the 1st party cookie forwarded from the piwik server response.

The first-party "cookie" will actually be a UUID (not necessarily rfc4122 compliant) generated by piwik.js and passed to piwik.php via a new parameter. Any allowed third-party cookies will continue to be signed and sent via the Cookie: header.

The tracker session table will map first and third party visitor id_cookies (plus idsite to act as indices) to rows that contain the former cookie store.

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 5, 2011

Use cases for this feature:

  • User tracks one main domain name
    • standard use case, there is only one set of cookie
  • User tracks domain name AND many subdomains within one Piwik website
    • cookies are shared across all subdomains, via a call to setCookieDomain()
  • User tracks domain name in one Piwik website, and other subdomains in other Piwik websites
    • cookies are NOT be shared across subdomains when setCookieDomain() is not called
  • User tracks one domain name under several Piwik websites (ie. sepearate sections in separate Piwik website)
    • cookies are NOT shared if setCookiePath() was called with the path ot set the cookie to. Similar to GA
  • User tracks one domain name, but specific pages are different Piwik websites - for example when tracking a 'user page' on a social network type website. If the URL is not in a sub-directory, then first party cookies will be shared across all websites. If we had cookies for each page, then we would quickly overflow the cookie limit (assuming visitors view many user pages). This use case is not supported in Piwik.
  • User tracks several domain names, inside one Piwik website - This use case is not covered in this proposal: cookies will NOT be shared across domains. This is what setAllowLinked GA feature does, but we are OK not implementing this at this stage.

Requirements piwik.js

  • New cookie _pk_id
    • Valid 2 years after the latest page view
    • Contains a 64b int UUID generated on cookie create. How to build a random good UUID? keeping first 16 bytes of md5 would work well (need 16b,not 8 only, since it is hex string)
    • Contains timestamp of cookie creation date, in UTC and seconds Math.round(new Date().getTime() / 1000) This will be used to process 'Days to conversion' for goal conversions.
    • Contains visits count, initially 1 (updated when _pk_ses is created)
    • Contains timestamp of last page view of the last visit before this visit. This is used to process "Days since last visit" #583 and "Days to purchase" #2031
  • New cookie _pk_ses
    • Valid 30minutes after the latest page view
    • Contains no data
    • Every time _pk_ses is created, increase _pk_id visits counter by 1. This will be used to report "Visits to conversion"
  • New cookie _pk_ref
    • Valid 6 months, from date of creation.
    • Contains ref URL, truncated at 1024b
    • Contains time at which ref URL was set
    • The referer URL set in this cookie depends on first/last referer attribution. Also, a direct entry will always be overwritten by non direct referers. Pseudo code:
  IF the visit is new (ie. there was no cookie _pk_ses when track* was called initially)
  AND there is a referer URL which domain is not the current domain, or any subdomain set in setDomainNames
  AND (_pk_ref is empty // if _pk_ref cookie is not set, we always set it
       OR setConversionAttributionFirstReferer == false // if _pk_ref cookie is already set, but overwrite the value since we want to attribute last known referer
       OR _pk_ref is set AND hostname of _pk_ref URL is the current domain, OR any subdomain // the _pk_ref was set to a referer, but as we evaluate this URL again now, it seems this URL does not fit the spec. This could happen if a _pk_ref URL was set earlier, and then user updated website to setDomainNames(..). We want to improve visitors cookies data in this case.
       )
THEN update _pk_ref with current referer URL truncated 1k
  • To test a URL hostname, we can simply use JS .indexOf as it will do the job nicely and be easier to maintain than parsing URLs properly
    • All new cookies must be as space efficient as possible, ie.
  • no named index for 'arrays like' cookies, just use a . separator for values
  • records as little info as possible, and always truncate when user input data
    • All 1st party cookies are sent along with each request to piwik.php
  • &_id=UUID_IN_PK_ID
  • &_idts=UUID_CREATED_TIMESTAMP_IN_PK_ID
  • &_idvc=VISITS_COUNT_IN_PK_ID
  • &_idn=1
  • If _pk_id was created on this page, set _idn=1, otherwise set _idn=0. This means idnew, ie. 'new visitor' (or 'returning visitor')
  • &_ref=ENCODED_URL_IN_CONTENT_PK_REF
  • &_ses=1
  • &_viewts=TIMESTAMP_OF_LAST_PAGE_VIEW_OF_LAST_VISIT
  • &_refts=TIMESTAMP_OF_REFERRAL_URL
  • Cookies should work on 'localhost' or 'intranet' host names (but JS cookies need a proper domain name to be set)
  • API
    • setCookieDomain() - '.example.org' to set to all subdomains as well
    • setCookieNamePrefix() - to change _pk to something else
    • setCookiePath() - sets the path on which to set the cookie. Useful to track a specific section of a website separately from the main website (unique visitors, referer attributions, etc.).
    • setVisitorCookieTimeout() - to change default 2yo
    • setConversionAttributionFirstReferer() - by default, we attribute last referer set for a visit (call setConversionAttributionFirstReferer(false) in constructor) but if called by used, we would attribute a conversion to the first referer set in a past visit
    • getVisitorId() - returns the 16 characters ID from the cookie (without the visit count & other info)

Requirements piwik.php

  • Update code to get the various new parameters and use them in Tracker
  • Allow to use third party cookies with a setting. If enabled, Piwik will use 1st party AND 3rd party cookies. [Tracker] use_third_party_cookies = 0 by default
  • add log_conversion.days_to_conversion that counts days to conversion trusting the js timestamp (better than nothing)
  • add log_conversion.visits_to_conversion that counts visits until conversion
  • delete from schema log_conversion.referer_idvisit since it is unused
  • Add new report in Piwik "Days to Conversion"
  • Add new report in Piwik "Visits to Conversion"

Documentation:

  • Add doc of new public JS API functions in the JS doc

Ideas for V2

  • Set the 1st party cookies in PiwikTracker so that this is consistent with piwik.js
  • A concern is that cookie jar size will be potentially large because of _pk_ref containing full ref URL, ie. around 1k (since we truncate URL at 1k).
    A fix for this would be to do a basic parsing of the referer in piwik.js (like GA and other WA tools do). For example parsing keywords of top 50 search engines.
    I think we don't need to do this in V1 since it is really too much effort / QA, but worth keeping in mind for a future improvement.
@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 5, 2011

Also I think the piwik_ignore cookie should stay 3rd party (and signed), to avoid abuse.

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 5, 2011

(In [3634]) Fixes #1916
Now always checking in the DB if we saw the visitor earlier. The cookie also becomes much smaller.
Renamed the setting enable_detect_unique_visitor_using_settings now called trust_visitors_cookies as it is different logic, and should only be enabled in intranet where IP is same for all users.
This will also help getting 1st party cookie implemented Refs #409

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 5, 2011

Also we need to think about subdomains tracking and first party cookies. How does GA handle this for example? see for reference: http://www.roirevolution.com/blog/2011/01/google_analytics_subdomain_tracking.php

and http://www.dannytalk.com/how-to-track-sub-domains-cross-domains-in-google-analytics/

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 14, 2011

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

matt: do you still want this one? It doesn't appear in the request. To manage this on the client, requires also keeping track of the timestamp for the most recent page view of the current visit.

  • Contains timestamp of last page view of the last visit before this visit. This is used to process "Days since last visit" #583 and "Days to purchase" #2031
@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

Because of this condition:

AND there is a referer URL which domain is not the current domain, or any subdomain set in setDomainNames

_pk_ref will never contain a referer for the current domain or subdomain; so, this expression will never be true:

OR _pk_ref is set AND hostname of _pk_ref URL is the current domain, OR any subdomain
@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

re: comment:38 - oops, I didn't scroll all the way to the right to read your comment; got it

The timestamp in comment:37 is still an open question.

Also, you mention that _pk_ref "Contains time at which ref URL was set", but this timestamp doesn't appear in the request either. (If I store this in the cookie, I need to change the delimeter, as the referrer may contain '.')

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 19, 2011

To manage this on the client, requires also keeping track of the timestamp for the most recent page view of the current visit.

OK that's right, this timestamp can also be saved in the cookie (ie. _pk_ses cookie?)

"Contains time at which ref URL was set", but this timestamp doesn't appear in the request either. (If I store this in the cookie, I need to change the delimeter, as the referrer may contain '.')

what do you mean by "it doesnt appear in the request"? I mean, the _pk_ref must contain the URL as well as the client timestamp when the cookie was last updated with a ref URL.

Thx

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

I mean your specification doesn't show any parameters in the request to piwik.php for these timestamps.

&_viewts=TIMESTAMP_OF_LAST_PAGE_VIEW_OF_LAST_VISIT
&_refts=TIMESTAMP_OF_REFERRAL

If I understand _pk_ses correctly, the timestamp of the most recent page view (cvts) would have to instead be stored in _pk_id.

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 19, 2011

Indeed, I now updated the request to add these 2 timestamp

Also I'm not sure what I meant by: &_ses=1 in the URL... ? maybe this is not useful.

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

Maybe if _ses=0, the server should use third-party cookies?

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

(In [3783]) refs #409 - first party cookies

  • API changes:
    • added: setCookieNamePrefix(cookieNamePrefix)
    • added: setCookieDomain(domain)
    • added: setCookiePath(path)
    • added: setVisitorCookieTimeout(timeout) - defaults to 2 years since last page view
    • added: setSessionCookieTimeout(timeout) - defaults to 30 minutes since last activity
    • added: setReferralCookieTimeout(timeout) - defaults to 6 months from the first visit
    • added: setConversionAttributionFirstReferer(enable)
    • added: getVisitorId()
      • for asynchronous tracking, use:
    var visitorId;

    _paq.push(function () {
        visitorId = this.getVisitorId();
    });
  • Cookie notes:

    • The default cookie path is '/'. This might be viewed as a potentially insecure default because it allows cookies to be shared across directories on the same domain. (Again, see the social network example.) This is unfortunately, a necessity. If we leave the path blank, the behaviour is undefined (i.e., browser or browser-version dependent). For example, earlier versions of Firefox would default to '/'; later versions default to the origin path.
    • I was hoping to avoid this, but I added a hash to the cookie content similar to GA's setAllowHash(). This is needed for two reasons:
      1. Cookies are uniquely identified by the tuple (key,domain,path). Hashing only the domain is a bug. (See "social network website" use case.)
      2. There's a long-standing cookie+subdomain bug in Firefox (Gecko) dating back to 1.0 that leaks cookies from "example.com" (not ".example.com") to "xyz.example.com". @see https://bugzilla.mozilla.org/show_bug.cgi?id=363872
  • changed internal setCookie() method to take expiry time in milliseconds (was days)

  • removed internal dropCookie() method as it was never used

    @todo Missing unit tests and cross browser testing

refs #739 - piwik.js improvements

  • jslint 2011-01-09
  • new unit tests (integrated jslint, is_a functions, sha1(), utf8_encode(), etc)
  • use ECMAScript String.substring() instead of non-standard (although widely supported) String.substr()
  • implement domainFixup() so "example.com" and "example.com." are equivalent
  • API changes:
    • added: killFrame() - a frame buster
    • added: redirectFile( url ) - redirect if browsing off-line, aka file: buster; url is where to redirect to
    • added: setHeartBeatTimer( delay ) - send heart beat 'delay' milliseconds after initial trackPageView(); set to 0 to disable
    • removed: piwik_log() - legacy tracking code; see trackLink()
    • removed: piwik_track() - legacy tracking code; see trackPageView()
    • removed: setDownloadClass() - deprecated; see setDownloadClasses()
    • removed: setLinkClass() - deprecated; see setLinkClasses()

refs #752 - track middle mouse button clicks (via mousedown+mouseup pseudo-click handler); defaults to tracking true "clicks"

  • API changes:
    • modified: addListener( element, enablePseudoClickHandler = false )
    • modified: enableLinkTracking( enablePseudoClickHandler = false )

refs #1984 - custom variables vs custom data

@todo These are just stubs.

  • API changes:
    • added: setCustomVar(slotId, key, value, opt_scope) - scope is 1 (visitor), 2 (sesson), 3 (page)
    • added: getCustomVar(slotId)
    • added: deleteCustomVar(slotId)
  • API changes for consistency:
    • added: setCustomVar(slotId, obj, opt_scope)
    • added: setCustomData(key, value)
    • for the equivalent of deleteCustomData(), use:
    tracker.setCustomData(null);
@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

(In [3784]) refs #409 - use getCookieName() in hasCookies() test

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

Mark as fixed. Future commits to #1984.

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 19, 2011

I still have to do some work :)

  • Requirements piwik.php
  • Integration testing

Also,

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Jan 19, 2011

ok. on my todo list.

@samgabriel

This comment has been minimized.

Copy link

samgabriel commented Sep 22, 2011

the revision [3960] is leading to a lot of breaking on our sites. We track multiple siteIds using the same domain name. On each request we call trackPageView twice once for each siteid. The new mechanism of adding the site id to the cookie name is causing the headers to overflow the server buffers. Leading to numerous errors on our server.

The goal that was to be achieved by this change i believe was to be able to track different site Ids for the sub domains. But if that is a requirement of the application then the application should do so by calling trackPageView twice or three times.

The current implementation would lead to an endless increase in the number of cookies as the user moves from one site to the next. which is what is happening on our side.

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Sep 23, 2011

Sam: it depends how you use piwik.js. (fyi the reason for the hash is mentioned in comment:44)

Are you using two sites ids across the entire site?

Or many more? eg one site-wide, and another that varies/depends on some area of the website? In this scenario, you should use setCookiePath()

Can you see if the TrackSiteByUrl plugin can be adapted for your environment?

@samgabriel

This comment has been minimized.

Copy link

samgabriel commented Sep 26, 2011

I looked at comment:44 the only thing I can see regarding the relevance would be the social network example. Unfortunately I couldn't find the description of this use case.

In our setup, we are using the same URL for all the various sites so cookie paths are not going to work.

Regarding the site ids we have one siteId that is site wide and another one based on the client. We have thousands of clients that we track. we developed our own plugin that creates the site during our own account creation process, retrieves the site id and embed that into the db for tracking.

One thing to note here is if you think about it abstractly, you have one user one browser. Can that user really have multiple identities, referral URLs ..etc based on the siteId????!!! I think this fix is trying to do something Piwik shouldn't be responsible for.

This is on a side note, but referral URLs can be monstrous in size. adding them to cookies can be a real pain on the server. if you already have them in the db based on the visitor id/siteId should they really be in the cookie as well?

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Sep 26, 2011

The hash addresses the subdomain cookie leak problem in Firefox.

Each tracker instance can point to a different Piwik server. If you're using cookie domains and/or paths, then it is possible for the cookie contents to be different.

@samgabriel

This comment has been minimized.

Copy link

samgabriel commented Sep 26, 2011

But if that is case then you can create the hash based on the piwik domain url instead of based on the siteId.

I still don't understand how adding the site Ids will fix the FF issue. Wouldn't the cookies still leak to the subdomains?

Regarding different siteIds values for subdomains. I might be wrong here but if there are two cookies with the same name if the subdomain is set for one of them and you visited the subdomain, wouldn't that return the one that has the subdomain set?

@robocoder

This comment has been minimized.

Copy link
Contributor

robocoder commented Sep 26, 2011

the hash is only on cookie domain and path

in any case, I think you're focussing too much on the hash

the bigger picture is that your visitors are amassing many, large cookies. What you expect/want is one client side cookie with server-side storage for the bulk of the cookie contents, that can somehow be mapped to one or more tracking site IDs. This wasn't part of the scope of this ticket, so it isn't something Piwik does right now. I'll create a new ticket for this feature request and we'll figure it out from there.

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Sep 27, 2011

See the new ticket at #2680

@mattab

This comment has been minimized.

Copy link
Member Author

mattab commented Jan 16, 2014

See also #2211 piwik.js: Cross domain tracking

@mattab mattab added this to the Piwik 1.2 milestone Jul 8, 2014

@mattab mattab self-assigned this Jul 8, 2014

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.