New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New website setting: Only track visits and actions when the action URL starts with one of the above URLs #588

Closed
anonymous-piwik-user opened this Issue Mar 5, 2009 · 29 comments

Comments

Projects
None yet
10 participants
@anonymous-piwik-user

anonymous-piwik-user commented Mar 5, 2009

Piwik statistics can be distortet by copying the JavaScript code to third party sites.
In the “Websites Management” you can add new sites with their URLs. But everybody can copy your JavaScript code to his own site and manipulate your statistics.

Piwik needs to be updated with a function that defines domains that are allowed to be counted.

@anonymous-piwik-user

This comment has been minimized.

Show comment
Hide comment
@anonymous-piwik-user

anonymous-piwik-user Mar 5, 2009

This option will only be viewable if you login as an admin? Other users that you just want checking stats should be given different login w/o admin credentials so they cannot go to site management.

View access will not show the javascript code shown to track within the Piwik front end.

anonymous-piwik-user commented Mar 5, 2009

This option will only be viewable if you login as an admin? Other users that you just want checking stats should be given different login w/o admin credentials so they cannot go to site management.

View access will not show the javascript code shown to track within the Piwik front end.

@robocoder

This comment has been minimized.

Show comment
Hide comment
@robocoder

robocoder Mar 6, 2009

Contributor

Piwik relies on information sent by the browser. Whatever we do on the server, there is some implicit trust that what the client sends is not malicious.

Server side filtering might incur the performance penalty concern raised in ticket #9. Redesignating this ticket as a plugin feature request.

A benefit of the current implementation is that discrepancies in one’s stats may help to identity copyvio or malicious activity to be blocked.

Contributor

robocoder commented Mar 6, 2009

Piwik relies on information sent by the browser. Whatever we do on the server, there is some implicit trust that what the client sends is not malicious.

Server side filtering might incur the performance penalty concern raised in ticket #9. Redesignating this ticket as a plugin feature request.

A benefit of the current implementation is that discrepancies in one’s stats may help to identity copyvio or malicious activity to be blocked.

@robocoder

This comment has been minimized.

Show comment
Hide comment
@robocoder

robocoder Mar 7, 2009

Contributor

Requirements:
- UI to enter domain name(s) for this site, e.g., example.com, www.example.com, example.subhosting.com, subhosting.com/example/
- Tracker: Filter out URLs which don’t match domain names for this site
- Tracker: cache the list of URLs in cache/tracker/\* array via Common.fetchWebsiteAttributes hook

See also related: #2375

Contributor

robocoder commented Mar 7, 2009

Requirements:
- UI to enter domain name(s) for this site, e.g., example.com, www.example.com, example.subhosting.com, subhosting.com/example/
- Tracker: Filter out URLs which don’t match domain names for this site
- Tracker: cache the list of URLs in cache/tracker/\* array via Common.fetchWebsiteAttributes hook

See also related: #2375

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Mar 11, 2009

Member

There is now a mechanism that is used to cache site- data in files to be loaded by piwik.php tracker code. That wouldn’t add the lookup at tracking time.

on the UI side we already ask for multiple URL alias for the website. we could simply add a checkbox (disabled by default): “Exclude all visits that do not load the Piwik code from one of these URLs”.

I agree with vipsoft suggestion of reporting malicious activity, but not in V1.

Member

mattab commented Mar 11, 2009

There is now a mechanism that is used to cache site- data in files to be loaded by piwik.php tracker code. That wouldn’t add the lookup at tracking time.

on the UI side we already ask for multiple URL alias for the website. we could simply add a checkbox (disabled by default): “Exclude all visits that do not load the Piwik code from one of these URLs”.

I agree with vipsoft suggestion of reporting malicious activity, but not in V1.

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jan 17, 2011

Member

Also, the http referer should be checked and it should be non empty, and have one of the known domain URL.

Member

mattab commented Jan 17, 2011

Also, the http referer should be checked and it should be non empty, and have one of the known domain URL.

@robocoder

This comment has been minimized.

Show comment
Hide comment
@robocoder

robocoder Jan 26, 2011

Contributor

The basic check is on the url parameter in the request.

The Referer check has to be separately enabled/disabled to accommodate use cases, such as:

  • when the visited page is https, but the tracker is http (in which case, the Referer is empty)
  • to mitigate undercounting visits when user agents block the Referer via add-on / privacy setting
Contributor

robocoder commented Jan 26, 2011

The basic check is on the url parameter in the request.

The Referer check has to be separately enabled/disabled to accommodate use cases, such as:

  • when the visited page is https, but the tracker is http (in which case, the Referer is empty)
  • to mitigate undercounting visits when user agents block the Referer via add-on / privacy setting
@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Aug 7, 2012

Member

See related #2375 which may be done at the same time

Member

mattab commented Aug 7, 2012

See related #2375 which may be done at the same time

@gaumondp

This comment has been minimized.

Show comment
Hide comment
@gaumondp

gaumondp Mar 14, 2014

I can't think of a more important feature than this one.

Data integrity is way more important than anything I can think of.

Dali

gaumondp commented Mar 14, 2014

I can't think of a more important feature than this one.

Data integrity is way more important than anything I can think of.

Dali

@jasonbukowski

This comment has been minimized.

Show comment
Hide comment
@jasonbukowski

jasonbukowski Mar 28, 2014

Agreed. There are several upcoming features/fixes I am looking forward to, but I cant help but ask myself how important new functionality may be when the underlying data is so openly exposed to corruption from any malicious 3rd party.

jasonbukowski commented Mar 28, 2014

Agreed. There are several upcoming features/fixes I am looking forward to, but I cant help but ask myself how important new functionality may be when the underlying data is so openly exposed to corruption from any malicious 3rd party.

@anonymous-piwik-user

This comment has been minimized.

Show comment
Hide comment
@anonymous-piwik-user

anonymous-piwik-user May 21, 2014

How's the status of this issue?
Is it already integrated to the current Piwik or it is still possible for anyone copy and pasting the tracking js code to spam the piwik tracking db?

anonymous-piwik-user commented May 21, 2014

How's the status of this issue?
Is it already integrated to the current Piwik or it is still possible for anyone copy and pasting the tracking js code to spam the piwik tracking db?

@anonymous-piwik-user

This comment has been minimized.

Show comment
Hide comment
@anonymous-piwik-user

anonymous-piwik-user May 28, 2014

I think don't having an option to exclude these spammy 3rd party sites is a major issue.
This option should have been available a long time ago!

It would be really important to have this implemented.

anonymous-piwik-user commented May 28, 2014

I think don't having an option to exclude these spammy 3rd party sites is a major issue.
This option should have been available a long time ago!

It would be really important to have this implemented.

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab May 30, 2014

Member

No it is not a major issue. But, there are also 400 other tickets opened or so. If you need this implemented soon, then please consider sponsoring this development.

Member

mattab commented May 30, 2014

No it is not a major issue. But, there are also 400 other tickets opened or so. If you need this implemented soon, then please consider sponsoring this development.

@mnapoli

This comment has been minimized.

Show comment
Hide comment
@mnapoli

mnapoli Mar 20, 2015

Contributor

There are several things to consider I guess:

  1. prevent a 3rd party to track visits on a 3rd party website:
    • in that case we check that the tracked URL is in the authorized domain list
  2. prevent a 3rd party to track visits on the tracked website: e.g. I could spam the demo to record 1000 fake visits on piwik.org homepage
    • here it's much more difficult because I see no way to differentiate between a normal tracker request and malicious tracker request? The referrer can be faked for example. One way would be to issue a token in piwik.js but we would have to make that token limited in time and unique for each piwik.js. But then again I can visit piwik.org, grab the token and spam demo.piwik.org with fake visits using that token…

I don't see a lot of value in fixing 1 if 2 is not fixed.

Contributor

mnapoli commented Mar 20, 2015

There are several things to consider I guess:

  1. prevent a 3rd party to track visits on a 3rd party website:
    • in that case we check that the tracked URL is in the authorized domain list
  2. prevent a 3rd party to track visits on the tracked website: e.g. I could spam the demo to record 1000 fake visits on piwik.org homepage
    • here it's much more difficult because I see no way to differentiate between a normal tracker request and malicious tracker request? The referrer can be faked for example. One way would be to issue a token in piwik.js but we would have to make that token limited in time and unique for each piwik.js. But then again I can visit piwik.org, grab the token and spam demo.piwik.org with fake visits using that token…

I don't see a lot of value in fixing 1 if 2 is not fixed.

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Mar 20, 2015

Member

there is value in (1) because sometimes it can be simple human error that a wrong tracking code is set (or the wrong idsite) and it records wrong data in the UI. When this happens currently there is no way for users to filter out the traffic, which creates bad data (while it's not their fault). Once fixed it'll be just enabling a per-website setting "Only record data from the website URLs" (disabled by default).

i've noticed that quite a few users have asked for this feature in the forums in the past...

maybe we could release it as an open source plugin on the Marketplace? it would be a nice use case for a plugin that adds a new per-website setting and make this easy for developers (maybe we also wait first for the Admin screen redesign in #7492).

Member

mattab commented Mar 20, 2015

there is value in (1) because sometimes it can be simple human error that a wrong tracking code is set (or the wrong idsite) and it records wrong data in the UI. When this happens currently there is no way for users to filter out the traffic, which creates bad data (while it's not their fault). Once fixed it'll be just enabling a per-website setting "Only record data from the website URLs" (disabled by default).

i've noticed that quite a few users have asked for this feature in the forums in the past...

maybe we could release it as an open source plugin on the Marketplace? it would be a nice use case for a plugin that adds a new per-website setting and make this easy for developers (maybe we also wait first for the Admin screen redesign in #7492).

@gaumondp

This comment has been minimized.

Show comment
Hide comment
@gaumondp

gaumondp Mar 20, 2015

@mnapoli maybe for #2 add a maximum actions per visitor value in a certain duration would help ?

Someone using httrack software to download a whole site will get as many hits as you got pages. As a webmaster, I'm not too happy to see +200 actions from a single user in 15 minutes when you have 10 000 pages... But right now I live with it.

I really don't know the complexity behind the general idea but let's be frank, fake visits are nasty.

gaumondp commented Mar 20, 2015

@mnapoli maybe for #2 add a maximum actions per visitor value in a certain duration would help ?

Someone using httrack software to download a whole site will get as many hits as you got pages. As a webmaster, I'm not too happy to see +200 actions from a single user in 15 minutes when you have 10 000 pages... But right now I live with it.

I really don't know the complexity behind the general idea but let's be frank, fake visits are nasty.

@mattab mattab modified the milestones: Short term, Mid term Mar 25, 2015

@barbushin barbushin self-assigned this Jul 8, 2015

@barbushin

This comment has been minimized.

Show comment
Hide comment
@barbushin

barbushin Jul 8, 2015

Contributor

I read all comments, and I have 2 ideas how we can implement it:

  1. Like this
    image
  2. Or in more compact way
    image

I think the good place for that option is Admin - Manage Websites - Edit Site Form.

Guys, what do you think?

Contributor

barbushin commented Jul 8, 2015

I read all comments, and I have 2 ideas how we can implement it:

  1. Like this
    image
  2. Or in more compact way
    image

I think the good place for that option is Admin - Manage Websites - Edit Site Form.

Guys, what do you think?

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jul 9, 2015

Member

@barbushin we already have the feature to exclude users based on user agents, see Administration > Websites:

user agent exclude

We don't need feature to exclude visits based on referrer.

I think this feature could be simply done as a new checkbox, on a per-website basis, something like this:

new feature

what do you think?

Notes:

Member

mattab commented Jul 9, 2015

@barbushin we already have the feature to exclude users based on user agents, see Administration > Websites:

user agent exclude

We don't need feature to exclude visits based on referrer.

I think this feature could be simply done as a new checkbox, on a per-website basis, something like this:

new feature

what do you think?

Notes:

@JonasDoebertin

This comment has been minimized.

Show comment
Hide comment
@JonasDoebertin

JonasDoebertin Jul 9, 2015

@mattab That looks amazing! 👍

JonasDoebertin commented Jul 9, 2015

@mattab That looks amazing! 👍

@barbushin

This comment has been minimized.

Show comment
Hide comment
@barbushin

barbushin Jul 9, 2015

Contributor

@mattab That's nice to keep it simple and easy to use, but what about subdomains?

Contributor

barbushin commented Jul 9, 2015

@mattab That's nice to keep it simple and easy to use, but what about subdomains?

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jul 9, 2015

Member

Good point, I reckon we allow all subdomains automatically as well, and rename the checkboxes/inline help to clarify this Only track visits for actions on any of the website URLs: $URLs_here (and their subdomains)

Member

mattab commented Jul 9, 2015

Good point, I reckon we allow all subdomains automatically as well, and rename the checkboxes/inline help to clarify this Only track visits for actions on any of the website URLs: $URLs_here (and their subdomains)

@barbushin

This comment has been minimized.

Show comment
Hide comment
@barbushin

barbushin Jul 9, 2015

Contributor

@mattab And what if somebody just using multiple site instances on one domain, like dev.piwik.com, stage.piwik.com.. and piwik.com for production, and he don't want to setup different Piwik integration options for different environments?

Contributor

barbushin commented Jul 9, 2015

@mattab And what if somebody just using multiple site instances on one domain, like dev.piwik.com, stage.piwik.com.. and piwik.com for production, and he don't want to setup different Piwik integration options for different environments?

@JonasDoebertin

This comment has been minimized.

Show comment
Hide comment
@JonasDoebertin

JonasDoebertin Jul 9, 2015

@barbushin Isn't it always a good practice to include your tracking code based on your environment? My local, dev or staging sites usually don't load the tracking code at all.

JonasDoebertin commented Jul 9, 2015

@barbushin Isn't it always a good practice to include your tracking code based on your environment? My local, dev or staging sites usually don't load the tracking code at all.

@barbushin

This comment has been minimized.

Show comment
Hide comment
@barbushin

barbushin Jul 9, 2015

Contributor

@JonasDoebertin Of course it's not a good practice :) But how we can be sure that everybody are as smart as you?

Contributor

barbushin commented Jul 9, 2015

@JonasDoebertin Of course it's not a good practice :) But how we can be sure that everybody are as smart as you?

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jul 9, 2015

Member

@barbushin I think for our MVP version we can track all subdomains (to KISS / Keep It Simple & Stupid). If users ask for possibility to "not track" sub-domains, we could re-visit our product vision?

Member

mattab commented Jul 9, 2015

@barbushin I think for our MVP version we can track all subdomains (to KISS / Keep It Simple & Stupid). If users ask for possibility to "not track" sub-domains, we could re-visit our product vision?

@JonasDoebertin

This comment has been minimized.

Show comment
Hide comment
@JonasDoebertin

JonasDoebertin Jul 9, 2015

@barbushin We can't. But this is something you already had to think about all the time (for nearly any other analytics service, as well).

JonasDoebertin commented Jul 9, 2015

@barbushin We can't. But this is something you already had to think about all the time (for nearly any other analytics service, as well).

@barbushin

This comment has been minimized.

Show comment
Hide comment
@barbushin

barbushin Jul 9, 2015

Contributor

@mattab Okay, so all we need is to add checkbox Only track visits for actions on any website URLs. But I'm not sure that I clearly understand what does it mean. Is it validation that referer equals to any of listed URLs?

Contributor

barbushin commented Jul 9, 2015

@mattab Okay, so all we need is to add checkbox Only track visits for actions on any website URLs. But I'm not sure that I clearly understand what does it mean. Is it validation that referer equals to any of listed URLs?

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Jul 9, 2015

Member

I think the validation should check &url= tracking API parameter (not urlref) and check that this is part of one of the URLs set for this website.

Member

mattab commented Jul 9, 2015

I think the validation should check &url= tracking API parameter (not urlref) and check that this is part of one of the URLs set for this website.

@barbushin barbushin removed their assignment Jul 23, 2015

@diosmosis diosmosis modified the milestones: 2.15.0, Short term Aug 26, 2015

@diosmosis diosmosis closed this in #8345 Aug 26, 2015

diosmosis added a commit that referenced this issue Aug 26, 2015

Merge pull request #8345 from piwik/588_urls_whitelist_2
Fixes #588, add option to ignore actions w/ URLs that are not for the website during tracking.

@mattab mattab changed the title from New admin setting: whitelist website URLs or hosts allowed to tracked visits to New website setting: Only track visits and actions when the action URL starts with one of the above URLs Oct 13, 2015

@mattab

This comment has been minimized.

Show comment
Hide comment
Member

mattab commented Nov 22, 2015

@tsteur

This comment has been minimized.

Show comment
Hide comment
@tsteur

tsteur Dec 7, 2015

Member

I just issued a PR #9358 to no longer match the subdomain as it was not mentioned in the UI and it was unclear. On the other side if any of these URLs specify a path we will now also check whether the given path is actually present

Member

tsteur commented Dec 7, 2015

I just issued a PR #9358 to no longer match the subdomain as it was not mentioned in the UI and it was unclear. On the other side if any of these URLs specify a path we will now also check whether the given path is actually present

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment