Adds data-exclude support and localStorage.plausible_ignore support #489

Vigasaurus · 2020-12-19T02:08:02Z

Changes

Adds support in the plausible.js script for exlcuding specific routes from couting towards events, as well as preventing all plausible events if a localStorage flag plausible_ignore is present.

@ukutaht This isn't technically done yet, I wanted your feedback on methodology and usability before finalizing it. I also am mostly unsure what the p.js script is for/when its used (I found no meaningful reference to it) - so I'm unsure if my changes need to be added to it.

Once we can agree on the format/the method - I can write up the docs/changelog as needed.

Basic premise is that we have two new exclusions on the client-side, similar to localhost or phantom.
First, localStorage.plausible_ignore - if this exists at all (i.e. is truthy in its string representation), all plausible events are prevented.
Second, if a string is passed in to data-exclude with the format data-exclude="/tou, /rule/*, /how-to-play, /*/admin, */secret, */priv/*" pages that match those routes will be excluded. In this example, we read those routes as the following table. I decided to use a format like this in pseudo-regex to make it a bit more usable/easy to understand for an end user - lmk if you'd prefer just a straight regex or something.

data-exclude input	parsed RegEx
`/tou`	`/^\/tou\/?$/`
`/rule/*`	`/^\/rule\/.*\/?$/`
`/how-to-play`	`/^\/how-to-play\/?$/`
`/*/admin`	`/^\/.*\/admin\/?$/`
`*/secret`	`/^.*\/secret\/?$/`
`/priv/`	`/^.\/priv\/.\/?$/`

Also, a sidenote - it seems like the plausible script has creeped above 1KB - sitting at 1268B (1.24 KiB = 1.27 KB) on master - and with this update it creeps up to 1591B (1.55 KiB = 1.59 KB). I don't think it super matters, since we are adding features to get to this size - but I just figure some branding/copy on the landing might need some updating to match it.

Tests

This PR does not require tests

Changelog

Entry has been added to changelog

Documentation

Docs have been updated
Adds docs for client-side exclusions docs#45

ukutaht · 2020-12-21T14:03:21Z

Looks good overall @Vigasaurus

p.js is deprecated but kept around to support some early users. No need to worry about it. If someone using that script wants access to new features, they are welcome to use the new plausible.js script.

First, localStorage.plausible_ignore - if this exists at all (i.e. is truthy in its string representation), all plausible events are prevented.

I haven't tested yet but my only worry is someone doing localStorage.plausible_ignore = false to disable it and getting confused because that evaluates to a truthy value in our script? Not sure how localStorage serialization works but setting localStorage.plausible_ignore = false should make the script count again.

I decided to use a format like this in pseudo-regex to make it a bit more usable/easy to understand for an end user - lmk if you'd prefer just a straight regex or something.

I think it's great, much preferred to regular Regex syntax and I think using stars in URLs is very widely accepted/supported.

Also, a sidenote - it seems like the plausible script has creeped above 1KB - sitting at 1268B (1.24 KiB = 1.27 KB) on master - and with this update it creeps up to 1591B (1.55 KiB = 1.59 KB). I don't think it super matters, since we are adding features to get to this size - but I just figure some branding/copy on the landing might need some updating to match it.

This is the raw size but we advertise the size with Brotli compression which is what you actually get over the wire. With compression it's probably not over 1KB yet.

Since adding page exclusions requires one to re-deploy their script anyways, I think we should add it as an extra script option that doesn't get included in the standard script. See how I've done hashMode and outboundLinks.

Vigasaurus · 2020-12-21T19:12:00Z

Coolio, thanks for the detailed feedback on all the PRs :D I'll get going on all the comments some time today or tomorrow 😀

I'm unsure about the formatting decision in tracker.ex - lmk.

This will certainly have conflicts with my other PRs related to the tracking scripts right now, I'll make one extra PR after both are done to ensure they're consolidated into the compiled scripts.

Vigasaurus · 2020-12-22T00:25:35Z

@ukutaht This should be ready to go - I'm happy with most everything here, but I'm unsure about the formatting in tracker.ex and the very long array in the map - do lmk if you'd prefer some different formatting there.

I'll get started on the docs update too.

ukutaht · 2020-12-22T09:27:23Z

Will review later today but wanted to jot down a thought so I'll remember to test later on.

What's the behaviour of the pseudo-regex with trailing slashes and subdirectories? What I mean is:

Exclusion rule	Actual URL	Excluded?
/page	/page	Yes
/page	/page/	Yes (?)
/page/	/page/	Yes
/page/	/page	No (?)
/page	/page/nested	No
/page/	/page/nested	No
/page/*	/page/nested	Yes
/page/*	/page	No (?)
/page/*	/page/	No (?)

The 'Excluded' column is what I intuitively expect to happen, question marks mean that I am not 100% certain this is the expected/correct behaviour. However it is implemented, it should be well documented and as easy to understand as possible.

Vigasaurus · 2020-12-22T09:42:09Z

Yeah this is definitely a sticking point for me too, as far as documentation is concerned. I feel as though a technical explanation of what the regex becomes would be best for power-users, but would likely confuse others. I think I kept it pretty well explained in plausible/docs#45 but it could definitely use more detail if you think its unclear. (Maybe this table you've laid out can help actually)

Honestly, I think the biggest confusion point will be trailing slashes - so maybe a note on there saying Don't put any trailing slashes in the rules! They happen automatically! or something to that effect could be beneficial. Or we could swap the trailing slash in the rule if it exists to always remove a provided trailing slash and swap it for our own optional one.

Exclusion rule	Actual URL	Excluded?	Actually Excluded?
/page	/page	Yes	Yes
/page	/page/	Yes (?)	Yes
/page/	/page/	Yes	Yes
/page/	/page	No (?)	No
/page	/page/nested	No	No
/page/	/page/nested	No	No
/page/*	/page/nested	Yes	Yes
/page/*	/page	No (?)	No
/page/*	/page/	No (?)	Yes (!)

Same table as yours, last column is whether or not the current implementation excludes it

So most of them work how you'd expect - the last one is the only one which doesn't and that's mostly just a decision to be made here. Do we allow zero-length *'s or no - I'd say yes, but up to you (it's a single-character fix if not).
Generally, I'd say that the trailing slash be fully optional like it is right now is best - since some browsers always add it, some always strip it, and others leave it the same - so I think making it always an option makes it most likely to match when it should, but again, up to you.

The way I tested these was by just extracting the regex constructor from the script and running it manually. Here's what it became if you want to try it at all.

"actual_pathname_to_test".match(new RegExp('^' + "excluded-string".trim().replace(/\*/g, '.*') + '\/?$'))

so I tested your examples by doing for example

"/page/nested".match(new RegExp('^' + "/page/*".trim().replace(/\*/g, '.*') + '\/?$'))

Vigasaurus · 2020-12-29T07:51:11Z

I'm actually going to make a change here (tomorrow, that's why I'm moving to draft) to make docs and explaining it simpler, in that the * in the provided pattern can't match a /, such that /page/* will only match /page/nested and not /page/nested/doublenested/etc and similarly /how-to-* will only match /how-to-play and /how-to-cook but not /how-to-cook/recipe. From some light research I've found this is a more consistent implementation with what stars in URLs usually mean elsewhere.

If this format all works well etc, I'll eventually add this same matching to page goals if wanted (for stuff like WooCommerce's /order/confirmation/<order_id>

Vigasaurus · 2021-01-04T07:10:49Z

This is now updated to not let the * match /s or whitespace. I'm updating the docs setup still, and will get some updated examples here as well. Once both are done I'll move this off draft, but it's otherwise done.

Vigasaurus · 2021-01-04T08:06:26Z

Honestly, I think the biggest confusion point will be trailing slashes

I added a note in the docs saying that trailing slashes are added automatically and are not needed in date-exclude.

Double Nesting

I've changed it such that a * in data-exclude cannot match a / or a whitespace character in the pathname, as such paths such as /admin/* now only matches /admin/1 and similar, but not /admin/1/edit nor /admin

the last one is the only one which doesn't and that's mostly just a decision to be made here. Do we allow zero-length *'s or no - I'd say yes, but up to you (it's a single-character fix if not).

More research found that zero-length asterisks in URLs in this format are non-standard, so I've changed them to be 1-length or greater.

Updated page path testing regex:

"actual_pathname_to_test".match(new RegExp('^' + "split-excluded-string".trim().replace(/\*/g, '[^\\s\/]+') + '\/?$'))

Overall, I added much more detail in plausible/docs#45, and an option for tables in the docs.

Please do let me know your thoughts :D

ukutaht · 2021-01-20T15:01:50Z

Again sorry about the super long wait on this :)

Overall this is in a great place. Good job on the documentation side as well.

I've changed it such that a * in data-exclude cannot match a / or a whitespace character in the pathname, as such paths such as /admin/* now only matches /admin/1 and similar, but not /admin/1/edit nor /admin

Sounds good. I think we should use bash glob as the main inspiration for the syntax here. It's well known. In glob, a regular asterisk does not match a forward slash as you've done here. But I think we should add a way to exclude a whole subdirectory.

In glob, this can be done with /admin/**. What do you think about adding ** as a way to match any character, including forward slash?

Also in glob both single and double asterisks do match a zero-length character. So in glob /page* matches with /page whereas in this implementation it doesn't seem to. Is there a good reason to require a character when * is specified?

If this format all works well etc, I'll eventually add this same matching to page goals if wanted (for stuff like WooCommerce's /order/confirmation/<order_id>

Love it. The same syntax could also be used to roll up pageviews in regular reporting (i.e. in the Top Pages report). I know many people want to group pageviews with similar URLs.

Vigasaurus · 2021-01-20T16:06:03Z

Love the ideas. I think the double glob mechanic would definitely be useful too (and should be easy to implement, since it's just the old implementation tacked on).

I have no particular issues with either implementation as far as zero-length goes - I think it just comes down to consistency and documentation.

I'll make the changes 👍

Zero Length is allowed
Double Glob
Docs update for the above

ukutaht

Looks good, requested a few changes. Once they're done I will merge this

lib/plausible_web/plugs/tracker.ex

tracker/src/plausible.js

metmarkosaric · 2021-01-26T12:26:40Z

Got a question on Twitter: they're using the same script on both the main domain and subdomain but want to exclude the / so the home pages doesn't get counted combined. Any thoughts?

https://twitter.com/tim_nolet/status/1354040020018077698

Vigasaurus · 2021-01-26T12:34:02Z

I'm a tiny bit confused by the use case/end goal, but can confirm that "/**" would exclude any path

If you have more details about what they're trying to exclude/the subdomain structure, I'm sure this could support it to an extent

metmarkosaric · 2021-01-26T12:36:20Z

thanks @Vigasaurus! i've asked them to clarify. btw if you exclude a specific page does that mean that any custom goals on that specific page won't be counting?

Vigasaurus · 2021-01-26T12:39:13Z

Nope, the exclusion only happens for triggering pageviews - outbound links and other custom events still fire.

Vigasaurus · 2021-01-26T12:41:36Z

Also reading more of the Twitter thread, the use case makes sense to me. I think excluding "/**" is definitely the way to do it for them, if they want all the marketing pages on the www or none subdomain to track normally, and then in app subdomain they run the full exclusion and fire custom events as needed.

metmarkosaric · 2021-01-26T12:43:10Z

thanks! so by excluding "/**" no visitors / pageviews will be counted both from the main domain and from the subdomain?

Vigasaurus · 2021-01-26T12:44:45Z

Well, I assume they have separate html outputs for each subdomain, so they'd likely only put the exclude on one of the subs' html - the main domain's html (the marketing site) should just not have any exclusion

metmarkosaric · 2021-01-26T12:46:33Z

ahh yes! i see. makes sense to me now. thanks!

Vigasaurus · 2021-01-26T12:47:48Z

Sweet! Feel free to ping me here or on the Twitter thread if they have other concerns/questions :D

metmarkosaric · 2021-01-26T12:50:40Z

they responded with more details so i think it's more clear now: https://twitter.com/tim_nolet/status/1354048423742341120

last question: if they do as described will the referral source still show for custom event conversions?

Vigasaurus · 2021-01-26T12:53:43Z

Mm, I don't think so, because from my understanding the referral source for custom events comes by linking the entry page+referrer for the entire session to the event session, and if there are no pageviews there are no entry pages or referrers. But to be honest I haven't looked a whole lot at how the referrer and sessions stuff works, so Uku would know better/more concretely.

metmarkosaric · 2021-01-26T12:54:17Z

ok cool, thanks!

ukutaht · 2021-01-26T13:12:48Z

I think the referrer stuff and everything should work normally. In their setup the normal funnel seems to:

Generate some pageviews on www.domain.com
Signup on app.domain.com

As long as they're sending the stats to the same dashboard, the session should stay intact and the referrer source for the initial pageview on www.domain.com will be visible in the dashboard.

metmarkosaric · 2021-01-26T13:13:55Z

perfect! thanks for confirming!

Vigasaurus · 2021-01-26T13:18:28Z

Oh yeah that makes sense! I guess the generated user hash would line up the same too since the data-domain would be the same.

I guess the break down would happen if someone was referred directly to app.domain.com/signup, and that wouldn't get any referrer on load, but that's probably a bit unlikely/could be planned around from a subdomain design standpoint

…lausible#489) * Adds data-exclude support and plausible_ignore support * Splits exclusion into separate script option * localStorage parsing upgrades * Additional script type additions I'm unsure about the formatting decision in tracker.ex - lmk. * Adds new compiled files This will certainly have conflicts with my other PRs related to the tracking scripts right now, I'll make one extra PR after both are done to ensure they're consolidated into the compiled scripts. * Moves localStorage blocker out of special script * Changelog * Second thoughts on localStorage exclusion * Updates `*` to not match `/` or whitespace * Fix formatting * Removes zero-length asterisks * Adds support for double glob, zero-length replacements * Update to reduce size+allow localStorage exclude Co-authored-by: Uku Taht <Uku.taht@gmail.com>

ukutaht · 2021-03-19T11:15:38Z

A (potential) bug raised by one of our clients:
Exclusion rule: /l/*
Actual page path: /l/some-page-here

Expected to match but this is the result:

"/l/testing".match(new RegExp("/l/*".trim().replace(/\*\*/g, '.*').replace(/[^\.]\*/g, '[^\\s\/]*') + '\/?$'))
<- null

@Vigasaurus any idea what's going on here?

Vigasaurus · 2021-03-19T14:37:47Z

@ukutaht Ah yeah that's actually a pretty big bug, that I caught when I was doing pageview goal globs, but completely forgot to update in the tracker script. Notice here the regex for the second replace was appropriately changed to a negative lookbehind, instead of the negative replacement, but in the tracker I didn't make this change. The appropriate new version in the tracker would be:

"/l/testing".match(new RegExp("/l/*".trim().replace(/\*\*/g, '.*').replace(/(?<!\.)\*/g, '[^\\s\/]*') + '\/?$'))
<- Array [ "/l/testing" ]

which is the correct result, notice the swap from .replace(/[^\.]\*/g, '[^\\s\/]*') to .replace(/(?<!\.)\*/g, '[^\\s\/]*'). Will PR shortly.

Adds data-exclude support and plausible_ignore support

b1b88d8

Vigasaurus marked this pull request as draft December 19, 2020 10:10

Vigasaurus mentioned this pull request Dec 21, 2020

Allows middle-click and ctrl+click outbound clicks to properly open as user intended #494

Merged

3 tasks

Vigasaurus added 4 commits December 21, 2020 17:51

Splits exclusion into separate script option

06bb3ad

localStorage parsing upgrades

52ef3a8

Additional script type additions

0437264

I'm unsure about the formatting decision in tracker.ex - lmk.

Adds new compiled files

e8e7d3b

This will certainly have conflicts with my other PRs related to the tracking scripts right now, I'll make one extra PR after both are done to ensure they're consolidated into the compiled scripts.

Vigasaurus marked this pull request as ready for review December 22, 2020 00:24

Vigasaurus added 3 commits December 21, 2020 23:12

Moves localStorage blocker out of special script

9807073

Changelog

5e35580

Second thoughts on localStorage exclusion

0e8dfb0

Vigasaurus mentioned this pull request Dec 22, 2020

Adds docs for client-side exclusions plausible/docs#45

Merged

Merge branch 'master' into 469-exclusions

123c0c0

Vigasaurus marked this pull request as draft December 29, 2020 07:51

Vigasaurus added 3 commits January 4, 2021 01:01

Updates * to not match / or whitespace

9472758

Merge branch 'master' into 469-exclusions

7dc2c76

Fix formatting

96b00ed

Removes zero-length asterisks

755e86c

Vigasaurus marked this pull request as ready for review January 4, 2021 08:06

Adds support for double glob, zero-length replacements

a754abb

ukutaht requested changes Jan 21, 2021

View reviewed changes

lib/plausible_web/plugs/tracker.ex Show resolved Hide resolved

tracker/src/plausible.js Outdated Show resolved Hide resolved

tracker/src/plausible.js Outdated Show resolved Hide resolved

Vigasaurus added 2 commits January 21, 2021 15:47

Update to reduce size+allow localStorage exclude

bf53142

Merge branch 'master' into 469-exclusions

1b9243b

ukutaht approved these changes Jan 22, 2021

View reviewed changes

Merge branch 'master' into 469-exclusions

23363e1

ukutaht merged commit fb7a3fe into plausible:master Jan 22, 2021

jonnitto mentioned this pull request Jan 24, 2021

New: Add exclusions option CarbonPackages/Carbon.Plausible#3

Merged

Vigasaurus mentioned this pull request Mar 19, 2021

Fixes exclude regex eating one character before single * #861

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds data-exclude support and localStorage.plausible_ignore support #489

Adds data-exclude support and localStorage.plausible_ignore support #489

Vigasaurus commented Dec 19, 2020 •

edited

Loading

ukutaht commented Dec 21, 2020

Vigasaurus commented Dec 21, 2020

Vigasaurus commented Dec 22, 2020 •

edited

Loading

ukutaht commented Dec 22, 2020

Vigasaurus commented Dec 22, 2020 •

edited

Loading

Vigasaurus commented Dec 29, 2020 •

edited

Loading

Vigasaurus commented Jan 4, 2021

Vigasaurus commented Jan 4, 2021

ukutaht commented Jan 20, 2021

Vigasaurus commented Jan 20, 2021

ukutaht left a comment

metmarkosaric commented Jan 26, 2021 •

edited

Loading

Vigasaurus commented Jan 26, 2021 •

edited

Loading

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021 •

edited

Loading

Vigasaurus commented Jan 26, 2021 •

edited

Loading

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021 •

edited

Loading

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021

metmarkosaric commented Jan 26, 2021

ukutaht commented Jan 26, 2021

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021

ukutaht commented Mar 19, 2021

Vigasaurus commented Mar 19, 2021

Adds data-exclude support and localStorage.plausible_ignore support #489

Adds data-exclude support and localStorage.plausible_ignore support #489

Conversation

Vigasaurus commented Dec 19, 2020 • edited Loading

Changes

Tests

Changelog

Documentation

ukutaht commented Dec 21, 2020

Vigasaurus commented Dec 21, 2020

Vigasaurus commented Dec 22, 2020 • edited Loading

ukutaht commented Dec 22, 2020

Vigasaurus commented Dec 22, 2020 • edited Loading

Vigasaurus commented Dec 29, 2020 • edited Loading

Vigasaurus commented Jan 4, 2021

Vigasaurus commented Jan 4, 2021

ukutaht commented Jan 20, 2021

Vigasaurus commented Jan 20, 2021

ukutaht left a comment

Choose a reason for hiding this comment

metmarkosaric commented Jan 26, 2021 • edited Loading

Vigasaurus commented Jan 26, 2021 • edited Loading

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021 • edited Loading

Vigasaurus commented Jan 26, 2021 • edited Loading

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021 • edited Loading

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021

metmarkosaric commented Jan 26, 2021

ukutaht commented Jan 26, 2021

metmarkosaric commented Jan 26, 2021

Vigasaurus commented Jan 26, 2021

ukutaht commented Mar 19, 2021

Vigasaurus commented Mar 19, 2021

Vigasaurus commented Dec 19, 2020 •

edited

Loading

Vigasaurus commented Dec 22, 2020 •

edited

Loading

Vigasaurus commented Dec 22, 2020 •

edited

Loading

Vigasaurus commented Dec 29, 2020 •

edited

Loading

metmarkosaric commented Jan 26, 2021 •

edited

Loading

Vigasaurus commented Jan 26, 2021 •

edited

Loading

Vigasaurus commented Jan 26, 2021 •

edited

Loading

Vigasaurus commented Jan 26, 2021 •

edited

Loading

Vigasaurus commented Jan 26, 2021 •

edited

Loading