Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adds 'custom-location' script extension (in response to #136) #1398

Closed
wants to merge 3 commits into from

Conversation

EmilePerron
Copy link
Contributor

@EmilePerron EmilePerron commented Oct 20, 2021

Changes

This PR gives users the possibility of specifying a custom tracking location for their pages via a new custom-location script extension.

The main advantage of this addition is that it allows users to aggregate and/or redact URLs to improve the privacy and usefulness of their analytics. This helps to resolve the issues mentioned in #136.

As discussed with @ukutaht in #136, the implementation of this feature works as follows:

If you want to provide a plain value (for ex. using the templating language from a server-generated page) you can use the standard configuration syntax:

<script data-domain="my-domain.com" data-location="https://my-domain.com/my-custom-location" src="https://plausible.io/js/plausible.custom-location.js"></script>

And if you want to generate it client-side:

<script>window.myLocationGetter = function() { return location.href + 'foo';  }</script>
<script data-domain="my-domain.com" data-get-location="myLocationGetter" src="https://plausible.io/js/plausible.custom-location.js"></script>

The contract would be that data-get-location will provide the name of the function and if it exists of window, it will get executed and the return value used as the location value.

Tests

  • This PR does not require tests

Changelog

  • Entry has been added to changelog

Documentation

  • Docs have been updated

The documentation PR is here: plausible/docs#136

@EmilePerron
Copy link
Contributor Author

I implemented the basic feature and tested it locally, and it works fine.

I'll be adding the documentation and changelog for this soon,

However, I do have a few things I'd like to hear your thoughts about:

How would you prefer to have this interact with the exclusions extension?
My thoughts is that the exclusions should still be checking the page's regular location (as it currently does), not its custom location (if one is provided). My thought process behind this is that checking for exclusion on the custom location would require usage of URL, which:

  • would add weight to the script;
  • would prevent users from using characters that are not necessarily URL friendly, but that are likely to be used, such as curly braces (ex.: https://dummy.site/project/{id}/settings);
  • would require an alternative when combined with compat, which would make the script much heavier (and likely more complex).
    Plus, it could require an additional change to make for users who already have exclusions set up, and who would like to add custom locations (if the new custom locations don't match their existing URL exclusion patterns).

Are there tests for the tracker?
I looked around and couldn't find any tests for the tracking scripts, so I'm wondering if I should write tests for this or not.

I'll wait for your thoughts on these two things before I complete this.

Cheers!

@EmilePerron
Copy link
Contributor Author

I added the changelog entry as well as the documentation for this new feature (in the following draft PR plausible/docs#136).

The only thing left would be to define how this should interact with exclusions, as mentioned in my previous comment.

@ukutaht any thoughts?

@ukutaht
Copy link
Contributor

ukutaht commented Oct 26, 2021

@EmilePerron The implementation looks good. I think not interacting with exclusions in any way is perfectly fine as well. Simple is good :)

I'm really sorry we didn't drill into this conversation before, but are there any benefits to doing this on the client side other than the fact that it's more convenient to implement at the moment?

Next year I plan to add a message queue which will give us the option to process incoming data based on user-defined rules. So in theory this could all be done from your site settings AND have the aggregation apply backwards with the tick of a button.

Technically it's a lot more difficult to accomplish this on the backend but I feel like that's a better way to do it for the user. Keeping in mind that the majority are non-technical.

Looking at the code and the docs, I started doubting whether doing adding the option client-side is the right move if we're going to deprecate this some time next year. We can never delete stuff, everything we put out there has to be supported until the end of time.

Even worse, we might never implement the best user experience because there's a 'good enough' workaround to do it if you're a developer.

Now, maybe there's a situation where doing the aggregation on the client side makes sense and I'm all ears about that possibility. Maybe if someone has personal data in the URL and doesn't want it to hit our server ever? This would be an extremely niche use-case and honestly someone who puts personal data in the URL probably doesn't care about privacy in the first place.

What are your thoughts about this?

Again, I feel very guilty about you putting this effort in and me bringing this up so late and I'm sorry about that. Supporting the whole project has been difficult lately and I've grown a lot more vigilant about protecting our support time. I'm concerned about putting temporary workarounds in place that have to be supported forever. I hope this provides some perspective.

@EmilePerron
Copy link
Contributor Author

@ukutaht Hey there!

No worries, I understand that maintaining a long-term project, especially one with a large commercial user base, is not something that can easily be done by a few people. Delays and accidental omissions are to be expected.

I agree that setting up aggregation/redaction from the site settings in Plausible would be much more convenient and user friendly. If there are already plans in place for a feature that would make that possible, I say we can close this MR and we can indeed wait for this upcoming implementation. It'll be much neater, and way more accessible.

As for personal information in the URLs, I was mostly referring to "semi-private" information, such as project or user IDs that could theoretically be used along with other metrics to work our way back to a specific user in the app, but I'm not sure how deep privacy laws go on that type of subject. But even in theory, that only applies to people who already have access to lots of data in the app, so I don't think that's really a concern/liability on Plausible's side.

TL;DR: I agree, let's scrap this MR (and the associated documentation MR) and go for the better feature. No hard feelings 😄

@metmarkosaric
Copy link
Contributor

thanks for your contribution @EmilePerron, appreciate it!

about user friendliness: i'm a non technical user so had a look through the instructions to see if i would be able to follow them on my own site (or whether i could help someone that emails us asking for help with this feature):

option 1: seems straightforward. basically i put a different Plausible snippet on pages i want to "rewrite" and in the data-location attribute i write the url i want this specific page to be displayed as in Plausible in Top Pages report. do i understand that correctly?

the main issue i see is how to add the custom-location snippet on only some pages while keeping the default snippet on all of the other pages on the site. this approach we haven't had until now. on CMS such as wordpress you place the same snippet in one place and it goes to all pages automatically and i don't know how to change that from page to page. but perhaps wordpress and other major CMS users are not the audience for this feature so it might be ok without going into this too much!?

option 2: if i understand correctly i could place that second line on all of my pages instead of the default snippet and then just add the first line on pages i want to "rewrite"? is that what "If the function is not defined, Plausible will fallback on using the current page's URL" means?

and then on pages i want to "rewrite" in addition to adding that first line, i change the myLocationGetter in the first line to what i want this page to be displayed as in Plausible in Top Pages report? is that correct?

the following part of the docs i'm not sure i understand how they fit especially as it looks like you already need to insert the specific code on all of the pages you want to "rewrite" and cannot just do this in one place for multiple pages:

All entries must begin with a /, and should not include the trailing slash as we account for this automatically.

  • Asterisks (*) expand to any stretch (of length >=0) of the page path and can be on either end or in the middle of any entry, but cannot be in the place of slashes.
  • Double asterisks (**) expand to any stretch (of length >=0) of the page path, can be on either end or in the middle of any entry, and can represent any characters, even slashes.

Also below this line there are no examples:

See below for examples of common page use cases and how they would function.

@ukutaht
Copy link
Contributor

ukutaht commented Oct 27, 2021

@EmilePerron For context, we discussed this with Marko and decided that if this is doable for him, it's probably doable for most of our users (he's not a developer but has experience running a wordpress site). I also re-read some of the discussion in the thread and many people were interested in a frontend solution to this as well.

So we're not quite ready to scrap this PR. If the developer/user experience is acceptable, we will merge this. We also have a need for the same feature on our live demo.

In terms of UX, I think the custom location thing sounded better than it actually is. It clearly wasn't easy to use for Marko. I have an alternative idea: adding some sort of route syntax would cover >99% use cases where URL aggregation is desirable. So common patterns like:

/users/1
/users/2
etc

Could be aggregated simply by providing:

<script data-url-patterns="/users/:id" src="plausible.url-patterns.js" ... ></script>

This sort of extension would automatically group the URLs into a single aggregated form. Any pageviews that match the pattern would be sent to the backend literally as /users/:id.

More examples:

/users/1/settings
/users/2/settings

<script data-url-patterns="/users/:id/settings" src="plausible.url-patterns.js" ... ></script>

Multiple rules could be combined with a comma for example:

<script data-url-patterns="/users/:id, /sites/:site_id/memberships/:id" src="plausible.url-patterns.js" ... ></script>

We already have something similar where patterns are matched on the frontend with exclusions: https://plausible.io/docs/excluding-pages.

Thoughts? @metmarkosaric @EmilePerron

@metmarkosaric
Copy link
Contributor

making it work like the exclusion extension sounds better to me! exclusion extension may not be the most popular feature we have but i don't remember anyone ever having an issue trying to figure out how to use it

@EmilePerron
Copy link
Contributor Author

@ukutaht @metmarkosaric The data-url-patterns option does sound like a more user-friendly approach for simpler use cases.

However, it lacks some of the leeway that the two options in place provide.

Also, in the case of a large website or a web app, specifying every possible URL pattern that needs to be aggregated in the data-url-patterns attribute could quickly get unwieldy and hard to maintain. In these scenarios, the more technical options currently implemented in the PR would likely be more appropriate.

My personal opinion is that the URL pattern option you mention, which works like the exclusion extension, should indeed be added and emphasized in the documentation as the best method for 99% of users. The two other options should still be left in, but could be clearly marked as being "advanced options" for technical users with many different URLs to aggregate.

I'm not usually a fan of having multiple different ways of implementing the same feature, but in this case I think each option has its own target audience and use cases.

Let me know what you think!

P.S.
As for the docs, it's my bad - I used another similar documentation page to get started and must've forgotten to remove some of the original content. I'll get that fixed.

@metmarkosaric
Copy link
Contributor

hmm what are the use cases that are not covered by the simple option but are covered by the advanced one? Plausible is focused on being simple so having a simple and advanced option of a very niche feature doesn't sound ideal. if we end up adding things like that in the future we risk ending up like the Google Analytics monster which we don't want

@EmilePerron
Copy link
Contributor Author

@metmarkosaric I'm mostly thinking of web apps and large websites with user-restricted sections.

For example, the project for which I want this feature is a small web app MVP at the moment.
But already, to aggregate URLs with the data-url-patterns, I would have the following Plausible script:

<script data-url-patterns="/project/:id/, /project/:id/checklist, /project/:id/testing, /project/:id/settings, /project/:id/settings/automated-testing, /team/:id, /team/:id/settings, /team/:id/leave" data-domain="yourdomain.com" src="https://plausible.io/js/plausible.aggregate-urls.js"></script>

It's not horrible, but it's not exactly ideal either:

  • If I change a section's URL, I must always remember to update the Plausible snippet as well.
  • Over the next year, for my web app, I expect the number of URL patterns will double (due to new features and sections being added). It'll therefore get more unwieldy and difficult to maintain overtime. And I'm just a small creator, with a fairly small application.

These are not major pain points, but I expect that they would likely be much more annoying for larger projects. And both of these pain points could be alleviated or completely resolved with a very simple technical implementation of data-location or data-get-location. Larger websites and apps of the sort usually have at least one developer behind them, so the technical knowledge isn't really a problem for them.

Still, I understand that your goal is to keep Plausible as simple as possible.

I'm voicing my point of view on this, but I absolutely understand if you decide to implement solely the simpler version of this feature. In any case, I'd be happy to help and to be able to use it in my projects.

Cheers!

@metmarkosaric
Copy link
Contributor

thanks for explaining @EmilePerron! i think as the simple method doesn't restrict any use case, it's fine to have it as the only option. it's best not to start allowing simple/advanced options for different features and always stick to the simplest one that most of our users will be able to understand/use. we already have an example with the exclusion script working this exact way and nobody has had issues with it, so i say we continue with this approach

@EmilePerron
Copy link
Contributor Author

Alright, simplest solution it is! I'll get to work on implementing this as soon as I get some time :)

@metmarkosaric
Copy link
Contributor

there's no hurry @EmilePerron and thanks very much for your contribution!

@ukutaht
Copy link
Contributor

ukutaht commented Nov 1, 2021

@EmilePerron I did just have an idea how to override the URL on the client side. I'm not sure how I didn't think of this before. I recently added a manual script extension that doesn't trigger pageviews automatically. So all we need is a way for the plausible function to accept a parameter to override the URL like so:

<script defer data-domain="yourdomain.com" src="https://plausible.io/js/plausible.manual.js"></script>
<script>
// define the `plausible` function to manually trigger events
window.plausible = window.plausible || function() { (window.plausible.q = window.plausible.q || []).push(arguments) }

// Trigger pageview with a custom location
plausible('pageview'. {u: '/my/custom/location'})

// Alternatively, use a function
plausible('pageview'. {u: getMyLocation()})
</script>

After writing this, it seems like the obvious way to accomplish a custom url. What do you think?

We could start with this and see what people think. It does require development skill but adding something more streamlined on top of this would not invalidate being able to override the url of an event. It's just a very basic and natural feature so it doesn't become an ultimatum between the two approaches we've discussed so far.

@EmilePerron
Copy link
Contributor Author

@ukutaht Great catch!

This definitely fills the need for a custom URL feature for larger websites and applications!

Plus, as you say, it doesn't conflict with the more streamlined option of data-url-patterns, so it's a win-win.

Should I still get to work on the data-url-patterns? Or would you prefer simply documenting this URL overriding option with the manual extension and waiting to see if there's still a demand for the simpler option afterwards?

@ukutaht
Copy link
Contributor

ukutaht commented Nov 2, 2021

Should I still get to work on the data-url-patterns? Or would you prefer simply documenting this URL overriding option with the manual extension and waiting to see if there's still a demand for the simpler option afterwards?

Overriding the url variable is not currently possible so you if you want to help you can implement and document that :) Otherwise I'm not sure where I can fit it into on my todo list.

I don't think we need to start with the patterns yet. Let's see if there's demand once we explain this option,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants