Skip to content
This repository has been archived by the owner on Feb 16, 2021. It is now read-only.

Plugin to clean URLs #7

Closed
wooorm opened this issue Feb 22, 2020 · 7 comments
Closed

Plugin to clean URLs #7

wooorm opened this issue Feb 22, 2020 · 7 comments
Labels
🧘 status/waiting This may go somewhere but needs more information

Comments

@wooorm
Copy link
Member

wooorm commented Feb 22, 2020

Something like https://github.com/Smile4ever/Neat-URL

Would be nice to run this on user-generated content, e.g., on the unified explore page where we render readmes, to remove tracking for our users.

@wooorm wooorm added the 🧘 status/waiting This may go somewhere but needs more information label Feb 22, 2020
@ericdmoore
Copy link

@wooorm Did you think this might be:

  • removing all query params?
  • removing only query params that match a "removal list"?
  • removing all query params, except for domains that are on a "leave alone list"?

@wooorm
Copy link
Member Author

wooorm commented Oct 15, 2020

I was thinking of b)! to remove pesky analytics and such. Although, that might come with c) too.
a) would be easier, but probably breaks stuff because too much is removed? 🤔

@ericdmoore
Copy link

Background

I was thinking of using this as an exercise in building a plugin. Just wanted some direction on implementation ideas if you had given it any thought.

Overview

I agree on using both a "keep" and "remove" list approach. Since the "bad actors list" can grow without limits it suggests to me to first start with the "leave alone domains" list given its seeming higher level of stability. This list likely starts with things like the url patterns from shields.io etc

removal list might start with:

  • utm_source
  • utm_medium
  • utm_campaign
  • utm_term
  • utm_content
  • plus other easily researched query string based tracking params

Would you recommend adding anything to the vfile.messages or vfile.info for "could not clean up this link" messages? I am still getting familiarized with the culturally appropriate spot to put something like that.

@wooorm
Copy link
Member Author

wooorm commented Oct 15, 2020

Nice: fun, and useful!

If I’d do it, I’d look at neat-url linked above. E.g., they seem to have the patterns! Interestingly, it seems they’re not checking domains, just stripping params?

I don’t think you should add messages to vfile by default. My use case was to run it on a ton of user-generated content.
But you could do it, maybe optionally? But then I’m wondering: what use is the message that some GA stuff was removed? Do you expect false-positives?

@wooorm
Copy link
Member Author

wooorm commented Oct 15, 2020

Oh wait, I guess their @ syntax includes optionally a domain: cvid@bing.com removes that on bing addresses!

@ericdmoore
Copy link

Thanks. Great nudge.

@ChristianMurphy
Copy link
Member

Thanks for starting the discussion @wooorm !
We're in the process unifying ideas in with discussions unifiedjs/collective#44
If you'd like to continue this thread, or start a new one https://github.com/rehypejs/rehype/discussions/categories/ideas will be the home for ideas going forward.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🧘 status/waiting This may go somewhere but needs more information
Development

No branches or pull requests

3 participants