Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove trackers embedded in URLs #47

Closed
wants to merge 6 commits into from

Conversation

Boruch-Baum
Copy link
Contributor

w3m--url-strip-queries and its associated defcustom
w3m-strip-queries-alist allow users to remove from URLs bogus queries
that don't query anything but instead send to the network tracking
data about the user's and/or behavior.

The default is pre-configured with a rather tame but common example
that tells websites that the user is arriving there from a newsfeed
instead of directly or via a search engine.

Honestly, I thought this had already been implemented somewhere, but I
couldn't find it, and it was easy to write...

@Boruch-Baum
Copy link
Contributor Author

With the default settings, the feature can be tested by having an rss reader use emacs-w3m to open a article on slashdot.

@emacs-w3m-notifier
Copy link

emacs-w3m-notifier commented Jun 14, 2019 via email

@Boruch-Baum
Copy link
Contributor Author

You're quite welcome, Vladimir.

Here are some of my notes for possible future related work. If you (or
anyone else reading this) know how to un-obfuscate or de-shorten URLs,
please let me know.

  • Modify the cdr spec of the data structure's elements to allow for a
    function instead of or in addition to a regex string. This would
    allow for features such as logging or replacing occurrences of a
    query, instead of just deleting a query.

    • General purpose logging could be used to collect tracker queries
      to be acted upon in the future.
  • Address other forms of embedded trackers:

  • Collect more common tracker queries for use in the default data
    structure.

    • My persoal internet usage has limited exposure, so maybe post to
      the mailing list asking for volunteers to supply additions
      regexes.

(note: this is a test of using magit forge M-x forge-create-post to
respond to a github comment from within emacs)

@Boruch-Baum
Copy link
Contributor Author

Here are two more regexes, this time from an email referrer, with an
explicit ID tokken:

(setq w3m-strip-queries-alist
  '(("^https?://.*" "&?utm_source=[^&]+")
    ("^https?://.*" "&?utm_medium=[^&]+")
    ("^https?://.*" "&?utm_campaign=[^&]+")
    ("^https?://.*" "&?email_source=[^&]+")
    ("^https?://.*" "&?email_token=[^&]+")
    ))

+ Strip them at first opportunity

+ The process handler functions were calling function
  w3m-w3m-dump-extra, and using its return values instead of
  respecting our stripped urls, so we need to perform the stripping
  another time for each call to w3m-w3m-dump-extra.
@yamaoka
Copy link
Contributor

yamaoka commented Jul 3, 2019

Merged to the modernize201906 branch. Thanks.

@yamaoka
Copy link
Contributor

yamaoka commented Jul 8, 2019

As having been merged to the git master, closing.

@yamaoka yamaoka closed this Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants