Skip to content

Removes various usually-irrelevant URLs and pieces (params, fragments)

Notifications You must be signed in to change notification settings

spekulatius/untrashurls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

untrashurls: Untrashes URL strings

Removes various usually irrelevant URL pieces from a file containing URLs or STDIN:

  • http://....:80/ => http://..../
  • https://....:443/ => https://..../
  • ?utm_* => /dev/null
  • ?att_* => /dev/null

The final result will automatically be deduplicated.

Usage

cat ./lots-of-urls | untrashurls

or

untrashurls --file ./lots-of-urls

filter-static-urls: The urinteresting-mode

Inspired by Tomnomnom's urinteresting there is a helper script to achieve a similar output:

cat ./lots-of-urls | untrashurls | filter-static-urls

or

untrashurls --file ./lots-of-urls | filter-static-urls

This will drop any static assets. For a detailed list of dropped extensions check the script.

More detailed checks are provided by urinteresting.

Installation

You'll need trurl to run untrashurls. You can build and install trurl yourself:

git clone git@github.com:curl/trurl.git /tmp/trurl
cd /tmp/trurl
make
mv /tmp/trurl/trurl /home/$USER/.local/
cd -
rm -rf /tmp/trurl

About

Removes various usually-irrelevant URLs and pieces (params, fragments)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages