Skip to content
This repository has been archived by the owner on Aug 15, 2023. It is now read-only.
/ archive-text-urls Public archive

Parse freeform text files looking for plausible URLs to archive (abandoned)

Notifications You must be signed in to change notification settings

gwern/archive-text-urls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

It occurred to me once that it might be neat to have a CLI tool which would parse a text file for strings like "http://", find the longest valid URL (eg "see http://www.google.com " can be easily turned into the correct URL just by starting at "http" and eating until you reach " ", which is not good in a URL without having been escaped as "%20"). So I did a little work on such a tool. It didn't work well.

My ultimate solution was to realize that I only cared about the URLs in my Markdown files, and to sit down and write a Pandoc script to parse the Markdown and extract URLs. See <http://www.gwern.net/haskell/link-extractor.hs>.

About

Parse freeform text files looking for plausible URLs to archive (abandoned)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published