Skip to content

natrys/link-checker-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

I started with this:

#!/bin/sh

while read url; do
  curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36" -sSfL >/dev/null 2>&1 $url || echo $url
done

But suffice it to say, it wasn’t finishing any time soon (my urls file is big). Instead of mucking with process pool in shell or something (or even use gnu parallel), took this chance to survey and learn about the Rust async I/O and http stack. I think it’s not fully without its pain points yet.

Anyway, this takes in a bulk of urls from stdin, and prints out only those that are faulty (e.g timeouts or don’t give back 200 OK).

cat urls | link-checker | tee faulty

Original order is preserved in output despite inherent asynchronicity, so to instead get back the links that are valid, one could use comm from coreutils, e.g.:

cat urls | link-checker > faulty
comm -23 --nocheck-order urls faulty > valid

About

Asynchronously check validity of list of URLs

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages