Skip to content

rberenguel/urls-to-epub-with-pandoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Convert some (particular) URLs to an EPUB

I want to read several of the pages in the examples category in the High Scalability blog, enough that I felt a bit sick when I started opening tabs.

Since I recently got a Kindle Paperwhite I'm really liking (here's another recent EPUB related project for it) I thought Can't I just convert these pages into a book?

The process

  • A list of URLs is available in the file urls, which is parsed via…
  • The doit.awk script (name inspiration), by running ./doit.awk urls (you need gawk).

The script, in turn:

  • Reads URLs line by line from the urls file;
  • Creates a neat orderable filename based on each URL;
  • Creates a pandoc command to fetch from the URL into the filename;
  • Runs a clean.awk script that purges anything in the markdown file outside the body of the post;
  • Sleeps 15 seconds to avoid being nasty to the destination server;
  • Creates an EPUB from the cleaned-up markdowns sorted in lexicographical order.

Why did you use AWK to do this?

I like AWK, and iterating through "arrays" in bash scripts or makefiles is a pain (imagine Captain Kirk screaming xargs instead of Khan), whereas iterating through lines in AWK and running a command is natural. I have skipped checking for errors in the pandoc commands (you can get the output code from system as a return value, though) for brevity, but seriously, AWK is very convenient when you have something quick, dirty and where you know bash is going to be a pain.

Some caveats

The parsing in clean.awk is tied specifically to the formatting in High Scalability: if you want to use your own URLs, comment all the system commands in doit.awk (# for comments in AWK), run manually the pandoc extraction and check what the markdown looks like, then parse accordingly.

I have only checked the first 3-4 posts for consistency in the generated markdown/EPUB. Since they were OK my expectation is that all are OK which is as good as it gets until I read them all. Caveat emptor.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages