Convert some (particular) URLs to an EPUB

I want to read several of the pages in the examples category in the High Scalability blog, enough that I felt a bit sick when I started opening tabs.

Since I recently got a Kindle Paperwhite I'm really liking (here's another recent EPUB related project for it) I thought Can't I just convert these pages into a book?

The process

A list of URLs is available in the file urls, which is parsed via…
The doit.awk script (name inspiration), by running ./doit.awk urls (you need gawk).

The script, in turn:

Reads URLs line by line from the urls file;
Creates a neat orderable filename based on each URL;
Creates a pandoc command to fetch from the URL into the filename;
Runs a clean.awk script that purges anything in the markdown file outside the body of the post;
Sleeps 15 seconds to avoid being nasty to the destination server;
Creates an EPUB from the cleaned-up markdowns sorted in lexicographical order.

Why did you use AWK to do this?

I like AWK, and iterating through "arrays" in bash scripts or makefiles is a pain (imagine Captain Kirk screaming xargs instead of Khan), whereas iterating through lines in AWK and running a command is natural. I have skipped checking for errors in the pandoc commands (you can get the output code from system as a return value, though) for brevity, but seriously, AWK is very convenient when you have something quick, dirty and where you know bash is going to be a pain.

Some caveats

The parsing in clean.awk is tied specifically to the formatting in High Scalability: if you want to use your own URLs, comment all the system commands in doit.awk (# for comments in AWK), run manually the pandoc extraction and check what the markdown looks like, then parse accordingly.

I have only checked the first 3-4 posts for consistency in the generated markdown/EPUB. Since they were OK my expectation is that all are OK which is as good as it gets until I read them all. Caveat emptor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

markdown

markdown

output

output

README.md

README.md

clean.awk

clean.awk

doit.awk

doit.awk

messages.awk

messages.awk

urls

urls

Repository files navigation

Convert some (particular) URLs to an EPUB

The process

Why did you use AWK to do this?

Some caveats

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
markdown		markdown
output		output
README.md		README.md
clean.awk		clean.awk
doit.awk		doit.awk
messages.awk		messages.awk
urls		urls

rberenguel/urls-to-epub-with-pandoc

Folders and files

Latest commit

History

Repository files navigation

Convert some (particular) URLs to an EPUB

The process

Why did you use AWK to do this?

Some caveats

About

Resources

Stars

Watchers

Forks

Languages