Download, convert and organize Gutenberg books for eBook Readers
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README
TODO
bulkdownload.py
gutenberg.py
toss.py

README

This is a set of python scripts which downloads all 
Dutch ebooks from Project Gutenberg, renames them to
human-readabele filenames, formats them so they display well 
on my ebook reader, and tosses them into subdirectories for 
easier navigation.

Written by Michiel Overtoom, motoom@xs4all.nl

How to use:

- Run bulkdownload.py to download the raw texts from a mirror of Project Gutenberg's eBook archive.
- Run gutenberg.py to reformat and rename the raw texts.
- Run toss.py to distribute them over subdirectories.

After that, upload them to your eBook reader, and enjoy!

In March 2016 I reworked this program since it's no longer allowed to scrape
from Gutenberg's main web site. This newer version:

- downloads from a mirror instead of scraping from Gutenberg's main web site
- language can be specified
- better input encoding detection
- outputs UTF8 encoded text files