Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
This is a set of python scripts which downloads all Dutch ebooks from Project Gutenberg, renames them to human-readabele filenames, formats them so they display well on my ebook reader, and tosses them into subdirectories for easier navigation. Written by Michiel Overtoom, firstname.lastname@example.org How to use: - Run bulkdownload.py to download the raw texts from a mirror of Project Gutenberg's eBook archive. - Run gutenberg.py to reformat and rename the raw texts. - Run toss.py to distribute them over subdirectories. After that, upload them to your eBook reader, and enjoy! In March 2016 I reworked this program since it's no longer allowed to scrape from Gutenberg's main web site. This newer version: - downloads from a mirror instead of scraping from Gutenberg's main web site - language can be specified - better input encoding detection - outputs UTF8 encoded text files