Skip to content

othellodesutter/archiveripper

 
 

Repository files navigation

Archive.org Ripper

This script lets you download books page-by-page from archive.org in the event that there is no PDF link. After the script has downloaded all the pages separately, it stitches the images to one pdf file. Any book with a <14 day loan period is like this, as you can see:

Credentials

The script needs your login credentials to borrow the book, then it will run on its own using your session. If you plan on using the script more than once, you can store your email and password in a config.py file (in the same directory) with the following structure:

config = {
    'email': 'your email',
    'password': 'your password'
}

Do not use this program in an illegal manner. Thanks!

Screenshots

Current bugs

  • When downloading books with multiple hundreds of pages, after a while, the downloaded images are only 4Kb and are not the usual image format (UnidentifiedImageError). I have to write a function which detects this and start over from the first page where the download went wrong.

Planned Features

  • Apply OCR on the stitched PDF
  • Searching for books instead of inputting id directly
  • GUI
  • Option to convert epub

About

Download borrowed books from archive.org

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%