Archive.org Ripper

This script lets you download books page-by-page from archive.org in the event that there is no PDF link. After the script has downloaded all the pages separately, it stitches the images to one pdf file. Any book with a <14 day loan period is like this, as you can see:

Credentials

The script needs your login credentials to borrow the book, then it will run on its own using your session. If you plan on using the script more than once, you can store your email and password in a config.py file (in the same directory) with the following structure:

config = {
    'email': 'your email',
    'password': 'your password'
}

Do not use this program in an illegal manner. Thanks!

Screenshots

Current bugs

When downloading books with multiple hundreds of pages, after a while, the downloaded images are only 4Kb and are not the usual image format (UnidentifiedImageError). I have to write a function which detects this and start over from the first page where the download went wrong.

Planned Features

Apply OCR on the stitched PDF
Searching for books instead of inputting id directly
GUI
Option to convert epub

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api.py		api.py
archive.png		archive.png
explorer.png		explorer.png
requirements.txt		requirements.txt
ripper.py		ripper.py
screenshot.png		screenshot.png
stitcher.py		stitcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Archive.org Ripper

Credentials

Screenshots

Current bugs

Planned Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Archive.org Ripper

Credentials

Screenshots

Current bugs

Planned Features

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages