This is a scraper designed to take the contents of the online textbook "Introductory Chemistry: A Foundation", 8th edition. The PDFs generated by this tool are intended for personal, non-commercial use only, by limiting access to people who have access to the online textbook ;).
Before starting, please note that it can take upwards of 30 minutes for the scraper to complete. It is not an instant process.
- Download and install PhantomJS for your operating system. You can also install it with npm if you prefer. As root:
npm i -g phantomjs-prebuilt
- Download or clone this repo (click on clone or download, then download zip). If it's a zip, you'll need to extract it before use.
- Run
phantomjs main.js yourusername@example.com pAs5w0rD true
from the cloned directory to begin scraping. Use your username and password instead of the example. Remove the true at the end if you want it to not show answers by default.
Note: If you see any "JSON Parse error" messages during the scraping, do not worry. This is a bug with MindTap and does not affect the scraping process.
After following these steps, separate PDF files for every "page" in the online textbook will be located in the pdfs
directory.
Sometimes, there will be issues and the program will hang at the table of contents step, or all the pdfs will be copies of the chapter 1 table of contents. If either of these happen, just restart the script.
You can use whatever tool you want to combine the PDFs created, but if you're in Linux (might work on mac too) an easy way to do it is to run the following command from the pdfs directory (make sure GhostScript is installed):
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dFastWebView -dPDFSETTINGS=/screen -sOutputFile=../introductory-chemistry-a-foundation.pdf $(ls | sort -V)