Extract documents list from Amazon Kindle webpage and save into a
How to use:
- Download (or build)
- Navigate to Manage your content and devices Amazon page
- Switch Show to Docs
- Scroll down to reach end of your list (or to see Show more button)
- Save the html (File -> Save Page As..., using Complete Webpage). Override the default filename with an easy name, e.g. 1.
- If more docs pending, press Show More button on the bottom of the page and iterate to Step 4
- When all pages iterated, open a command line and invoke the conversion:
Jareks-MBP:Downloads jhartman$ java -jar KindleLibrary.jar 1.htm Amazon book list extractor Elements found:400 Saving 1.html Saving 1.txt Saving 1.xml Done!
html files saved earlier
Example of output
xml looks as below
Libraries & References
- jsoup Java HTML Parser
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.