HTML cleaner for Rheinwerk (ex-Galileo) openbooks
This is a tool for cleaning up Rheinwerk openbooks (formerly known as Galileo openbooks) before converting them to EPUB or PDF format.
Current state of development: v1.2.0-SNAPSHOT is feature complete, i.e. it can download, MD5-verify, unpack and convert all 37 openbooks available at release time.
History: If you want to know details about what has changed in which version, please take a look at the change log.
Download: A precompiled, executable JAR file is available here.
Dependencies: Openbook cleaner was developed in Java 7. It also uses a few open source libraries:
- jsoup 1.8.3 for parsing the "dirty" openbook HTML, selecting DOM elements and editing them, removing navigation elements, ads and other types of clutter, and finally write a clean, pretty-printed HTML document back to disk
- JOpt Simple 4.9 for parsing command-line parameters and showing a help page (usage info)
- Apache Commons Compress 1.10 for unzipping downloaded openbook archives. Note: When Java 7 is available on MacOS, this library might be removed again and we can revert to using the built-in Java classes.
- XStream 1.4.8 parsing the config.xml file containing openbook meta data
- AspectJ 1.8.13 for cross-cutting concerns like logging, timing, tracing which are not part of the main application logic. This helps to keep the core code clean and free from scattered code addressing secondary concerns.
- IDE: I originally started developing this project with Eclipse but have switched to IntelliJ IDEA which for me personally is preferable because of its superior Maven support. OTOH, Eclipse has better AspectJ integration. So if you want to change any of the aspect code, you might want to use Eclipse anyway.
- Git support is needed in your IDE of choice (or at least from the command line) if you want to interact with the source code repository and not just download a ZIP archive from GitHub.
- Maven is used for dependency management and the whole build and packaging cycle. Any Maven 3 version should be safe, I recommend using the latest stable version. It is totally up to you if you want to build from the command line or via IDE integration. In IntelliJ IDEA you should install the original Maven plugins, for Eclipse you need m2e and also the AspectJ Maven Configurator (can be installed from http://dist.springsource.org/release/AJDT/configurator/).
- AspectJ support is available for both Eclipse (AJDT, AspectJ Development Tools) and IntelliJ IDEA. I do not know about Netbeans or other IDEs though. So please make sure to install the corresponding IDE plugins for AspectJ support if you want to edit the aspect code comfortably. But this is optional, because Maven can still build the project, fetching all necessary dependencies including AspectJ.
Because later I might want to use this Git repository as a refactoring showcase for my developer workshops, I am going to do any refactoring step by step, documenting progress in small, fine-granular Git changesets, so later on I can review the evolutionary progress with others.
As you can see, I am mostly doing this little project for myself, but I like to share the results and receive some user feedback. I hope the openbook cleaner is useful to you. Enjoy! :-)