To run the program:
Note: Python 3 is required.
Architecture: The program is in four parts:
- Load the Word document (
- Convert it to extremely rough HTML+CSS (
- Apply a series of transformations, ranging from minor tweaks to
very fancy algorithms, to the HTML (
- Dump the resulting HTML document (
Most of the interesting work, and most of the bugs, are in
Fragility: The script is quite sensitive to the input document and will throw an exception and give up if the document isn't exactly as expected. It's been hard to balance (a) being "liberal in what you accept" with (b) making sure fixups do not break silently, but rather get the user's attention, when the input document changes in unexpected ways.
Debugging: If a directory named
_fixup_log exists under the
current directory, the script dumps the whole halfway-transformed
document to a file in that directory after each fixup.