Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

cl-typesetting:

Added Klaus Weidner's (X)HTML to pdf converter in the contrib directory.

git-svn-id: http://www.fractalconcept.com:8000/public/open-source/cl-typesetting@86 9d29c65d-f3d6-0310-ab0c-b43ff62e96ec
  • Loading branch information...
commit 1bcda5526d2e19d5b5dda3fec80e67bb996d4cc5 1 parent f3983cf
marc authored
BIN  contrib/xhtml-renderer/fop.jpg
View
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
265 contrib/xhtml-renderer/html-sample.html
View
@@ -0,0 +1,265 @@
+<html><head>
+ <title>HTML sample file</title></head>
+ <body>
+ <a name="tomjones">
+ </a><h1><a name="tomjones">Sample text from Henry Fielding's <cite>Tom Jones</cite></a></h1>
+<a name="tomjones"> </a><p><a name="tomjones">Here's a short excerpt from a great 18th Century British
+ novel. Enjoy!</a></p>
+<a name="tomjones"> </a><h2><a name="tomjones"><b>Book I.</b> Containing as Much of the Birth of the Foundling
+ as Is Necessary or Proper to Acquaint the Reader with in the
+ Beginning of This History</a></h2>
+<a name="tomjones"> </a><h3><a name="tomjones"><b>Chapter VII.</b> Containing Such Grave Matter, That the Reader
+ Cannot Laugh Once Through the Whole Chapter, Unless Peradventure He
+ Should Laugh at the Author</a></h3>
+<a name="tomjones"> </a><p><a name="tomjones">WHEN Jenny appeared, Mr. Allworthy took her into his study, and
+ spoke to her as follows: "You know, child, it is in my power as a
+ magistrate, to punish you very rigorously for what you have done;
+ and you will, perhaps, be the more apt to fear I should execute that
+ power, because you have in a manner laid your sins at my door.</a></p>
+<a name="tomjones"> </a><p><a name="tomjones">"But, perhaps, this is one reason which hath determined me to act
+ in a milder manner with you: for, as no private resentment should
+ ever influence a magistrate, I will be so far from considering your
+ having deposited the infant in my house as an aggravation of your
+ offence, that I will suppose, in your favour, this to have proceeded
+ from a natural affection to your child, since you might have some
+ hopes to see it thus better provided for than was in the power of
+ yourself, or its wicked father, to provide for it. I should indeed
+ have been highly offended with you had you exposed the little wretch
+ in the manner of some inhuman mothers, who seem no less to have
+ abandoned their humanity, than to have parted with their chastity.
+ It is the other part of your offence, therefore, upon which I intend
+ to admonish you, I mean the violation of your chastity; -- a crime,
+ however lightly it may be treated by debauched persons, very heinous
+ in itself, and very dreadful in its consequences....</a></p>
+<a name="tomjones"> </a><h1><a name="tomjones">Sample text to illustrate HTML elements</a></h1>
+<a name="tomjones"> </a><p><a name="tomjones">Here are some HTML elements that should put FOP through its paces.</a></p>
+<a name="tomjones"> </a><h2><a name="tomjones">Basic HTML formatting</a></h2>
+<a name="tomjones"> </a><p><a name="tomjones"><i>Now</i> is the time for all good <b>men and women</b> to come
+ to the aid of <u>the party</u>. The <tt>quick brown fox</tt> jumped
+ over the <strong>lazy dog</strong>. Every <em>good boy</em> deserves
+ fudge. Jackdaws <strike>love</strike> like my big sphinx of quartz.</a></p>
+<a name="tomjones"> </a><h2><a name="tomjones">Lists</a></h2>
+<a name="tomjones"> </a><p><a name="tomjones">The previous section featured a number of text effects:</a></p>
+<a name="tomjones"> </a><ul>
+<a name="tomjones"> <li><b>Bold</b> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ <li><u>Underlined</u> text</li>
+ <li><tt>Teletype</tt> text</li>
+ </a></ul>
+<a name="tomjones"> </a><p><a name="tomjones">Here they are again, ranked according to how I like 'em:</a></p>
+<a name="tomjones"> </a><ol>
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><p><a name="tomjones">Finally, let's define these things in a definition list, just to
+ have something else to write about.</a></p>
+<a name="tomjones"> </a><dl>
+<a name="tomjones"> <dt><b>Bold</b></dt>
+ <dd>Text written in a <b>thicker font</b>.</dd>
+ <dt><em>Emphasized</em></dt>
+ <dd><em>Emphasized text</em>, usually written in an italicized font.</dd>
+ <dt><i>Italic</i></dt>
+ <dd>Text written in an <i>italicized font</i>.</dd>
+ <dt><strike>Strikethrough</strike></dt>
+ <dd>Text with a <strike>line drawn through it</strike>.</dd>
+ <dt><strong>Strong</strong></dt>
+ <dd><strong>Strongly emphasized</strong> text, usually written in
+ a bold font.</dd>
+ <dt><tt>Teletype</tt></dt>
+ <dd>Text written in a <tt>monospaced</tt> font.</dd>
+ <dt><u>Underlined</u></dt>
+ <dd>Text with a <u>line drawn under it</u>.</dd>
+ <dd>Here's a second definition of the term, just to test the stylesheet.
+ The second and any subsequent definitions under the same term should
+ appear a half-line below the previous definition.</dd>
+ </a></dl>
+<a name="tomjones"> </a><p><a name="tomjones">This lovely document was produced by the Apache XML Project's FOP:</a></p>
+<a name="tomjones"> <img src="fop.jpg" height="50" width="150">
+ </a><h2><a name="tomjones">More lists</a></h2>
+<a name="tomjones"> </a><p><a name="tomjones">Here are some advanced lists. This one uses uppercase Roman numerals:</a></p>
+<a name="tomjones"> </a><ol type="I">
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><p><a name="tomjones">This list uses lowercase Roman numerals starting at 17:</a></p>
+<a name="tomjones"> </a><ol start="17" type="i">
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><p><a name="tomjones">This one uses lowercase alpha characters and starts at 30:</a></p>
+<a name="tomjones"> </a><ol start="30" type="a">
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><p><a name="tomjones">This list uses uppercase alpha characters and starts at 12:</a></p>
+<a name="tomjones"> </a><ol start="12" type="A">
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text and a sublist:
+ <ol type="a">
+ <li>An item</li>
+ <li>Another item
+ <ul>
+ <li>&lt;ul&gt; item a</li>
+ <li>&lt;ul&gt; item b</li>
+ <li>&lt;ul&gt; list item c, which contains two &lt;hr&gt;s
+ and an embedded list
+ <hr>
+ <ol start="37">
+ <li>Deeply nested item one</li>
+ <li>Deeply nested item two</li>
+ <li><a href="#tomjones">The excerpt from <cite>Tom
+ Jones</cite></a> is a link to an earlier section of
+ this document.</li>
+ </ol>
+ <hr>
+ </li>
+ <li>&lt;ul&gt; item d</li>
+ </ul>
+ </li>
+ <li>Yet another item</li>
+ <li>Notice that these items (and in fact this whole list) are
+ indented from the start of the other list items. Notice also
+ that the text wraps the way you'd think it would, using the
+ settings of the internal list, not the external list.</li>
+ <li>Our final item</li>
+ </ol>
+ </li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><h1><a name="tomjones">Tables</a></h1>
+<a name="tomjones"> </a><p><a name="tomjones">Mapping HTML table tags to XSL-FO tables has some difficulties.
+ The biggest problems are supporting the <code>cols</code> attribute
+ of the <code>&lt;table&gt;</code> element, and supporting the
+ <code>rowspan</code> and <code>colspan</code> attributes of the
+ <code>&lt;td&gt;</code> element. Here's a table that illustrates
+ all of the things we support:</a></p>
+<a name="tomjones"> </a><table border="1" cols="200">
+ <tbody><tr>
+ <th>State</th>
+ <th>Abbr</th>
+ </tr>
+ <tr>
+ <td>North Carolina</td>
+ <td>NC</td>
+ </tr>
+ <tr>
+ <td>California</td>
+ <td>CA</td>
+ </tr>
+ <tr>
+ <td>Tennessee</td>
+ <td>TN</td>
+ </tr>
+ <tr>
+ <td rowspan="2">Texas <br><i>and</i> <br>Connecticut</td>
+ <td>TX</td>
+ </tr>
+ <tr>
+ <td>CT</td>
+ </tr>
+ <tr>
+ <td colspan="2" align="right">That's all!</td>
+ </tr>
+ </tbody></table>
+<a name="tomjones"> </a><h1><a name="tomjones">More HTML we support</a></h1>
+<a name="tomjones"> </a><p><a name="tomjones">This section goes through more HTML tags **NED: AGAIN** we support.</a></p>
+<a name="tomjones"> </a><h2><a name="tomjones">Anchor tags</a></h2>
+<a name="tomjones"> </a><a name="anchors"> </a>
+ <p>Supporting links is very important to us here at
+ <a href="http://www.ibm.com/developerWorks">developerWorks</a>.
+ This sample document contains both internal and external links; if
+ you don't believe me, just read the excerpt from <a href="#tomjones">
+ <cite>Tom Jones</cite></a> earlier in this document. </p>
+ <p>This is <em>not</em> my address:</p>
+ <address>
+ Mrs. Mary McGoon
+ <br>
+ 901 Main Street
+ <br>
+ Kenosha, WI 38492
+ </address>
+ <p>Now for a paragraph with <b>boldfaced text</b>, <big>big text,
+ <big>bigger text, <big>biggest text,</big></big></big> and
+ <br>three <br>line <br>breaks.</p>
+ <blockquote>
+ When in the Course of human events, it becomes necessary
+ for one people to dissolve the political bands which have
+ connected them with another, and to assume among the powers
+ of the earth, the separate and equal station to which the
+ Laws of Nature and of Nature's God entitle them, a decent
+ respect to the opinions of mankind requires that they should
+ declare the causes which impel them to the separation.
+ </blockquote>
+ <center>
+ <font color="red" face="sans-serif" size="+2">
+ This text is big and centered <br>
+ so it will stand out.
+ </font>
+ </center>
+ <h1>An &lt;h1&gt;</h1>
+ <p>Blah blah blah</p>
+ <h2>An &lt;h2&gt;</h2>
+ <p>Blah blah blah</p>
+ <h3>An &lt;h3&gt;</h3>
+ <p>Blah blah blah</p>
+ <h4>An &lt;h4&gt;</h4>
+ <p>Blah blah blah</p>
+ <h5>An &lt;h5&gt;</h5>
+ <p>Blah blah blah</p>
+ <h6>An &lt;h6&gt;</h6>
+ <p>Blah blah blah</p>
+ <p><nobr>Now here's a really, really, really long sentence that's
+ coded with the &lt;nobr&gt; tag. This should run on and on and
+ on and on and eventually it should run all the way off the page
+ and into the void.</nobr> This text appears after the &lt;nobr&gt; tag.</p>
+ <h1>A short code listing</h1>
+ <p>Here's a simple Java program, formatted with the &lt;pre&gt; element:</p>
+ <pre>public class Sample
+{
+ public static void main(String [] args)
+ {
+ System.out.println("Hello, World!");
+
+ for (int i = 0; i &lt; 5; i++)
+ {
+ System.out.print("How");
+ System.out.print("dy! ");
+ }
+
+ System.out.println();
+ }
+}
+ </pre>
+ <h2>More HTML elements</h2>
+ <p>This paragraph tests out the <samp>sample element
+ (&lt;samp&gt;)</samp>, <small>small text (&lt;small&gt;)</small>,
+ <sub>sub</sub>script text, <sup>super</sup>script text, a
+ <kbd>keyboard</kbd> command, and a <var>variable</var> name.</p>
+ </body></html>
83 contrib/xhtml-renderer/html2pdf
View
@@ -0,0 +1,83 @@
+#!/bin/sh
+#
+# Convert HTML documents to PDF
+#
+# Copyright (C) 2004 Klaus Weidner <kweidner@pobox.com>
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# Configure this to point to the location of the saved memory image.
+# Generate it as follows:
+#
+# clisp -x "(asdf::oos 'asdf:load-op :xml-render) (tt::save-image)"
+# gzip -9 clisp-xml-render.mem
+# mv clisp-xml-render.mem.gz ~/lisp/images/clisp
+#
+IMAGE="$HOME/lisp/images/clisp/clisp-xml-render.mem.gz"
+
+# Location of GNU CLISP binary
+#CLISP=/usr/lib/clisp/full/lisp.run
+CLISP=clisp
+
+# WARNING: creates fixed-name temp files in current working directory.
+# Don't use it if current dir is writable for untrusted users.
+
+# Run through W3C "tidy" utility to clean up noncompliant HTML and
+# convert to XHTML. See http://tidy.sourceforge.net/
+#
+# Not needed if input is already valid XHTML. Comment out the next
+# line if you don't want to use it.
+[ -z "$TIDY" ] && TIDY=$(which tidy)
+
+# Optional: clisp generates uncompressed PDF. Use the "PDF Toolkit"
+# (pdftk) to compress it. See http://www.accesspdf.com/pdftk/
+#
+# Comment out the next line if you don't want to use it.
+# FIXME: pdftk fails on output generated by v66 cl-pdf ?!
+#[ -z "$PDFTK" ] && PDFTK=$(which pdftk)
+
+### End of user configurable section
+
+Usage () {
+ echo "Usage: $(basename $0) FILE.html
+Creates FILE.pdf in current working directory." >&2
+ exit 1
+}
+
+[ $# -eq 1 ] || Usage
+
+IN="$1"
+OUT=$(basename "$IN" .html).pdf
+
+if [ -x "$TIDY" ]
+then
+ XML=$(basename "$IN").tmp.xhtml
+ "$TIDY" --quiet yes --show-warnings 0 -wrap 0 -asxhtml "$IN" > "$XML"
+else
+ XML="$IN"
+fi
+
+# Do the conversion
+$CLISP -q -q -M $IMAGE -- "$XML" "$OUT"
+
+[ -x "$TIDY" ] && rm -f "$XML"
+
+[ -x "$PDFTK" ] && {
+ "$PDFTK" "$OUT" output "$OUT.new" compress && mv "$OUT.new" "$OUT"
+}
3  contrib/xhtml-renderer/readme.txt
View
@@ -0,0 +1,3 @@
+
+Klaus Weidner's XHTML renderer.
+
31 contrib/xhtml-renderer/xml-render.asd
View
@@ -0,0 +1,31 @@
+;;;; -*- Mode: LISP; Syntax: ANSI-Common-Lisp; Base: 10 -*-
+
+;; How to use this:
+;;
+;; Get Marc Battyani's "cl-typesetting" and "cl-pdf" packages:
+;; http://www.fractalconcept.com/asp/html/cl-typesetting.html
+;;
+;; and Miles Egan's xmls parser:
+;; http://common-lisp.net/project/xmls/
+;;
+;; Then load this package and use as follows:
+;; (tt::xhtml-to-pdf "everything.html" "/tmp/output.pdf")
+;;
+;; If you have clisp, you may want to use the included shell script
+;; "html2pdf" for command line use. Read the script comments for more details.
+
+(in-package :asdf)
+
+(defsystem :xml-render
+ :name "xml-render"
+ :author "Klaus Weidner <klaus@atsec.com>"
+ :version "2.1.1"
+ :maintainer "Klaus Weidner <klaus@atsec.com>"
+ :licence "BSD like license"
+ :description "none"
+ :long-description ""
+ :perform (load-op :after (op xml-render)
+ (pushnew :xml-render cl:*features*))
+ :components ((:file "xml-xform"))
+ :depends-on (:cl-typesetting
+ :xmls))
577 contrib/xhtml-renderer/xml-xform.lisp
View
@@ -0,0 +1,577 @@
+(in-package typeset)
+
+;; (asdf::use :cl-typesetting :xmls)
+
+;; For prettier printing of XHTML output at the REPL, use with:
+;; (setf (readtable-case *readtable*) :invert)
+
+(defun char-invert-case (c)
+ (if (upper-case-p c)
+ (char-downcase c)
+ (char-upcase c)))
+
+(defun invert-if-single-case (s)
+ (if (and (some #'upper-case-p s)
+ (some #'lower-case-p s))
+ s
+ (map 'string #'char-invert-case s)))
+
+(defun xml-make-keyword (s)
+ (if (symbolp s)
+ s
+ (intern (invert-if-single-case (substitute #\! #\: s))
+ "KEYWORD")))
+
+;; XML trees must be of the form (elem attr . content), with attr
+;; being a list of (key . value) conses, and content being a list of
+;; strings and XML trees. There should be no consecutive strings or
+;; empty strings in the content list.
+;;
+;; Example: <a href="http://example.com" title="Example">a <b>bold</b> link</a>
+;; ==>
+;; (:a ((:href . "http://example.com") (:title . "Example")) "a " (:b () "bold") " link")
+
+;; Accessor functions
+(defun xml-elem (tree) (car tree))
+(defun xml-attr (tree) (cadr tree))
+(defun xml-clst (tree) (cddr tree))
+
+(defun xml-attr-get (attr key)
+ (cdr (assoc key attr)))
+
+(defun xml-xform (elem-xform tree &optional parents)
+ "Recursively transform XML tree depth-first by calling the supplied
+elem-xform function on each node."
+ (declare (function elem-xform))
+ (let ((clst (mapcar (lambda (c)
+ (if (consp c)
+ ;; recurse into content
+ (xml-xform elem-xform c
+ (cons c parents))
+ c))
+ (xml-clst tree))))
+ (funcall elem-xform (nconc (list (xml-elem tree)
+ (xml-attr tree))
+ clst)
+ parents)))
+
+(defun xml-collapse-sxml-namespace (node parents)
+ "Remove namespace information from XML tree, and use keyword symbols
+for elements and attributes. Example: (\"foo\" . \"http://namespace\")
+=> :foo"
+ (declare (ignorable parents))
+ (let ((elem (xml-elem node))
+ (attr (xml-attr node))
+ (clst (xml-clst node)))
+ (declare (ignorable elem attr clst))
+ (nconc (list (if (consp elem)
+ (xml-make-keyword (car elem))
+ (xml-make-keyword elem))
+ (mapcar (lambda (a)
+ (cons (if (consp (car a))
+ (xml-make-keyword (cdar a))
+ (xml-make-keyword (car a)))
+ (cdr a)))
+ attr))
+ clst)))
+
+(defun string-collapse-whitespace (string)
+ (do* ((chars (map 'list #'identity (substitute #\Space #\NewLine string))
+ (cdr chars))
+ (c (car chars)
+ (car chars))
+ (new nil))
+ ((null chars) (map 'string #'identity (nreverse new)))
+ (unless (and (eql c #\Space) (eql (car new) #\Space))
+ (push c new))))
+
+(defun verbatim-p (parents)
+ (find-if (lambda (p)
+ (or (member (xml-elem p) '(:pre :ins :del))
+ ;; FIXME: assumes that if the attribute was
+ ;; specified, that it has the value "false"...
+ (assoc :white-space-collapse (xml-attr p))))
+ parents))
+
+(defun remove-spaces (clst)
+ (remove-if (lambda (c)
+ (or (null c)
+ (equal c " ")))
+ clst))
+
+(defun remove-leading-space (clst)
+ (if (equal " " (car clst))
+ (cdr clst)
+ clst))
+
+(defun xml-collapse-whitespace (node parents)
+ (declare (ignorable parents))
+ (let ((elem (xml-elem node))
+ (attr (xml-attr node))
+ (clst (xml-clst node)))
+ (declare (ignorable elem attr clst))
+ (nconc (list elem attr)
+ (if (verbatim-p (cons node parents))
+ clst ;; keep unmodified for this element
+ (remove-leading-space
+ (mapcar (lambda (c)
+ (if (stringp c)
+ (string-collapse-whitespace c)
+ c))
+ clst))))))
+
+
+(defun attr-list-to-assoc (node parents)
+ "convert sxml (attr val) list to (attr . val) conses"
+ (declare (ignorable parents))
+ (let ((elem (xml-elem node))
+ (attr (xml-attr node))
+ (clst (xml-clst node)))
+ (declare (ignorable elem attr clst))
+ (nconc (list elem (mapcar (lambda (a)
+ (cons (car a)
+ (cadr a)))
+ attr))
+ clst)))
+
+(defun xml-extract-text (tree)
+ "Extract text strings from XML file."
+ (let ((clst (xml-clst tree))
+ (strings nil))
+ (dolist (c clst)
+ (cond ((stringp c)
+ (push c strings))
+ ((consp c)
+ (push (xml-extract-text c) strings))))
+ (apply #'concatenate 'string (reverse strings))))
+
+;; Note: load-xml-file can't handle non-XML files.
+;;
+;; The following command is useful to convert legacy HTML
+;; to parseable XMTHL:
+;;
+;; tidy -wrap 0 -asxhtml SLES-security-guide.html
+
+(defun load-xml-file (file)
+ ;;setq xmls::*entities*
+ ;;(adjoin '("AElig;" #\?) xmls::*entities* :test #'equal))
+ ;;setq xmls::*entities*
+ ;;(adjoin '("sect;" #\#) xmls::*entities* :test #'equal))
+ ;;setq xmls::*entities*
+ ;;(adjoin '("nbsp;" #\Space) xmls::*entities* :test #'equal))
+ (with-open-file (s file)
+ (let ((xml (xmls:parse s :compress-whitespace nil)))
+ (xml-xform #'attr-list-to-assoc xml))))
+
+(defun flatten-mostly (tree)
+ "Similar to flatten, but keep the last level of list structure intact."
+ (let ((acc nil))
+ (labels ((rec (tree)
+ (cond ((null tree) nil)
+ ((or (atom tree)
+ (atom (car tree)))
+ (push tree acc))
+ (t (rec (car tree))
+ (rec (cdr tree))))))
+ (rec tree)
+ (nreverse acc))))
+
+(defun xml-subtrees (path tree)
+ "Returns list of all subtrees matching path spec.
+ Example: (xml-subtrees '(:body :h1) tree)"
+ ;; FIXME: This should be simpler...
+ (flatten-mostly
+ (cond ((null tree) nil)
+ ((null path) tree)
+ (t (mapcar (lambda (c)
+ (xml-subtrees (cdr path) c))
+ (remove-if-not (lambda (c)
+ (and (consp c)
+ (eq (xml-elem c)
+ (car path))))
+ (xml-clst tree)))))))
+
+(defun xml-subtree (path tree)
+ "Returns first subtree matching path spec.
+ Example: (xml-subtrees tree '(:body :h1))"
+ (car (xml-subtrees path tree)))
+
+(defun xhtml-get-body (tree)
+ "Extract the body from an XHTML file."
+ (xml-subtree tree '(:html :body)))
+
+(defmacro appendq (var &rest lists)
+ `(setf ,var (append ,var ,@lists)))
+
+(defmacro append1 (var &rest elems)
+ `(setf ,var (append ,var (list ,@elems))))
+
+;; misc utilities
+
+(defun remove-if-not-elems (elst clst)
+ (remove-if-not (lambda (c)
+ (and (consp c)
+ (member elst (xml-elem c))))
+ clst))
+
+(defun remove-if-not-elem (elem clst)
+ (remove-if-not (lambda (c)
+ (and (consp c)
+ (eq elem (xml-elem c))))
+ clst))
+
+;;(defun xmls::resolve-entity (ent)
+;; "Resolves the xml entity ENT to a character. Numeric entities are
+;;converted using CODE-CHAR, which only works in implementations that
+;;internally encode strings in US-ASCII, ISO-8859-1 or UCS."
+;; (declare (type simple-base-string ent))
+;; (or (and (>= (length ent) 2)
+;; (char= (char ent 0) #\#)
+;; (code-char
+;; (min 255
+;; (if (char= (char ent 1) #\x)
+;; (parse-integer ent :start 2 :end (- (length ent) 1) :radix 16)
+;; (parse-integer ent :start 1 :end (- (length ent) 1))))))
+;; (second (assoc ent xmls::*entities* :test #'string=))
+;; (warn "Unable to resolve entity ~S" ent)
+;; #\?))
+
+(defun table-cell-p (c)
+ (and (consp c)
+ (eq (car c) 'cell)))
+
+(defun column-count (rows)
+ (iter (for row in rows)
+ (maximize (count-if #'table-cell-p row))))
+
+
+(defun calculate-column-widths (colspec rows)
+ (print colspec)
+ (mapcar (lambda (l)
+ (declare (ignorable l))
+ (/ 420
+ (column-count rows)))
+ rows))
+
+;; The XHTML style sheet
+
+(defvar *font-normal* "Times-Roman")
+(defvar *font-bold* "Times-Bold")
+(defvar *font-italic* "Times-Italic")
+(defvar *font-bold-italic* "Times-BoldItalic")
+(defvar *font-monospace* "Courier")
+
+(defun typeset-elem-xform (node parents)
+ (let ((elem (xml-elem node))
+ (attr (xml-attr node))
+ (clst (xml-clst node)))
+
+ ;; Deal with each element recursively.
+ (case elem
+ ((:html) `(with-style () ,@clst))
+
+ ((:head) `(set-contextual-variable :title
+ ,(xml-extract-text (xml-subtree '(:title) node))))
+
+ ;; need to preserve :title for :head to work on, due to
+ ;; depth-first search
+ ((:title) node)
+
+ ;; tricky elements that involve cross-reference handling
+
+ ((:body)
+ (if (> *toc-depth* 0)
+ (let ((toc (remove-if #'null (make-toc))))
+ (setf *chapter-nums* nil)
+ (setq *chapters* nil)
+ `(with-style (:font *font-normal* :font-size 10)
+ (set-contextual-variable :header-enabled t)
+ (set-contextual-variable :footer-enabled t)
+ (mark-ref-point '(:chapter 0) :data "Table of Contents")
+ ,@toc
+ :fresh-page
+ ,@clst
+ (mark-ref-point "DocumentEnd")))
+ `(with-style (:font *font-normal* :font-size 10)
+ ,@clst
+ (mark-ref-point "DocumentEnd"))))
+
+ ((:a)
+ ;; FIXME: make links clickable
+ (let ((name (xml-attr-get attr :name))
+ (href (xml-attr-get attr :href))
+ (out nil))
+ (if name (append1 out `(mark-ref-point ,name)))
+ (appendq out clst)
+ (if href
+ (append1 out
+ (if (eql #\# (aref href 0))
+ `(put-string (format nil " (page ~D)"
+ (find-ref-point-page-number ,(subseq href 1))))
+ `(with-style ()
+ " ("
+ (with-style (:color :blue)
+ (put-string ,href))
+ ")"))))
+ `(with-style () ,@out)))
+
+ ((:h1)
+ `(with-style ()
+ :fresh-page
+ (paragraph (:font "Helvetica-Bold" :font-size 20
+ :top-margin 14 :bottom-margin 10)
+ (apply #'mark-ref-point ',(chp-ref 0 (xml-extract-text node)))
+ ,@clst)))
+
+ ((:h2)
+ `(paragraph (:font "Helvetica-BoldOblique"
+ :font-size 18 :top-margin 10 :bottom-margin 8)
+ (apply #'mark-ref-point ',(chp-ref 1 (xml-extract-text node)))
+ ,@clst))
+
+ ((:h3)
+ `(paragraph (:font "Helvetica-Bold" :font-size 16
+ :top-margin 10 :bottom-margin 8)
+ (apply #'mark-ref-point ',(chp-ref 2 (xml-extract-text node)))
+ ,@clst))
+
+ ((:h4)
+ `(paragraph (:font "Helvetica-BoldOblique" :font-size 14
+ :top-margin 10 :bottom-margin 8)
+ (apply #'mark-ref-point ',(chp-ref 3 (xml-extract-text node)))
+ ,@clst))
+
+ ((:h5)
+ `(paragraph (:font "Helvetica-Bold" :font-size 12
+ :top-margin 10 :bottom-margin 8)
+ (apply #'mark-ref-point ',(chp-ref 4 (xml-extract-text node)))
+ ,@clst))
+
+ ((:h6)
+ `(paragraph (:font "Helvetica-BoldOblique" :font-size 12
+ :top-margin 10 :bottom-margin 8)
+ (apply #'mark-ref-point ',(chp-ref 5 (xml-extract-text node)))
+ ,@clst))
+
+ ((:p)
+ `(paragraph (:font *font-normal* :font-size 10
+ :top-margin 3 :bottom-margin 4) ,@clst))
+
+ ;; Table support is currently very limited
+ ((:table) `(table (:col-widths
+ ',(calculate-column-widths (xml-attr-get attr :cols)
+ clst))
+ ,@clst))
+
+ ((:tr) `(row () ,@clst))
+
+ ((:td :th) (let* ((col-span (or (xml-attr-get attr :colspan) "1"))
+ (row-span (or (xml-attr-get attr :rowspan) "1"))
+ (align-s (xml-attr-get attr :align))
+ (align (cond ((equal align-s "right") :right)
+ ((equal align-s "center") :center)
+ (t :left))))
+ `(cell (:col-span ,(parse-integer col-span)
+ :row-span ,(parse-integer row-span))
+ (paragraph (:h-align ,align) ,@clst))))
+
+ ;; Ordered lists are a bit tricky, need to handle the item
+ ;; numbering correctly. The following should support most
+ ;; interesting parts of the XHTML spec.
+
+ ((:ul)
+ ;; FIXME: support different bullet styles
+ `(itemize (:item-fmt "- "
+ :text-style (:top-margin 3 :bottom-margin 4))
+ ,@(remove-if-not-elem 'item clst)))
+
+ ((:ol)
+ (let* ((first (or (xml-attr-get attr :start) "1"))
+ (type (xml-attr-get attr :type))
+ (fmt (cond ((equal type "I") "~@R ")
+ ((equal type "i") "~(~@R~) ")
+ ((equal type "A") "~/tt::alpha-item/. ")
+ ((equal type "a") "~:/tt::alpha-item/. ")
+ (t "~D. "))))
+ `(itemize (:item-fmt ,fmt
+ :start-from ,(parse-integer first)
+ :text-style (:top-margin 3 :bottom-margin 4))
+ ,@(remove-if-not-elem 'item clst))))
+
+ ((:li)
+ `(item () ,@clst))
+
+ ;; most elements are straightforward transformations
+
+ ((:dl)
+ `(with-style () ,@clst))
+
+ ((:dt)
+ `(paragraph (:font *font-bold* :bottom-margin 0)
+ ,@clst))
+
+ ((:dd)
+ `(paragraph (:top-margin 0 :left-margin 20 :bottom-margin 7)
+ ,@clst))
+
+ ((:center)
+ `(paragraph (:font *font-normal* :font-size 10
+ :top-margin 3 :bottom-margin 4
+ :h-align :center) ,@clst))
+
+ ((:blockquote)
+ `(paragraph (:font *font-normal* :font-size 10
+ :top-margin 3 :bottom-margin 4
+ :left-margin 20 :right-margin 20) ,@clst))
+
+ ((:pre :code)
+ `(with-style (:font *font-monospace* :font-size 9 :bottom-margin 0)
+ ,@(mapcar (lambda (c)
+ `(verbatim ,c))
+ clst)))
+
+ ((:nobr)
+ `(with-style () (hbox () ,@clst)))
+
+ ((:br)
+ :eol)
+
+ ((:div :span)
+ `(with-style () ,@clst))
+
+ ((:i :em :var :address)
+ ;; FIXME: can't handle bold-italic
+ `(with-style (:font *font-italic*) ,@clst))
+
+ ((:b :strong)
+ ;; FIXME: can't handle bold-italic
+ `(with-style (:font *font-bold*) ,@clst))
+
+ ((:tt :kbd :samp)
+ `(with-style (:font *font-monospace*) ,@clst))
+
+ ((:big)
+ `(with-style (:font-size (* *font-size* 1.2)) ,@clst))
+
+ ((:small)
+ `(with-style (:font-size (/ *font-size* 1.2)) ,@clst))
+
+ ((:cite)
+ `(with-style () ,@clst))
+
+ ((:sub)
+ `(with-subscript (:font-size (* 0.75 *font-size*)) ,@clst))
+
+ ((:sup)
+ `(with-superscript (:font-size (* 0.75 *font-size*)) ,@clst))
+
+ ((:u)
+ `(with-style (:post-decoration #'decoration-underline) ,@clst))
+
+ ((:strike)
+ `(with-style (:post-decoration #'decoration-strikethrough) ,@clst))
+
+ ((:hr)
+ `(hrule :dy 0.5))
+
+ ;; change bar support
+
+ ((:ins)
+ `(with-style (:pre-decoration
+ #'decoration-green-background)
+ (change-start-insert)
+ ,@(if (verbatim-p parents)
+ (mapcar (lambda (c) `(verbatim ,c)) clst)
+ clst)
+ (change-end)))
+
+ ((:del)
+ `(with-style (:post-decoration
+ #'decoration-strikethrough)
+ (change-start-delete)
+ ,@(if (verbatim-p parents)
+ (mapcar (lambda (c) `(verbatim ,c)) clst)
+ clst)
+ (change-end)))
+
+ ;; non-standard extension: unnested change start/stop markers.
+ ;;
+ ;; They need to be used pairwise (in tree depth-first order),
+ ;; but do NOT need to be properly nested in relation to other
+ ;; XHTML elements. This makes it much easier to generate diffs
+ ;; with a non-XML-aware tool such as wdiff.
+ ;;
+ ;; Example:
+ ;; - This is <b>text</b>
+ ;; + This is some <b>bold text</b>
+ ;; ->
+ ;; This is <ins-start />some <b>bold <ins-end />text</b>
+
+ ((:ins-start)
+ `(set-style (:pre-decoration
+ #'decoration-green-background)
+ (change-start-insert)))
+
+ ((:del-start)
+ `(set-style (:post-decoration
+ #'decoration-strikethrough)
+ (change-start-delete)))
+
+ ((:ins-end :del-end)
+ `(set-style (:pre-decoration :none
+ :post-decoration :none)
+ (change-end)))
+
+ ;; Unknown item: insert bright and ugly complaint
+
+ (otherwise
+ `(with-style (:color :red)
+ "[Unsupported: " ,(symbol-name elem) "]")))))
+
+;;; high-level functions
+
+(defun xhtml-to-typeset (input)
+ "Read XML input file and transform to typesetting instructions"
+ ;; First some cleanup on the input XML file
+ (let ((tree (xml-xform #'xml-collapse-whitespace
+ (xml-xform #'xml-collapse-sxml-namespace
+ (load-xml-file input)))))
+ ;; Generate table of contents
+ #-(and) (setq *chapters* (mapcar (lambda (h)
+ (xml-extract-text h))
+ (xml-subtrees '(:body :h1)
+ tree)))
+ ;; The tree-to-tree transform
+ (xml-xform #'typeset-elem-xform tree)))
+
+(defun xhtml-to-pdf (input output)
+ (typeset::render-document (xhtml-to-typeset input)
+ :file output
+ :twosided *twosided*))
+
+;; following sections help in building a command line tool (based on
+;; clisp) to convert HTML to PDF
+
+#+clisp
+(defun save-image ()
+ (ext:gc)
+ (ext:saveinitmem "clisp-xml-render.mem"
+ :init-function #'tt::run
+ :start-package (find-package :tt)))
+
+;; gzip -9 clisp-xml-render.mem && mv clisp-xml-render.mem.gz ~/lisp/images/clisp/
+;; ship with /usr/lib/clisp/full/lisp.run binary
+
+#+clisp
+(defun run ()
+ (let ((args ext:*ARGS*))
+ (when (equal "-x" (first args))
+ (eval (read-from-string (second args)))
+ (setq args (cddr args)))
+ (if (eql 2 (length args))
+ (apply #'xhtml-to-pdf args)
+ (format *error-output* "~&Usage: html2pdf INPUT.html OUTPUT.pdf")))
+ (ext:exit))
+
+;; Test case:
+;; (tt::xhtml-to-pdf "everything.html" "/tmp/output.pdf")
Please sign in to comment.
Something went wrong with that request. Please try again.