Permalink
Browse files

cl-typesetting:

Added Klaus Weidner's (X)HTML to pdf converter in the contrib directory.

git-svn-id: http://www.fractalconcept.com:8000/public/open-source/cl-typesetting@86 9d29c65d-f3d6-0310-ab0c-b43ff62e96ec
  • Loading branch information...
1 parent f3983cf commit 1bcda5526d2e19d5b5dda3fec80e67bb996d4cc5 marc committed Feb 24, 2005
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@@ -0,0 +1,265 @@
+<html><head>
+ <title>HTML sample file</title></head>
+ <body>
+ <a name="tomjones">
+ </a><h1><a name="tomjones">Sample text from Henry Fielding's <cite>Tom Jones</cite></a></h1>
+<a name="tomjones"> </a><p><a name="tomjones">Here's a short excerpt from a great 18th Century British
+ novel. Enjoy!</a></p>
+<a name="tomjones"> </a><h2><a name="tomjones"><b>Book I.</b> Containing as Much of the Birth of the Foundling
+ as Is Necessary or Proper to Acquaint the Reader with in the
+ Beginning of This History</a></h2>
+<a name="tomjones"> </a><h3><a name="tomjones"><b>Chapter VII.</b> Containing Such Grave Matter, That the Reader
+ Cannot Laugh Once Through the Whole Chapter, Unless Peradventure He
+ Should Laugh at the Author</a></h3>
+<a name="tomjones"> </a><p><a name="tomjones">WHEN Jenny appeared, Mr. Allworthy took her into his study, and
+ spoke to her as follows: "You know, child, it is in my power as a
+ magistrate, to punish you very rigorously for what you have done;
+ and you will, perhaps, be the more apt to fear I should execute that
+ power, because you have in a manner laid your sins at my door.</a></p>
+<a name="tomjones"> </a><p><a name="tomjones">"But, perhaps, this is one reason which hath determined me to act
+ in a milder manner with you: for, as no private resentment should
+ ever influence a magistrate, I will be so far from considering your
+ having deposited the infant in my house as an aggravation of your
+ offence, that I will suppose, in your favour, this to have proceeded
+ from a natural affection to your child, since you might have some
+ hopes to see it thus better provided for than was in the power of
+ yourself, or its wicked father, to provide for it. I should indeed
+ have been highly offended with you had you exposed the little wretch
+ in the manner of some inhuman mothers, who seem no less to have
+ abandoned their humanity, than to have parted with their chastity.
+ It is the other part of your offence, therefore, upon which I intend
+ to admonish you, I mean the violation of your chastity; -- a crime,
+ however lightly it may be treated by debauched persons, very heinous
+ in itself, and very dreadful in its consequences....</a></p>
+<a name="tomjones"> </a><h1><a name="tomjones">Sample text to illustrate HTML elements</a></h1>
+<a name="tomjones"> </a><p><a name="tomjones">Here are some HTML elements that should put FOP through its paces.</a></p>
+<a name="tomjones"> </a><h2><a name="tomjones">Basic HTML formatting</a></h2>
+<a name="tomjones"> </a><p><a name="tomjones"><i>Now</i> is the time for all good <b>men and women</b> to come
+ to the aid of <u>the party</u>. The <tt>quick brown fox</tt> jumped
+ over the <strong>lazy dog</strong>. Every <em>good boy</em> deserves
+ fudge. Jackdaws <strike>love</strike> like my big sphinx of quartz.</a></p>
+<a name="tomjones"> </a><h2><a name="tomjones">Lists</a></h2>
+<a name="tomjones"> </a><p><a name="tomjones">The previous section featured a number of text effects:</a></p>
+<a name="tomjones"> </a><ul>
+<a name="tomjones"> <li><b>Bold</b> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ <li><u>Underlined</u> text</li>
+ <li><tt>Teletype</tt> text</li>
+ </a></ul>
+<a name="tomjones"> </a><p><a name="tomjones">Here they are again, ranked according to how I like 'em:</a></p>
+<a name="tomjones"> </a><ol>
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><p><a name="tomjones">Finally, let's define these things in a definition list, just to
+ have something else to write about.</a></p>
+<a name="tomjones"> </a><dl>
+<a name="tomjones"> <dt><b>Bold</b></dt>
+ <dd>Text written in a <b>thicker font</b>.</dd>
+ <dt><em>Emphasized</em></dt>
+ <dd><em>Emphasized text</em>, usually written in an italicized font.</dd>
+ <dt><i>Italic</i></dt>
+ <dd>Text written in an <i>italicized font</i>.</dd>
+ <dt><strike>Strikethrough</strike></dt>
+ <dd>Text with a <strike>line drawn through it</strike>.</dd>
+ <dt><strong>Strong</strong></dt>
+ <dd><strong>Strongly emphasized</strong> text, usually written in
+ a bold font.</dd>
+ <dt><tt>Teletype</tt></dt>
+ <dd>Text written in a <tt>monospaced</tt> font.</dd>
+ <dt><u>Underlined</u></dt>
+ <dd>Text with a <u>line drawn under it</u>.</dd>
+ <dd>Here's a second definition of the term, just to test the stylesheet.
+ The second and any subsequent definitions under the same term should
+ appear a half-line below the previous definition.</dd>
+ </a></dl>
+<a name="tomjones"> </a><p><a name="tomjones">This lovely document was produced by the Apache XML Project's FOP:</a></p>
+<a name="tomjones"> <img src="fop.jpg" height="50" width="150">
+ </a><h2><a name="tomjones">More lists</a></h2>
+<a name="tomjones"> </a><p><a name="tomjones">Here are some advanced lists. This one uses uppercase Roman numerals:</a></p>
+<a name="tomjones"> </a><ol type="I">
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><p><a name="tomjones">This list uses lowercase Roman numerals starting at 17:</a></p>
+<a name="tomjones"> </a><ol start="17" type="i">
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><p><a name="tomjones">This one uses lowercase alpha characters and starts at 30:</a></p>
+<a name="tomjones"> </a><ol start="30" type="a">
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text</li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><p><a name="tomjones">This list uses uppercase alpha characters and starts at 12:</a></p>
+<a name="tomjones"> </a><ol start="12" type="A">
+<a name="tomjones"> <li><u>Underlined</u> text</li>
+ <li><strike>Strikethrough</strike> text</li>
+ <li><i>Italicized</i> text</li>
+ <li><b>Bold</b> text</li>
+ <li><tt>Teletype</tt> text and a sublist:
+ <ol type="a">
+ <li>An item</li>
+ <li>Another item
+ <ul>
+ <li>&lt;ul&gt; item a</li>
+ <li>&lt;ul&gt; item b</li>
+ <li>&lt;ul&gt; list item c, which contains two &lt;hr&gt;s
+ and an embedded list
+ <hr>
+ <ol start="37">
+ <li>Deeply nested item one</li>
+ <li>Deeply nested item two</li>
+ <li><a href="#tomjones">The excerpt from <cite>Tom
+ Jones</cite></a> is a link to an earlier section of
+ this document.</li>
+ </ol>
+ <hr>
+ </li>
+ <li>&lt;ul&gt; item d</li>
+ </ul>
+ </li>
+ <li>Yet another item</li>
+ <li>Notice that these items (and in fact this whole list) are
+ indented from the start of the other list items. Notice also
+ that the text wraps the way you'd think it would, using the
+ settings of the internal list, not the external list.</li>
+ <li>Our final item</li>
+ </ol>
+ </li>
+ <li><em>Emphasized</em> text</li>
+ <li><strong>Strongly emphasized</strong> text</li>
+ </a></ol>
+<a name="tomjones"> </a><h1><a name="tomjones">Tables</a></h1>
+<a name="tomjones"> </a><p><a name="tomjones">Mapping HTML table tags to XSL-FO tables has some difficulties.
+ The biggest problems are supporting the <code>cols</code> attribute
+ of the <code>&lt;table&gt;</code> element, and supporting the
+ <code>rowspan</code> and <code>colspan</code> attributes of the
+ <code>&lt;td&gt;</code> element. Here's a table that illustrates
+ all of the things we support:</a></p>
+<a name="tomjones"> </a><table border="1" cols="200">
+ <tbody><tr>
+ <th>State</th>
+ <th>Abbr</th>
+ </tr>
+ <tr>
+ <td>North Carolina</td>
+ <td>NC</td>
+ </tr>
+ <tr>
+ <td>California</td>
+ <td>CA</td>
+ </tr>
+ <tr>
+ <td>Tennessee</td>
+ <td>TN</td>
+ </tr>
+ <tr>
+ <td rowspan="2">Texas <br><i>and</i> <br>Connecticut</td>
+ <td>TX</td>
+ </tr>
+ <tr>
+ <td>CT</td>
+ </tr>
+ <tr>
+ <td colspan="2" align="right">That's all!</td>
+ </tr>
+ </tbody></table>
+<a name="tomjones"> </a><h1><a name="tomjones">More HTML we support</a></h1>
+<a name="tomjones"> </a><p><a name="tomjones">This section goes through more HTML tags **NED: AGAIN** we support.</a></p>
+<a name="tomjones"> </a><h2><a name="tomjones">Anchor tags</a></h2>
+<a name="tomjones"> </a><a name="anchors"> </a>
+ <p>Supporting links is very important to us here at
+ <a href="http://www.ibm.com/developerWorks">developerWorks</a>.
+ This sample document contains both internal and external links; if
+ you don't believe me, just read the excerpt from <a href="#tomjones">
+ <cite>Tom Jones</cite></a> earlier in this document. </p>
+ <p>This is <em>not</em> my address:</p>
+ <address>
+ Mrs. Mary McGoon
+ <br>
+ 901 Main Street
+ <br>
+ Kenosha, WI 38492
+ </address>
+ <p>Now for a paragraph with <b>boldfaced text</b>, <big>big text,
+ <big>bigger text, <big>biggest text,</big></big></big> and
+ <br>three <br>line <br>breaks.</p>
+ <blockquote>
+ When in the Course of human events, it becomes necessary
+ for one people to dissolve the political bands which have
+ connected them with another, and to assume among the powers
+ of the earth, the separate and equal station to which the
+ Laws of Nature and of Nature's God entitle them, a decent
+ respect to the opinions of mankind requires that they should
+ declare the causes which impel them to the separation.
+ </blockquote>
+ <center>
+ <font color="red" face="sans-serif" size="+2">
+ This text is big and centered <br>
+ so it will stand out.
+ </font>
+ </center>
+ <h1>An &lt;h1&gt;</h1>
+ <p>Blah blah blah</p>
+ <h2>An &lt;h2&gt;</h2>
+ <p>Blah blah blah</p>
+ <h3>An &lt;h3&gt;</h3>
+ <p>Blah blah blah</p>
+ <h4>An &lt;h4&gt;</h4>
+ <p>Blah blah blah</p>
+ <h5>An &lt;h5&gt;</h5>
+ <p>Blah blah blah</p>
+ <h6>An &lt;h6&gt;</h6>
+ <p>Blah blah blah</p>
+ <p><nobr>Now here's a really, really, really long sentence that's
+ coded with the &lt;nobr&gt; tag. This should run on and on and
+ on and on and eventually it should run all the way off the page
+ and into the void.</nobr> This text appears after the &lt;nobr&gt; tag.</p>
+ <h1>A short code listing</h1>
+ <p>Here's a simple Java program, formatted with the &lt;pre&gt; element:</p>
+ <pre>public class Sample
+{
+ public static void main(String [] args)
+ {
+ System.out.println("Hello, World!");
+
+ for (int i = 0; i &lt; 5; i++)
+ {
+ System.out.print("How");
+ System.out.print("dy! ");
+ }
+
+ System.out.println();
+ }
+}
+ </pre>
+ <h2>More HTML elements</h2>
+ <p>This paragraph tests out the <samp>sample element
+ (&lt;samp&gt;)</samp>, <small>small text (&lt;small&gt;)</small>,
+ <sub>sub</sub>script text, <sup>super</sup>script text, a
+ <kbd>keyboard</kbd> command, and a <var>variable</var> name.</p>
+ </body></html>
@@ -0,0 +1,83 @@
+#!/bin/sh
+#
+# Convert HTML documents to PDF
+#
+# Copyright (C) 2004 Klaus Weidner <kweidner@pobox.com>
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# Configure this to point to the location of the saved memory image.
+# Generate it as follows:
+#
+# clisp -x "(asdf::oos 'asdf:load-op :xml-render) (tt::save-image)"
+# gzip -9 clisp-xml-render.mem
+# mv clisp-xml-render.mem.gz ~/lisp/images/clisp
+#
+IMAGE="$HOME/lisp/images/clisp/clisp-xml-render.mem.gz"
+
+# Location of GNU CLISP binary
+#CLISP=/usr/lib/clisp/full/lisp.run
+CLISP=clisp
+
+# WARNING: creates fixed-name temp files in current working directory.
+# Don't use it if current dir is writable for untrusted users.
+
+# Run through W3C "tidy" utility to clean up noncompliant HTML and
+# convert to XHTML. See http://tidy.sourceforge.net/
+#
+# Not needed if input is already valid XHTML. Comment out the next
+# line if you don't want to use it.
+[ -z "$TIDY" ] && TIDY=$(which tidy)
+
+# Optional: clisp generates uncompressed PDF. Use the "PDF Toolkit"
+# (pdftk) to compress it. See http://www.accesspdf.com/pdftk/
+#
+# Comment out the next line if you don't want to use it.
+# FIXME: pdftk fails on output generated by v66 cl-pdf ?!
+#[ -z "$PDFTK" ] && PDFTK=$(which pdftk)
+
+### End of user configurable section
+
+Usage () {
+ echo "Usage: $(basename $0) FILE.html
+Creates FILE.pdf in current working directory." >&2
+ exit 1
+}
+
+[ $# -eq 1 ] || Usage
+
+IN="$1"
+OUT=$(basename "$IN" .html).pdf
+
+if [ -x "$TIDY" ]
+then
+ XML=$(basename "$IN").tmp.xhtml
+ "$TIDY" --quiet yes --show-warnings 0 -wrap 0 -asxhtml "$IN" > "$XML"
+else
+ XML="$IN"
+fi
+
+# Do the conversion
+$CLISP -q -q -M $IMAGE -- "$XML" "$OUT"
+
+[ -x "$TIDY" ] && rm -f "$XML"
+
+[ -x "$PDFTK" ] && {
+ "$PDFTK" "$OUT" output "$OUT.new" compress && mv "$OUT.new" "$OUT"
+}
@@ -0,0 +1,3 @@
+
+Klaus Weidner's XHTML renderer.
+
@@ -0,0 +1,31 @@
+;;;; -*- Mode: LISP; Syntax: ANSI-Common-Lisp; Base: 10 -*-
+
+;; How to use this:
+;;
+;; Get Marc Battyani's "cl-typesetting" and "cl-pdf" packages:
+;; http://www.fractalconcept.com/asp/html/cl-typesetting.html
+;;
+;; and Miles Egan's xmls parser:
+;; http://common-lisp.net/project/xmls/
+;;
+;; Then load this package and use as follows:
+;; (tt::xhtml-to-pdf "everything.html" "/tmp/output.pdf")
+;;
+;; If you have clisp, you may want to use the included shell script
+;; "html2pdf" for command line use. Read the script comments for more details.
+
+(in-package :asdf)
+
+(defsystem :xml-render
+ :name "xml-render"
+ :author "Klaus Weidner <klaus@atsec.com>"
+ :version "2.1.1"
+ :maintainer "Klaus Weidner <klaus@atsec.com>"
+ :licence "BSD like license"
+ :description "none"
+ :long-description ""
+ :perform (load-op :after (op xml-render)
+ (pushnew :xml-render cl:*features*))
+ :components ((:file "xml-xform"))
+ :depends-on (:cl-typesetting
+ :xmls))
Oops, something went wrong.

0 comments on commit 1bcda55

Please sign in to comment.