Skip to content
WeasyPrint converts web documents (HTML with CSS, SVG, …) to PDF. http://weasyprint.org/
Python CoffeeScript
Find file
Pull request Compare This branch is 1 commit ahead, 1634 commits behind Kozea:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
weasy
.gitignore
.pylintrc
COPYING
README
setup.py
test_requirements
weasyprint.py

README

WeasyPrint converts web documents (HTML, CSS, ...) to PDF.

See the documentation at http://weasyprint.org/


Dependencies
------------

Listed in setup.py, will install automatically if you use easy_install or pip:

 * html5lib
 * lxml
 * cssutils
 * Attest

Not listed in setup.py since they are either not on PyPI or tricky to compile.
You need to install these manually:

 * PyCairo
 * PyGTK
 * python-rsvg

About the PyGTK dependency
--------------------------

WeasyPrint does not use GTK+, but it uses Pango for text rendering and rsvg for
SVG rendering. Both of them can work work without GTK+, but their Python
bindings either are part of PyGTK (for Pango) or depend on PyGTK (for rsvg).

If someday we have GObject introspection for all of Pango, rsvg and cairo
we can switch to those and drop the PyGTK dependency.

Standards conformance
---------------------

WeasyPrint strives for web standards conformance. For some standards however,
conformance is just that of the libraries we use:

 * HTML parsing: (turning bytes into a DOM tree), we currently use lxml.html
   (see below.)
 * CSS parsing: cssutils
 * CSS selectors: lxml.cssselect (conforms to CSS3 with some exceptions,
   see http://lxml.de/cssselect.html#limitations)
 * SVG: rsvg

The biggest part where WeasyPrint only has itself to blame about conformance is
the graphical rendering and layout of documents. (That is: all of CSS but syntax
and selectors.)

Inline SVG
----------

SVG, even when inlined in the HTML document, is rendered by the rsvg library
independently of the rest of the document. In CSS speak, we consider it to be
a “replaced element”.

HTML parsing
------------

We use lxml to parse HTML into an object tree. lmxl’s own parser is very fast,
but it can optionnaly use the html5lib parser. html5lib implements the HTML5
parsing algorithm so it should give better results on broken HTML, though
“they all parse pretty-good HTML the same.” [1]

[1] http://stackoverflow.com/questions/2676872/how-to-parse-malformed-html-in-python-using-standard-libraries/2680724#2680724

lxml vs ElementTree
-------------------

lxml uses the same API as ElementTree so that some programs can use any of them.
However we need lxml.cssselect, which does not exist in ElementTree.

Something went wrong with that request. Please try again.