Fetching contributors…
Cannot retrieve contributors at this time
83 lines (48 sloc) 2.01 KB
Why lxml?
.. contents::
1 Motto
2 Aims
"the thrills without the strangeness"
To explain the motto:
"Programming with libxml2 is like the thrilling embrace of an exotic stranger.
It seems to have the potential to fulfill your wildest dreams, but there's a
nagging voice somewhere in your head warning you that you're about to get
screwed in the worst way." (`a quote by Mark Pilgrim`_)
Mark Pilgrim was describing in particular the experience a Python programmer
has when dealing with libxml2. The default Python bindings of libxml2 are
fast, thrilling, powerful, and your code might fail in some horrible way that
you really shouldn't have to worry about when writing Python code. lxml
combines the power of libxml2 with the ease of use of Python.
.. _`a quote by Mark Pilgrim`:
The C libraries libxml2_ and libxslt_ have huge benefits:
* Standards-compliant XML support.
* Support for (broken) HTML.
* Full-featured.
* Actively maintained by XML experts.
* fast. fast! FAST!
.. _libxml2:
.. _libxslt:
These libraries already ship with Python bindings, but these Python bindings
mimic the C-level interface. This yields a number of problems:
* very low level and C-ish (not Pythonic).
* underdocumented and huge, you get lost in them.
* UTF-8 in API, instead of Python unicode strings.
* Can easily cause segfaults from Python.
* Require manual memory management!
lxml is a new Python binding for libxml2 and libxslt, completely independent
from these existing Python bindings. Its aims:
* Pythonic API.
* Documented.
* Use Python unicode strings in API.
* Safe (no segfaults).
* No manual memory management!
lxml aims to provide a Pythonic API by following as much as possible the
`ElementTree API`_. We're trying to avoid inventing too many new APIs, or you
having to learn new things -- XML is complicated enough.
.. _`ElementTree API`: