Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 51 lines (32 sloc) 1.446 kb
1968238 Add README and LICENSE files to trunk
jgraham.cantab authored
1 html5lib is a pure-python library for parsing HTML. It is designed to
2 conform to the Web Applications 1.0 specification, which has
3 formalized the error handling algorithms of popular web browsers.
4
5 = Installation =
6
7 html5lib is packaged with distutils. To install it use:
8 $ python setup.py install
9
10 = Tests =
11
12 You may wish to check that your installation has been a success by
13 running the testsuite. All the tests can be run by invoking
14 runtests.py in the tests/ directory
15
16 = Usage =
17
18 Simple usage follows this pattern:
19
20 import html5lib
21
22 f = open("mydocument.html")
23
24 parser = html5lib.HTMLParser()
25 document = parser.parse(f)
26
f9dd61c A little more documentation
jgraham.cantab authored
27 By default, the returned document is a simple DOM-like structure which
28 can be navigated using the .parent and .childNode attributes on each
29 element.
30
31 It is also possible to generate an ElementTree tree, this requires the use of the "tree" argument to the parser:
32
33 from html5lib.treebuilders import etree
34 parser = html5lib.HTMLParser(tree=etree.TreeBuilder)
35
36 Intrepid users may write their own treebuilder implementations - see
37 help(html5lib.treebuilders) for more information
1968238 Add README and LICENSE files to trunk
jgraham.cantab authored
38
39 More documentation is avaliable in the docstrings.
40
41 = Bugs =
42
43 Please report any bugs on the issue tracker:
44 http://code.google.com/p/html5lib/issues/list
45
46 = Get Involved =
47
48 Contributions to code or documenation are actively encouraged. Submit
49 patches to the issue tracker or discuss changes on irc in the #whatwg
50 channel on freenode.net
Something went wrong with that request. Please try again.