Objective-C wrapper for HTML parser of libxml2
Pull request Compare This branch is 33 commits behind stklieme:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
HTMLDocument.h
HTMLDocument.m
HTMLNode+XPath.h
HTMLNode+XPath.m
HTMLNode.h
HTMLNode.m
README

README


Objective-C wrapper for HTML parser of libxml2

This HTML parser gives access to libxml2 with Objective-C in Mac OS (Leopard and higher) and iOS.
An optional category provides XPath support.
libxml2 is very fast, for less overhead all recursive tasks are realized with C functions 
The naming is similar to NSXMLDocument (which lacks in iOS).
Unlike NSXMLDocument HTMLDocument does not inherit from HTMLNode, there is no HTMLElement class
and you can't create new documents nor change nodes.

All methods returning a value/object without parameter(s) are declared as read-only properties for providing dot syntax.

Objective-C classes:

HTMLDocument
HTMLNode

Optional category of HTMLNode for XPath support:

HTMLNode+XPath

How to use:

• Add the class files and the (optional) category files to your project
• Add libxml2.dylib to frameworks (Link Binary With Libraries)
• Add $SDKROOT/usr/include/libxml2 to target -> Build Settings > Header Search Paths
• Add -lxml2 to target ->  Build Settings -> other linker flags 
• import HTMLDocument.h and HTMLNode+XPath.h (if needed) header files 


HTMLDocument:

Create an HTMLDocument with one of these init methods

- (id)initWithData:(NSData *)data encoding:(NSStringEncoding )encoding error:(NSError **)error; // designated initializer
- (id)initWithContentsOfURL:(NSURL *)url encoding:(NSStringEncoding )encoding error:(NSError **)error;
- (id)initWithHTMLString:(NSString *)string encoding:(NSStringEncoding )encoding error:(NSError **)error;

The corresponding initializer methods without the encoding parameter assume UTF-8 encoding.
For each initializer method there is also a convenience class method

+ (HTMLDocument *)documentWith…

Get the root node (actually the <html> node ) or the <body> node of the document with 

@property (readonly) HTMLNode *rootNode
@property (readonly) HTMLNode *body


HTMLNode:

In HTMLNode search for node(s) only within the first level of children of the current node with the prefix

- (HTMLNode *)child…
- (NSArray *)children…

or perform a deep search within all descendants of the current node

- (HTMLNode *)descendant…
- (NSArray *)descendants…

the appropriate methods to search with XPath within all descendants are

- (HTMLNode *)node…
- (NSArray *)nodes…

Generic methods to search for a custom XPath are

- (HTMLNode *)nodeForXPath:(NSString *)query error:(NSError **)error;
- (NSArray *)nodesForXPath:(NSString *)query error:(NSError **)error;

There are many methods to look for tag and attribute names and values.

You can obtain the stringValue of the current text node or the textContent of all descendant text nodes
as well as its integerValue, doubleValue (also with a given locale identifier) and dateValue for a format string (also with a given time zone).
By default returning string values are trimmed by whitespace and newline characters. The methods starting with raw return the unfiltered values.


© 2011 Stefan Klieme