A simple Objective-C block-based wrapper for libxml2.
Objective-C HTML
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



A simple Objective-C block-based wrapper for the libxml2 SAX xml and HTML parsers.

This project contains the MirrorXML library, along with a couple examples that use it: MXHTMLToAttributedString, and MXFeed (a simple RSS parser).

The advantage with this library, is that you get a SAX parser, but you can match patterns with basic XPath style selectors. Handlers for each parsed item are written as Objective-C blocks. Handlers can be nested, which makes your code 'mirror' the structure of the XML file.

####To use MirrorXML in your own code:

Make sure to add the libxml2 framework to your project.

Import MirrorXML.h, which includes references to the rest of the necessary header files.

####Parsing XML/HTML

Generally, first you create an MXParser/MXHTMLParser instance each time you start to parse a new file.

  • Then you assign it a new instance of MXHandlerList.
  • Then you assign an array of MXHandler's to the MXHandlerList.
  • Then you call parseDataChunk: on the parser instance until you run out of data.
  • Then you call dataFinished.

MXHandler instances are initialized with a pattern to match. This can either be an instance of MXPattern, or a string (for convenience). MXPattern instances are immutable and can be re-used.

Patterns support a subset of XPath features. (Basically they don't support lookahead or relative positioning selectors.)

A pattern can simply be the name of an xml tag you with to match in the current context. E.g. @"title".

A pattern can be a fully specified path. E.g. @"/rss/channel/title".

Use two slashes at the beginning to match anywhere. E.g. @"//uuid".

A star selects any type of element. E.g. @"/rss/channel/item/*".

A pipe character acts like an 'or' operator. E.g. @"//b|//strong".

Namespaces are also supported by passing a dictionary when creating the pattern or handler - the keys are namespace abbreviations and the values are the full namespace reference. e.g. @"//fh:updated" along with @{@"fh":@"com.mynamespace.updated"}.

Each MXHandler instance can reference several blocks. They are called when the parser matches the pattern associated with the MXHandler.

Entry handler block: An entry handler block is called after all the attributes are parsed, and before any sub elements or text are parsed within the matched node. The entry handler block must return an array of new MXHandlers, or nil.

Any MXHandlers you return from the entry handler are only used while the parser is in the current XML node. Once it exits this node, they are dropped. The root of the patterns specified for these "sub-handlers" is relative to the current XML node, not the entire document.

MXHandler -initRootExit: This is a special instance of MXHandler that is called when the current node is finished. It is useful in that it can be returned from the context of an entry handler block. As such it will retain references to any local objects you created in the entry handler block. See MXHTMLToAttributedString for examples of how this is used.

Exit handler block: This is called after the entire node is parsed, including text and other nodes. The MXElement parameter will have its text property set with any text found in the node during parsing.

OK that's all for now. If there is further interest in this library, I'll improve this documentation.