Skip to content
Fast XML parser for Lua
Lua Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md update version to 2.1.2 Oct 13, 2019
example.lua split xmlparser and xmllpegparser Jul 7, 2019
example.xml xmllpegparser.mkReplaceEntities and xmllpegparser.mkVisitor Apr 21, 2019
tag.sh
test.lua
xmllpegparser-2.1-2.rockspec
xmllpegparser.lua #2: fix parser with a visitor who has tag() without proc() or vice ve… Oct 13, 2019

README.md

xmllpegparser

xmllpegparser is a fast XML parser who uses LPeg library.

  1. Installation
  2. Test
  3. xmllpegparser API
    1. Document structure (default parser)
    2. Parser structure
    3. Visitor structure
    4. Default parser limitations
  4. Licence

Installation

luarocks install --local https://raw.githubusercontent.com/jonathanpoelen/lua-xmllpegparser/master/xmllpegparser-2.1-2.rockspec

# or in your local directory lua-xmllpegparser

luarocks make --local xmllpegparser-2.1-2.rockspec

Test

Run ./example.lua.

./example.lua xmlfile [replaceentities]

replaceentities = anything, only to enable replacement of entities.

xmllpegparser API

  • xmllpegparser.parse(xmlstring[, visitorOrsubEntities[, visitorInitArgs...]]):
    Returns a tuple document table, (string error or nil) (see visitor.finish).
    If subEntities is true, the entities are replaced and a tentity member is added to the document table.
  • xmllpegparser.parseFile(filename[, visitorOrsubEntities[, visitorInitArgs...]]):
    Returns a tuple document table, error file or error document.
  • xmllpegparser.defaultEntitiyTable():
    Returns the default entity table ( { quot='"', ... }).
  • xmllpegparser.createEntityTable(docEntities[, resultEntities]):
    Creates an entity table from the document entity table. Return resultEntities.
  • xmllpegparser.mkReplaceEntities(entityTable_or_func):
    Returns a lpeg replace entities context: str = ctx:match(str).
  • xmllpegparser.replaceEntities(s, entityTable_or_func):
    Returns a string.
  • xmllpegparser.parser(visitor[, safeVisitor:bool]):
    Returns a parser. If all visitor functions return nil (excepted accuattr, init and finish), then safeVisitor may be true and the parser will optimize the visitor's calls.
  • xmllpegparser.lazyParser(visitorCreator):
    Returns a parser.
    xmllpegparser.parser(visitorCreator()) is used on the first call of myparser.parse(...).
  • xmllpegparser.mkVisitor(evalEntities:bool, defaultEntities:table|function|nil, withoutPosition):
    If not defaultEntities and evalEntities then defaultEntities = defaultEntityTable.
    If withoutPosition, then pos parameter does not exist for the visitor functions except for finish.
  • xmllpegparser.treeParser:
    The default parser used by xmllpegparser.parse(s, false)
  • xmllpegparser.treeParserWithReplacedEntities:
    The default parser used by xmllpegparser.parse(s, true)
  • xmllpegparser.treeParserWithoutPos:
    Parser without pos parameter
  • xmllpegparser.treeParserWithoutPosWithReplacedEntities:
    Parser without pos parameter
  • xmllpegparser.enableWithoutPosParser([bool]):
    Enable default parser with treeParserWithoutPos* version.
    enableParserWithoutPos(false) is same to setDefaultParsers().
    Returns the previous parsers.
  • xmllpegparser.setDefaultParsers(parser, parserWithReplacedEntities|bool|nil):
    If parserWithReplacedEntities == true, then parserWithReplacedEntities = p.
    nil or false value restore the default parser.
    Returns the previous parsers.

Document structure (default parser)

-- pos member = index of string
document = {
  children = {
    { pos=integer, parent=table or nil, text=string[, cdata=true] } or
    { pos=integer, parent=table or nil, tag=string, attrs={ { name=string, value=string }, ... }, children={ ... } },
    ...
  },
  bad = { children={ ... } } -- if the number of closed nodes is greater than the open nodes. parent always refers to bad
  preprocessor = { { pos=integer, tag=string, attrs={ { name=string, value=string }, ... } },
  error = string, -- if error
  lastpos = numeric, -- last known position of parse()
  entities = { { pos=integer, name=string, value=string }, ... },
  tentities = { name=value, ... } -- only if subEntities = true
}

Parser structure

{
  parse = function(xmlstring, visitorInitArgs...) ... end,
  parseFile = function(filename, visitorInitArgs...) ... end,
  __call = function(xmlstring, visitorInitArgs...) ... end,
}

Visitor structure

Each member is optionnal.

{
  withPos = bool -- indicates if pos parameter exists in function parameter (except `finish`)
  init = function(...), -- called before parsing, returns the position of the beginning of macth or nil
  finish = function(err, pos, xmlstring), -- called after parsing
  proc = function(pos, name, attrs), -- <?...?>
  entity = function(pos, name, value),
  doctype = function(pos, name, cat, path), -- called after all addEntity
  accuattr = function(table, name, value), -- `table` is an accumulator that will be transmitted to tag.attrs. Set to `false` for disable this function.
                                           -- If `nil` and `tag` is `not nil`, a default accumalator is used.
                                           -- If `false`, the accumulator is disabled.
                                           -- (`tag(pos, name, accuattr(accuattr({}, attr1, value1), attr2, value2)`)
  tag = function(pos, name, attrs), -- for a new tag (`<a>` or `<a/>`)
  open = function(), -- only for a open node (`<a>`), called after `tag`.
  close = function(name),
  text = function(pos, text),
  cdata = function(pos, text), -- or `text` if nil 
  comment = function(str)
}

Default parser limitations

  • Non-validating
  • No DTD support
  • Ignore processing instructions
  • Ignore DOCTYPE, parse only ENTITY
  • If several attributes have the same name (allowed by the standard), only the last is kept.

Licence

MIT license

You can’t perform that action at this time.