Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
LPeg-Parsers/README
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
478 lines (387 sloc)
10.8 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| The code herein contains LPeg [1] routines for parsing some common data | |
| formats. The current formats are: | |
| abnf | |
| The core ruleset from RFC-5234. These rules are used often in RFCs. | |
| ascii | |
| ascii.char | |
| ascii.control | |
| ascii.ctrl | |
| Match a single ASCII character. The top level module will match | |
| both graphical charaters and control characters. The "ascii.char" | |
| module only matches the graphical characters; the "ascii.control" | |
| module only matches the control codes. The "ascii.ctrl" will return | |
| the name of a control character, or nil if not a control character. | |
| iso | |
| iso.char | |
| iso.control | |
| iso.ctrl | |
| Match a single ISO character. The top level module will match both | |
| graphical, control characters and control sequences. The "iso.char" | |
| only matches the ISO graphical characters; the iso.control" module | |
| mathces only control characters (for example, <ESC>E or \133). The | |
| "iso.ctrl" will return the name of the control character, plus any | |
| associated data as appropriate. For example: | |
| <ESC>[32;40m | |
| will return a name of "SGR" and a two element array containing 32 | |
| and 40. | |
| NOTE: These modules only deal with the ISO defined characters, and | |
| will NOT match those defined by ASCII. To match a graphical character | |
| that matches both ASCII and ISO: | |
| char = require "org.conman.parsers.ascii.char | |
| + require "org.conman.parsers.iso.char | |
| Parses email headers as defined in: | |
| RFC-0822 Internet Message Format | |
| RFC-1036 Standard for Interchange of USENET Messages | |
| RFC-2045 Multipurpose Internet Mail Extensions I | |
| RFC-2046 Multipurpose Internet Mail Extensions II | |
| RFC-2047 Multipurpose Internet Mail Extensions III | |
| RFC-2048 Multipurpose Internet Mail Extensions IV | |
| RFC-2369 The Use of URLs as Meta-Syntax for Core Mail | |
| List Commands and their Transport through | |
| Message Header Fields | |
| RFC-2822 Internet Message Format | |
| RFC-2919 A Structured Field and Namespace for the Identification of Mailing Lists | |
| RFC-5064 The Archived-At Message Header Field | |
| RFC-5322 Internet Message Format | |
| Headers are returned in a Lua table. For example, the following | |
| headers: | |
| Return-Path: <sean@conman.org> | |
| Received: from brevard.conman.org (brevard.conman.org | |
| [66.252.224.242]) | |
| by mail.example.com (Postfix) | |
| with ESMTP id 538562EA5D07 | |
| for <sherlock@example.com>; | |
| Fri, 28 Dec 2012 21:40:11 -0500 | |
| From: Sean Conner <sean@conman.org> | |
| To: Sherlock Holmes <sherlock@example.com>, | |
| the-scooby-gang: Fred <fred@example.net>, | |
| Daphne <daphne@example.net>, | |
| Velma <velma@example.net>, | |
| Shaggy <shaggy@example.net>, | |
| Scobby-Doo <scooby@example.net>;, | |
| The Batman <batman@example.org> | |
| Subject: I know who did it! | |
| Date: Fri, 28 Dec 2012 21:40:11 -0500 | |
| Message-ID: <1234.5678.90abcd@conman.org> | |
| Will return the following Lua table: | |
| { | |
| received = | |
| { | |
| [1] = | |
| { | |
| with = "ESMTP", | |
| from = "brevard.conman.org", | |
| id = "538562EA5D07", | |
| when = | |
| { | |
| min = 0.000000, | |
| zone = -18000.000000, | |
| day = 28.000000, | |
| month = 12.000000, | |
| year = 2012.000000, | |
| sec = 1.000000, | |
| hour = 1.000000, | |
| weekday = "Fri", | |
| }, | |
| for = | |
| { | |
| address = "sherlock@example.com", | |
| }, | |
| by = "mail.example.com", | |
| }, | |
| }, | |
| to = | |
| { | |
| [1] = | |
| { | |
| name = "Sherlock Holmes", | |
| address = "sherlock@example.com", | |
| }, | |
| [2] = | |
| { | |
| ['the-scooby-gang'] = | |
| { | |
| [1] = | |
| { | |
| name = "Fred", | |
| address = "fred@example.net", | |
| }, | |
| [2] = | |
| { | |
| name = "Daphne", | |
| address = "daphne@example.net", | |
| }, | |
| [3] = | |
| { | |
| name = "Velma", | |
| address = "velma@example.net", | |
| }, | |
| [4] = | |
| { | |
| name = "Shaggy", | |
| address = "shaggy@example.net", | |
| }, | |
| [5] = | |
| { | |
| name = "Scobby-Doo", | |
| address = "scooby@example.net", | |
| }, | |
| }, | |
| }, | |
| [3] = | |
| { | |
| name = "The Batman", | |
| address = "batman@example.org", | |
| }, | |
| }, | |
| from = | |
| { | |
| [1] = | |
| { | |
| name = "Sean Conner", | |
| address = "sean@conman.org", | |
| }, | |
| }, | |
| date = | |
| { | |
| min = 0.000000, | |
| zone = -18000.000000, | |
| day = 28.000000, | |
| month = 12.000000, | |
| year = 2012.000000, | |
| sec = 1.000000, | |
| hour = 1.000000, | |
| weekday = "Fri", | |
| }, | |
| return_path = | |
| { | |
| [1] = | |
| { | |
| address = "sean@conman.org", | |
| }, | |
| }, | |
| message_id = "1234.5678.90abcd@conman.org", | |
| subject = "I know who did it!", | |
| } | |
| The only fields not supported are the Resent-* fields; they are | |
| rarely used and the semantics are particularly hard to support via | |
| parsing only. These fields, as well as any other fields not | |
| otherwise understood or parsable will end up on a field called | |
| 'generic' with the key being the raw header name. | |
| json | |
| Implements a JSON parser. It requires some additional modules [2] | |
| to run. This will parse a JSON file into a Lua table. The full | |
| grammar is supported, but it expects the input to be valid UTF-8. | |
| A JSON null value will be converted to nil. If you won't want this | |
| behavior, define a global variable called "null" to be the value | |
| you want for a JSON null. | |
| jsons | |
| Another implementation of a JSON parser. This one "streams" the | |
| input, meaning it will handle large JSON files the other one won't, | |
| and is a drop in replacement. You can also pass in a function that | |
| will return more data so you can actually "stream" data into the | |
| parser. | |
| ip | |
| Provides two LPeg patterns, IPv4 and IPv6 which parse and convert | |
| said addresses directly into their network-order binary formats. | |
| ip-text | |
| Provides two LPeg patterns, IPv4 and IPv6 which parse and return said | |
| addresses as text, unlike the ip module above. | |
| ini | |
| Provides a INI file parser that returns a Lua table from a INI | |
| file. A sample INI file such as: | |
| ; we allow "default" values | |
| default = ok | |
| [section1] | |
| var1 = foo | |
| var2 = 12,23,34,54,44 | |
| VAR3 = "var3=foo",33,44,55 | |
| var2 = apple | |
| Var4 = 55 | |
| [section2] | |
| # another comment | |
| ; and so is this one | |
| var1 = foo bar baz ; this is a comment | |
| var2 = "foo bar baz ; this is not a comment" | |
| [section1] | |
| var4=this is a test | |
| var5= this is also a test | |
| var2 = pear | |
| var3 = 88,99 | |
| will result in a Lua table of: | |
| { | |
| section1 = | |
| { | |
| var1 = "foo", | |
| var5 = "this is also a test", | |
| var4 = | |
| { | |
| [1] = "55", | |
| [2] = "this is a test", | |
| }, | |
| var3 = | |
| { | |
| [1] = "var3=foo", | |
| [2] = "33", | |
| [3] = "44", | |
| [4] = "55", | |
| [5] = "88", | |
| [6] = "99", | |
| }, | |
| var2 = | |
| { | |
| [1] = "12", | |
| [2] = "23", | |
| [3] = "34", | |
| [4] = "54", | |
| [5] = "44", | |
| [6] = "apple", | |
| [7] = "pear", | |
| }, | |
| }, | |
| default = "ok", | |
| section2 = | |
| { | |
| var1 = "foo bar baz ", | |
| var2 = "foo bar baz ; this is not a comment", | |
| }, | |
| } | |
| strftime | |
| Parses the format string for strftime() (or os.date() for Lua) and | |
| returns an LPeg expression that can parse that format, with the | |
| exceptions of "%c", "%x" and "%X" (all system specific formats). | |
| For example, the format, "%A, %d %B %Y @ %H:%M:%S" will return the | |
| LPeg expression to parse this: | |
| Monday, 02 July 2018 @ 16:02:48 | |
| into | |
| { | |
| min = 2.000000, | |
| wday = 2.000000, | |
| day = 2.000000, | |
| month = 7.000000, | |
| sec = 48.000000, | |
| hour = 16.000000, | |
| year = 2018.000000, | |
| } | |
| This will even work with other locales, such as "se_NO.UTF-8", | |
| which will allow LPeg to parse: | |
| vuossárga, 02 suoidnemánu 2018 @ 16:02:48 | |
| into | |
| { | |
| min = 2,000000, | |
| wday = 2,000000, | |
| day = 2,000000, | |
| month = 7,000000, | |
| sec = 48,000000, | |
| hour = 16,000000, | |
| year = 2018,000000, | |
| } | |
| url | |
| Parses URLs per RFC-3986. By default, it will handle the following | |
| URL types: | |
| http: | |
| https: | |
| file: | |
| ftp: | |
| Given the following URL: | |
| http://www.conman.org/people/spc/index.cgi?one=1%3F&two=2&three=3#target1 | |
| It will be broken down into a Lua table as follows: | |
| { | |
| scheme = "http", | |
| host = "www.conman.org", | |
| port = 80, | |
| path = "/people/spc/index.cgi", | |
| query = "one=1%3F&two=2&three=3", | |
| fragment = "target1", | |
| } | |
| Other URLs can be parsed, but a URL like: | |
| mailto:sean@conman.org?cc=fred@example.com,velma@example.net&subject=Current%20Mystery | |
| will be broken down as: | |
| { | |
| scheme = "mailto", | |
| path = "sean@conman.org", | |
| query = "cc=fred@example.com,velma@example.net&subject=Current%20Mystery", | |
| } | |
| which may require more parsing than provided here. | |
| url.gopher | |
| Parses "gopher:" URLs per RFC-4266. Given this URL: | |
| gopher://gopher.conman.org/0foobar%09search%20String%09plus | |
| it will be broken down as: | |
| { | |
| scheme = "gopher", | |
| host = "gopher.conman.org", | |
| port = 70.000000, | |
| type = "file", | |
| selector = "foobar", | |
| search = "search String", | |
| plus = "plus", | |
| } | |
| If you need to parse other URLs in addition to "gopher:" types, | |
| you can do: | |
| gopher = require "org.conman.parsers.url.gopher" | |
| url = require "org.conman.parsers.url" | |
| url = gopher + url | |
| info = url:match(my_url) | |
| url.siptel | |
| Parses "sip:" and "sips:" URIs per RFC-3261. | |
| Parses "tel:" URIs per RFC-3966. | |
| Examples: | |
| sip = require "org.conman.parsers.url.sip" | |
| u = sip:match [[sip:annc@example.com;play=file://fs.example.net//clips/my-intro.dvi;content-type=video/mpeg%3bencode%d3314M-25/625-50]] | |
| results in: | |
| { | |
| host = "example.com", | |
| port = 5060.000000, | |
| user = "annc", | |
| scheme = "sip", | |
| parameters = | |
| { | |
| play = "file://fs.example.net//clips/my-intro.dvi", | |
| ["content-type"] = "video/mpeg%3bencode%d3314M-25/625-50", | |
| }, | |
| } | |
| and | |
| u = sip:match [[sip:+1-(555)-555-1212;ext=1234@example.net;user=phone]] | |
| results in: | |
| { | |
| host = "example.net", | |
| port = 5060.000000, | |
| user = | |
| { | |
| number = "15555551212", | |
| global = true, | |
| parameters = | |
| { | |
| ext = "1234", | |
| }, | |
| }, | |
| scheme = "sip", | |
| parameters = | |
| { | |
| user = "phone", | |
| }, | |
| } | |
| and | |
| tel = require "org.conman.parsers.url.tel" | |
| u = tel:match "tel:+1-(555)-555-1212;ext=1234" | |
| results in: | |
| { | |
| scheme = "tel", | |
| number = "15555551212", | |
| global = true, | |
| parameters = | |
| { | |
| ext = "1234", | |
| }, | |
| } | |
| If you need to parse other URLs in addition to these types, | |
| you can do: | |
| siptel = require "org.conman.parsers.url.sip" | |
| url = require "org.conman.parsers.url" | |
| url = siptel + url | |
| info = url:match(my_url) | |
| soundex.lua | |
| Implements the Soundex algorithm. | |
| [1] http://www.inf.puc-rio.br/~roberto/lpeg/ | |
| [2] https://github.com/spc476/lua-conmanorg |