Parsing common data formats via LPeg
Lua Makefile
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
rockspecs
url
Makefile
README
abnf.lua
email.lua
ini.lua
ip-text.lua
ip.lua
json.lua
jsons.lua

README

The code herein contains LPeg [1] routines for parsing some common data
formats.  The current formats are:

abnf

	The core ruleset from RFC-5234.  These rules are used often in RFCs.

email

	Parses email headers as defined in:

		RFC-0822	Internet Message Format
		RFC-1036	Standard for Interchange of USENET Messages
		RFC-2045	Multipurpose Internet Mail Extensions I
		RFC-2046	Multipurpose Internet Mail Extensions II
		RFC-2047	Multipurpose Internet Mail Extensions III
		RFC-2048	Multipurpose Internet Mail Extensions IV
		RFC-2369	The Use of URLs as Meta-Syntax for Core Mail 
				List Commands and their Transport through 
				Message Header Fields
		RFC-2822	Internet Message Format	
		RFC-2919	A Structured Field and Namespace for the Identification of Mailing Lists
		RFC-5064	The Archived-At Message Header Field
		RFC-5322	Internet Message Format

	Headers are returned in a Lua table.  For example, the following
	headers:

		Return-Path: <sean@conman.org>
		Received: from brevard.conman.org (brevard.conman.org 
			[66.252.224.242])
			by mail.example.com (Postfix) 
			with ESMTP id 538562EA5D07
			for <sherlock@example.com>; 
			Fri, 28 Dec 2012 21:40:11 -0500
		From: Sean Conner <sean@conman.org>
		To: Sherlock Holmes <sherlock@example.com>,
			the-scooby-gang: Fred <fred@example.net>,
				Daphne <daphne@example.net>,
				Velma <velma@example.net>,
				Shaggy <shaggy@example.net>,
				Scobby-Doo <scooby@example.net>;,
			The Batman <batman@example.org>
		Subject: I know who did it!
		Date: Fri, 28 Dec 2012 21:40:11 -0500
		Message-ID: <1234.5678.90abcd@conman.org>

	Will return the following Lua table:

		{
		  received =
		  {
		    [1] =
		    {
		      with = "ESMTP",
		      from = "brevard.conman.org",
		      id = "538562EA5D07",
		      when =
		      {
		        min = 0.000000,
		        zone = -18000.000000,
		        day = 28.000000,
		        month = 12.000000,
		        year = 2012.000000,
		        sec = 1.000000,
		        hour = 1.000000,
		        weekday = "Fri",
		      },
		      for =
		      {
		        address = "sherlock@example.com",
		      },
		      by = "mail.example.com",
		    },
		  },
		  to =
		  {
		    [1] =
		    {
		      name = "Sherlock Holmes",
		      address = "sherlock@example.com",
		    },
		    [2] =
		    {
		      ['the-scooby-gang'] =
		      {
		        [1] =
		        {
		          name = "Fred",
		          address = "fred@example.net",
		        },
		        [2] =
		        {
		          name = "Daphne",
		          address = "daphne@example.net",
		        },
		        [3] =
		        {
		          name = "Velma",
		          address = "velma@example.net",
		        },
		        [4] =
		        {
		          name = "Shaggy",
		          address = "shaggy@example.net",
		        },
		        [5] =
		        {
		          name = "Scobby-Doo",
		          address = "scooby@example.net",
		        },
		      },
		    },
		    [3] =
		    {
		      name = "The Batman",
		      address = "batman@example.org",
		    },
		  },
		  from =
		  {
		    [1] =
		    {
		      name = "Sean Conner",
		      address = "sean@conman.org",
		    },
		  },
		  date =
		  {
		    min = 0.000000,
		    zone = -18000.000000,
		    day = 28.000000,
		    month = 12.000000,
		    year = 2012.000000,
		    sec = 1.000000,
		    hour = 1.000000,
		    weekday = "Fri",
		  },
		  return_path =
		  {
		    [1] =
		    {
		      address = "sean@conman.org",
		    },
		  },
		  message_id = "1234.5678.90abcd@conman.org",
		  subject = "I know who did it!",
		}

	The only fields not supported are the Resent-* fields; they are
	rarely used and the semantics are particularly hard to support via
	parsing only.  These fields, as well as any other fields not
	otherwise understood or parsable will end up on a field called
	'generic' with the key being the raw header name.

json

	Implements a JSON parser.  It requires some additional modules [2]
	to run.  This will parse a JSON file into a Lua table.  The full
	grammar is supported, but it expects the input to be valid UTF-8.

	A JSON null value will be converted to nil.  If you won't want this
	behavior, define a global variable called "null" to be the value
	you want for a JSON null.

jsons

	Another implementation of a JSON parser.  This one "streams" the
	input, meaning it will handle large JSON files the other one won't,
	and is a drop in replacement.  You can also pass in a function that
	will return more data so you can actually "stream" data into the
	parser.

ip

	Provides two LPeg patterns, IPv4 and IPv6 which parse and convert
	said addresses directly into their network-order binary formats.

ip-text

	Provides two LPeg patterns, IPv4 and IPv6 which parse and return said
	addresses as text, unlike the ip module above.

ini

	Provides a INI file parser that returns a Lua table from a INI
	file.  A sample INI file such as:

		; we allow "default" values

		default = ok

		[section1]

		var1 = foo
		var2 = 12,23,34,54,44
		VAR3 = "var3=foo",33,44,55
		var2 = apple
		Var4 = 55

		[section2]
			# another comment
			; and so is this one
		
			var1 = foo bar baz ; this is a comment
			var2 = "foo bar baz ; this is not a comment"

		[section1]

			var4=this is a test
			var5= this is also a test
			var2 = pear
			var3 = 88,99


	will result in a Lua table of:

		{
		  section1 =
		  {
		    var1 = "foo",
		    var5 = "this is also a test",
		    var4 =
		    {
		      [1] = "55",
		      [2] = "this is a test",
		    },
		    var3 =
		    {
		      [1] = "var3=foo",
		      [2] = "33",
		      [3] = "44",
		      [4] = "55",
		      [5] = "88",
		      [6] = "99",
		    },
		    var2 =
		    {
		      [1] = "12",
		      [2] = "23",
		      [3] = "34",
		      [4] = "54",
		      [5] = "44",
		      [6] = "apple",
		      [7] = "pear",
		    },
		  },
		  default = "ok",
		  section2 =
		  {
		    var1 = "foo bar baz ",
		    var2 = "foo bar baz ; this is not a comment",
		  },
		}

url.url

	Parses URLs per RFC-3986.  By default, it will handle the following
	URL types:

		http:
		https:
		file:
		ftp:

	Given the following URL:

		http://www.conman.org/people/spc/index.cgi?one=1%3F&two=2&three=3#target1

	It will be broken down into a Lua table as follows:

		{
		  host = "www.conman.org",
		  path =
		  {
		    [1] = "people",
		    [2] = "spc",
		    [3] = "index.cgi",
		    root = true,
		  },
		  query =
		  {
		    one = "1?",
		    three = "3",
		    two = "2",
		  },
		  scheme = "http",
		  fragment = "target1",
		  port = 80.000000,
		}

	The path is broken down by default.  No normalization is done, so a
	path like:

		/foo/../bar/baz/../../snafu/./fubar

	will be returned as:

		{
		  [1] = "foo",
		  [2] = "..",
		  [3] = "bar",
		  [4] = "baz",
		  [5] = "..",
		  [6] = "..",
		  [7] = "snafu",
		  [8] = ".",
		  [9] = "fubar"
		}

	Other URLs can be parsed, but a URL like:

		mailto:sean@conman.org?cc=fred@example.com,velma@example.net&subject=Current%20Mystery

	will be broken down as:

		{
		  scheme = "mailto",
		  path   =
		  {
		    [1] = "sean@conman.org",
		  },
		  query =
		  {
		    cc      = "fred@example.com,velma@example.net;subject",
		    subject = "Current Mystery",
		  },
		}

	which may require more parsing than provided here.

url.gopher

	Parses "gopher:" URLs per RFC-4266.  Given this URL:

		gopher://gopher.conman.org/0foobar%09search%20String%09plus

	it will be broken down as:

		{
		  host     = "gopher.conman.org",
		  type     = "0",
		  search   = "search String",
		  scheme   = "gopher",
		  plus     = "plus",
		  port     = 70.000000,
		  selector = "foobar",
		}

	If you need to parse other URLs in addition to "gopher:" types,
	you can do:

		gopher = require "org.conman.parsers.url.url"
		url    = require "org.conman.parsers.url.url"
		
		url  = gopher + url
		info = url:match(my_url)	

url.sip

	Parses "sip:" URIs per RFC-3261.  Examples:

		sip = require "org.conman.parsers.url.sip"

		u = sip:match [[sip:annc@example.com;play=file://fs.example.net//clips/my-intro.dvi;content-type=video/mpeg%3bencode%d3314M-25/625-50]]

	results in:

		{
		  host       = "example.com",
		  port       = 5060.000000,
		  user       = "annc",
		  scheme     = "sip",
		  parameters =
		  {
		    play             = "file://fs.example.net//clips/my-intro.dvi",
		    ["content-type"] = "video/mpeg%3bencode%d3314M-25/625-50",
		  },
		}
	
	and 

		u = sip:match [[sip:+1-(555)-555-1212;ext=1234@example.net;user=phone]]

	results in:

		{
		  host = "example.net",
		  port = 5060.000000,
		  user =
		  {
		    number     = "15555551212",
		    global     = true,
		    parameters =
		    {
		      ext = "1234",
		    },
		  },
		  scheme     = "sip",
		  parameters =
		  {
		    user = "phone",
		  },
		}

	If you need to parse other URLs in addition to "sip:" types,
	you can do:

		sip = require "org.conman.parsers.url.sip"
		url = require "org.conman.parsers.url.url"

		url  = sip + url
		info = url:match(my_url)

url.tel

	Parses "tel:" URIs per RFC-3966.  Example:

		tel = require "org.conman.parsers.url.tel"

		u = tel:match "tel:+1-(555)-555-1212;ext=1234"

	results in:

		{
		  scheme = "tel",
		  number = "15555551212",
		  global = true,
		  parameters =
		  {
		    ext = "1234",
		  },
		}

	If you need to parse other URLs in addition to "tel:" types,
	you can do:

		tel = require "org.conman.parsers.url.tel"
		url = require "org.conman.parsers.url.url"

		url  = tel.tel + url
		info = url:match(my_url)

	NOTE:   Unlike the other modules, this one returns a table instead
		of an LPeg expression, due to some other requirements.

[1]	http://www.inf.puc-rio.br/~roberto/lpeg/

[2]	https://github.com/spc476/lua-conmanorg