spc476/LPeg-Parsers
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
master
Could not load branches
Nothing to show
Could not load tags
Nothing to show
{{ refName }}
default
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code
-
Clone
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
I know why people keep adding ANSI codes to text files---it's because they want to be colorful. But they don't realize that this is a possible attack vector (from setting the foreground and background to the same color, to resizing the window, to possibly sending (or redefining a key to send) "rm -rf *" to the shell). I want none of that.
c2f69b0
Git stats
Files
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
The code herein contains LPeg [1] routines for parsing some common data
formats. The current formats are:
abnf
The core ruleset from RFC-5234. These rules are used often in RFCs.
ascii
ascii.char
ascii.control
ascii.ctrl
Match a single ASCII character. The top level module will match
both graphical charaters and control characters. The "ascii.char"
module only matches the graphical characters; the "ascii.control"
module only matches the control codes. The "ascii.ctrl" will return
the name of a control character, or nil if not a control character.
iso
iso.char
iso.control
iso.ctrl
Match a single ISO character. The top level module will match both
graphical, control characters and control sequences. The "iso.char"
only matches the ISO graphical characters; the iso.control" module
mathces only control characters (for example, <ESC>E or \133). The
"iso.ctrl" will return the name of the control character, plus any
associated data as appropriate. For example:
<ESC>[32;40m
will return a name of "SGR" and a two element array containing 32
and 40.
NOTE: These modules only deal with the ISO defined characters, and
will NOT match those defined by ASCII. To match a graphical character
that matches both ASCII and ISO:
char = require "org.conman.parsers.ascii.char
+ require "org.conman.parsers.iso.char
email
Parses email headers as defined in:
RFC-0822 Internet Message Format
RFC-1036 Standard for Interchange of USENET Messages
RFC-2045 Multipurpose Internet Mail Extensions I
RFC-2046 Multipurpose Internet Mail Extensions II
RFC-2047 Multipurpose Internet Mail Extensions III
RFC-2048 Multipurpose Internet Mail Extensions IV
RFC-2369 The Use of URLs as Meta-Syntax for Core Mail
List Commands and their Transport through
Message Header Fields
RFC-2822 Internet Message Format
RFC-2919 A Structured Field and Namespace for the Identification of Mailing Lists
RFC-5064 The Archived-At Message Header Field
RFC-5322 Internet Message Format
Headers are returned in a Lua table. For example, the following
headers:
Return-Path: <sean@conman.org>
Received: from brevard.conman.org (brevard.conman.org
[66.252.224.242])
by mail.example.com (Postfix)
with ESMTP id 538562EA5D07
for <sherlock@example.com>;
Fri, 28 Dec 2012 21:40:11 -0500
From: Sean Conner <sean@conman.org>
To: Sherlock Holmes <sherlock@example.com>,
the-scooby-gang: Fred <fred@example.net>,
Daphne <daphne@example.net>,
Velma <velma@example.net>,
Shaggy <shaggy@example.net>,
Scobby-Doo <scooby@example.net>;,
The Batman <batman@example.org>
Subject: I know who did it!
Date: Fri, 28 Dec 2012 21:40:11 -0500
Message-ID: <1234.5678.90abcd@conman.org>
Will return the following Lua table:
{
received =
{
[1] =
{
with = "ESMTP",
from = "brevard.conman.org",
id = "538562EA5D07",
when =
{
min = 0.000000,
zone = -18000.000000,
day = 28.000000,
month = 12.000000,
year = 2012.000000,
sec = 1.000000,
hour = 1.000000,
weekday = "Fri",
},
for =
{
address = "sherlock@example.com",
},
by = "mail.example.com",
},
},
to =
{
[1] =
{
name = "Sherlock Holmes",
address = "sherlock@example.com",
},
[2] =
{
['the-scooby-gang'] =
{
[1] =
{
name = "Fred",
address = "fred@example.net",
},
[2] =
{
name = "Daphne",
address = "daphne@example.net",
},
[3] =
{
name = "Velma",
address = "velma@example.net",
},
[4] =
{
name = "Shaggy",
address = "shaggy@example.net",
},
[5] =
{
name = "Scobby-Doo",
address = "scooby@example.net",
},
},
},
[3] =
{
name = "The Batman",
address = "batman@example.org",
},
},
from =
{
[1] =
{
name = "Sean Conner",
address = "sean@conman.org",
},
},
date =
{
min = 0.000000,
zone = -18000.000000,
day = 28.000000,
month = 12.000000,
year = 2012.000000,
sec = 1.000000,
hour = 1.000000,
weekday = "Fri",
},
return_path =
{
[1] =
{
address = "sean@conman.org",
},
},
message_id = "1234.5678.90abcd@conman.org",
subject = "I know who did it!",
}
The only fields not supported are the Resent-* fields; they are
rarely used and the semantics are particularly hard to support via
parsing only. These fields, as well as any other fields not
otherwise understood or parsable will end up on a field called
'generic' with the key being the raw header name.
json
Implements a JSON parser. It requires some additional modules [2]
to run. This will parse a JSON file into a Lua table. The full
grammar is supported, but it expects the input to be valid UTF-8.
A JSON null value will be converted to nil. If you won't want this
behavior, define a global variable called "null" to be the value
you want for a JSON null.
jsons
Another implementation of a JSON parser. This one "streams" the
input, meaning it will handle large JSON files the other one won't,
and is a drop in replacement. You can also pass in a function that
will return more data so you can actually "stream" data into the
parser.
ip
Provides two LPeg patterns, IPv4 and IPv6 which parse and convert
said addresses directly into their network-order binary formats.
ip-text
Provides two LPeg patterns, IPv4 and IPv6 which parse and return said
addresses as text, unlike the ip module above.
ini
Provides a INI file parser that returns a Lua table from a INI
file. A sample INI file such as:
; we allow "default" values
default = ok
[section1]
var1 = foo
var2 = 12,23,34,54,44
VAR3 = "var3=foo",33,44,55
var2 = apple
Var4 = 55
[section2]
# another comment
; and so is this one
var1 = foo bar baz ; this is a comment
var2 = "foo bar baz ; this is not a comment"
[section1]
var4=this is a test
var5= this is also a test
var2 = pear
var3 = 88,99
will result in a Lua table of:
{
section1 =
{
var1 = "foo",
var5 = "this is also a test",
var4 =
{
[1] = "55",
[2] = "this is a test",
},
var3 =
{
[1] = "var3=foo",
[2] = "33",
[3] = "44",
[4] = "55",
[5] = "88",
[6] = "99",
},
var2 =
{
[1] = "12",
[2] = "23",
[3] = "34",
[4] = "54",
[5] = "44",
[6] = "apple",
[7] = "pear",
},
},
default = "ok",
section2 =
{
var1 = "foo bar baz ",
var2 = "foo bar baz ; this is not a comment",
},
}
strftime
Parses the format string for strftime() (or os.date() for Lua) and
returns an LPeg expression that can parse that format, with the
exceptions of "%c", "%x" and "%X" (all system specific formats).
For example, the format, "%A, %d %B %Y @ %H:%M:%S" will return the
LPeg expression to parse this:
Monday, 02 July 2018 @ 16:02:48
into
{
min = 2.000000,
wday = 2.000000,
day = 2.000000,
month = 7.000000,
sec = 48.000000,
hour = 16.000000,
year = 2018.000000,
}
This will even work with other locales, such as "se_NO.UTF-8",
which will allow LPeg to parse:
vuossárga, 02 suoidnemánu 2018 @ 16:02:48
into
{
min = 2,000000,
wday = 2,000000,
day = 2,000000,
month = 7,000000,
sec = 48,000000,
hour = 16,000000,
year = 2018,000000,
}
url
Parses URLs per RFC-3986. By default, it will handle the following
URL types:
http:
https:
file:
ftp:
Given the following URL:
http://www.conman.org/people/spc/index.cgi?one=1%3F&two=2&three=3#target1
It will be broken down into a Lua table as follows:
{
scheme = "http",
host = "www.conman.org",
port = 80,
path = "/people/spc/index.cgi",
query = "one=1%3F&two=2&three=3",
fragment = "target1",
}
Other URLs can be parsed, but a URL like:
mailto:sean@conman.org?cc=fred@example.com,velma@example.net&subject=Current%20Mystery
will be broken down as:
{
scheme = "mailto",
path = "sean@conman.org",
query = "cc=fred@example.com,velma@example.net&subject=Current%20Mystery",
}
which may require more parsing than provided here.
url.gopher
Parses "gopher:" URLs per RFC-4266. Given this URL:
gopher://gopher.conman.org/0foobar%09search%20String%09plus
it will be broken down as:
{
scheme = "gopher",
host = "gopher.conman.org",
port = 70.000000,
type = "file",
selector = "foobar",
search = "search String",
plus = "plus",
}
If you need to parse other URLs in addition to "gopher:" types,
you can do:
gopher = require "org.conman.parsers.url.gopher"
url = require "org.conman.parsers.url"
url = gopher + url
info = url:match(my_url)
url.siptel
Parses "sip:" and "sips:" URIs per RFC-3261.
Parses "tel:" URIs per RFC-3966.
Examples:
sip = require "org.conman.parsers.url.sip"
u = sip:match [[sip:annc@example.com;play=file://fs.example.net//clips/my-intro.dvi;content-type=video/mpeg%3bencode%d3314M-25/625-50]]
results in:
{
host = "example.com",
port = 5060.000000,
user = "annc",
scheme = "sip",
parameters =
{
play = "file://fs.example.net//clips/my-intro.dvi",
["content-type"] = "video/mpeg%3bencode%d3314M-25/625-50",
},
}
and
u = sip:match [[sip:+1-(555)-555-1212;ext=1234@example.net;user=phone]]
results in:
{
host = "example.net",
port = 5060.000000,
user =
{
number = "15555551212",
global = true,
parameters =
{
ext = "1234",
},
},
scheme = "sip",
parameters =
{
user = "phone",
},
}
and
tel = require "org.conman.parsers.url.tel"
u = tel:match "tel:+1-(555)-555-1212;ext=1234"
results in:
{
scheme = "tel",
number = "15555551212",
global = true,
parameters =
{
ext = "1234",
},
}
If you need to parse other URLs in addition to these types,
you can do:
siptel = require "org.conman.parsers.url.sip"
url = require "org.conman.parsers.url"
url = siptel + url
info = url:match(my_url)
soundex.lua
Implements the Soundex algorithm.
[1] http://www.inf.puc-rio.br/~roberto/lpeg/
[2] https://github.com/spc476/lua-conmanorg
About
Parsing common data formats via LPeg
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published