GitHub - mrlamm/penman: PENMAN notation (e.g. AMR) in Python

Branch	Status
master
develop

This module models graphs encoded in the PENMAN notation (e.g., AMR). It may be used as a Python library or as a script. It does not include any of the concept inventory or text-generation capabilities of the PENMAN project.

Features

Serialization between graphs and either PENMAN notation or triple conjunctions is provided by the PENMANCodec class's encode(), decode(), and iterdecode() methods. Module-level functions provide a convenient interface to this class:

encode(g) - serialized graph g and return the string
decode(s) - deserialize s and return the graph
load(f) - return all graphs in file f
loads(s) - return all graphs in string s
dump(gs, f) - serialize all graphs in gs and write to file f
dumps(gs) - serialize all graphs in gs and return the string

Passing triples=True to the above functions does (de)serialization to/from conjunctions of triples. The indent parameter of encode(), dump(), and dumps() changes how PENMAN-serialized graphs are indented (by default, they are adaptively indented to line up with their containing node). Deserialized Graph objects may be inspected and queried for their variables (nonterminal node identifiers), triples, etc. For more information, please consult the documentation, and see the example below.

Library Usage

>>> import penman
>>> g = penman.decode('(b / bark :ARG0 (d / dog))')
>>> g.triples()
[Triple(source='b', relation='instance', target='bark'), Triple(source='d', relation='instance', target='dog'), Triple(source='b', relation='ARG0', target='d')]
>>> print(penman.encode(g))
(b / bark
   :ARG0 (d / dog))
>>> print(penman.encode(g, top='d', indent=6))
(d / dog
      :ARG0-of (b / bark))
>>> print(penman.encode(g, indent=False))
(b / bark :ARG0 (d / dog))

Script Usage

$ python penman.py --help
Penman

An API and utility for working with graphs in PENMAN notation.

Usage: penman.py [-h|--help] [-V|--version] [options]

Options:
  -h, --help                display this help and exit
  -V, --version             display the version and exit
  -i FILE, --input FILE     read graphs from FILE instead of stdin
  -o FILE, --output FILE    write output to FILE instead of stdout
  -t, --triples             print graphs as triple conjunctions
  --indent N                indent N spaces per level ("no" for no newlines)
  --amr                     use AMR codec instead of generic PENMAN one

$ python penman.py <<< "(w / want-01 :ARG0 (b / boy) :ARG1 (g / go :ARG0 b))"
(w / want-01
   :ARG0 (b / boy)
   :ARG1 (g / go
            :ARG0 b))

Requirements

Python 2.7 or 3.3+
docopt (for script usage)

PENMAN Notation

The PENMAN project was a large effort at natural language generation, and what I'm calling "PENMAN notation" is more accurately "Sentence Plan Language" (SPL; [Kaspar 1989]), but I'll stick with "PENMAN notation" because it may be a more familiar name to modern users and it also sounds less specific to sentence representations, e.g., in case someone wants to use the format to encode arbitrary graphs.

This module expands the notation slightly to allow for untyped nodes (e.g., (x)) and anonymous relations (e.g., (x : (y))). It is also very permissive for the form of node identifiers (and other atoms). A PEG* definition for the notation is given below (for simplicity, whitespace is not explicitly included; assume all nonterminals can be surrounded by /\s+/):

Start    <- Node
Node     <- '(' NodeData ')'
NodeData <- Variable ('/' NodeType)? Edge*
NodeType <- Atom
Variable <- Atom
Edge     <- Relation Value
Relation <- /:[^\s(),]*/
Value    <- Node | Atom
Atom     <- String | Float | Integer | Symbol
String   <- /"[^"\\]*(?:\\.[^"\\]*)*"/
Float    <- /[-+]?(((\d+\.\d*|\.\d+)([eE][-+]?\d+)?)|\d+[eE][-+]?\d+)/
Integer  <- /[-+]?\d+/
Symbol   <- /[^\s()\/,]+/

* Note: I use | above for ordered-choice instead of / so that / can be used to surround regular expressions.

A more restricted variant of the grammar for AMR might make the ('/' NodeType) group required, and NodeTypes (maybe renamed "Concepts") could be given as a disjunction of allowed names. Similarly, Relations could be a disjunction of allowed names and possible inversions, or otherwise require at least one character after :. It might also restrict Variables to a form like /[a-z]+\d*/ and also restrict Atom values in some way. The included AMRCodec employs most of these restrictions and raises DecodeErrors for graphs it deems invalid. See also Nathan Schneider's PEG for AMR.

Disclaimer

This project is not affiliated with ISI, the PENMAN project, or the AMR project.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
docs		docs
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
penman.py		penman.py
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Library Usage

Script Usage

Requirements

PENMAN Notation

Disclaimer

About

Releases

Packages

Languages

License

mrlamm/penman

Folders and files

Latest commit

History

Repository files navigation

Features

Library Usage

Script Usage

Requirements

PENMAN Notation

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages