Skip to content

Parse CL syntactic constructs in s-expression form (possibly represented as e.g. CSTs) and construct syntax trees

License

Notifications You must be signed in to change notification settings

scymtym/s-expression-syntax

Repository files navigation

s-expression-syntax README

Introduction

The s-expression-syntax library provides declarative rules for parsing the various kinds of s-expression syntax used in Common Lisp:

  • Special operators
  • Lambda lists
  • Declarations
  • Type specifiers
  • Standard macros
    • ~cl:loop~ (work in progress)
  • Feature expressions
  • Format control strings (work in progress)

Note that this library is not a code walker but it could be used to implement a code walker.

Concepts

Explaining things in more detail requires a little bit of terminology:

Expression
A data structure which conceptually consists of atoms and conses (possibly meant for evaluation in which case the expression is a form). In contrast to the usual definition, expressions processed by this library may be represented as arbitrary objects which are interpreted as atoms and conses according to rules defined by the client using the library. That said, processing expressions using the usual interpretation of atoms and conses is of course supported. Cons expressions can contain other expressions which are sub-expressions of the respective containing expression.
Syntax Description
An object which describes the syntax of one kind of expression. For example, there is a syntax description for the special operator cl:let which describes (the infinite set) of well-formed let expressions. Besides recognizing well-formed expressions of a certain kind, a syntax description also contains descriptions of the parts that can appear within well-formed expressions of that kind. For example, for cl:let, the parts are
  • zero or more bindings, in turn consisting of
    • a variable name
    • an optional initial value form
  • zero or more declarations
  • and zero or more body forms.

More precisely, a syntax description consists of a name, a list of parts and a parser which consumes expressions conforming to the described syntax and produces a parse result tree.

Part
An object which describes a sub-structure of the syntax described by a particular syntax description. For example, the syntax description for cl:let has a part for declaration expressions contained in let expressions.

More precisely, a part consists of a name, a cardinality, which is either one, zero-or-one, one-or-more or zero-or-more, and an evaluation which is roughly “evaluated”, “not evaluated” or “composed of evaluated and non-evaluated parts”.

For example, cl:defun has the following parts

NameCardinalityEvaluation
nameonenot evaluated
lambda listonecomposed of evaluated and non-evaluated parts
documentation stringzero-or-onenot evaluated
declarationzero-or-morenot evaluated
body formszero-or-moreevaluated

The “composed of evaluated and non-evaluated parts” evaluation of the lambda list is due to the fact that lambda lists contain unevaluated variable names and lambda list keywords as well as, possibly, evaluated initialization forms for optional and keyword parameters and &aux variables.

Result Tree
A tree that is result of parsing an input expression. Each node in the tree corresponds to a sub-expression of the input expression (multiple nodes may correspond to the same sub-expression). A node is characterized by its kind, its initargs and its relations to other nodes.

The result tree could possibly be called an Abstract Syntax Tree (AST), but we avoid that term because the result directly reflects the structure of the input expression.

We may call a result tree a partial result tree if it contains unparsed sub-expressions of the input expression in some of its leaf nodes.

Tutorial

The most common use of this library probably is turning a given expression into an AST. This process happens in multiple steps

  1. Determine an appropriate syntax description for parsing the expression. For example, the expression (locally (declare …) 1 (+ a b) 3) must be parsed using the syntax description for the special operator cl:locally.
  2. Apply the obtained syntax description in conjunction with a parse result builder to obtain a partial (see 3.) AST for the expression.
  3. Optionally parse evaluated sub-expressions recursively. In the above example (declare …) is a sub-expression that is not evaluated while 1, (+ a b) and 3 are sub-expressions that are evaluated. The latter are not automatically parsed and thus must be recursively processed in the way described here in order to obtain a fully parsed AST. A complete AST can generally only be produced by consulting an environment as well as interleaving parsing with macroexpansion and is therefore out of scope for this library.

The following code performs steps 1. and 2. and prints the resulting (partially parsed) AST in a human-readable form. Note how the list builder of the architecture.builder-protocol system is passed in the parse call and later used to destructure the AST node node by calling the functions node-relations and node-relation.

(let* ((expression '(defun foo (a &optional (b 2))
                      (declare (type integer a))
                      (declare (type integer b))
                      (format t "~S" a)
                      (list a b)))
       (syntax     (s-expression-syntax:find-syntax 'defun))
       ;; Alternatively, determine the appropriate syntax description
       ;; for EXPRESSION automatically:
       ;; (syntax     (s-expression-syntax:classify t expression))
       (builder    'list)
       (node       (s-expression-syntax:parse builder syntax expression)))
  (flet ((describe-sub-expression (sub-expression relation-args)
           (format t "~2@T-> ~S~%~
                      ~2@T   evaluation: ~S~%"
                   sub-expression (getf relation-args :evaluation))))
   (loop :for relation    :in (architecture.builder-protocol:node-relations builder node)
         :for part-name   = (find-symbol (symbol-name (first relation))
                                         (find-package "S-EXPRESSION-SYNTAX"))
         :for part        = (s-expression-syntax:find-part part-name syntax)
         :for cardinality = (s-expression-syntax:cardinality part)
         :for (sub-expression evaluation)
            = (multiple-value-list (architecture.builder-protocol:node-relation
                                    builder relation node))
         :do  (format t "~A (~A)~%" part-name cardinality)
              (ecase (s-expression-syntax:cardinality part)
                ((1) (describe-sub-expression sub-expression evaluation))
                ((*) (mapc #'describe-sub-expression sub-expression evaluation))))))

Evaluating the code results in the following output which illustrates the four parts of the defun expression: name, lambda-list, declaration and form. The latter two have a cardinality of *, so multiple child nodes may be related to the parent node through the relation in question. In this example, both relations contain two child nodes: two declarations and two body forms.

NAME (1)
  -> (:FUNCTION-NAME NIL :NAME FOO :SOURCE FOO)
     evaluation: NIL
LAMBDA-LIST (1)
  -> (:ORDINARY-LAMBDA-LIST
      ((:REQUIRED . *)
       (((:REQUIRED-PARAMETER
          ((:NAME . 1)
           (((:VARIABLE-NAME NIL :NAME A :SOURCE A) :EVALUATION NIL)))
          :SOURCE A)))
       (:OPTIONAL . *)
       (((:OPTIONAL-PARAMETER
          ((:NAME . 1) (((:VARIABLE-NAME NIL :NAME B :SOURCE B)))
           (:DEFAULT . 1)
           (((:UNPARSED NIL :EXPRESSION 2 :CONTEXT :FORM :SOURCE 2) :EVALUATION
             T)))
          :SOURCE (B 2))
         :EVALUATION :COMPOUND)))
      :SOURCE (A &OPTIONAL (B 2)))
     evaluation: :COMPOUND
DECLARATION (*)
  -> (:DECLARATION
      ((:ARGUMENT . *)
       (((:ATOMIC-TYPE-SPECIFIER
          ((:NAME . 1) (((:TYPE-NAME NIL :NAME INTEGER :SOURCE INTEGER))))
          :SOURCE INTEGER))
        ((:VARIABLE-NAME NIL :NAME A :SOURCE A))))
      :KIND TYPE :SOURCE (TYPE INTEGER A))
     evaluation: NIL
  -> (:DECLARATION
      ((:ARGUMENT . *)
       (((:ATOMIC-TYPE-SPECIFIER
          ((:NAME . 1) (((:TYPE-NAME NIL :NAME INTEGER :SOURCE INTEGER))))
          :SOURCE INTEGER))
        ((:VARIABLE-NAME NIL :NAME B :SOURCE B))))
      :KIND TYPE :SOURCE (TYPE INTEGER B))
     evaluation: NIL
FORM (*)
  -> (:UNPARSED NIL :EXPRESSION (FORMAT T "~S" A) :CONTEXT :FORM :SOURCE
      (FORMAT T "~S" A))
     evaluation: T
  -> (:UNPARSED NIL :EXPRESSION (LIST A B) :CONTEXT :FORM :SOURCE (LIST A B))
     evaluation: T

We can also focus on the overall tree structure and print the (partially parsed) AST as a tree. The following code again uses the architecture.builder-protocol system to destructure the AST, this time as part of a generic tree printer.

(let* ((expression '(defun foo (a &optional (b 2))
                      (declare (type integer a))
                      (declare (type integer b))
                      (format t "~S" a)
                      (list a b)))
       (builder    'list)
       (node       (s-expression-syntax:parse builder t expression)))
  (let ((*print-case* :downcase))
    (architecture.builder-protocol.print-tree:print-tree
     builder node *standard-output*)))

Note the unparsed leafs indicated by the unparsed node kind.

defun
│ source: (defun foo (a &optional (b 2))
│           (declare (type integer a))
│           (declare (type integer b))
│           (format t "~S" a)
│           (list a b))
├─name: function-name
│   name: foo
│   source: foo
├─lambda-list: ordinary-lambda-list
│ │ source: (a &optional (b 2))
│ ├─required: required-parameter
│ │ │ source: a
│ │ └─name: variable-name
│ │     name: a
│ │     source: a
│ └─optional: optional-parameter
│   │ source: (b 2)
│   ├─name: variable-name
│   │   name: b
│   │   source: b
│   └─default: unparsed
│       expression: 2
│       context: :form
│       source: 2
├─declaration: declaration
│ │ kind: type
│ │ source: (type integer a)
│ ├─argument: atomic-type-specifier
│ │ │ source: integer
│ │ └─name: type-name
│ │     name: integer
│ │     source: integer
│ └─argument: variable-name
│     name: a
│     source: a
├─declaration: declaration
│ │ kind: type
│ │ source: (type integer b)
│ ├─argument: atomic-type-specifier
│ │ │ source: integer
│ │ └─name: type-name
│ │     name: integer
│ │     source: integer
│ └─argument: variable-name
│     name: b
│     source: b
├─form: unparsed
│   expression: (format t "~S" a)
│   context: :form
│   source: (format t "~S" a)
└─form: unparsed
    expression: (list a b)
    context: :form
    source: (list a b)

Dictionary

Syntax Description Protocol

find-syntax NAME &KEY IF-DOES-NOT-EXIST

Return the syntax description named NAME, if any.

IF-DOES-NOT-EXIST controls the behavior in case a syntax description
named NAME does not exist. The following values are allowed:

#'ERROR

Signal an error if a syntax description named NAME does not exist.

OBJECT

Return OBJECT if a syntax description named NAME does not exist.
(setf find-syntax) NEW-VALUE NAME &KEY IF-DOES-NOT-EXIST

Set the syntax description associated with NAME to NEW-VALUE.

An existing association for NAME, if any, is replaced.

IF-DOES-NOT-EXISTS is accepted for parity with FIND-SYNTAX but
ignored.
ensure-syntax NAME CLASS &REST INITARGS

Associate NAME with a syntax description based on CLASS and INITARGS.

Return the new or updated syntax description object associated with
NAME.

If the database of syntax descriptions already contains a syntax
description for NAME, the existing syntax description object is
reinitialized with INITARGS.

If the database of syntax descriptions does not contain a syntax
description for NAME, a new association is created by making an
instance of CLASS, initializing it with INITARGS and registering the
new object for NAME.

Parser Protocol

classify CLIENT EXPRESSION

Classify EXPRESSION, possibly according to specialized behavior of CLIENT.

Return a syntax description object that roughly reflects the kind of
EXPRESSION. Note that a precise classification would have to take into
account aspects beyond the syntax, such as the environment, to, for
example, distinguish function and macro application or variable
references and symbol macro applications. It is always possible to
find an appropriate syntax description:

+ If EXPRESSION is a special form, this function returns the syntax
  description for the corresponding special operator.

+ If EXPRESSION is an application of a standard macro, this function
  returns the syntax description for that macro.

+ If EXPRESSION a list not covered by the above cases, this function
  returns the syntax description for a generic (that is, function or
  macro) application. Note that this case also covers invalid
  applications such as (1 2 3).

+ If EXPRESSION is a symbol but not a keyword, this function returns a
  syntax description for a variable reference.

+ If EXPRESSION is any object that is not covered by the above cases,
  this function returns a syntax description for a self-evaluating
  object.
parse CLIENT SYNTAX EXPRESSION

Parse EXPRESSION according to SYNTAX, possibly specialized to CLIENT.

SYNTAX is a designator for a syntax description:

+ If SYNTAX is `t', `classify' is applied to CLIENT and EXPRESSION to
  obtain an appropriate syntax description object.

+ If SYNTAX is any other symbol, `find-syntax' is called to obtain the
  syntax description named by SYNTAX. An error is signaled if SYNTAX
  does not name a syntax description.

+ Otherwise SYNTAX must be a syntax description object.

EXPRESSION is either one of the kinds of expressions that make up
Common Lisp programs (such as forms, type specifiers and declarations)
or a particular non-standard representation of such expressions which
is specific to CLIENT. For example, a client may choose to represent
every sub-expression contained in an expression as a standard object
in order to store additional information. If CLIENT employs such a
non-standard representation, the protocol named by symbols exported
from the `s-expression-syntax.expression-grammar' package has to be
implemented by defining appropriate methods.

If EXPRESSION does not conform to the syntax described by SYNTAX, an
error of type `invalid-syntax-error' is signaled.

If EXPRESSION does conform to the syntax described by SYNTAX, a parse
result that associates the parts of SYNTAX with sub-expressions of
EXPRESSION is returned. The type and structure of the return value
depends on CLIENT as the parse result is constructed using the builder
protocol with CLIENT as the builder.

About

Parse CL syntactic constructs in s-expression form (possibly represented as e.g. CSTs) and construct syntax trees

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published