Skip to content

pcase-powered position extractor and Lisp highlighter

Notifications You must be signed in to change notification settings

gatchapin2/pospcase

Repository files navigation

Pospcase is a pcase-powered position extractor for Emacs

./code-image.png

About pospcase

pospcase is a Emacs Lisp package for extracting S-expression positions using pcase pattern matching capability.

The package also contains a pretty advanced code highlighter for lisp-mode and emacs-lisp-mode, fully extensible by using pcase patterns.

Introduction

Emacs’s font-locking (highlighting) heavily relies on regular expression (regexp).

There’s already an attempt to do regexp based extra highlighting for Lisp and Emacs Lisp.

When you read the code from the link, let’s say lisp-extra-font-lock-match-let, a anchored-matcher for the keyword let. (BTW I use the word submatcher in this package, but I really means anchored-matcher). you’ll notice parsing S-expressions with regexp is inherently awkward and error-prone (you also notice in-between comments is not supported).

To reliably parse S-expressions with regexp, you basically have to call syntax-ppss and forward-comment everywhere in pattern matching part of your code. The resulting code is quite unreadable hybrid of regexp matchings, constant syntax-table checkings, and numerous corner-case coverages.

Here’s an example code for regexp based S-expression parsing:

;;; match and collect all symbol positions within the same nested level
(let ((current-level (car (syntax-ppss)))
      (symbol-pat "\\_<\\(?:\\sw\\|\\s_\\)+\\_>")
      (skip-pat "\\(?:\\sw\\|\\s_\\|\\s \\)"))

  (if (zerop current-level)
      (error "The point is in the top level.")
    (backward-up-list)
    ;; skip the starting non-symbol chars
    (skip-chars-forward skip-pat)

    (cl-loop
     with result

     ;; return if the task ended or the `point' is out of scope
     when (or (eobp) (< (car (syntax-ppss)) current-level)) return result

     ;; actual matching
     do (progn
          (forward-comment most-positive-fixnum)
          (looking-at symbol-pat))

     ;; collect matched symbol's range in a cons pair
     if (= (car (syntax-ppss)) current-level)
     do (setq result (cons (cons (match-beginning 0) (match-end 0)) result))

     ;; skip uninteresting characters
     do (progn
          (goto-char (match-end 0))
          (skip-chars-forward skip-pat)))))

Next a code with nearly identical result but using pospcase-read, the reader function from this package:

;;; collect all symbol positions within the same nested level
(if (zerop (car (syntax-ppss)))
    (error "The point is in the top level.")
  (backward-up-list)

  (cl-loop for pair in (car (pospcase-read (point))) ; `car' because `cdr' contains
                                                     ; position data of entire list
           if (symbolp (car pair)) collect (cdr pair)))

Of course I prefer the later code much more.

In addition, Emacs recently got a nice pattern matching macro pcase. Nowadays You can easily match and destructure any S-expression whenever you want.

So I decided it’s a good time to write a pcase-based position extractor and use it for a Lisp code highlighter.

How to Enable Code Highlighting

To activate pospcase-font-lock for lisp-mode and emacs-lisp-mode, add the following code to init file:

(add-hook 'after-init-hook
          (lambda ()
            (add-to-list 'load-path
                         "/path/to/pospcase") ; change it appropriately
            (require 'pospcase-font-lock)
            (pospcase-font-lock-lisp-setup)))

The above code add pospcase-font-lock-lisp-keywords~add to the said two lisp major-modes’ hooks. The highlighting will be activated for every future Emacs session. But if you want to activate it right now type: M-x load-library RET pospcase-font-lock then M-x pospcase-font-lock-lisp-setup, lastly reactivate the major-mode of each buffer you want to be highlighted, for example M-x emacs-lisp-mode if the current buffer is Emacs Lisp code.

Side note:

Currently pospcase-font-lock doesn’t have minor-mode unlike lisp-extra-font-lock. The reason is I designed the font-lock specific functions so I can easily making buffer-local keywords. Then I can simultaneously work on different codebases in a single Emacs session with different highlighting rules with the same keywords for each codebases.

It may never happened to you, but I’ve actually seen two codebases define defun* by themselves with completely different meanings with different argument styles (one of them enforce type check with -> (dash-greater-than) as the infix).

If you want to enforce the same highlighting rule to a single major-mode, minor-mode is the best way to do it. But your workflow involves numerous per-project highlighting rules, I feel there’s not much point using minor-mode for it.

But, by all means I don’t oppose writing your own minor-mode if you find it useful. You can write it by simply using pospcase-font-lock-lisp-keywords-add and pospcase-font-lock-lisp-keywords-remove to for toggling them.

The First Example

pospcase is written because I wanted easier way to add new highlighting rules to fully embrace Lisp’s ultimate flexibility. The function pospcase-font-lock is specifically written for that purpose.

Suppose I want to add a highlight rule for a new Common Lisp macro mydefun, I can simply write like this:

(pospcase-font-lock
 'lisp-mode                             ; major-mode name

 '(`(mydefun ,name ,args . ,_))         ; `pcase' pattern to match

 ;; font specs
 '((heading-keyword .
                    (font-lock-keyword-face)) ; face of `mydefun' keyword

   (name
    . (font-lock-function-name-face))   ; face of new function `name’

   ((args . list/1)                     ; `args' is arbitrary length
                                        ; list of symbols and lists.

    . (font-lock-variable-name-face)))) ; face of every argument

Hopefully it’s straightforward enough for you. The most foreign part is list/1. To understand what it is, You have to understand Submatchers. But I’ll explain with more details later.

The symbol heading-keyword appeared out of nowhere. It’s because internally this pattern is translated to:

`((and 'mydefun heading-keyword) ,name ,args . ,_)

and heading-keyword is automatically assigned. The pattern is so recurring (and arguably less readable), I took the liberty and made it to a shortcut notation.

An Example Work Flow

Manual Editing

We are going to use $HOME/.emacs.d/pospcase-user.el to store custom highlighting rules. (If you don’t like it, change the following code appropriately)

In pospcase-user.el, insert the following code:

-*- lexical-binding: t -*-
(require 'pospcase)

(let ((user-file load-file-name))
  (defun my-add-new-font-lock-keyword ()
    (interactive)
    (let* ((str
            (format "
(pospcase-font-lock
 '%s
 '(`(foo ,bar ,baz . ,_))
 '((heading-keyword . (font-lock-keyword-face))
   (bar . (font-lock-function-name-face))
   ((baz . list/1) . (font-lock-variable-name-face))))"
                    major-mode)))
      (find-file user-file)
      (goto-char (point-max))
      (insert str "\n")
      (backward-char (- (1+ (length str)) (string-match "foo" str))))))

From now on, whenever you encounter a new keyword which needs extra highlighting for maximum readability, you can just M-x my-add-new-font-lock-keyword and start writing a new keyword right away from the convenient cookie cutter (you can also write a new snippet for Yasnippet if it’s your style).

If you are satisfied with the new keyword, save the buffer, C-M-x or C-x C-e or whatever command you use to evaluate the code, then go back to your project and reactivate the major mode, for example M-x lisp-mode for a Common Lisp buffer. Now you see the new font-lock rule is applied and the code is highlighted accordingly.

Buffer-Local Keywords

Lisp’s flexibility sometimes causes unfortunate accidents that two people to choose the exact same keyword for complete different purpose in their own codebases. Two different definitions means two different highlighting rules. You need buffer-local keyword rules for them.

For example, ASDF package system for Common Lisp defines defun* and use it internally. To highlight the keyword you wrap your pospcase-font-lock statement like this:

(add-hook
 'lisp-mode-hook
 (lambda ()
   (when (and (buffer-file-name)
              (equal (file-name-nondirectory (buffer-file-name)) "asdf.lisp"))
 
     (pospcase-font-lock 'lisp-mode
                         '(`(defun* ,name ,args . ,_)
                           `(defgeneric* ,name ,args . ,_))
                         '((heading-keyword . (font-lock-keyword-face))
                           (name .
                                 (font-lock-function-name-face))
                           ((args . list/1)
                            .
                            ((pospcase-font-lock-variable-face-form
                              (match-string 3)))))
                         t))))          ; buffer-local-p

Writing a predicate for detecting which codebase the file belongs is sometimes tricky and I can’t provide universal solution for the problem. So be creative and invent your own method for codebase detection if your use case is more complex than the above one.

Add-Form

All I’ve done in the previous section can be achieved without touching Emacs Lisp code, by using the command pospcase-addform. It doesn’t completely free you from code tweaking but at least it gives you fancy forms to fill.

To use the command, run M-x load-library RET pospcase-addform or evaluate the following code:

(require 'pospcase-addform)

Then M-x pospcase-addform.

./addform-image.png

If you read the previous section surely you can guess what each entity of the forms represents. Add/delete/edit them accordingly to make your own highlighting rule.

Once done, click Accept button. It automatically generate identical code to what you wrote in the previous section in the user config file. If you see no problem with the code, evaluate it then reactivate the major-mode to apply the new highlighting rule.

I like writing Lisp code. Manually writing font-lock rule in my config file has never bothered me. But sometimes tabbing through and tinkering the form entries feels good. So here it is, if you like this style of editing, pospcase-addform is here for you.

A minor inconvenience is you can’t comment your code within the forms. You have to add it manually to the user config file later.

Before Writing Your Own Font-Lock Keywords

Unfortunately current pospcase-font-lock design doesn’t simply allow you to write a pcase patterns then Emacs instantly highlight the matching code section for you. There’s something you need to know before writing your own font-lock keywords.

This is largely due to my design decision to keep the implementation as straightforward as possible even at the expense of introducing a new concept and shoving it at the user’s face.

So please bear with me and read the following subsections.

Submatchers

The macro pcase, which pospcase is heavily depending on, is not particularly designed for pattern-matching arbitrary length S-expression. But we exactly want that feature in our use case. Obvious example is argument list for function declaration. To overcome the limitation, you have to choose appropriate submatchers for each arbitrary length list. So far, the following submatchers are implemented.

  • list/2
  • list/1
  • flet
  • destructuring
  • macrolet
  • defstruct (list/1)
  • parameter
  • loop (destructuring variant)
  • loop-and
  • setq

If you are a skilled programmer, maybe you can just skim the actual keyword declarations in pospcase-font-lock-lisp-setup and build your own keyword without any prior information. But I’m going to explain each of them bellow.

list/2

If you pair a pcase pattern variable with list/2 in the specs of pospcase-font-lock like this:

(args . list/2)

It means args is a arbitrary length list of either symbols or strictly two length list. Like argument list of defmethod:

(defmethod foo
  ((bar class1) (baz class2) qux quux)  ; <- this list
  body)

Yes, the strange /2 (slash two) at the end of name is added to indicate list/2 matches two length list (notation is stolen from function arity notation).

list/1

This submatcher is almost the same as list/2 (also implementation is almost identical too). But unlike list/2 it matches the first element of arbitrary length list. Like:

(foo (bar) (baz qux) (quux meow woof) (oink quak quaak quaaak))

What happens when you want to match to exact one length list? Well, I haven yet encounter the said use case, so I narrow-mindedly named it list/1 for ease of typing and readability. Should I rename it like list/1* (asterisk for arbitrary length)? Let me know how you feel.

flet

This submatcher, as the name indicates, is designed for matching function list of flet.

The car and cadr of of each list. The cadr part matches arbitrary length list. When a element of the arbitrary list is also a list, it matches only the first like list/1. Like:

((foo (bar) body)
 (baz (qux quux) body)
 (meow (woof (oink quak)) body))

destructuring

This submatcher matches every symbol of a list no matter how deeply nest they are. Like:

(foo (bar (baz) ((qux))) (((quux))))

Of course it’s used for matching destructuring-bind and defmacro, etc.

macrolet

Similar to flet. But the cadr part is the same as destructuring.

defstruct (list/1)

I was surprized when I realized I have to implement a submatcher specifically for defstruct. The uniqueness comes from the layout defstruct keyword and a docstring is placed before the slots of the defined structure. As you know it, docstring is optional and submatcher manually have to check whether current pattern has docstring or not. Then set a fence to ignore the unnecessary heading S-expressions from the matches.

Hopefully you don’t need to touch this submatcher for your highlighting needs and have no occasion to deal with more strange syntax in the wild.

parameter

The most tricky syntax is the parameter keywords for argument list (or lambda list). They change following semantic meaning and therefore highlighting rules when they appear in the middle of a list. The most notorious example is destructuring-bind like:

(let ((meow 1) (woof 2))
  (destructuring-bind (foo (bar
                          &key (baz meow) (quux woof)))
    '(1 (2 :quux 4 :baz 3))

  (list foo bar baz quux)))

It matches in-middle keyword appearance, then overwrite the original highlighting rule. So it’s very sensitive to the declaration order of font-lock keywords.

I really hope you don’t have to touch this submatcher at all.

(Secret note: this pattern is not even pcase pattern. So declaration is irregular or non-existent too. They are just a list of keywords. Don’t try to see them as patterns in case you really need to use this submatcher.)

loop (destructuring variant)

This submatcher is for highlighting variables inside the notorious loop macro of Common Lisp. As it may sound crazy, the loop macro does destructuring by default when assigning a local variable. So, more than simple regexp based highlighting is needed.

I’m sure you don’t have to touch this submatcher in your lifetime. (Unless you are implementing loop macro version 2.0 or something).

loop-and

A variant of loop submatcher made just to support the loop macro’s and keyword.

setq

This submatcher is for highlighting variable names (plain symbol names only) assigned by setq and setf. As you know it, setq and setf allow indefinite length of variable-value pair for assignments within a single evaluation like:

(setf foo 1
      bar 2
      baz 3)

Technically speaking, every even symbols in the list is going to be matched.

I suppose this submatcher is going to be used only by setq and setf.

Why Are My Patterns Not Supported?

Dirty secret of pospcase-font-lock is it uses regexp search for heading keyword. For example from the pattern:

`(defun ,name ,args . ,_)

The keyword builder function pospcase-font-lock-build extracts the heading keyword defun and constructs a regexp pattern string:

"\\(?:(defun\\)\\_>\\s *"

Technically it’s possible to search and jump within a buffer using pcase patterns. But I fear it’s going to be very costly.

Currently I have no plan to switch from regexp based heading keyword search. But it also means you have to write pcase pattern to regexp pattern string translator by yourself if you want to use some specific pcase pattern for heading keyword.

Please register your pattern using M-x customize-variable RET pospcase-stringify-heading-keyword-cases.

Sorry for the inconvenience.

Appendix: Technical Titbits

Data flow

Submatchers call pospcase-at and pospcase-read to parse S-expression and get positional metadata.

pospcase-at returns cons cells in (start . end).

pospcase-read returns S-expression tree with each node with cons cell in (sexp . (start . end))

Submatchers either manually collect (start . end) pairs of interest or call pospcase, pospcase-at or pospcase-read repeatedly on each start position (car (start . end)) of interested S-expression and collect the result.

Structure the collected (start . end) pairs in pospcase--matches suitable for pospcase–iterator consumption like this:

(((start . end)              ; (match-string 1) of first (match-data)
  (start . end))             ; (match-string 2) of first (match-data)

 ((start . end)              ; (match-string 1) of second (match-data)
  (start . end)))            ; (match-string 2) of second (match-data)

pospcase–iterator set car of pospcase--matches to match-data using set-match-data.

Quirks of Pospcase Font Lock

Iterator

Admittedly, pospcase-font-lock do something very weird. Here, I’m talking about submatchers. As you can see all of them calls pospcase--call-iterator macro. True to its name, the macro realize the behavior of the iterator pattern (very crudely using a global variable pospcase--matches as the place holder for pre-collected data.) I’m not very please with the implementation either. But I think making lambda functions dynamically for each iterators, managing and dispatching them correct for each call, is far complexer than current implementation. And ultimately Emacs’s font-lock (and jit-lock) is single-threaded. So I decided it doesn’t worth the trouble to implement proper iterator.

You may ask why do you have to implement iterator in the first place? Well, clearly Emacs’ font-lock.el was written with regexp-based crawler like behavior in mind. So font-lock-add-keywords was designed accordingly. Lazy me just don’t want to reimplement everything from scratch. Obviously I’m misusing them. And this is why pospcase-font-lock needs its weird iterator.

Emacs-Lisp-fy

The thing is, Emacs Lisp doesn’t have reader macro. In pospcase context it means you can’t really use Emacs’s build-in reader read-from-string to parse Common Lisp’s S-expressions.

To circumvent and not really tackle the limitation, pospcase--read-from-string does quick hack using regexp to convert unless unparsable S-expressions into Emacs Lisp counterpart as smoothly as possible.

It’s simple text replacement rule. So don’t expect too much. If you experience a major problem you can’t think any way to circumvent, well, accept it as unparsable and give up the fancy highlighting for that section.

Secretly using Regular Expression

pospcase-font-lock totally depends on pcase. But it still use regexp for searching heading keywords. The reason why I don’t use something like el-search is I fear further degeneration of performance. And I feel it’s overkill.

So far I have no use case for in-middle keyword matching. So it’s not implemented. Purposely pospcase-font-lock only supports heading keyword patterns.

Multiple Submatcher

See defclass syntax:

`(defclass ,name ,supers ,slots . ,_)

Where supers is a list of super-classes, and slots is a list of class’s variables.

pospcase-font-lock internally generates two keywords for defclass equivalent of declaring the following:

(pospcase-font-lock 'lisp-mode
                    '(`(defclass ,name ,supers ,slots . ,_))
                    '((heading-keyword . (font-lock-keyword-face))
                      (name . (font-lock-type-face))
                      ((slots . list/1) . (font-lock-variable-name-face))))

(pospcase-font-lock 'lisp-mode
                    '(`(defclass ,name ,supers ,slots . ,_))
                    '((heading-keyword . (font-lock-keyword-face))
                      (name . (font-lock-type-face))
                      ((supers . list/1) . (font-lock-type-face))))

Note pospcase-font-lock adds new keyword at the start of a keyword list. In other word, the last added keyword will be highlighted first and the highlighted text’s property fontified is set to t. And since keywords are internally processed with append flag, the below highlighting is not going to be overwritten by the following highlighting rules.

About

pcase-powered position extractor and Lisp highlighter

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages