NYC Elisp talk

r edited this page Jun 17, 2012 · 27 revisions

2012 ELisp NYC Talk -- What makes good (Elisp) Code?


Hi -

Thanks again for coming and thanks to Anthony and Andrew for making this happen. Thanks also to google for providing for providing the space and for all the good things it has done and has continued to do. [Google summer of code]. But praising an organization tends to obscures the fact that it is made up of great individuals.

If you happened to be at the last talk I gave, I have made a couple of corrections and edits to the slides and talk text. Anthony has kindly made a video available. Down the line, I'll try to add annotations to that.

There were some good questions and some tangents we didn't explore. It was not my intention to avoid answering questions, but since they were tangents, we had to cut them off so as to be able finish the prepared stuff. Some of the things I was thinking but couldn't say lest they lead to further digressions, and corrections of some inaccuracies, will eventually appear in the annotations.

What Makes Good Elisp Code?

On the LispNY mailing list I made a comment about comparing code from two people. I read a lot of code and occasionally I come across really well-written code. But what makes Elisp code good?

A glib answer is: the same things that make code in general good. I can't explain this completely here, so for now I want to focus on short functions and methods. Sometimes good code creates its own interesting world.

Let me say here, this is my own opinion and you might not agree. I can't prove that either of the bullet items makes good code and I don't have statistical evidence either.

Let me also add that it's bad idea to get unnecessarily creative. Einstein didn't invent Relativity willy nilly. He had admiration for Newtonian Physics and gave up only after not being able to make it work.

Interesting Emacs C Source Code

Here is some of the low-level buffer code from the Emacs source buffer.c. We'll look at the function buffer-live-p() . It and other functions in this file are generally pretty short. This is one characteristic of good code.

But there's something else I think interesting. Although the code is written in C, in fact it reads pretty much like Emacs Lisp code. Notice that the function doesn't start out as others might have in C:

Lisp_Object buffer-live-p(Lisp_Object object) { ...
   add_docstring("non nil if buffer OBJECT ...");

But rather:

DEFUN("buffer-live-p", ... 1, 1, 0, doc: /* non-nil if buffer OBJECT has not been killed... */) 
(Lisp_Object object)
  return ((BUFFERP (object) && ! NILP (BVAR (XBUFFER (object), name))
      ? Qt : Qnil);

And this is another sign of good code: it sometimes creates its own little world.

DEFUN is a C macro which has to be used for C functions that are visible in Emacs Lisp. The "1, 1," has to do with indicating the minimum and maximum number of arguments. And there is a place for the document string which is a Emacs Lisp thing, but not a C thing. The macro makes it impossible to forget to include information about the minimum and maximum number of arguments or the docstring.

Is this tasteful? I think it is an effective way to ensure those Lisp characteristics get inserted. It helps bridge the gap between C and Lisp.

But also notice:

C Code Reads Like Lisp Code!

Qt is the same as 't and Qnil is the same as nil. BUFFERP(object) is roughly like Lisp (bufferp object). Here is the code translated almost verbatim into Emacs Lisp:

  (defun buffer-live-p (object)
  "Return non-nil if OBJECT is a buffer which has not been killed.
  Value is nil if OBJECT is not a buffer or if it has been killed."
    (if (and (bufferp object) (not (null (BVAR object name))))

(BVAR object name) is the only thing that can't be done in Emacs Lisp. BVAR extracts the "name" field of the Emacs Lisp structure "object". There are structs in Emacs Lisp, but object uses a custom built-in Emacs Lisp C structure which was around before structs were available in Emacs Lisp. The Emacs Lisp version of this code could be shortened a bit:

   (defun buffer-live-p (object)
   "Return non-nil if OBJECT is a buffer which has not been killed.
   Value is nil if OBJECT is not a buffer or if it has been killed."
    (and (bufferp object) (BVAR object name))

but because C is not Lisp you have to do things more along the lines of the first way, which isn't so bad.

Lisp-like code from Emacs 18.59

The above code comes from current Emacs sources. I went to the old emacs archive site to pull down the oldest version of Emacs I could find. It doesn't have buffer-live-p(), but all of the good qualities I mention are still there. Here is buffer_list() some code from buffer.c:

   DEFUN ("buffer-list", Fbuffer_list, Sbuffer_list, 0, 0, 0,
         "Return a list of all buffers.")
      return Fmapcar (Qcdr, Vbuffer_alist);

We saw the Q prefix for "Quote" before. The F prefix means "function", and "V" means "variable" .

Note that doc strings in the newer format are comments while in the older format they are strings just as in Emacs lisp.

And the next function after that get-buffer() is basically:

      return Fcdr (Fassoc (name, Vbuffer_alist));

So one wonders why write this in C at all? My guess is that getting a list of buffers or a buffer with a particular name is used a lot, so it had to be fast.

Now consider real Emacs lisp code from simple.el from that early version of Emacs. You will see there lots of short functions fundamental-mode , eval-expression, count-lines-region, count-line, and so on.

So to sum up here: Lots of little functions. In a way this mirrors what I said last time about how I prefer lots of files with each file being short.

gud-def macro

Ok. So now to gud.el. And let's start with the one macro I came across there. One of the things gud.el has to do is translate Emacs commands into a specific command for a specific debugger. For example debuggers have a command to clear a breakpoint which sends a string to a real debugger. It might be "clear" as it is in gdb or "B" in the perl debugger, "db" in xdb (although I'm not sure what that is). In order to do this gud uses a macro called gud-def(). Here is the definition for removing a breakpoint in gdb:

  (gud-def gud-remove "clear %f:%l" "\C-d" "Remove breakpoint at current line")

which expands to:

   (defalias gud-remove 
 (lambda (arg) '("Remove breakpoint at current line")
       (interactive "p")
       (if (not gud-running) (gud-call "clear %f:l" arg))))
   (local-set-key "\C-c\C-d" 'gud-remove)
   (global-set-key (vconcat gud-key-prefix "\C-d") 'gud-remove))

Question: could this be done using a function?

The actual macro is a little more complicated than what is shown here; for example you don't have to pass the second argument as a string. gud-def allows this complication, which looks like feature-itis to me. Virtually all of the uses of gud-def are like the ones that I've shown here.

I think this could be easily written as a function even with the additional (gratuitous) complexity.

Is efficiency a concern here? No, because these functions basically get created and defined once throughout the course of loading gud.

Problem with gud: gud-last-frame

Whether to use a macro for gud-def is probably a matter of taste. But gud-last-frame and gud-last-last-frame are not very good.

;; Where gud-display-frame should put the debugging arrow; a cons of
;; (filename . line-number).  This is set by the marker-filter, which scans
;; the debugger's output for indications of the current program counter.
(defvar gud-last-frame nil)

;; Used by gud-refresh, which should cause gud-display-frame to redisplay
;; the last frame, even if it's been called before and gud-last-frame has
;; been set to nil.
(defvar gud-last-last-frame nil)

Let's leave aside the fact that those comments aren't part of the doc-string for the variable. That is probably just a reflection of this being old and unmaintained code. Elsewhere you'll see a reference in comments to gud-tag-frame which doesn't exist.

Problems with gud-last-frame

Weird, if not incorrect is the suggestion that a file and line number is the same as the current program counter. And using "frame" for "file and line number" is a bit of an exaggeration. Of course, "frame" can be a programming-language term to refer to a place during execution, and this kind of object does contains a location which includes a file name and line number. But programming frames typically have more information, such a program counter, pointer to the parent frame, and access to the evaluation environment. In programming languages which provide limited access to a frame such Python and Perl, the frame also holds the name of the function you are inside of. gud's definition of "frame" is a pale imitation of what programming language frames typically hold.

Finally, "frame" in Emacs came to means something else. The only fault here was selecting a vague name. But a better term I think would have been "location"; that's what I use in emacs-dbgr. Emacs code, especially that written by rms, generally doesn't have either the exaggeration, ambiguity or weakness of thinking that is exhibited here.

No encapsulation: a bare cons node is used

Let's get back to what is needed. There is the location as reported by the debugger, and then gud needs to find a corresponding location in a buffer. But "marks" are what Emacs generally uses for locations. And although a cons node may fit two items like file and line number, one is now boxed. Suppose I want to later add in a column number? I then go to a list? Ok, but then I'd have to change a code with:

    (cdr gud-last-frame)


    (cddr gud-last-frame).

If I encapsulated this in accessors no change is needed in filename extraction.


As for gud-last-last-frame , just the name should be a hack warning. In emacs-dbgr I haven't needed to use this. I suspect that saving a previous copy of the location is a weakness of the overall logic.

Is a global variable

Another problem with variables gud-last-frame and gud-last-last-frame is that these are global variables. Even though Emacs can debug lots of buffers and run lots of processes, you can't debug more than one program at a time. Perhaps this is not a serious limitation, but it is an unnecessary one. And it is one that people have occasionally noted as undesirable.

emacs-dbgr's location structure

The solution for this is actually very simple: make the variable buffer-local to the process that is running the debugger.

In emacs-dbgr I have a circular queue or a "ring" of positions which I use for saving the most recent positions seen.

Compare this with the struct used in emacs-dbgr for location. (From dbgr/commmon/loc.el):

(defstruct dbgr-loc
"Our own location type. Even though a mark contains a
file-name (via a buffer) and a line number (via an offset), we
want to save the values that were seen/requested originally."
   num           ;; If there is a number such as a breakpoint or frame
                 ;; number associated with this location this is set.
                 ;; nil otherwise.
   column-number ;; Column offset within line
   marker        ;; Position in source code
   cmd-marker    ;; Position in command process buffer

By making a distinction between the debugger's way and Emacs's way of reporting a location, the two locations don't actually have to have the same line number. When the buffer is edited, the location automatically adjusted by Emacs.

Elisp Structure Issues

There are a few Common-Lisp structures in my code. Common Lisp allows a print hook for each structure, but Emacs Lisp doesn't support this. So I have written custom "describe" methods. Here is the one for the command buffer. [do that] Here is the one for a source buffer. [do that]

One last thing that bothered me when using structures is the awkward access. We saw the description of a debugger information put in a command buffer.

(defstruct dbgr-cmdbuf-info
   bp-list ...

To retrieve the value of the bp-list field of variable dbgr-srcbuf-info, we use:

(dbgr-cmdbuf-info-bp-list dbgr-cmdbuf-info)

The field "bp-list" is camouflaged by the dash that predcedes it. And that is an awful lot of verbiage. To set the bp-list field even more verbiage is needed.

But the only variable that is a dbgr-cmdbuf-info struct is a variable called dbgr-cmdbuf-info , an Emacs buffer-local variable. So right now I have instead:

(dbgr-sget 'cmdbuf-info 'bp)

where dbgr-sget is a macro.

Does dbgr-sget need to be a macro? Strictly speaking, I don't think so. However in contrast to gud-def accessing field inside a structure isn't done once over the course of loading the package, it's done over and over again. So here I think using a macro, and it is a simple one, makes sense. You may disagree though.


In sum, I've tried to show some of the small, simple functions that Emacs Lisp uses at the base level, even when in C, which I think is good practice. These functions also make my slide presentation simple.

Also I've tried to show specific places where we can break out of the shackles of the environment. In this talk this was accomplished by using the defun macro; in my last talk it was by internal linking.

Epilogue: Adding a new debugger

I had originally intended to show how to add a new debugger. I like the idea of having a debugger added in for this presentation. So I tried adding the stock Python debugger pdb. This was a pretty good choice since pdb deviates a little more from gdb-like things than my versions of the Python debuggers. And I'm not as familiar with it.

Overall I was pretty pleased with how quickly I hammered this out. I was also pleased by how I could take existing tests and customize them for the pdb strings and behavior. I see that this automatically checks out properly. However there was a little bit of automake and configure boilerplate that is not of much interest. I then considered doing what they do in cooking shows where they show preparing stuffing for a turkey, and skip to another turkey that has already been put on a plate and is waiting for its garnish. But I'm probably too much of a turkey to pull it off. So I'll stop here and take comments and questions.