NYC Lisp Elisp talk
Clone this wiki locally
2012 Lisp NYC Talk -- Scaling Elisp Code
- Development Requirements
- The Linking Problem
Thanks for coming, and thanks to Heow, Lisp NYC organizer, for inviting me. Thanks also to Brian and
meetup.com for offering this wonderful space.
I don't often give talks, but when I do it is generally on something important to me. Here, this talk is a personal style of Software development that I have come to use in Emacs Lisp. This has made making large Emacs Lisp programs tractable for me.
The name "Large-scale Software Development" may sound ominous. Don't let it put you off. It isn't all that difficult. What I hope to show is that really it's about writing little pieces and having a lego-like way to build them up (or "scale" them) to something large.
Feel free to stop me along the way to offer differing views, tools or thoughts. I wasn't planning on discussing much of the internals or even the user-interface of the large program I am using as an example. But at the end I'm happy to delve more in into that for those interested.
Five or so years back, I was exposed to this framework called "Ruby on Rails". A framework is a little bit like a religion: it makes some things easy to do if you go along with it. (And conversely if you don't, you can take a lot of flack from the community for not doing so--think Galileo.)
One of the tenets in Ruby on Rails is a rigorous approach to testing. I will go over how I do this in Emacs Lisp. Also important to me is building large programs from littler modular smaller ones.
The overall plan is to show a little demo of a large piece of software in Emacs Lisp. Then I want to go a little into the development process, I use.
So first let me show the code in action as a year and a half ago.
Now that you have some sense of what I'm working on let me totally switch to the other end of my spectrum -- the development environment I work in.
Here are the things that are important to me.
- Of course, I use GNU Emacs.
- I need to be able to break this program into small chunks or
- Implication: there may be many files.
- I need to be able to run and debug each module in isolation.
- Implication: each module needs to have enough information to pull in whatever other modules it needs
- I want to reduce overhead in the development cycle. Implications:
- This means not requiring "compilation" or "link" steps
- it means I don't want to have to "install" code to try internal modules I am interested in.
- I may have an "installed" version and a "development" version and I want to be able to run the "development" code with little overhead.
- I need to be able to test each module in isolation. Test modules
are still modules, so see 1..4 above. Implications:
- Because there are many files there are many tests
- Tests need to be able to be run interactively
- I need to be able to run all the tests in batch.
Sounds reasonable? Does this match your expectations?
The Linking Problem
Internal versus External
A (large) Emacs Lisp program uses modules from many places. Some modules reside inside the project and some reside outside. Furthermore there is a certain fluidity here: one may start with a module residing inside the project and then later decide to make it an independent project.
Example: in the debugger front-end project, a module maintaining a circular ring of locations and initial line numbers of positions we have stopped at.
It so happens that something similar was recognized a long time ago in C, where I think they got it right. Consider the difference between:
which can also be written as:
which could legally be written as
but in practice is never written that way; it would be frowned on if it were.
The first #include instructs a compiler to look in the filesystem relative to where you are now (if the path isn't absolute) while the second says to consider an "include" path. Semantically, the first is used for referring within a project while the latter generally refers to headers outside the project. What's true for headers in C, is also
true for modules in Ruby 1.9 and greater. There is something called
require_relative which works like the two kinds of
#includes above. In Perl, there is the CPAN module called rlib.
Load path is evil
In Emacs Lisp, the thing that is like Ruby's $LOAD_PATH global variable in Ruby is called, well,
load-path. (Ruby does borrow from Lisp.) But I think
load-path is evil. Go into GNU Emacs and look at your
load-path. Here's mine:
("/home/rocky/.emacs.d/elpa/epresent-0.1/" "/home/rocky/.emacs.d/elpa/haml-mode-3.0.14/" "/home/rocky/.emacs.d/elpa/rvm-1.1/" "/home/rocky/.emacs.d/elpa/test-case-mode-0.1/" "/home/rocky/.emacs.d/elpa/fringe-helper-0.1.1/" "/home/rocky/.emacs.d/elpa/yaml-mode-0.0.5/" "/usr/share/emacs/site-lisp/ruby1.8-elisp" "/usr/share/emacs/site-lisp/tuareg" "/home/rocky/elisp" "/usr/share/emacs-snapshot/site-lisp/remake" "/usr/share/emacs/site-lisp/mgp/" "/usr/share/emacs/23.1.50/site-lisp/auctex" "/usr/share/emacs-snapshot/site-lisp/anthy" "/usr/share/emacs-snapshot/site-lisp/mailcrypt" "/usr/share/emacs/site-lisp/autoconf" "/usr/share/emacs-snapshot/site-lisp/auctex" "/usr/share/emacs/site-lisp/auctex" "/usr/share/emacs/site-lisp/mailcrypt" "/usr/share/emacs-snapshot/site-lisp/ocaml-mode" "/etc/emacs-snapshot" "/etc/emacs" "/usr/local/share/emacs/23.1.50/site-lisp" "/usr/local/share/emacs/site-lisp" "/usr/local/share/emacs/site-lisp/dbgr" "/usr/local/share/emacs/site-lisp/dbgr/common" "/usr/local/share/emacs/site-lisp/dbgr/debugger" "/usr/local/share/emacs/site-lisp/dbgr/lang" "/usr/local/share/emacs/site-lisp/dbgr/common/buffer" "/usr/local/share/emacs/site-lisp/dbgr/common/init" "/usr/local/share/emacs/site-lisp/dbgr/debugger/bashdb" "/usr/local/share/emacs/site-lisp/dbgr/debugger/gdb" "/usr/local/share/emacs/site-lisp/dbgr/debugger/kshdb" "/usr/local/share/emacs/site-lisp/dbgr/debugger/perldb" "/usr/local/share/emacs/site-lisp/dbgr/debugger/pydbgr" "/usr/local/share/emacs/site-lisp/dbgr/debugger/rdebug" "/usr/local/share/emacs/site-lisp/dbgr/debugger/remake" "/usr/local/share/emacs/site-lisp/dbgr/debugger/trepan" "/usr/local/share/emacs/site-lisp/dbgr/debugger/trepan.pl" "/usr/local/share/emacs/site-lisp/dbgr/debugger/trepan8" "/usr/local/share/emacs/site-lisp/dbgr/debugger/trepanpl" "/usr/local/share/emacs/site-lisp/dbgr/debugger/trepanx" "/usr/local/share/emacs/site-lisp/dbgr/debugger/zshdb" "/usr/share/emacs/23.1.50/site-lisp" "/usr/share/emacs/23.1.50/site-lisp/anthy" "/usr/share/emacs/23.1.50/site-lisp/cmake-data" "/usr/share/emacs/23.1.50/site-lisp/global" "/usr/share/emacs/23.1.50/site-lisp/mailcrypt" "/usr/share/emacs/23.1.50/site-lisp/ocaml-mode" "/usr/share/emacs/23.1.50/site-lisp/remake" "/usr/share/emacs/site-lisp" "/usr/share/emacs/23.1.50/lisp" "/usr/share/emacs/23.1.50/lisp/url" "/usr/share/emacs/23.1.50/lisp/textmodes" "/usr/share/emacs/23.1.50/lisp/progmodes" "/usr/share/emacs/23.1.50/lisp/play" "/usr/share/emacs/23.1.50/lisp/org" "/usr/share/emacs/23.1.50/lisp/nxml" "/usr/share/emacs/23.1.50/lisp/net" "/usr/share/emacs/23.1.50/lisp/mh-e" "/usr/share/emacs/23.1.50/lisp/mail" "/usr/share/emacs/23.1.50/lisp/language" "/usr/share/emacs/23.1.50/lisp/international" "/usr/share/emacs/23.1.50/lisp/gnus" "/usr/share/emacs/23.1.50/lisp/eshell" "/usr/share/emacs/23.1.50/lisp/erc" "/usr/share/emacs/23.1.50/lisp/emulation" "/usr/share/emacs/23.1.50/lisp/emacs-lisp" "/usr/share/emacs/23.1.50/lisp/calendar" "/usr/share/emacs/23.1.50/lisp/calc" "/usr/share/emacs/23.1.50/lisp/obsolete" "/usr/share/emacs/23.1.50/leim")
This is an incompressible jumble of stuff most of it I don't have a clue about. There are over 50 items in this list so the number of combinations is over 10 to 63rd power. In theory many of these combinations should result in the same behavior, but semantically they are slightly different when packages do not stay inside their namespace. GNU Emacs does not provide a package scope mechanisms to avoid a package namespace conflict.
If someone pressed me as to whether it is what I want, I'd have to give an opinion based on empirical use.
Given the complexity of
load-path, it is fragile and insecure. Again, I have to hope that name spaces are distinct. [See
math-add-abs-approx inside calc-arith.el] An Emacs exploit might be to inject a directory into
load-path; if put far enough down in the string you might not even see it printed when you eval
load-path. The default setting in Emacs stops showing you a string after so many characters. (I invite people with GNU Emacs go into
*scratch* and see if your
load-path variable is chopped off when printed.)
But also judge this in light of my requirement that I want to be able
to run a development version while there may be an installed version
load-path may be consulted in
require() when an explicit file name is not used. However
load can be given a specific file prefix and that's what I want.
allow me to do the kind of robust internal
linking I want. It uses Emacs primitive
load where I always supply a file name rather than let it search for a file using
load-path. Let me give an example from the Emacs debugger front-end project,
In its simplest form one can write:
(require 'load-relative) ; pull in (load-relative) (load-relative "my-module")
which simply issues a
my-module which is assumed to be
located inside the same directory as the file that this command comes
from. Underneath the function
__FILE__ is used. This name corresponds to the same use it has in C, Perl, and Ruby.
If you evaluate this inside an Emacs buffer, the file is the one that is associated with the buffer. You can also pass an optional symbol:
(load-relative "my-module" 'dbgr)
This just says that if you don't find "my-module" around, if you can
find the file associated with
provide 'dbgr, use the directory that
file was in as the starting point in a relative file search.
Another slight variation is
require-relative which uses underneath
require instead of
load. And finally I give an example which uses
the form I mostly use.
(require-relative-list '("../../common/regexp" "../../common/loc" "../../common/init") "dbgr-")
Note here that I give a list of relative file prefixes. (You can leave off the trailing
.elc if this is not important.
So for example, I should find
dbgr/common/regexp.el or the compiled version which ends in
Also notice that I use require rather than load. Were I to use load, loading would be much slower because the same lower-level files would be loaded again and again. For testing, forced (re)load of a file is what I want. But otherwise, if the the lower-level file has been loaded I do not want to reload it.
There are current schools of programming that suggest one writes tests or behaviors before one writes code. These schools are abbreviated TDD (Test-Driven Development) and BDD (Behavior-Driven Development).
I don't strictly follow this, but I do believe testing is very important. A number of basic test frameworks around are modeled off of one for Java called Junit.
I looked around for test frameworks for GNU Emacs. The GNU Emacs UnitTesting wiki has a list of them.
Coming back to the requirements again, I need to be able to test each module either interactively or in batch. At the time I started working on this, elr was a little deficient in running tests in batch mode. But I see now that this has been fixed. So I will eventually redo my tests.
However for this talk lemme stick with what I know best to show how testing works with my modifications to techromancy's unit test.
I currently have 39 tests files for this one project which is pretty small for my goal, given that there are about 90 Emacs Lisp files. Still I have more test files than the entire Emacs 24 project has using elr in its "automated tests".
But now let's dive into a test. The emacs-dbgr project supports quite a a few debuggers and programming languages. In order to do that, we need to be able to extract position information from the output produced in debuggers and programming languages. There are tables of regular expressions in support of this. And recall jwz's dictum:
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
In my situation, it is hard for me to imagine using anything but
some sort of regular expressions. And the unit tests make using regular expressions manageable. I'll use the test program which tests
gdb regular expressions. (I use
gdb since that is probably the most familiar debugger of any that I support.)
At the top of the file I have this:
(require 'test-unit) (load-file "../dbgr/common/buffer/command.el") (load-file "../dbgr/debugger/gdb/init.el")
The first line with
require pulls down my test code. Notice that the
second line uses a
load-file rather than some sort of
require. And also notice I specify the source code file rather than give Emacs a choice as to whether to use the
compiled version or not. Here, I always want to use the source code.
I could use
load-relative that I mentioned previously. Instead I am
using the Emacs primitive which has a restriction that the test has to
be run from the test directory. For testing, that's an okay
limitation. For general development the pattern of use is a little different.
which clears any prior testing. Next I have some initialization and each tests are put in a context block which here is tagged "regexp-gdb".
Individual tests are run using
(assert-t (numberp (loc-match text)))
and wrapped in a
(specify "basic location" (assert-t (numberp (loc-match text))))
Finally at the end of the file I run the tests with:
which when evaluated runs the tests.
Ok. So as I said I can run this inside the Emacs with
eval-current-buffer. And when I run this I get:
Running specs tagged "regexp-gdb": .... 0 problems in 4 specifications using 18 assertions. (0 seconds)
elr has slightly slicker looking output for an interactive run. But for my purposes, the above is fine. But what happens when there's a failure? Ok. Let's introduce one to see.
I remove the "g" in "beg" in the
let's run again. I get:
Running specs tagged "regexp-gdb": F 1 problem in 1 specification using 1 assertions. (0 seconds) Context: traceback location matching Specification: basic location
So I am told that the problem lies in "basic location". So now what I
want to show is how easy it is for me to smoke down the problem.
Basically I just eval lines in the file. The
are optional as are the setting of globals:
text. So look down for
basic-location and when I eval
(loc-match text) I get
nil back. If this isn't enough and I want
to debug into
loc-match() I can do that. I'll have to copy
M-x edebug-defun it.
Running tests in batch
To run all the tests I have them strung together in a GNU Makefile. I
wrote version of GNU Make that adds a
--tasks option which shows you
what the "interesting" targets are. (If you are familiar with Ruby's
rake, it has a similar flag). Let me run that to find the target.
M-x `compile` remake --tasks CTAGS ChangeLog ... check check-am check-recursive check-short # Run all tests without bloated output ...
I'll run this via GNU Make
make --check-short, and I get output like
make check-short make check 2>&1 | ruby make-check-filter.rb Running specs tagged "bp": . 0 problems in 1 specification using 1 assertions. (0 seconds) Running specs tagged "dbgr-buf-bt-pydbgr": . 0 problems in 1 specification using 8 assertions. (0 seconds) Running specs tagged "dbgr-buf-bt-rdebug": . 0 problems in 1 specification using 10 assertions. (0 seconds) Running specs tagged "dbgr-buf-bt-trepan": . 0 problems in 1 specification using 12 assertions. (0 seconds) Running specs tagged "dbgr-buf-bt-trepanx": . 0 problems in 1 specification using 11 assertions. (0 seconds) Running specs tagged "dbgr-buf-bt-zshdb": . 0 problems in 1 specification using 3 assertions. (1 seconds) Running specs tagged "dbgr-buf-bt": ... 0 problems in 3 specifications using 3 assertions. (0 seconds) Running specs tagged "dbgr-buf-cmd":
The number of dots just gives a running count of the number of specifications. Most of the specifications have one assertion.
I think I'll stop here and take questions or comments.