NYC Lisp talk

R. Bernstein edited this page Feb 27, 2015 · 25 revisions
Table of Contents
##Introduction##

Hi --

Thanks for coming, and thanks to Heow for inviting me and Brian for offering this wonderful space.

I generally don't give talks, but when I do it is generally on something important to me. Here, this talk is a very personal opinion on Software development that I have come to use in Emacs Lisp, which has made making large programs tractable for me.

The name "Large-scale Software Development" may sound ominous. Don't let it put you off. It isn't all that difficult. What I hope to show is that really it's about writing little pieces and having a lego-like way to build them up (or "scale" them) to something large.

Feel free to stop me along the way to offer differing views, tools or thoughts. I wasn't planning on discussing much of the internals or even user-interface of large program I am using as an example. But at the end I'm happy to delve more in into that for those interested.

Five or so years back, I was exposed to this framework called "Ruby on Rails". A framework is a little bit like a religion: it makes some things easy to do if you go along with it. (And conversely if you don't, you can take a lot of flack for not doing so--ask Galileo.)

One of the tenants in Ruby on Rails is a rigorous approach to testing. I will go over how I do this in Emacs Lisp. Also important to me is building large programs from littler modular smaller ones.

The overall plan is to show a little demo of a large piece of software in Emacs Lisp. Then I want to go a little into the development process, I use.

So first let me show the code in action as a year and a half ago.

Show DebConf 2010 Conference Lightening Talk.

Now that you have some sense of what I'm working on let me totally switch to the other end of my spectrum -- the development environment I work in.

##Requirements##

Here are the things that are important to me.

  1. Of course, I use GNU Emacs.
  2. I need to be able to break this program into small chunks or modules.
  • Implication: there may be many files.
  1. I need to be able to run and debug each check in isolation.
  • Implication: each module needs to have enough information to pull in whatever other modules it needs
  1. I want to reduce the development loop. Implications:
  • This means reducing the amount of "compilation" time
  • it means I don't want to have to "install" code to try modules I am interested in.
  • I may have an "installed" version and a "development" version and I want to be able to run out of the "development" code with little overhead.
  1. I need to be able to test each module in isolation. Test modules are still modules, so see 1..4 above. Implications:
  • Because there are many files there are many tests
  • Tests need to be able to be run interactively and also as a suite in batch

##The Linking Problem## ###Internal versus External###

A (large) Emacs Lisp program uses modules from many places. Some modules reside from inside the project and some reside from outside. Furthermore there is a certain fluidity here in that one may start with a module residing inside the project and then later decide to make it an independent project.

Example: in the debugger front-end project, the linking method we describe or the testing method might be an internal module.

It so happens that something similar was recognized a long time ago in C where I think they got it right. Consider the difference between

#include "stdio.h"

which can also be written as

#include "./stdio.h"

or perhaps

#include "/tmp/stdio.h"

versus:

#include <stdio.h>

which could legally be written as

#include </usr/include/stdio.h>

but in practice is never written that way; it would be frowned on if it were.

The first #include instructs a compiler in the filesystem (relative to where you are now if the path isn't absolute) while the second says to consider an "include" path. Semantically the first is used for referring within a project while the latter is when referring to headers outside the project. What's true for headers in C, is also true for modules in Ruby greater than 1.9 and greater. There is something called require and require_relative which works like the two kinds of #includes above.

###`load-path` is Evil###

In Emacs Lisp, the thing like INCLUDE for C or $LOAD_PATH in Ruby is called, well, load-path. (Ruby does borrow from Lisp.) But I think load_path is evil. Perhaps a necessary evil, but still evil. Go into Emacs and look at load-path. Here's mine:

  ("/home/rocky/.emacs.d/elpa/epresent-0.1/" "/home/rocky/.emacs.d/elpa/haml-mode-3.0.14/" "/home/rocky/.emacs.d/elpa/rvm-1.1/" "/home/rocky/.emacs.d/elpa/test-case-mode-0.1/" "/home/rocky/.emacs.d/elpa/fringe-helper-0.1.1/" "/home/rocky/.emacs.d/elpa/yaml-mode-0.0.5/" "/usr/share/emacs/site-lisp/ruby1.8-elisp" "/usr/share/emacs/site-lisp/tuareg" "/home/rocky/elisp" "/usr/share/emacs-snapshot/site-lisp/remake" "/usr/share/emacs/site-lisp/mgp/" "/usr/share/emacs/23.1.50/site-lisp/auctex" "/usr/share/emacs-snapshot/site-lisp/anthy" "/usr/share/emacs-snapshot/site-lisp/mailcrypt" "/usr/share/emacs/site-lisp/autoconf" "/usr/share/emacs-snapshot/site-lisp/auctex" "/usr/share/emacs/site-lisp/auctex" "/usr/share/emacs/site-lisp/mailcrypt" "/usr/share/emacs-snapshot/site-lisp/ocaml-mode" "/etc/emacs-snapshot" "/etc/emacs" "/usr/local/share/emacs/23.1.50/site-lisp" "/usr/local/share/emacs/site-lisp" "/usr/local/share/emacs/site-lisp/dbgr" "/usr/local/share/emacs/site-lisp/realgud/common" "/usr/local/share/emacs/site-lisp/realgud/debugger" "/usr/local/share/emacs/site-lisp/realgud/lang" "/usr/local/share/emacs/site-lisp/realgud/common/buffer" "/usr/local/share/emacs/site-lisp/realgud/common/init" "/usr/local/share/emacs/site-lisp/realgud/debugger/bashdb" "/usr/local/share/emacs/site-lisp/realgud/debugger/gdb" "/usr/local/share/emacs/site-lisp/realgud/debugger/kshdb" "/usr/local/share/emacs/site-lisp/realgud/debugger/perldb" "/usr/local/share/emacs/site-lisp/realgud/debugger/pydbgr" "/usr/local/share/emacs/site-lisp/realgud/debugger/rdebug" "/usr/local/share/emacs/site-lisp/realgud/debugger/remake" "/usr/local/share/emacs/site-lisp/realgud/debugger/trepan" "/usr/local/share/emacs/site-lisp/realgud/debugger/trepan.pl" "/usr/local/share/emacs/site-lisp/realgud/debugger/trepan8" "/usr/local/share/emacs/site-lisp/realgud/debugger/trepanpl" "/usr/local/share/emacs/site-lisp/realgud/debugger/trepanx" "/usr/local/share/emacs/site-lisp/realgud/debugger/zshdb" "/usr/share/emacs/23.1.50/site-lisp" "/usr/share/emacs/23.1.50/site-lisp/anthy" "/usr/share/emacs/23.1.50/site-lisp/cmake-data" "/usr/share/emacs/23.1.50/site-lisp/global" "/usr/share/emacs/23.1.50/site-lisp/mailcrypt" "/usr/share/emacs/23.1.50/site-lisp/ocaml-mode" "/usr/share/emacs/23.1.50/site-lisp/remake" "/usr/share/emacs/site-lisp" "/usr/share/emacs/23.1.50/lisp" "/usr/share/emacs/23.1.50/lisp/url" "/usr/share/emacs/23.1.50/lisp/textmodes" "/usr/share/emacs/23.1.50/lisp/progmodes" "/usr/share/emacs/23.1.50/lisp/play" "/usr/share/emacs/23.1.50/lisp/org" "/usr/share/emacs/23.1.50/lisp/nxml" "/usr/share/emacs/23.1.50/lisp/net" "/usr/share/emacs/23.1.50/lisp/mh-e" "/usr/share/emacs/23.1.50/lisp/mail" "/usr/share/emacs/23.1.50/lisp/language" "/usr/share/emacs/23.1.50/lisp/international" "/usr/share/emacs/23.1.50/lisp/gnus" "/usr/share/emacs/23.1.50/lisp/eshell" "/usr/share/emacs/23.1.50/lisp/erc" "/usr/share/emacs/23.1.50/lisp/emulation" "/usr/share/emacs/23.1.50/lisp/emacs-lisp" "/usr/share/emacs/23.1.50/lisp/calendar" "/usr/share/emacs/23.1.50/lisp/calc" "/usr/share/emacs/23.1.50/lisp/obsolete" "/usr/share/emacs/23.1.50/leim")

This is an incompressible jumble of stuff most of it I don't have a clue about; if someone pressed me as to whether it represented what I want, I'd have to give an opinion based on empiric use.

Given the complexity of load-path, it is fragile and insecure. I have to hope that name spaces are distinct. [See match-possible-signs inside calc-arith.el] An Emacs exploit might be to inject a directory into load-path; if put far enough down in the string you might not even see it printed when you eval load-path. The default setting in Emacs stops showing you a string after so many characters.

But also judge this in light of my requirement that I want to be able to run a development version while there may be an installed version around. load-path is consulted in load, load-library and require. However load can be given a specific file name and that's what I want.

###Emacs `load-relative`###

I wrote emacs-load-relative to allow me to do the kind of robust internal linking I want. It uses Emacs primitive load which takes a file name instead of load-library which searches for a file using load-path. Let me give a few examples from the Emacs debugger front-end project, emacs-dbgr.

In its simplest form one can write:

(require 'load-relative) ; pull in (load-relative)
(load-relative "my-module")

which simply issues a load of my-module which is assumed to be located inside the same directory as the file that this command comes from. If you evaluate this inside an Emacs buffer, the file is the one that is associated with the buffer. You can also pass an optional symbol:

(load-relative "my-module" 'dbgr)

This just says that if you don't find "my-module" around, if you can find the file associated with provide 'dbgr, use the directory that file was in as the starting point in a relative file search.

Another slight variation is require-relative which uses underneath require instead of load. And finally I give and example which uses the form which in fact is mostly used in my project:

Inside realgud/debugger/trepan.pl/init.el

    (require-relative-list '("../../common/regexp"
			    "../../common/loc"
			    "../../common/init")
			    "dbgr-")

Note here that locations span different directories. So for example I should find realgud/common/regexp.el or the compiled version which ends in elc.

##Testing##

There are current schools of programming that suggest one writes tests or behaviors before one writes code. These schools are abbreviated TDD (Test-Driven Development) and BDD (Behavior-Driven Development).

I don't strictly follow this, but I do believe testing is very important. A number of basic test frameworks around are modeled off of one for Java called Junit.

I looked around for test frameworks for Emacs. http://www.emacswiki.org/emacs/UnitTesting has a list of them.

The first one I used is called elk-test. Then I found necromancy's test-unit which he apparently doesn't work on, and pointed me to elr which is now developed as part of Emacs.

Coming back to the requirements again, I need to be able to test each module either interactively or in batch. At the time I started working on this, elr was a little deficient in running tests in batch mode. But I see now that this has been fixed. So I will eventually redo my tests.

###Emacs `test-unit`###

Note: this module has been superseded by test-simple

However for this talk lemme stick with what I know best to show how testing works with my modifications to necromancy's unit test.

I currently have 39 tests files for this one project which is pretty small for my goal, given that there are about 90 Emacs Lisp files. However I still have more automated test files than the entire Emacs 24 project has using elr in "automated tests".

But now let's dive into a test. The emacs-dbgr project supports quite a a few debuggers and programming languages. In order to do that, we need to be able to extract position information from the output produced in debuggers and programming languages. There are tables of regular expressions in support of this. And recall jwz's dictum:

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

In my situation, it is hard for me to imagine using anything but some sort of regular expressions. And the unit tests using regular expressions manageable. I'll use the test program which tests gdb regular expressions. (I use gdb since that is probably the most familiar debugger of any that I support.)

At the top of the file I have this:

    (require 'test-unit)
    (load-file "../realgud/common/buffer/command.el")
    (load-file "../realgud/debugger/gdb/init.el")

The first line with require pulls down my test code. Notice that the second line uses a "load-file". And also notice I specify the source code file rather than give Emacs a choice as to whether to use the compiled version or not. Here, I always want to use the source code.

I could use load-relative that I mentioned previously. Instead I am using the Emacs primitive which has a restriction that the test has to be run from the test directory. For testing, that's an okay limitation. For general development generally I want to do better.

Next comes:

   (test-unit-clear-contexts)

which clears any prior testing. Next I have some initialization and each tests are put in a context block which here is tagged "regexp-gdb".

Finally at the end of the file I run the tests with:

   (test-unit "regexp-gdb")

Ok. So as I said I can run this inside the Emacs with M-x eval-current-buffer. And when I run this I get:

Running specs tagged "regexp-gdb": .... 0 problems in 4
specifications using 18 assertions. (0 seconds)

elr has slightly slicker looking output for an interactive run. But for my purposes, the above is fine. But what happens when there's a failure? Ok. Let's introduce one to see.

I remove the "g" in "beg" in the ../realgud/debugger/gdb/init.el and let's run again. I get:

Running specs tagged "regexp-gdb": F 1 problem in 1 specification
using 1 assertions. (0 seconds) Context: traceback location

matching Specification: basic location

So I am told that the problem lies in "basic location". So now what I want to show is how easy it is for me to smoke down the problem. Basically I just eval lines in the file. The load-file statements are optional as are the setting of globals: dbg-name, loc-pat realgud, and text. So look down for "basic-location" and when I eval (loc-match text) I get nil back. If this isn't enough and I want to debug into loc-match() I can do that. I'll have to copy loc-match to M-x edebug-defun it.

###Running Tests in Batch###

To run all the tests I have them strung together in a GNU Makefile. I wrote version of GNU Make that adds a --tasks option which shows you what the "interesting" targets are. (If you are familiar with Ruby's "rake", it has a similar flag). Let me run that to find the target.

M-x `compile`
remake --tasks

CTAGS
ChangeLog
...
check
check-am
check-recursive
check-short	# Run all tests without bloated output
...

I'll run this via GNU Make make --check-short, and I get output like this:

make check-short
make check 2>&1  | ruby make-check-filter.rb
Running specs tagged "bp":
.
0 problems in 1 specification using 1 assertions. (0 seconds)
Running specs tagged "dbgr-buf-bt-pydbgr":
.
0 problems in 1 specification using 8 assertions. (0 seconds)
Running specs tagged "dbgr-buf-bt-rdebug":
.
0 problems in 1 specification using 10 assertions. (0 seconds)
Running specs tagged "dbgr-buf-bt-trepan":
.
0 problems in 1 specification using 12 assertions. (0 seconds)
Running specs tagged "dbgr-buf-bt-trepanx":
.
0 problems in 1 specification using 11 assertions. (0 seconds)
Running specs tagged "dbgr-buf-bt-zshdb":
.
0 problems in 1 specification using 3 assertions. (1 seconds)
Running specs tagged "dbgr-buf-bt":
...
0 problems in 3 specifications using 3 assertions. (0 seconds)
Running specs tagged "dbgr-buf-cmd":

The number of dots just gives a running count of the number of specifications.

I think I'll stop here and take questions or comments.