Run-time type checking infrastructure for whole programs
C OCaml Python C++ Makefile Shell

README

libcrunch is a runtime library, with some toolchain extensions (compiler
wrapper, etc.), for fast dynamic type checking in unsafe languages.
Currently that means mostly C, although languages are fairly pluggable
in the design.

"Dynamic type checking" mostly means checking pointer casts. There is
limited checking of other things like va_arg and union use; more to add
in due course. More recently, libcrunch has acquired a bounds checker
which promises to be faster and more usable than SoftBound. I'm hoping
it will become the first really practical on-by-default bounds checker,
but time will tell... it is still fairly immature and unevaluated. There
is no temporal memory checking at present, but a GC is on the cards.

All this is build on top of my other project, liballocs, which you
should build (and probably understand) first. Overall, the build looks
something like this.

$ git clone https://github.com/stephenrkell/liballocs.git
$ cat liballocs/README
(and follow those instructions, then...)
$ export LIBALLOCS=`pwd`/liballocs
$ git clone https://github.com/stephenrkell/libcrunch.git
$ cd libcrunch
$ make -jn                    # for your favourite n
$ make -C test                # if this succeeds, be amazed
$ frontend/c/bin/crunchcc -o hello /path/to/hello.c  # your code here
$ LD_PRELOAD=`pwd`/lib/libcrunch_preload.so ./hello  # marvel!

Tips for non-Debian or non-jessie users:

- You must have Dave Anderson's (ex-SGI) libdwarf, not elfutils's
  (libdw1) version. The libdwarfpp build will, by default, look for its
  dwarf.h and libdwarf.h in /usr/include.  If this libdwarf's headers
  are not in /usr/include (some distros put them in
  /usr/include/libdwarf instead), set LIBDWARFPP_CONFIGURE_FLAGS to
  "--with-libdwarf-includes=/path/to/includes" so that liballocs's
  contrib build process will configure libdwarfpp appropriately.

- Some problems have been reported with gcc 5.x and later. I can't
  reproduce them right now myself, but for now the recommended gcc is
  the 4.9 series. Bug reports for build errors occurring on 5.x and
  later are welcome.

- Be careful of build skew with libelf. Again, there are two versions:
  libelf0 and libelf1. It doesn't much matter which you use, but you
  should use the same at all times.

- On *BSD: you must first install g++, and build boost 1.55 from source
  using it. Add the relevant prefix to CFLAGS, CXXFLAGS and LDFLAGS.
  This is for library/symbol reasons not compiler reasons: mixing
  libstdc++ and libc++ in one process doesn't work, and libc++fileno
  doesn't work with libc++ at present (relevant feature request: a
  fileno() overload for ofstream/ifstream objects). Note that currently,
  the liballocs runtime doesn't build or run on the BSDs; however, the
  tools should do.

- Changes with cxxabi: again, build skew with these can be problematic,
  especially if you're relying on a system-supplied build of some C++
  library such as libboost* -- since it needn't be built using the same
  ABI that your currently-installed C++ compiler is using. If you get
  link errors with C++ symbol names, chances are you have a mismatch of
  ABI. This is another reason to use g++ 4.9.x for everything (including
  your own build of boost, as appropriate), since it predates the new
  cxxabi.


Liballocs models programs during execution in terms of /typed
allocations/. It reifies data types, providing fast access to
per-allocation metadata. 

Libcrunch extends this with check functions, thereby allowing assertions
such as 

assert(__is_aU(p, &__uniqtype_Widget));

to assert that p points to a Widget, and so on. 

A compiler wrapper inserts these checks automatically at particular
points -- primarily at pointer casts, but also at certain uses of va_arg
(and some other places too obscure to get into here). The effect is to
provide clean error messages on bad pointer casts. Language-wise,
libcrunch slightly narrows standard C, such that all live, allocated
storage has a well-defined type at any moment (cf. C99 "effective type"
which is more liberal).

Instrumentation is currently done with CIL. There is also a clang
front-end which is less mature (lacks a bounds checker) but will appear
as a submodule in this tree shortly.

Below there is a slightly out-of-date list of some of the known
limitations. More positively, it really works (if you can build it),
supports lots of nasty C code really quite well, and usually only slows
execution by about 5--35% (more if you turn on bounds checking, though).
You can also run instrumented code without the library loaded, when the
slowdown is usually negligible (again, less so for bounds checks).



Usability quirks 

- requires manual identification of alloc functions (or rather,
liballocs does)

- check-on-cast is too eager for some C programming styles
    ("trap pointer" mechanism in the works)
    
- higher-order (indirect, pointer-to-function) checks are slightly 
    conservative
    (i.e. a few false positives are possible in these cases)

- plain crunchcc assumes memory-correct execution and checks only
    types (use crunchxcc for bounds checking too; 
    temporal correctness is assumed, i.e. use-after-free can break us)


Limitations of metadata

- no metadata (debug info) for actual parameters passed in varargs
     (need to maintain a shadow stack for this; am working on it)

- no metadata (debug info) for address-taken temporaries
     (significant for C++, but not for C; needs compiler fixes)
     
- sizeof scraping is not completely reliable (but is really not bad)


Current performance quirks

- liballocs's trailer-based heap index implementation causes clownshoes