Tools and runtime for dynamic checking (mostly of C code, for now)
OCaml C Makefile C++ Python Shell Other
Switch branches/tags
Nothing to show
Clone or download

README

libcrunch is a system for fast dynamic type and bounds checking in
unsafe languages -- currently C, although languages are fairly pluggable
in the design.

It is somewhat inaccurately named, in that it is nowadays both a runtime
library and some toolchain extensions (compiler wrapper, linker plugin,
auxiliary tools).

"Dynamic type checking" mostly means checking pointer casts. There is
limited checking of other things like va_arg and union use; more to add
in due course.

Bounds checking means probably what you think it means. The innovation
of libcrunch is to do fine-grained bounds checking (sensitive to
subobjects, such as arrays-in-structs), over all allocators (static,
stack, heap and custom), with very few false positives. The key to doing
this is run-time type information and a run-time model of allocators.
Currently, bounds checking performs about the same as ASan, but does
finer-grained checking. It's also comparable to SoftBound, but doesn't
suffer the kind of false positives that fat-pointer systems do when they
lose track of bounds.

I have some plans for temporal checking too, including a garbage
collector (which masks errors) and a mostly-timely checker (which
catches errors), but nothing concrete yet.

The medium-term goal is a proof-of-concept implementation of C that is
dynamically safe... and runs most source code unmodified, is
binary-compatible even with uninstrumented code (albeit sacrificing
safety guarantees), and performs usably well (hopefully no worse than
half native speed, usually better).

To get good performance, I have some plans for exploiting hardware
assistance (various kinds of tagged memory that are springing up) and
also speculative/dynamic optimisations. Again, nothing concrete yet (but
feel free to ask).

All this is built on top of my other project, liballocs, which you
should build (and probably understand) first. In a nutshell, liballocs
provides the type information and other dynamic run-time services; its
goal is "Smalltalk-style dynamism for Unix processes".

Building is non-trivial... but you can do it! Overall, the build looks
something like this.

$ git clone https://github.com/stephenrkell/liballocs.git
$ cat liballocs/README
(and follow those instructions, then...)
$ export LIBALLOCS=`pwd`/liballocs
$ git clone https://github.com/stephenrkell/libcrunch.git
$ cd libcrunch
$ make -jn                    # for your favourite n
$ make -C test                # if this succeeds, be amazed
$ frontend/c/bin/crunchcc -o hello /path/to/hello.c  # your code here
$ LD_PRELOAD=`pwd`/lib/libcrunch_preload.so ./hello  # marvel!

Tips for non-Debian or non-jessie users:

- You must have Dave Anderson's (ex-SGI) libdwarf, not elfutils's
  (libdw1) version. The libdwarfpp build will, by default, look for its
  dwarf.h and libdwarf.h in /usr/include.  If this libdwarf's headers
  are not in /usr/include (some distros put them in
  /usr/include/libdwarf instead), set LIBDWARFPP_CONFIGURE_FLAGS to
  "--with-libdwarf-includes=/path/to/includes" so that liballocs's
  contrib build process will configure libdwarfpp appropriately.

- Some problems have been reported with gcc 5.x and later. See gcc bug
  78407. For now the recommended gcc is the 4.9 series, although 7.2.x
  fixes that bug and seems to work. Bug reports for build errors
  occurring on other versions are welcome.

- Be careful of build skew with libelf. Again, there are two versions:
  libelf0 and libelf1. It doesn't much matter which you use, but you
  should use the same at all times.

- On *BSD: you must first install g++, and build boost 1.55 from source
  using it. Add the relevant prefix to CFLAGS, CXXFLAGS and LDFLAGS.
  This is for library/symbol reasons not compiler reasons: mixing
  libstdc++ and libc++ in one process doesn't work, and libc++fileno
  doesn't work with libc++ at present (relevant feature request: a
  fileno() overload for ofstream/ifstream objects). Note that currently,
  the liballocs runtime doesn't build or run on the BSDs; however, the
  tools should do.

- Changes with cxxabi: again, build skew with these can be problematic,
  especially if you're relying on a system-supplied build of some C++
  library such as libboost* -- since it needn't be built using the same
  ABI that your currently-installed C++ compiler is using. If you get
  link errors with C++ symbol names, chances are you have a mismatch of
  ABI. This is another reason to use g++ 4.9.x for everything (including
  your own build of boost, as appropriate), since it predates the new
  cxxabi.


Liballocs models programs during execution in terms of /typed
allocations/. It reifies data types, providing fast access to
per-allocation metadata. 

Libcrunch extends this with check functions, thereby allowing assertions
such as 

assert(__is_aU(p, &__uniqtype_Widget));

to assert that p points to a Widget, and so on.

For bounds errors, libcrunch instruments /pointer derivation/. This
includes array indexing and pointer arithmetic, but not pointer
dereference which can safely proceed unchecked. Bad pointer uses are
caught and reported in a segfault handler.

A compiler wrapper inserts these checks (and some others) automatically
at particular points. The effect is to provide clean error messages on
bad pointer casts, bad pointer uses and other operations that would
otherwise be corrupting failure (undefined behaviour, in C).
Language-wise, libcrunch slightly narrows standard C, such that all
live, allocated storage has a well-defined type at any moment (cf. C99
"effective type" which is more liberal). This can be a source of false
positives in the quirkiest code; there are some mitigations.

Instrumentation is currently done with CIL. There is also a clang
front-end which is less mature (lacks a bounds checker) and currently
rather out-of-date, but will be revived at some point.

Type-checking usually only slows execution by about 5--35%. You can also
run type-check-instrumented code without the library loaded; in that
case the slowdown is usually minimal (a few percent at most).



Usability quirks 

- requires manual identification of alloc functions (or rather,
liballocs does)

- check-on-cast is too eager for some C programming styles
    ("trap pointer" mechanism for casts in the works; bounds checks
    already work this way)
    
- higher-order (indirect, pointer-to-function) checks are slightly 
    conservative
    (i.e. a few false positives are possible in these cases)

- plain crunchcc assumes memory-correct execution and checks only
    types (use crunchxcc for bounds checking too; 
    temporal correctness is assumed, i.e. use-after-free can break us)


Limitations of metadata

- no metadata (debug info) for actual parameters passed in varargs
     (need to maintain a shadow stack for this; am working on it)

- no metadata (debug info) for address-taken temporaries
     (significant for C++, but not for C; needs compiler fixes)
     
- sizeof scraping is not completely reliable (but is pretty good!)