Said test takes over 90 seconds on my Core 2 Duo which is too long, even for the pummel tests.
Use static buffers. Most clock ticks were spent in malloc() and free() instead of read() and write(). Fix measurements. Really fast runs would result in bogus results like: Wrote 1048576000 bytes in -0.731630s using 8192 byte buffers: -1366.811093mB/s