* Use the new tsc functions to poll for new read data in a pipe for a short period of time on a SMP box. This greatly increases the odds of a pipe writer on one cpu being able to pipeline data to a reader on another cpu without having to use an IPI or tsleep/wakeup. For the pipe1 test this brings the synchronous communications path over a pipe (Awrite, Bread, Bwrite, Aread) down from 7uS to around 2uS. For the pipe2 test this value greatly reduces the number of IPIs and improves bandwidth by a few hundred megabytes/sec (the old DELAY did the same thing so there is no change for the pipe2 test). * Add sysctl kern.pipe.delay which defaults to 5000 nanoseconds (5uS). This is the maximum a pipe reader will wait for additional data before falling back to tsleep/wakeup (and related ipis). pipe_delay may be set to 0 to disable the function. I value of at least 3000 is recomended. Pipelining large buffers efficiently requires a higher value, say up to 8000 or so. * Allow kern.pipe.mpsafe to be set to 2 which adds a predictive wakeup when a writer is found to be stalled. This currently has no significant effect on operations due to token collisions. * Add statistics: kern.pipe.wblocked and kern.pipe.rblocked, counting the number of times a pipe blocks in "pipewr" or "piperd". * Fix MP races in pipe_ioctl().
Add int64_t target = tsc_get_target(ns) and tsc_test_target(target). See routines for details. These functions are available when the system supports an extremely fine-grained counter such as a TSC and may be used to generate finely tuned delays.
* Make pipe_read and pipe_write MPSAFE. * Add a sysctl kern.pipe.mpsafe which defaults to disabled. Set to 1 to test the MPSAFE pipe code. The expectation is that it will be set to 1 for the release. Currently only pipe_read and pipe_write is MPSAFE. * The new code in mpsafe mode also implements a streaming optimization to avoid unnecessary tsleep/wakeup/IPIs. If the reader and writer are operating on different cpus this feature results in more uniform performance across a wide swath of block sizes. * The new code currently does not use any page mapping optimizations. Page table overhead is fairly nasty on SMP so for now we rely on cpu caches and do an extra copy. This means the code is tuned better for more recent cpus and tuned worse for older cpus. At least for now. OLD pipe code: dwe = dwrite_enable, sfb = dwrite_sfbuf mode NEW pipe code: mpsafe = 0 (gets bgl) or 1 (does not use bgl) Using /usr/src/test/sysperf/pipe2.c to test, all results in MBytes/sec 8K 16K 32K 64K 128K 256K ---- ---- ---- ---- ---- ---- OLD dwe=0 1193 1167 1555 1525 1473 1477 OLD dwe=1 sfb=0 856 1458 2307 2182 2275 2307 OLD dwe=1 sfb=1 955 1537 2300 2356 2363 2708 OLD dwe=1 sfb=2 939 1561 2367 2477 2379 2360 NEW mpsafe=0 1150 1369 1536 1591 1358 1270 NEW mpsafe=1 2133 2319 2375 2387 2396 2418
This means that from now on, what is allowed within a jail is purely defined in function prison_priv_check().