raspi-hacks: Remove fakeclock, add e2fsck, libcofi_rpi, break out doi…

…nst.sh
idlemoor · Jul 10, 2012 · 1e8584b · 1e8584b
1 parent d397b48
commit 1e8584b
Show file tree

Hide file tree

Showing 12 changed files with 615 additions and 138 deletions.
diff --git a/raspi-hacks/README b/raspi-hacks/README
@@ -9,13 +9,22 @@ lines' entries s0, s1 and s2.  The Rasberry Pi's UART is /dev/ttyAMA0, not
 
 * Clock - the Raspberry Pi has no hardware clock and boots with the
 date/time set to 1970-01-01, so /etc/rc.d/rc.local will be modified to
-set the correct date/time from the network (using the 'sntp' command).
-Also, to provide a pragmatic approximation for the correct date/time
-during early boot and if the network is unavailable, /sbin/hwclock will be
-replaced by a shellscript which saves the date/time to disk on shutdown
-and restores it on startup.  This prevents the error 'Cannot access the
-Hardware Clock via any known method'.
+set the correct date/time from the network (using the 'sntp' command). 
+Additionally, the file /etc/e2fsck.conf will be created to stop e2fsck
+from erroring or requiring manual intervention when it encounters bad time
+stamps.  The fakeclock mechanism in a previous version of this package is
+no longer used, and the original hwclock command will be restored when you
+upgrade this package.  This reinstates the error 'Cannot access the
+Hardware Clock via any known method' :-)
 
 * Tuning - /etc/sysctl.conf will be created to tune vm.min_free_kbytes.
 This prevents the error 'smsc95xx 1-1.1:1.0: eth0: kevent 2 may have
 been dropped'.
+
+* libcofi_rpi - The library /usr/lib/libcofi_rpi.so contains teh_orph's
+replacement memcpy and memset functions.  These replacements have been
+reported to improve application performance.  They are disabled by default,
+but if you want to enable them, use this command and then log out and log
+in again:
+
+  chmod ugo-x /etc/profile.d/libcofi_rpi.{sh,csh}
diff --git a/raspi-hacks/doinst.sh b/raspi-hacks/doinst.sh
@@ -0,0 +1,24 @@
+config() {
+  NEW="$1"
+  OLD="$(dirname $NEW)/$(basename $NEW .new)"
+  # If there's no config file by that name, mv it over:
+  if [ ! -r $OLD ]; then
+    mv $NEW $OLD
+  elif [ "$(cat $OLD | md5sum)" = "$(cat $NEW | md5sum)" ]; then
+    # toss the redundant copy
+    rm $NEW
+  fi
+  # Otherwise, we leave the .new copy for the admin to consider...
+}
+
+preserve_perms() {
+  NEW="$1"
+  OLD="$(dirname $NEW)/$(basename $NEW .new)"
+  if [ -e $OLD ]; then
+    cp -a $OLD ${NEW}.incoming
+    cat $NEW > ${NEW}.incoming
+    mv ${NEW}.incoming $NEW
+  fi
+  config $NEW
+}
+
diff --git a/raspi-hacks/e2fsck.conf.new b/raspi-hacks/e2fsck.conf.new
@@ -0,0 +1,3 @@
+[options]
+        accept_time_fudge = 1
+        broken_system_clock =1
diff --git a/raspi-hacks/fakeclock.sh b/raspi-hacks/fakeclock.sh
diff --git a/raspi-hacks/libcofi_rpi/Makefile b/raspi-hacks/libcofi_rpi/Makefile
@@ -0,0 +1,6 @@
+libcofi_rpi.so: memcpy.o memset.o
+	$(CC) -o libcofi_rpi.so -shared memcpy.o memset.o -g
+memset.o: memset.s
+	$(AS) memset.s -o memset.o -g
+memcpy.o: memcpy.s
+	$(AS) memcpy.s -o memcpy.o -g
diff --git a/raspi-hacks/libcofi_rpi/README.libcofi_rpi b/raspi-hacks/libcofi_rpi/README.libcofi_rpi
@@ -0,0 +1,59 @@
+copies-and-fills
+
+SUMMARY
+
+Replacement memcpy and memset functionality for the Raspberry Pi with the intention of gaining greater performance.
+Coding with an understanding of single-issue is important.
+
+Tested using a modified https://github.com/ssvb/ssvb-membench, from Siarhei Siamashka.
+The testing involves lots of random numbers, iterating through sizes and source/destination alignments.
+If you find a bug, please tell me!
+
+To use: define the environment variable, LD_PRELOAD=/full/path/to/libcofi_rpi.so, then run program.
+
+The inner loop of the misalignment path of memcpy is derived from the GNU libc ARM port. As a result "copies-and-fills" is licensed under the GNU Lesser General Public License version 2.1. See http://www.gnu.org/licenses/ for details.
+To see the original memcpy, browse it here: http://sourceware.org/git/?p=glibc-ports.git;a=blob;f=sysdeps/arm/memcpy.S;hb=HEAD
+
+Simon Hall
+
+NOTES
+
+memcpy:
+Can be found in memcpy.s.
+Compared to the generic libc memcpy, this one reaches performance parity at around ~150 bytes copies with any source/destination alignment and eventually gains 2-3x throughput, especially when the source buffer is uncached.
+When taking the libc source and enabling the pld path, it certainly does improve. However the source alignment option appears to do nothing for performance yet greatly increases the code complexity.
+In initial testing, some facts were found:
+- despite the increase in free registers, copies via VFP were slower at peak by ~25%
+- copying 32 bytes at a time with a single store-multiple gives the highest performance
+- getting the destination 32b aligned gives a much greater throughput versus 4b-alignment
+- some memcpys are of a fixed size, eg 1/2/4/8 byte in size
+- byte transfers have a much worse performance than expected
+- for misaligned transfers, 32b-aligned stms are the way forward with mov/orr byte shuffling; byte copies give very poor performance
+
+The code deals with the special small sizes, then races to reach 32b alignment of the destination.
+We then test for misalignment with the source. If the (source - dest alignment) & 3 != 0 then we use the misaligned path.
+For the aligned path, we iterate through the data, 32 bytes at a time. We then handle a word at a time, then a byte.
+For the misaligned path, we have to choose how misaligned we are - 1, 2, or 3 bytes. There is a custom path for each that does the appropriate shifts.
+
+The key to this is prefetch of the source array. Prefetch instructions must be far from the load instruction, as it appears the load/store pipe is busy for a while after a large load instruction is issued.
+
+Speeds of up to 680 MB/s have been achieved (effective 339 MB/s copy).
+
+memset:
+Can by found in memset.s.
+Compared to the generic libc memset, this quickly reaches performance parity at around 100 bytes with any alignment.
+On testing,
+- it appears 32-byte stores yield ~1000-1100 MB/s, by two sequential 16-byte stores can reach 1300-1400 MB/s
+- again 32b aligned destinations are good
+
+The code 4-byte aligns the destination with a byte writer, then 32-byte aligns it with a word writer.
+We then write two 2*16 bytes of data, then write words, then bytes.
+No preload of destination data seems to be required.
+
+Speeds of up to 1390 MB/s have been achieved. This is ~7x faster than the libc version.
+
+VERSION HISTORY
+
+09/07/2012, minor updates
+01/07/2012, initial release
+
diff --git a/raspi-hacks/libcofi_rpi/libcofi_rpi.csh.new b/raspi-hacks/libcofi_rpi/libcofi_rpi.csh.new
@@ -0,0 +1,3 @@
+#!/bin/csh
+
+setenv LD_PRELOAD /usr/lib/libcofi_rpi.so
diff --git a/raspi-hacks/libcofi_rpi/libcofi_rpi.sh.new b/raspi-hacks/libcofi_rpi/libcofi_rpi.sh.new
@@ -0,0 +1,3 @@
+#!/bin/sh
+
+export LD_PRELOAD=/usr/lib/libcofi_rpi.so