Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Initial commit of "hashtables".

  • Loading branch information...
commit d6928baac2f9833e3928e3fb639d762b91e6ca57 0 parents
@gregorycollins authored
9 .gitignore
@@ -0,0 +1,9 @@
+*~
+dist
+dist/**
+test/dist
+test/dist/**
+test/.hpc
+test/.hpc/**
+test/*.tix
+*.prof
28 LICENSE
@@ -0,0 +1,28 @@
+Copyright (c) 2011, Google, Inc.
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice,
+ this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.
+
+ * Neither the name of Google, Inc. nor the names of other contributors may
+ be used to endorse or promote products derived from this software without
+ specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
79 README.md
@@ -0,0 +1,79 @@
+This package provides a couple of different implementations of mutable hash
+tables in the ST monad, as well as a typeclass abstracting their common
+operations, and a set of wrappers to use the hash tables in the IO monad.
+
+**Quick start**: documentation for the hash table operations is provided in the
+`Data.HashTable.Class` module, and the IO wrappers are located in the
+`Data.HashTable.IO` module.
+
+This package currently contains three hash table implementations:
+
+ 1. `Data.HashTable.ST.Basic` contains a basic open-addressing hash table
+ using linear probing as the collision strategy. On a pure speed basis it
+ should currently be the fastest available Haskell hash table
+ implementation for lookups, although it has a higher memory overhead
+ than the other tables and can suffer from long delays when the table is
+ resized because all of the elements in the table need to be rehashed.
+
+ 2. `Data.HashTable.ST.Cuckoo` contains an implementation of "cuckoo hashing"
+ as introduced by Pagh and Rodler in 2001 (see
+ [http://en.wikipedia.org/wiki/Cuckoo\_hashing](http://en.wikipedia.org/wiki/Cuckoo_hashing)).
+ Cuckoo hashing has worst-case /O(1)/ lookups and can reach a high "load
+ factor", in which the table can perform acceptably well even when more
+ than 90% full. Randomized testing shows this implementation of cuckoo
+ hashing to be slightly faster on insert and slightly slower on lookup than
+ `Data.Hashtable.ST.Basic`, while being more space efficient by about a
+ half-word per key-value mapping. Cuckoo hashing, like the basic hash table
+ implementation using linear probing, can suffer from long delays when the
+ table is resized.
+
+ 3. `Data.HashTable.ST.Linear` contains a linear hash table (see
+ [http://en.wikipedia.org/wiki/Linear\_hashing](http://en.wikipedia.org/wiki/Linear_hashing)),
+ which trades some insert and lookup performance for higher space
+ efficiency and much shorter delays when expanding the table. In most
+ cases, benchmarks show this table to be currently slightly faster than
+ `Data.HashTable` from the Haskell base library.
+
+It is recommended to create a concrete type alias in your code when using this
+package, i.e.:
+
+ import qualified Data.HashTable.IO as H
+
+ type HashTable k v = H.BasicHashTable k v
+
+ foo :: IO (HashTable Int Int)
+ foo = do
+ ht <- H.new
+ H.insert ht 1 1
+ return ht
+
+Firstly, this makes it easy to switch to a different hash table implementation,
+and secondly, using a concrete type rather than leaving your functions abstract
+in the HashTable class should allow GHC to optimize away the typeclass
+dictionaries.
+
+This package accepts a couple of different cabal flags:
+
+ * `unsafe-tricks`, default **on**. If this flag is enabled, we use some
+ unsafe GHC-specific tricks to save indirections (namely `unsafeCoerce#` and
+ `reallyUnsafePtrEquality#`. These techniques rely on assumptions about the
+ behaviour of the GHC runtime system and, although they've been tested and
+ should be safe under normal conditions, are slightly dangerous. Caveat
+ emptor. In particular, these techniques are incompatible with HPC code
+ coverage reports.
+
+ * `sse41`, default /off/. If this flag is enabled, we use some SSE 4.1
+ instructions (see
+ [http://en.wikipedia.org/wiki/SSE4](http://en.wikipedia.org/wiki/SSE4),
+ first available on Intel Core 2 processors) to speed up cache-line searches
+ for cuckoo hashing.
+
+ * `bounds-checking`, default /off/. If this flag is enabled, array accesses
+ are bounds-checked.
+
+ * `debug`, default /off/. If turned on, we'll rudely spew debug output to
+ stdout.
+
+ * `portable`, default /off/. If this flag is enabled, we use only pure
+ Haskell code and try not to use unportable GHC extensions. Turning this
+ flag on forces `unsafe-tricks` and `sse41` *OFF*.
2  Setup.hs
@@ -0,0 +1,2 @@
+import Distribution.Simple
+main = defaultMain
471 cbits/cfuncs.c
@@ -0,0 +1,471 @@
+#include <stdint.h>
+
+
+#if defined(USE_SSE_4_1)
+#include <smmintrin.h>
+#endif
+
+
+#if defined(__GNUC__)
+#define PREFETCH_READ(x) (__builtin_prefetch(x, 0, 3))
+#define PREFETCH_WRITE(x) (__builtin_prefetch(x, 1, 3))
+#else
+#define PREFETCH_READ(x)
+#define PREFETCH_WRITE(x)
+#endif
+
+void prefetchCacheLine32_write(uint32_t* line, int start)
+{
+ PREFETCH_WRITE((void*)(&line[start]));
+}
+
+
+void prefetchCacheLine64_write(uint64_t* line, int start)
+{
+ PREFETCH_WRITE((void*)(&line[start]));
+}
+
+
+void prefetchCacheLine32_read(uint32_t* line, int start)
+{
+ PREFETCH_READ((void*)(&line[start]));
+}
+
+
+void prefetchCacheLine64_read(uint64_t* line, int start)
+{
+ PREFETCH_READ((void*)(&line[start]));
+}
+
+
+int forwardSearch32_2(uint32_t* array, int start, int end,
+ uint32_t x1, uint32_t x2) {
+ uint32_t* ep = array + end;
+ uint32_t* p = array + start;
+ while (1) {
+ if (p == ep) p = array;
+ if (*p == x1 || *p == x2) return p - array;
+ ++p;
+ }
+}
+
+
+int forwardSearch32_3(uint32_t* array, int start, int end,
+ uint32_t x1, uint32_t x2, uint32_t x3) {
+ uint32_t* ep = array + end;
+ uint32_t* p = array + start;
+ while (1) {
+ if (p == ep) p = array;
+ if (*p == x1 || *p == x2 || *p == x3) return p - array;
+ ++p;
+ }
+}
+
+
+int forwardSearch64_2(uint64_t* array, int start, int end,
+ uint64_t x1, uint64_t x2) {
+ uint64_t* ep = array + end;
+ uint64_t* p = array + start;
+ while (1) {
+ if (p == ep) p = array;
+ if (*p == x1 || *p == x2) return p - array;
+ ++p;
+ }
+}
+
+
+int forwardSearch64_3(uint64_t* array, int start, int end,
+ uint64_t x1, uint64_t x2, uint64_t x3) {
+ uint64_t* ep = array + end;
+ uint64_t* p = array + start;
+ while (1) {
+ if (p == ep) p = array;
+ if (*p == x1 || *p == x2 || *p == x3) return p - array;
+ ++p;
+ }
+}
+
+
+//----------------------------------------------------------------------------
+// cache line search functions
+// First: 32 bit
+
+inline int mask(int a, int b) { return -(a == b); }
+
+
+uint8_t deBruijnBitPositions[] = {
+ 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
+ 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
+};
+
+
+int firstBitSet(int a) {
+ int zeroCase = mask(0, a);
+ uint32_t x = (uint32_t) (a & -a);
+ x *= 0x077CB531;
+ x >>= 27;
+ return zeroCase | deBruijnBitPositions[x];
+}
+
+
+int32_t lineResult32(int m, int start) {
+ int p = firstBitSet(m);
+ int32_t mm = mask(p, -1);
+ return mm | (~mm & (start + p));
+}
+
+
+uint32_t lineMask32(uint32_t* array, int start, uint32_t value) {
+ uint32_t* p = array + start;
+ uint32_t m = 0;
+ int offset = start & 0xf;
+
+ switch (offset) {
+ case 0: m |= mask(*p++, value) & 0x1;
+ case 1: m |= mask(*p++, value) & 0x2;
+ case 2: m |= mask(*p++, value) & 0x4;
+ case 3: m |= mask(*p++, value) & 0x8;
+ case 4: m |= mask(*p++, value) & 0x10;
+ case 5: m |= mask(*p++, value) & 0x20;
+ case 6: m |= mask(*p++, value) & 0x40;
+ case 7: m |= mask(*p++, value) & 0x80;
+ case 8: m |= mask(*p++, value) & 0x100;
+ case 9: m |= mask(*p++, value) & 0x200;
+ case 10: m |= mask(*p++, value) & 0x400;
+ case 11: m |= mask(*p++, value) & 0x800;
+ case 12: m |= mask(*p++, value) & 0x1000;
+ case 13: m |= mask(*p++, value) & 0x2000;
+ case 14: m |= mask(*p++, value) & 0x4000;
+ case 15: m |= mask(*p++, value) & 0x8000;
+ }
+
+ return m >> offset;
+}
+
+
+int lineSearch32(uint32_t* array, int start, uint32_t value) {
+ uint32_t m = lineMask32(array, start, value);
+ return lineResult32((int)m, start);
+}
+
+
+uint32_t lineMask32_2(uint32_t* array, int start, uint32_t x1, uint32_t x2) {
+ uint32_t* p = array + start;
+ uint32_t m = 0;
+ int offset = start & 0xf;
+
+ switch (offset) {
+ case 0: m |= (mask(*p, x1) | mask(*p, x2)) & 0x1; ++p;
+ case 1: m |= (mask(*p, x1) | mask(*p, x2)) & 0x2; ++p;
+ case 2: m |= (mask(*p, x1) | mask(*p, x2)) & 0x4; ++p;
+ case 3: m |= (mask(*p, x1) | mask(*p, x2)) & 0x8; ++p;
+ case 4: m |= (mask(*p, x1) | mask(*p, x2)) & 0x10; ++p;
+ case 5: m |= (mask(*p, x1) | mask(*p, x2)) & 0x20; ++p;
+ case 6: m |= (mask(*p, x1) | mask(*p, x2)) & 0x40; ++p;
+ case 7: m |= (mask(*p, x1) | mask(*p, x2)) & 0x80; ++p;
+ case 8: m |= (mask(*p, x1) | mask(*p, x2)) & 0x100; ++p;
+ case 9: m |= (mask(*p, x1) | mask(*p, x2)) & 0x200; ++p;
+ case 10: m |= (mask(*p, x1) | mask(*p, x2)) & 0x400; ++p;
+ case 11: m |= (mask(*p, x1) | mask(*p, x2)) & 0x800; ++p;
+ case 12: m |= (mask(*p, x1) | mask(*p, x2)) & 0x1000; ++p;
+ case 13: m |= (mask(*p, x1) | mask(*p, x2)) & 0x2000; ++p;
+ case 14: m |= (mask(*p, x1) | mask(*p, x2)) & 0x4000; ++p;
+ case 15: m |= (mask(*p, x1) | mask(*p, x2)) & 0x8000; ++p;
+ }
+
+ return m >> offset;
+}
+
+
+int lineSearch32_2(uint32_t* array, int start, uint32_t x1, uint32_t x2) {
+ uint32_t m = lineMask32_2(array, start, x1, x2);
+ return lineResult32((int)m, start);
+}
+
+
+uint32_t lineMask32_3(uint32_t* array, int start,
+ uint32_t x1, uint32_t x2, uint32_t x3) {
+ uint32_t* p = array + start;
+ uint32_t m = 0;
+ int offset = start & 0xf;
+
+ switch (offset) {
+ case 0: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x1; ++p;
+ case 1: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x2; ++p;
+ case 2: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x4; ++p;
+ case 3: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x8; ++p;
+ case 4: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x10; ++p;
+ case 5: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x20; ++p;
+ case 6: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x40; ++p;
+ case 7: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x80; ++p;
+ case 8: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x100; ++p;
+ case 9: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x200; ++p;
+ case 10: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x400; ++p;
+ case 11: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x800; ++p;
+ case 12: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x1000; ++p;
+ case 13: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x2000; ++p;
+ case 14: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x4000; ++p;
+ case 15: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x8000; ++p;
+ }
+
+ return m >> offset;
+}
+
+
+int lineSearch32_3(uint32_t* array, int start,
+ uint32_t x1, uint32_t x2, uint32_t x3) {
+ uint32_t m = lineMask32_3(array, start, x1, x2, x3);
+ return lineResult32((int)m, start);
+}
+
+
+//----------------------------------------------------------------------------
+// Now: 64-bit. If USE_SSE_4_1 is on, we will use SSE4.1 SIMD instructions to
+// search the cache line super-efficiently.
+
+#if defined(USE_SSE_4_1)
+
+inline uint64_t mask_to_mask2(__m128i m) {
+ int mask16 = _mm_movemask_epi8(m);
+ // output of _mm_movemask_epi8 is a 16-bit word where bit i is 1 iff the
+ // most significant bit of byte i of the mask is 1
+ int m1 = mask16 & 0x1;
+ int m2 = (mask16 & 0x100) >> 7;
+ return (uint64_t) (m1 | m2);
+}
+
+
+inline uint64_t cmp_and_mask(__m128i val, __m128i x0) {
+ __m128i mask1 = _mm_cmpeq_epi64(val, x0);
+ return mask_to_mask2(mask1);
+}
+
+
+inline uint64_t cmp_and_mask_2(__m128i val, __m128i x0, __m128i x1) {
+ __m128i mask1 = _mm_cmpeq_epi64(val, x0);
+ __m128i mask2 = _mm_cmpeq_epi64(val, x1);
+ mask1 = _mm_or_si128(mask1, mask2);
+ return mask_to_mask2(mask1);
+}
+
+
+inline uint64_t cmp_and_mask_3(__m128i val, __m128i x0, __m128i x1,
+ __m128i x2) {
+ __m128i mask1 = _mm_cmpeq_epi64(val, x0);
+ __m128i mask2 = _mm_cmpeq_epi64(val, x1);
+ __m128i mask3 = _mm_cmpeq_epi64(val, x2);
+ mask1 = _mm_or_si128(mask1, mask2);
+ mask1 = _mm_or_si128(mask1, mask3);
+ return mask_to_mask2(mask1);
+}
+
+
+uint64_t lineMask64(uint64_t* array, int start0, uint64_t v1) {
+ int offset = start0 & 0x7;
+ int start = start0 & ~0x7;
+
+ __m128i* p = (__m128i*) (&array[start]);
+ __m128i x1 = _mm_cvtsi32_si128(0);
+ x1 = _mm_insert_epi64(x1, v1, 0);
+ x1 = _mm_insert_epi64(x1, v1, 1);
+ uint64_t dest_mask = 0;
+
+ // x1 contains two 64-bit copies of the value to look for
+
+ // words 0, 1
+ __m128i x = _mm_load_si128(p);
+ dest_mask = cmp_and_mask(x, x1);
+ p = (__m128i*) (&array[start+2]);
+
+ // words 2, 3
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask(x, x1) << 2);
+ p = (__m128i*) (&array[start+4]);
+
+ // words 4, 5
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask(x, x1) << 4);
+ p = (__m128i*) (&array[start+6]);
+
+ // words 6, 7
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask(x, x1) << 6);
+
+ return dest_mask >> offset;
+}
+
+
+uint64_t lineMask64_2(uint64_t* array, int start0, uint64_t v1, uint64_t v2) {
+ int offset = start0 & 0x7;
+ int start = start0 & ~0x7;
+
+ __m128i* p = (__m128i*) (&array[start]);
+ __m128i x1 = _mm_cvtsi32_si128(0);
+ x1 = _mm_insert_epi64(x1, v1, 0);
+ x1 = _mm_insert_epi64(x1, v1, 1);
+
+ __m128i x2 = _mm_cvtsi32_si128(0);
+ x2 = _mm_insert_epi64(x2, v2, 0);
+ x2 = _mm_insert_epi64(x2, v2, 1);
+
+ uint64_t dest_mask = 0;
+
+ // words 0, 1
+ __m128i x = _mm_load_si128(p);
+ dest_mask = cmp_and_mask_2(x, x1, x2);
+ p = (__m128i*) (&array[start+2]);
+
+ // words 2, 3
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask_2(x, x1, x2) << 2);
+ p = (__m128i*) (&array[start+4]);
+
+ // words 4, 5
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask_2(x, x1, x2) << 4);
+ p = (__m128i*) (&array[start+6]);
+
+ // words 6, 7
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask_2(x, x1, x2) << 6);
+
+ return dest_mask >> offset;
+}
+
+
+uint64_t lineMask64_3(uint64_t* array, int start0,
+ uint64_t v1, uint64_t v2, uint64_t v3) {
+ int offset = start0 & 0x7;
+ int start = start0 & ~0x7;
+
+ __m128i* p = (__m128i*) (&array[start]);
+ __m128i x1 = _mm_cvtsi32_si128(0);
+ x1 = _mm_insert_epi64(x1, v1, 0);
+ x1 = _mm_insert_epi64(x1, v1, 1);
+
+ __m128i x2 = _mm_cvtsi32_si128(0);
+ x2 = _mm_insert_epi64(x2, v2, 0);
+ x2 = _mm_insert_epi64(x2, v2, 1);
+
+ __m128i x3 = _mm_cvtsi32_si128(0);
+ x3 = _mm_insert_epi64(x3, v3, 0);
+ x3 = _mm_insert_epi64(x3, v3, 1);
+
+ uint64_t dest_mask = 0;
+
+ // words 0, 1
+ __m128i x = _mm_load_si128(p);
+ dest_mask = cmp_and_mask_3(x, x1, x2, x3);
+ p = (__m128i*) (&array[start+2]);
+
+ // words 2, 3
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask_3(x, x1, x2, x3) << 2);
+ p = (__m128i*) (&array[start+4]);
+
+ // words 4, 5
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask_3(x, x1, x2, x3) << 4);
+ p = (__m128i*) (&array[start+6]);
+
+ // words 6, 7
+ x = _mm_load_si128(p);
+ dest_mask |= (cmp_and_mask_3(x, x1, x2, x3) << 6);
+
+ return dest_mask >> offset;
+}
+
+
+#else
+
+
+
+uint64_t lineMask64(uint64_t* array, int start, uint64_t value) {
+ uint64_t* p = array + start;
+ uint64_t m = 0;
+ int offset = start & 0x7;
+
+ switch (offset) {
+ case 0: m |= mask(*p++, value) & 0x1;
+ case 1: m |= mask(*p++, value) & 0x2;
+ case 2: m |= mask(*p++, value) & 0x4;
+ case 3: m |= mask(*p++, value) & 0x8;
+ case 4: m |= mask(*p++, value) & 0x10;
+ case 5: m |= mask(*p++, value) & 0x20;
+ case 6: m |= mask(*p++, value) & 0x40;
+ case 7: m |= mask(*p++, value) & 0x80;
+ }
+
+ return m >> offset;
+}
+
+
+uint64_t lineMask64_2(uint64_t* array, int start, uint64_t x1, uint64_t x2) {
+ uint64_t* p = array + start;
+ uint64_t m = 0;
+ int offset = start & 0x7;
+
+ switch (offset) {
+ case 0: m |= (mask(*p, x1) | mask(*p, x2)) & 0x1; ++p;
+ case 1: m |= (mask(*p, x1) | mask(*p, x2)) & 0x2; ++p;
+ case 2: m |= (mask(*p, x1) | mask(*p, x2)) & 0x4; ++p;
+ case 3: m |= (mask(*p, x1) | mask(*p, x2)) & 0x8; ++p;
+ case 4: m |= (mask(*p, x1) | mask(*p, x2)) & 0x10; ++p;
+ case 5: m |= (mask(*p, x1) | mask(*p, x2)) & 0x20; ++p;
+ case 6: m |= (mask(*p, x1) | mask(*p, x2)) & 0x40; ++p;
+ case 7: m |= (mask(*p, x1) | mask(*p, x2)) & 0x80; ++p;
+ }
+
+ return m >> offset;
+}
+
+
+uint64_t lineMask64_3(uint64_t* array, int start,
+ uint64_t x1, uint64_t x2, uint64_t x3) {
+ uint64_t* p = array + start;
+ uint64_t m = 0;
+ int offset = start & 0x7;
+
+ switch (offset) {
+ case 0: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x1; ++p;
+ case 1: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x2; ++p;
+ case 2: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x4; ++p;
+ case 3: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x8; ++p;
+ case 4: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x10; ++p;
+ case 5: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x20; ++p;
+ case 6: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x40; ++p;
+ case 7: m |= (mask(*p, x1) | mask(*p, x2) | mask(*p, x3)) & 0x80; ++p;
+ }
+
+ return m >> offset;
+}
+
+
+#endif // USE_SSE_4_1
+
+
+int64_t lineResult64(int64_t m, int64_t start) {
+ int p = firstBitSet((int)m);
+ int64_t mm = (int64_t) mask(p, -1);
+ return mm | (~mm & (start + p));
+}
+
+
+int lineSearch64(uint64_t* array, int start, uint64_t value) {
+ uint64_t m = lineMask64(array, start, value);
+ return lineResult64((int)m, start);
+}
+
+
+int lineSearch64_2(uint64_t* array, int start, uint64_t x1, uint64_t x2) {
+ uint64_t m = lineMask64_2(array, start, x1, x2);
+ return lineResult64((int)m, start);
+}
+
+
+int lineSearch64_3(uint64_t* array, int start,
+ uint64_t x1, uint64_t x2, uint64_t x3) {
+ uint64_t m = lineMask64_3(array, start, x1, x2, x3);
+ return lineResult64((int)m, start);
+}
+
9 haddock.sh
@@ -0,0 +1,9 @@
+#!/bin/sh
+
+set -x
+
+rm -Rf dist/doc
+
+HADDOCK_OPTS='--html-location=http://hackage.haskell.org/packages/archive/$pkg/latest/doc/html'
+
+cabal haddock $HADDOCK_OPTS --hyperlink-source $@
191 hashtables.cabal
@@ -0,0 +1,191 @@
+Name: hashtables
+Version: 1.0.0.0
+Synopsis: Mutable hash tables in the ST monad
+Homepage: http://github.com/gregorycollins/hashtables
+License: BSD3
+License-file: LICENSE
+Author: Gregory Collins
+Maintainer: greg@gregorycollins.net
+Copyright: (c) 2011, Google, Inc.
+Category: Data
+Build-type: Simple
+Cabal-version: >= 1.8
+
+Description:
+ This package provides a couple of different implementations of mutable hash
+ tables in the ST monad, as well as a typeclass abstracting their common
+ operations, and a set of wrappers to use the hash tables in the IO monad.
+ .
+ /QUICK START/: documentation for the hash table operations is provided in the
+ "Data.HashTable.Class" module, and the IO wrappers (which most users will
+ probably prefer) are located in the "Data.HashTable.IO" module.
+ .
+ This package currently contains three hash table implementations:
+ .
+ 1. "Data.HashTable.ST.Basic" contains a basic open-addressing hash table
+ using linear probing as the collision strategy. On a pure speed basis it
+ should currently be the fastest available Haskell hash table
+ implementation for lookups, although it has a higher memory overhead
+ than the other tables and can suffer from long delays when the table is
+ resized because all of the elements in the table need to be rehashed.
+ .
+ 2. "Data.HashTable.ST.Cuckoo" contains an implementation of \"cuckoo
+ hashing\" as introduced by Pagh and Rodler in 2001 (see
+ <http://en.wikipedia.org/wiki/Cuckoo_hashing>). Cuckoo hashing has
+ worst-case /O(1)/ lookups and can reach a high \"load factor\", in which
+ the table can perform acceptably well even when more than 90% full.
+ Randomized testing shows this implementation of cuckoo hashing to be
+ slightly faster on insert and slightly slower on lookup than
+ "Data.Hashtable.ST.Basic", while being more space efficient by about a
+ half-word per key-value mapping. Cuckoo hashing, like the basic hash
+ table implementation using linear probing, can suffer from long delays
+ when the table is resized.
+ .
+ 3. "Data.HashTable.ST.Linear" contains a linear hash table (see
+ <http://en.wikipedia.org/wiki/Linear_hashing>), which trades some insert
+ and lookup performance for higher space efficiency and much shorter
+ delays when expanding the table. In most cases, benchmarks show this
+ table to be currently slightly faster than @Data.HashTable@ from the
+ Haskell base library.
+ .
+ It is recommended to create a concrete type alias in your code when using this
+ package, i.e.:
+ .
+ > import qualified Data.HashTable.IO as H
+ >
+ > type HashTable k v = H.BasicHashTable k v
+ >
+ > foo :: IO (HashTable Int Int)
+ > foo = do
+ > ht <- H.new
+ > H.insert ht 1 1
+ > return ht
+ .
+ Firstly, this makes it easy to switch to a different hash table implementation,
+ and secondly, using a concrete type rather than leaving your functions abstract
+ in the HashTable class should allow GHC to optimize away the typeclass
+ dictionaries.
+ .
+ This package accepts a couple of different cabal flags:
+ .
+ * @unsafe-tricks@, default /ON/. If this flag is enabled, we use some
+ unsafe GHC-specific tricks to save indirections (namely @unsafeCoerce#@
+ and @reallyUnsafePtrEquality#@. These techniques rely on assumptions
+ about the behaviour of the GHC runtime system and, although they've been
+ tested and should be safe under normal conditions, are slightly
+ dangerous. Caveat emptor. In particular, these techniques are
+ incompatible with HPC code coverage reports.
+ .
+ * @sse41@, default /OFF/. If this flag is enabled, we use some SSE 4.1
+ instructions (see <http://en.wikipedia.org/wiki/SSE4>, first available on
+ Intel Core 2 processors) to speed up cache-line searches for cuckoo
+ hashing.
+ .
+ * @bounds-checking@, default /OFF/. If this flag is enabled, array accesses
+ are bounds-checked.
+ .
+ * @debug@, default /OFF/. If turned on, we'll rudely spew debug output to
+ stdout.
+ .
+ * @portable@, default /OFF/. If this flag is enabled, we use only pure
+ Haskell code and try not to use unportable GHC extensions. Turning this
+ flag on forces @unsafe-tricks@ and @sse41@ /OFF/.
+ .
+ This package has been tested with GHC 7.0.3, on:
+ .
+ * a MacBook Pro running Snow Leopard with an Intel Core i5 processor,
+ running GHC 7.0.3 in 64-bit mode.
+ .
+ * an Arch Linux desktop with an AMD Phenom II X4 940 quad-core processor.
+ .
+ * a MacBook Pro running Snow Leopard with an Intel Core 2 Duo processor,
+ running GHC 6.12.3 in 32-bit mode.
+ .
+ Please send bug reports to
+ <https://github.com/gregorycollins/hashtables/issues>.
+
+Extra-Source-Files:
+ README.md,
+ haddock.sh,
+ test/compute-overhead/ComputeOverhead.hs,
+ test/hashtables-test.cabal,
+ test/runTestsAndCoverage.sh,
+ test/runTestsNoCoverage.sh,
+ test/suite/Data/HashTable/Test/Common.hs,
+ test/suite/TestSuite.hs
+
+
+------------------------------------------------------------------------------
+Flag unsafe-tricks
+ Description: turn on unsafe GHC tricks
+ Default: True
+
+Flag bounds-checking
+ Description: if on, use bounds-checking array accesses
+ Default: False
+
+Flag debug
+ Description: if on, spew debugging output to stdout
+ Default: False
+
+Flag sse41
+ Description: if on, use SSE 4.1 extensions to search cache lines very
+ efficiently. The portable flag forces this off.
+ Default: False
+
+Flag portable
+ Description: if on, use only pure Haskell code and no GHC extensions.
+ Default: False
+
+
+Library
+ hs-source-dirs: src
+
+ if !flag(portable)
+ C-sources: cbits/cfuncs.c
+
+ Exposed-modules: Data.HashTable.Class,
+ Data.HashTable.IO,
+ Data.HashTable.ST.Basic,
+ Data.HashTable.ST.Cuckoo,
+ Data.HashTable.ST.Linear
+
+ Other-modules: Data.HashTable.Internal.Array,
+ Data.HashTable.Internal.IntArray,
+ Data.HashTable.Internal.CacheLine,
+ Data.HashTable.Internal.CheapPseudoRandomBitStream,
+ Data.HashTable.Internal.UnsafeTricks,
+ Data.HashTable.Internal.Utils,
+ Data.HashTable.Internal.Linear.Bucket
+
+ Build-depends: base >= 4 && <5,
+ hashable >= 1.1 && <2,
+ primitive,
+ vector >= 0.7
+
+
+ if flag(portable)
+ cpp-options: -DNO_C_SEARCH
+
+ if !flag(portable) && flag(unsafe-tricks) && impl(ghc)
+ build-depends: ghc-prim
+ cpp-options = -DUNSAFETRICKS
+
+ if flag(debug)
+ cpp-options: -DDEBUG
+
+ if flag(bounds-checking)
+ cpp-options: -DBOUNDS_CHECKING
+
+ if flag(sse41) && !flag(portable)
+ cc-options: -DUSE_SSE_4_1 -msse4.1
+ cpp-options: -DUSE_SSE_4_1
+
+ ghc-prof-options: -prof -auto-all
+
+ if impl(ghc >= 6.12.0)
+ ghc-options: -Wall -fwarn-tabs -funbox-strict-fields -O2
+ -fno-warn-unused-do-bind
+ else
+ ghc-options: -Wall -fwarn-tabs -funbox-strict-fields -O2
+
115 src/Data/HashTable/Class.hs
@@ -0,0 +1,115 @@
+{-# LANGUAGE BangPatterns #-}
+
+-- | This module contains a 'HashTable' typeclass for the hash table
+-- implementations in this package. This allows you to provide functions which
+-- will work for any hash table implementation in this collection.
+--
+-- It is recommended to create a concrete type alias in your code when using this
+-- package, i.e.:
+--
+-- > import qualified Data.HashTable.IO as H
+-- >
+-- > type HashTable k v = H.BasicHashTable k v
+-- >
+-- > foo :: IO (HashTable Int Int)
+-- > foo = do
+-- > ht <- H.new
+-- > H.insert ht 1 1
+-- > return ht
+--
+-- or
+--
+-- > import qualified Data.HashTable.ST.Cuckoo as C
+-- > import qualified Data.HashTable.Class as H
+-- >
+-- > type HashTable s k v = C.HashTable s k v
+-- >
+-- > foo :: ST s (HashTable s k v)
+-- > foo = do
+-- > ht <- H.new
+-- > H.insert ht 1 1
+-- > return ht
+--
+-- Firstly, this makes it easy to switch to a different hash table
+-- implementation, and secondly, using a concrete type rather than leaving your
+-- functions abstract in the 'HashTable' class should allow GHC to optimize
+-- away the typeclass dictionaries.
+--
+-- Note that the functions in this typeclass are in the 'ST' monad; if you want
+-- hash tables in 'IO', use the convenience wrappers in "Data.HashTable.IO".
+--
+module Data.HashTable.Class
+ ( HashTable(..)
+ , fromList
+ , toList
+ ) where
+
+
+import Control.Monad.ST
+import Data.Hashable
+import Prelude hiding (mapM_)
+
+-- | A typeclass for hash tables in the 'ST' monad. The operations on these
+-- hash tables are typically both key- and value-strict.
+class HashTable h where
+ -- | Creates a new, default-sized hash table. /O(1)/.
+ new :: ST s (h s k v)
+
+ -- | Creates a new hash table sized to hold @n@ elements. /O(n)/.
+ newSized :: Int -> ST s (h s k v)
+
+ -- | Inserts a key/value mapping into a hash table, replacing any existing
+ -- mapping for that key.
+ --
+ -- /O(n)/ worst case, /O(1)/ amortized.
+ insert :: (Eq k, Hashable k) => h s k v -> k -> v -> ST s ()
+
+ -- | Deletes a key-value mapping from a hash table. /O(n)/ worst case,
+ -- /O(1)/ amortized.
+ delete :: (Eq k, Hashable k) => h s k v -> k -> ST s ()
+
+ -- | Looks up a key-value mapping in a hash table. /O(n)/ worst case,
+ -- (/O(1)/ for cuckoo hash), /O(1)/ amortized.
+ lookup :: (Eq k, Hashable k) => h s k v -> k -> ST s (Maybe v)
+
+ -- | A strict fold over the key-value records of a hash table in the 'ST'
+ -- monad. /O(n)/.
+ foldM :: (a -> (k,v) -> ST s a) -> a -> h s k v -> ST s a
+
+ -- | A side-effecting map over the key-value records of a hash
+ -- table. /O(n)/.
+ mapM_ :: ((k,v) -> ST s b) -> h s k v -> ST s ()
+
+ -- | Computes the overhead (in words) per key-value mapping. Used for
+ -- debugging, etc; time complexity depends on the underlying hash table
+ -- implementation. /O(n)/.
+ computeOverhead :: h s k v -> ST s Double
+
+
+------------------------------------------------------------------------------
+-- | Create a hash table from a list of key-value pairs. /O(n)/.
+fromList :: (HashTable h, Eq k, Hashable k) => [(k,v)] -> ST s (h s k v)
+fromList l = do
+ ht <- newSized (length l)
+ go ht l
+
+ where
+ go ht = go'
+ where
+ go' [] = return ht
+ go' ((!k,!v):xs) = do
+ insert ht k v
+ go' xs
+{-# INLINE fromList #-}
+
+
+------------------------------------------------------------------------------
+-- | Given a hash table, produce a list of key-value pairs. /O(n)/.
+toList :: (HashTable h) => h s k v -> ST s [(k,v)]
+toList ht = do
+ l <- foldM f [] ht
+ return l
+
+ where
+ f !l !t = return (t:l)
+{-# INLINE toList #-}
217 src/Data/HashTable/IO.hs
@@ -0,0 +1,217 @@
+{-# LANGUAGE BangPatterns #-}
+{-# LANGUAGE EmptyDataDecls #-}
+
+-- | This module provides wrappers in 'IO' around the functions from
+-- "Data.HashTable.Class".
+--
+-- This module exports three concrete hash table types, one for each hash table
+-- implementation in this package:
+--
+-- > type BasicHashTable k v = IOHashTable (B.HashTable) k v
+-- > type CuckooHashTable k v = IOHashTable (Cu.HashTable) k v
+-- > type LinearHashTable k v = IOHashTable (L.HashTable) k v
+--
+-- The 'IOHashTable' type can be thought of as a wrapper around a concrete
+-- hashtable type, which sets the 'ST' monad state type to 'PrimState' 'IO',
+-- a.k.a. 'RealWorld':
+--
+-- > type IOHashTable tabletype k v = tabletype (PrimState IO) k v
+--
+-- This module provides 'stToIO' wrappers around the hashtable functions (which
+-- are in 'ST') to make it convenient to use them in 'IO'. It is intended to be
+-- imported qualified and used with a user-defined type alias, i.e.:
+--
+-- > import qualified Data.HashTable.IO as H
+-- >
+-- > type HashTable k v = H.CuckooHashTable k v
+-- >
+-- > foo :: IO (HashTable Int Int)
+-- > foo = do
+-- > ht <- H.new
+-- > H.insert ht 1 1
+-- > return ht
+--
+-- Essentially, anywhere you see @'IOHashTable' h k v@ in the type signatures
+-- below, you can plug in any of @'BasicHashTable' k v@, @'CuckooHashTable' k
+-- v@, or @'LinearHashTable' k v@.
+--
+module Data.HashTable.IO
+ ( BasicHashTable
+ , CuckooHashTable
+ , LinearHashTable
+ , IOHashTable
+ , new
+ , newSized
+ , insert
+ , delete
+ , lookup
+ , fromList
+ , toList
+ , mapM_
+ , foldM
+ , computeOverhead
+ ) where
+
+
+------------------------------------------------------------------------------
+import Control.Monad.Primitive (PrimState)
+import Control.Monad.ST
+import Data.Hashable (Hashable)
+import qualified Data.HashTable.Class as C
+import Prelude hiding (lookup, mapM_)
+
+------------------------------------------------------------------------------
+import qualified Data.HashTable.ST.Basic as B
+import qualified Data.HashTable.ST.Cuckoo as Cu
+import qualified Data.HashTable.ST.Linear as L
+
+
+------------------------------------------------------------------------------
+-- | A type alias for a basic open addressing hash table using linear
+-- probing. See "Data.HashTable.ST.Basic".
+type BasicHashTable k v = IOHashTable (B.HashTable) k v
+
+-- | A type alias for the cuckoo hash table. See "Data.HashTable.ST.Cuckoo".
+type CuckooHashTable k v = IOHashTable (Cu.HashTable) k v
+
+-- | A type alias for the linear hash table. See "Data.HashTable.ST.Linear".
+type LinearHashTable k v = IOHashTable (L.HashTable) k v
+
+
+------------------------------------------------------------------------------
+-- | A type alias for our hash tables, which run in 'ST', to set the state
+-- token type to 'PrimState' 'IO' (aka 'RealWorld') so that we can use them in
+-- 'IO'.
+type IOHashTable tabletype k v = tabletype (PrimState IO) k v
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in "Data.HashTable.Class#v:new".
+new :: C.HashTable h => IO (IOHashTable h k v)
+new = stToIO C.new
+{-# INLINE new #-}
+{-# SPECIALIZE INLINE new :: IO (BasicHashTable k v) #-}
+{-# SPECIALIZE INLINE new :: IO (LinearHashTable k v) #-}
+{-# SPECIALIZE INLINE new :: IO (CuckooHashTable k v) #-}
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in
+-- "Data.HashTable.Class#v:newSized".
+newSized :: C.HashTable h => Int -> IO (IOHashTable h k v)
+newSized = stToIO . C.newSized
+{-# INLINE newSized #-}
+{-# SPECIALIZE INLINE newSized :: Int -> IO (BasicHashTable k v) #-}
+{-# SPECIALIZE INLINE newSized :: Int -> IO (LinearHashTable k v) #-}
+{-# SPECIALIZE INLINE newSized :: Int -> IO (CuckooHashTable k v) #-}
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in "Data.HashTable.Class#v:update".
+insert :: (C.HashTable h, Eq k, Hashable k) =>
+ IOHashTable h k v -> k -> v -> IO ()
+insert h k v = stToIO $ C.insert h k v
+{-# INLINE insert #-}
+{-# SPECIALIZE INLINE insert :: (Eq k, Hashable k) =>
+ BasicHashTable k v -> k -> v -> IO () #-}
+{-# SPECIALIZE INLINE insert :: (Eq k, Hashable k) =>
+ LinearHashTable k v -> k -> v -> IO () #-}
+{-# SPECIALIZE INLINE insert :: (Eq k, Hashable k) =>
+ CuckooHashTable k v -> k -> v -> IO () #-}
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in "Data.HashTable.Class#v:delete".
+delete :: (C.HashTable h, Eq k, Hashable k) =>
+ IOHashTable h k v -> k -> IO ()
+delete h k = stToIO $ C.delete h k
+{-# INLINE delete #-}
+{-# SPECIALIZE INLINE delete :: (Eq k, Hashable k) =>
+ BasicHashTable k v -> k -> IO () #-}
+{-# SPECIALIZE INLINE delete :: (Eq k, Hashable k) =>
+ LinearHashTable k v -> k -> IO () #-}
+{-# SPECIALIZE INLINE delete :: (Eq k, Hashable k) =>
+ CuckooHashTable k v -> k -> IO () #-}
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in "Data.HashTable.Class#v:lookup".
+lookup :: (C.HashTable h, Eq k, Hashable k) =>
+ IOHashTable h k v -> k -> IO (Maybe v)
+lookup h k = stToIO $ C.lookup h k
+{-# INLINE lookup #-}
+{-# SPECIALIZE INLINE lookup :: (Eq k, Hashable k) =>
+ BasicHashTable k v -> k -> IO (Maybe v) #-}
+{-# SPECIALIZE INLINE lookup :: (Eq k, Hashable k) =>
+ LinearHashTable k v -> k -> IO (Maybe v) #-}
+{-# SPECIALIZE INLINE lookup :: (Eq k, Hashable k) =>
+ CuckooHashTable k v -> k -> IO (Maybe v) #-}
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in
+-- "Data.HashTable.Class#v:fromList".
+fromList :: (C.HashTable h, Eq k, Hashable k) =>
+ [(k,v)] -> IO (IOHashTable h k v)
+fromList = stToIO . C.fromList
+{-# INLINE fromList #-}
+{-# SPECIALIZE INLINE fromList :: (Eq k, Hashable k) =>
+ [(k,v)] -> IO (BasicHashTable k v) #-}
+{-# SPECIALIZE INLINE fromList :: (Eq k, Hashable k) =>
+ [(k,v)] -> IO (LinearHashTable k v) #-}
+{-# SPECIALIZE INLINE fromList :: (Eq k, Hashable k) =>
+ [(k,v)] -> IO (CuckooHashTable k v) #-}
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in "Data.HashTable.Class#v:toList".
+toList :: (C.HashTable h, Eq k, Hashable k) =>
+ IOHashTable h k v -> IO [(k,v)]
+toList = stToIO . C.toList
+{-# INLINE toList #-}
+{-# SPECIALIZE INLINE toList :: (Eq k, Hashable k) =>
+ BasicHashTable k v -> IO [(k,v)] #-}
+{-# SPECIALIZE INLINE toList :: (Eq k, Hashable k) =>
+ LinearHashTable k v -> IO [(k,v)] #-}
+{-# SPECIALIZE INLINE toList :: (Eq k, Hashable k) =>
+ CuckooHashTable k v -> IO [(k,v)] #-}
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in "Data.HashTable.Class#v:foldM".
+foldM :: (C.HashTable h) =>
+ (a -> (k,v) -> IO a)
+ -> a
+ -> IOHashTable h k v -> IO a
+foldM f seed ht = stToIO $ C.foldM f' seed ht
+ where
+ f' !i !t = unsafeIOToST $ f i t
+{-# INLINE foldM #-}
+{-# SPECIALIZE INLINE foldM :: (a -> (k,v) -> IO a) -> a
+ -> BasicHashTable k v -> IO a #-}
+{-# SPECIALIZE INLINE foldM :: (a -> (k,v) -> IO a) -> a
+ -> LinearHashTable k v -> IO a #-}
+{-# SPECIALIZE INLINE foldM :: (a -> (k,v) -> IO a) -> a
+ -> CuckooHashTable k v -> IO a #-}
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in "Data.HashTable.Class#v:mapM_".
+mapM_ :: (C.HashTable h) => ((k,v) -> IO a) -> IOHashTable h k v -> IO ()
+mapM_ f ht = stToIO $ C.mapM_ f' ht
+ where
+ f' = unsafeIOToST . f
+{-# INLINE mapM_ #-}
+{-# SPECIALIZE INLINE mapM_ :: ((k,v) -> IO a) -> BasicHashTable k v
+ -> IO () #-}
+{-# SPECIALIZE INLINE mapM_ :: ((k,v) -> IO a) -> LinearHashTable k v
+ -> IO () #-}
+{-# SPECIALIZE INLINE mapM_ :: ((k,v) -> IO a) -> CuckooHashTable k v
+ -> IO () #-}
+
+
+------------------------------------------------------------------------------
+-- | See the documentation for this function in
+-- "Data.HashTable.Class#v:computeOverhead".
+computeOverhead :: (C.HashTable h) => IOHashTable h k v -> IO Double
+computeOverhead = stToIO . C.computeOverhead
+{-# INLINE computeOverhead #-}
45 src/Data/HashTable/Internal/Array.hs
@@ -0,0 +1,45 @@
+{-# LANGUAGE CPP #-}
+
+module Data.HashTable.Internal.Array
+ ( MutableArray
+ , newArray
+ , readArray
+ , writeArray
+ ) where
+
+
+import Control.Monad.ST
+#ifdef BOUNDS_CHECKING
+import qualified Data.Vector.Mutable as M
+import Data.Vector.Mutable (MVector)
+#else
+import qualified Data.Primitive.Array as M
+import Data.Primitive.Array (MutableArray)
+#endif
+
+
+#ifdef BOUNDS_CHECKING
+
+type MutableArray s a = MVector s a
+
+newArray :: Int -> a -> ST s (MutableArray s a)
+newArray = M.replicate
+
+readArray :: MutableArray s a -> Int -> ST s a
+readArray = M.read
+
+writeArray :: MutableArray s a -> Int -> a -> ST s ()
+writeArray = M.write
+
+#else
+
+newArray :: Int -> a -> ST s (MutableArray s a)
+newArray = M.newArray
+
+readArray :: MutableArray s a -> Int -> ST s a
+readArray = M.readArray
+
+writeArray :: MutableArray s a -> Int -> a -> ST s ()
+writeArray = M.writeArray
+
+#endif
843 src/Data/HashTable/Internal/CacheLine.hs
@@ -0,0 +1,843 @@
+{-# LANGUAGE BangPatterns #-}
+{-# LANGUAGE CPP #-}
+{-# LANGUAGE ForeignFunctionInterface #-}
+{-# LANGUAGE MagicHash #-}
+
+module Data.HashTable.Internal.CacheLine
+ ( cacheLineSearch
+ , cacheLineSearch2
+ , cacheLineSearch3
+ , forwardSearch2
+ , forwardSearch3
+ , isCacheLineAligned
+ , advanceByCacheLineSize
+ , prefetchRead
+ , prefetchWrite
+ , bl_abs#
+ , sign#
+ , mask#
+ , maskw#
+ ) where
+
+import Control.Monad.ST
+import Data.HashTable.Internal.IntArray (IntArray)
+import qualified Data.HashTable.Internal.IntArray as M
+
+#ifndef NO_C_SEARCH
+import Foreign.C.Types
+#else
+import Data.Bits
+import Data.Int
+import qualified Data.Vector.Unboxed as U
+import GHC.Int
+#endif
+
+import Data.HashTable.Internal.Utils
+import GHC.Exts
+
+
+{-# INLINE prefetchRead #-}
+{-# INLINE prefetchWrite #-}
+prefetchRead :: IntArray s -> Int -> ST s ()
+prefetchWrite :: IntArray s -> Int -> ST s ()
+
+#ifndef NO_C_SEARCH
+foreign import ccall unsafe "lineSearch32"
+ c_lineSearch32 :: Ptr a -> CInt -> CUInt -> IO Int
+
+foreign import ccall unsafe "lineSearch64"
+ c_lineSearch64 :: Ptr a -> CInt -> CULong -> IO Int
+
+foreign import ccall unsafe "lineSearch32_2"
+ c_lineSearch32_2 :: Ptr a -> CInt -> CUInt -> CUInt -> IO Int
+
+foreign import ccall unsafe "lineSearch64_2"
+ c_lineSearch64_2 :: Ptr a -> CInt -> CULong -> CULong -> IO Int
+
+foreign import ccall unsafe "lineSearch32_3"
+ c_lineSearch32_3 :: Ptr a -> CInt -> CUInt -> CUInt -> CUInt -> IO Int
+
+foreign import ccall unsafe "lineSearch64_3"
+ c_lineSearch64_3 :: Ptr a -> CInt -> CULong -> CULong -> CULong -> IO Int
+
+foreign import ccall unsafe "forwardSearch32_2"
+ c_forwardSearch32_2 :: Ptr a -> CInt -> CInt -> CUInt -> CUInt -> IO Int
+
+foreign import ccall unsafe "forwardSearch32_3"
+ c_forwardSearch32_3 :: Ptr a -> CInt -> CInt -> CUInt -> CUInt -> CUInt
+ -> IO Int
+
+foreign import ccall unsafe "forwardSearch64_2"
+ c_forwardSearch64_2 :: Ptr a -> CInt -> CInt -> CULong -> CULong -> IO Int
+
+foreign import ccall unsafe "forwardSearch64_3"
+ c_forwardSearch64_3 :: Ptr a -> CInt -> CInt -> CULong -> CULong -> CULong
+ -> IO Int
+
+foreign import ccall unsafe "prefetchCacheLine32_read"
+ prefetchCacheLine32_read :: Ptr a -> CInt -> IO ()
+
+foreign import ccall unsafe "prefetchCacheLine64_read"
+ prefetchCacheLine64_read :: Ptr a -> CInt -> IO ()
+
+foreign import ccall unsafe "prefetchCacheLine32_write"
+ prefetchCacheLine32_write :: Ptr a -> CInt -> IO ()
+
+foreign import ccall unsafe "prefetchCacheLine64_write"
+ prefetchCacheLine64_write :: Ptr a -> CInt -> IO ()
+
+
+fI :: (Num b, Integral a) => a -> b
+fI = fromIntegral
+
+
+prefetchRead a i = unsafeIOToST c
+ where
+ v = M.toPtr a
+ x = fI i
+ c32 = prefetchCacheLine32_read v x
+ c64 = prefetchCacheLine64_read v x
+ c = if wordSize == 32 then c32 else c64
+
+
+prefetchWrite a i = unsafeIOToST c
+ where
+ v = M.toPtr a
+ x = fI i
+ c32 = prefetchCacheLine32_write v x
+ c64 = prefetchCacheLine64_write v x
+ c = if wordSize == 32 then c32 else c64
+
+
+{-# INLINE forwardSearch2 #-}
+forwardSearch2 :: IntArray s -> Int -> Int -> Int -> Int -> ST s Int
+forwardSearch2 !vec !start !end !x1 !x2 =
+ unsafeIOToST c
+ where
+ c32 = c_forwardSearch32_2 (M.toPtr vec) (fI start) (fI end) (fI x1) (fI x2)
+ c64 = c_forwardSearch64_2 (M.toPtr vec) (fI start) (fI end) (fI x1) (fI x2)
+ c = if wordSize == 32 then c32 else c64
+
+
+{-# INLINE forwardSearch3 #-}
+forwardSearch3 :: IntArray s -> Int -> Int -> Int -> Int -> Int -> ST s Int
+forwardSearch3 !vec !start !end !x1 !x2 !x3 =
+ unsafeIOToST c
+ where
+ c32 = c_forwardSearch32_3 (M.toPtr vec) (fI start) (fI end)
+ (fI x1) (fI x2) (fI x3)
+ c64 = c_forwardSearch64_3 (M.toPtr vec) (fI start) (fI end)
+ (fI x1) (fI x2) (fI x3)
+ c = if wordSize == 32 then c32 else c64
+
+
+{-# INLINE lineSearch #-}
+lineSearch :: IntArray s -> Int -> Int -> ST s Int
+lineSearch !vec !start !value =
+ unsafeIOToST c
+ where
+ c32 = c_lineSearch32 (M.toPtr vec) (fI start) (fI value)
+ c64 = c_lineSearch64 (M.toPtr vec) (fI start) (fI value)
+ c = if wordSize == 32 then c32 else c64
+
+{-# INLINE lineSearch2 #-}
+lineSearch2 :: IntArray s -> Int -> Int -> Int -> ST s Int
+lineSearch2 !vec !start !x1 !x2 =
+ unsafeIOToST c
+ where
+ c32 = c_lineSearch32_2 (M.toPtr vec) (fI start) (fI x1) (fI x2)
+ c64 = c_lineSearch64_2 (M.toPtr vec) (fI start) (fI x1) (fI x2)
+ c = if wordSize == 32 then c32 else c64
+
+{-# INLINE lineSearch3 #-}
+lineSearch3 :: IntArray s -> Int -> Int -> Int -> Int -> ST s Int
+lineSearch3 !vec !start !x1 !x2 !x3 =
+ unsafeIOToST c
+ where
+ c32 = c_lineSearch32_3 (M.toPtr vec) (fI start) (fI x1) (fI x2) (fI x3)
+ c64 = c_lineSearch64_3 (M.toPtr vec) (fI start) (fI x1) (fI x2) (fI x3)
+ c = if wordSize == 32 then c32 else c64
+#endif
+
+{-# INLINE advanceByCacheLineSize #-}
+advanceByCacheLineSize :: Int -> Int -> Int
+advanceByCacheLineSize !(I# start0#) !(I# vecSize#) = out
+ where
+ !(I# clm#) = cacheLineIntMask
+ !clmm# = not# (int2Word# clm#)
+ !start# = word2Int# (clmm# `and#` int2Word# start0#)
+ !(I# nw#) = numWordsInCacheLine
+ !start'# = start# +# nw#
+ !s# = sign# (vecSize# -# start'# -# 1#)
+ !m# = not# (int2Word# s#)
+ !r# = int2Word# start'# `and#` m#
+ !out = I# (word2Int# r#)
+
+
+{-# INLINE isCacheLineAligned #-}
+isCacheLineAligned :: Int -> Bool
+isCacheLineAligned (I# x#) = r# ==# 0#
+ where
+ !(I# m#) = cacheLineIntMask
+ !mw# = int2Word# m#
+ !w# = int2Word# x#
+ !r# = word2Int# (mw# `and#` w#)
+
+
+{-# INLINE sign# #-}
+-- | Returns 0 if x is positive, -1 otherwise
+sign# :: Int# -> Int#
+sign# !x# = x# `uncheckedIShiftRA#` wordSizeMinus1#
+ where
+ !(I# wordSizeMinus1#) = wordSize-1
+
+
+{-# INLINE bl_abs# #-}
+-- | Abs of an integer, branchless
+bl_abs# :: Int# -> Int#
+bl_abs# !x# = word2Int# r#
+ where
+ !m# = sign# x#
+ !r# = (int2Word# (m# +# x#)) `xor#` int2Word# m#
+
+
+{-# INLINE mask# #-}
+-- | Returns 0xfff..fff (aka -1) if a# == b#, 0 otherwise.
+mask# :: Int# -> Int# -> Int#
+mask# !a# !b# = dest#
+ where
+ !d# = a# -# b#
+ !r# = bl_abs# d# -# 1#
+ !dest# = sign# r#
+
+
+{- note: this code should be:
+
+mask# :: Int# -> Int# -> Int#
+mask# !a# !b# = let !(I# z#) = fromEnum (a# ==# b#)
+ !q# = negateInt# z#
+ in q#
+
+but GHC doesn't properly optimize this as straight-line code at the moment.
+
+-}
+
+
+{-# INLINE maskw# #-}
+maskw# :: Int# -> Int# -> Word#
+maskw# !a# !b# = int2Word# (mask# a# b#)
+
+
+#ifdef NO_C_SEARCH
+prefetchRead _ _ = return ()
+prefetchWrite _ _ = return ()
+
+{-# INLINE forwardSearch2 #-}
+forwardSearch2 :: IntArray s -> Int -> Int -> Int -> Int -> ST s Int
+forwardSearch2 !vec !start !end !x1 !x2 = go start
+ where
+ next !i = let !j = i+1
+ in if j == end then 0 else j
+
+ go !i = do
+ h <- M.readArray vec i
+ if h == x1 || h == x2
+ then return i
+ else go $ next i
+
+
+{-# INLINE forwardSearch3 #-}
+forwardSearch3 :: IntArray s -> Int -> Int -> Int -> Int -> Int -> ST s Int
+forwardSearch3 !vec !start !end !x1 !x2 !x3 = go start
+ where
+ next !i = let !j = i+1
+ in if j == end then 0 else j
+
+ go !i = do
+ h <- M.readArray vec i
+ if h == x1 || h == x2 || h == x3
+ then return i
+ else go $ next i
+
+
+deBruijnBitPositions :: U.Vector Int8
+deBruijnBitPositions =
+ U.fromList [
+ 0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
+ 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
+ ]
+
+
+{-# INLINE firstBitSet# #-}
+-- only works with 32-bit values -- ok for us here
+firstBitSet# :: Int# -> Int#
+firstBitSet# i# = word2Int# ((or# zeroCase# posw#))
+ where
+ !zeroCase# = int2Word# (mask# 0# i#)
+ !w# = int2Word# i#
+ !iLowest# = word2Int# (and# w# (int2Word# (negateInt# i#)))
+ !idxW# = uncheckedShiftRL#
+ (narrow32Word# (timesWord# (int2Word# iLowest#)
+ (int2Word# 0x077CB531#)))
+ 27#
+ !idx = I# (word2Int# idxW#)
+ !(I8# pos8#) = U.unsafeIndex deBruijnBitPositions idx
+ !posw# = int2Word# pos8#
+
+#endif
+
+
+------------------------------------------------------------------------------
+-- | Search through a mutable vector for a given int value, cache-line aligned.
+-- If the start index is cache-line aligned, and there is more than a
+-- cache-line's room between the start index and the end of the vector, we will
+-- search the cache-line all at once using an efficient branchless
+-- bit-twiddling technique. Otherwise, we will use a typical loop.
+--
+cacheLineSearch :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+cacheLineSearch !vec !start !value = do
+#ifdef NO_C_SEARCH
+ let !vlen = M.length vec
+ let !st1 = vlen - start
+ let !nvlen = numWordsInCacheLine - st1
+ let adv = (start + cacheLineIntMask) .&. complement cacheLineIntMask
+ let st2 = adv - start
+
+
+ if nvlen > 0 || not (isCacheLineAligned start)
+ then naiveSearch vec start (min st1 st2) value
+ else lineSearch vec start value
+#else
+ lineSearch vec start value
+#endif
+{-# INLINE cacheLineSearch #-}
+
+
+#ifdef NO_C_SEARCH
+-- | Search through a mutable vector for a given int value. The number of
+-- things to search for must be at most the number of things remaining in the
+-- vector.
+naiveSearch :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ number of things to search
+ -> Int -- ^ value to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+naiveSearch !vec !start !nThings !value = go start
+ where
+ !doneIdx = start + nThings
+
+ go !i | i >= doneIdx = return (-1)
+ | otherwise = do
+ x <- M.readArray vec i
+ if x == value then return i else go (i+1)
+{-# INLINE naiveSearch #-}
+
+
+lineResult# :: Word# -- ^ mask
+ -> Int -- ^ start value
+ -> Int
+lineResult# bitmask# (I# start#) = I# (word2Int# rv#)
+ where
+ !p# = firstBitSet# (word2Int# bitmask#)
+ !mm# = maskw# p# (-1#)
+ !nmm# = not# mm#
+ !rv# = mm# `or#` (nmm# `and#` (int2Word# (start# +# p#)))
+{-# INLINE lineResult# #-}
+
+
+lineSearch :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch | wordSize == 32 = lineSearch32
+ | otherwise = lineSearch64
+{-# INLINE lineSearch #-}
+
+
+lineSearch64 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch64 !vec !start !(I# v#) = do
+ (I# x1#) <- M.readArray vec $! start + 0
+ let !p1# = maskw# x1# v# `and#` int2Word# 0x1#
+
+ (I# x2#) <- M.readArray vec $! start + 1
+ let !p2# = p1# `or#` (maskw# x2# v# `and#` int2Word# 0x2#)
+
+ (I# x3#) <- M.readArray vec $! start + 2
+ let !p3# = p2# `or#` (maskw# x3# v# `and#` int2Word# 0x4#)
+
+ (I# x4#) <- M.readArray vec $! start + 3
+ let !p4# = p3# `or#` (maskw# x4# v# `and#` int2Word# 0x8#)
+
+ (I# x5#) <- M.readArray vec $! start + 4
+ let !p5# = p4# `or#` (maskw# x5# v# `and#` int2Word# 0x10#)
+
+ (I# x6#) <- M.readArray vec $! start + 5
+ let !p6# = p5# `or#` (maskw# x6# v# `and#` int2Word# 0x20#)
+
+ (I# x7#) <- M.readArray vec $! start + 6
+ let !p7# = p6# `or#` (maskw# x7# v# `and#` int2Word# 0x40#)
+
+ (I# x8#) <- M.readArray vec $! start + 7
+ let !p8# = p7# `or#` (maskw# x8# v# `and#` int2Word# 0x80#)
+
+ return $! lineResult# p8# start
+{-# INLINE lineSearch64 #-}
+
+
+
+lineSearch32 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch32 !vec !start !(I# v#) = do
+ (I# x1#) <- M.readArray vec $! start + 0
+ let !p1# = maskw# x1# v# `and#` int2Word# 0x1#
+
+ (I# x2#) <- M.readArray vec $! start + 1
+ let !p2# = p1# `or#` (maskw# x2# v# `and#` int2Word# 0x2#)
+
+ (I# x3#) <- M.readArray vec $! start + 2
+ let !p3# = p2# `or#` (maskw# x3# v# `and#` int2Word# 0x4#)
+
+ (I# x4#) <- M.readArray vec $! start + 3
+ let !p4# = p3# `or#` (maskw# x4# v# `and#` int2Word# 0x8#)
+
+ (I# x5#) <- M.readArray vec $! start + 4
+ let !p5# = p4# `or#` (maskw# x5# v# `and#` int2Word# 0x10#)
+
+ (I# x6#) <- M.readArray vec $! start + 5
+ let !p6# = p5# `or#` (maskw# x6# v# `and#` int2Word# 0x20#)
+
+ (I# x7#) <- M.readArray vec $! start + 6
+ let !p7# = p6# `or#` (maskw# x7# v# `and#` int2Word# 0x40#)
+
+ (I# x8#) <- M.readArray vec $! start + 7
+ let !p8# = p7# `or#` (maskw# x8# v# `and#` int2Word# 0x80#)
+
+ (I# x9#) <- M.readArray vec $! start + 8
+ let !p9# = p8# `or#` (maskw# x9# v# `and#` int2Word# 0x100#)
+
+ (I# x10#) <- M.readArray vec $! start + 9
+ let !p10# = p9# `or#` (maskw# x10# v# `and#` int2Word# 0x200#)
+
+ (I# x11#) <- M.readArray vec $! start + 10
+ let !p11# = p10# `or#` (maskw# x11# v# `and#` int2Word# 0x400#)
+
+ (I# x12#) <- M.readArray vec $! start + 11
+ let !p12# = p11# `or#` (maskw# x12# v# `and#` int2Word# 0x800#)
+
+ (I# x13#) <- M.readArray vec $! start + 12
+ let !p13# = p12# `or#` (maskw# x13# v# `and#` int2Word# 0x1000#)
+
+ (I# x14#) <- M.readArray vec $! start + 13
+ let !p14# = p13# `or#` (maskw# x14# v# `and#` int2Word# 0x2000#)
+
+ (I# x15#) <- M.readArray vec $! start + 14
+ let !p15# = p14# `or#` (maskw# x15# v# `and#` int2Word# 0x4000#)
+
+ (I# x16#) <- M.readArray vec $! start + 15
+ let !p16# = p15# `or#` (maskw# x16# v# `and#` int2Word# 0x8000#)
+
+ return $! lineResult# p16# start
+{-# INLINE lineSearch32 #-}
+
+#endif
+
+------------------------------------------------------------------------------
+-- | Search through a mutable vector for one of two given int values,
+-- cache-line aligned. If the start index is cache-line aligned, and there is
+-- more than a cache-line's room between the start index and the end of the
+-- vector, we will search the cache-line all at once using an efficient
+-- branchless bit-twiddling technique. Otherwise, we will use a typical loop.
+--
+cacheLineSearch2 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+cacheLineSearch2 !vec !start !value !value2 = do
+#ifdef NO_C_SEARCH
+ let !vlen = M.length vec
+ let !st1 = vlen - start
+ let !nvlen = numWordsInCacheLine - st1
+ let adv = (start + cacheLineIntMask) .&. complement cacheLineIntMask
+ let st2 = adv - start
+
+ if nvlen > 0 || not (isCacheLineAligned start)
+ then naiveSearch2 vec start (min st1 st2) value value2
+ else lineSearch2 vec start value value2
+#else
+ lineSearch2 vec start value value2
+#endif
+{-# INLINE cacheLineSearch2 #-}
+
+
+#ifdef NO_C_SEARCH
+
+naiveSearch2 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ number of things to search
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+naiveSearch2 !vec !start !nThings !value1 !value2 = go start
+ where
+ !doneIdx = start + nThings
+
+ go !i | i >= doneIdx = return (-1)
+ | otherwise = do
+ x <- M.readArray vec i
+ if x == value1 || x == value2 then return i else go (i+1)
+{-# INLINE naiveSearch2 #-}
+
+
+lineSearch2 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch2 | wordSize == 32 = lineSearch32_2
+ | otherwise = lineSearch64_2
+
+
+
+lineSearch64_2 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch64_2 !vec !start !(I# v#) !(I# v2#) = do
+ (I# x1#) <- M.readArray vec $! start + 0
+ let !p1# = (maskw# x1# v# `or#` maskw# x1# v2#) `and#` int2Word# 0x1#
+
+ (I# x2#) <- M.readArray vec $! start + 1
+ let !p2# = p1# `or#` ((maskw# x2# v# `or#` maskw# x2# v2#)
+ `and#` int2Word# 0x2#)
+
+ (I# x3#) <- M.readArray vec $! start + 2
+ let !p3# = p2# `or#` ((maskw# x3# v# `or#` maskw# x3# v2#)
+ `and#` int2Word# 0x4#)
+
+ (I# x4#) <- M.readArray vec $! start + 3
+ let !p4# = p3# `or#` ((maskw# x4# v# `or#` maskw# x4# v2#)
+ `and#` int2Word# 0x8#)
+
+ (I# x5#) <- M.readArray vec $! start + 4
+ let !p5# = p4# `or#` ((maskw# x5# v# `or#` maskw# x5# v2#)
+ `and#` int2Word# 0x10#)
+
+ (I# x6#) <- M.readArray vec $! start + 5
+ let !p6# = p5# `or#` ((maskw# x6# v# `or#` maskw# x6# v2#)
+ `and#` int2Word# 0x20#)
+
+ (I# x7#) <- M.readArray vec $! start + 6
+ let !p7# = p6# `or#` ((maskw# x7# v# `or#` maskw# x7# v2#)
+ `and#` int2Word# 0x40#)
+
+ (I# x8#) <- M.readArray vec $! start + 7
+ let !p8# = p7# `or#` ((maskw# x8# v# `or#` maskw# x8# v2#)
+ `and#` int2Word# 0x80#)
+
+ return $! lineResult# p8# start
+{-# INLINE lineSearch64_2 #-}
+
+
+lineSearch32_2 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch32_2 !vec !start !(I# v#) !(I# v2#) = do
+ (I# x1#) <- M.readArray vec $! start + 0
+ let !p1# = (maskw# x1# v# `or#` maskw# x1# v2#) `and#` int2Word# 0x1#
+
+ (I# x2#) <- M.readArray vec $! start + 1
+ let !p2# = p1# `or#` ((maskw# x2# v# `or#` maskw# x2# v2#)
+ `and#` int2Word# 0x2#)
+
+ (I# x3#) <- M.readArray vec $! start + 2
+ let !p3# = p2# `or#` ((maskw# x3# v# `or#` maskw# x3# v2#)
+ `and#` int2Word# 0x4#)
+
+ (I# x4#) <- M.readArray vec $! start + 3
+ let !p4# = p3# `or#` ((maskw# x4# v# `or#` maskw# x4# v2#)
+ `and#` int2Word# 0x8#)
+
+ (I# x5#) <- M.readArray vec $! start + 4
+ let !p5# = p4# `or#` ((maskw# x5# v# `or#` maskw# x5# v2#)
+ `and#` int2Word# 0x10#)
+
+ (I# x6#) <- M.readArray vec $! start + 5
+ let !p6# = p5# `or#` ((maskw# x6# v# `or#` maskw# x6# v2#)
+ `and#` int2Word# 0x20#)
+
+ (I# x7#) <- M.readArray vec $! start + 6
+ let !p7# = p6# `or#` ((maskw# x7# v# `or#` maskw# x7# v2#)
+ `and#` int2Word# 0x40#)
+
+ (I# x8#) <- M.readArray vec $! start + 7
+ let !p8# = p7# `or#` ((maskw# x8# v# `or#` maskw# x8# v2#)
+ `and#` int2Word# 0x80#)
+
+ (I# x9#) <- M.readArray vec $! start + 8
+ let !p9# = p8# `or#` ((maskw# x9# v# `or#` maskw# x9# v2#)
+ `and#` int2Word# 0x100#)
+
+ (I# x10#) <- M.readArray vec $! start + 9
+ let !p10# = p9# `or#` ((maskw# x10# v# `or#` maskw# x10# v2#)
+ `and#` int2Word# 0x200#)
+
+ (I# x11#) <- M.readArray vec $! start + 10
+ let !p11# = p10# `or#` ((maskw# x11# v# `or#` maskw# x11# v2#)
+ `and#` int2Word# 0x400#)
+
+ (I# x12#) <- M.readArray vec $! start + 11
+ let !p12# = p11# `or#` ((maskw# x12# v# `or#` maskw# x12# v2#)
+ `and#` int2Word# 0x800#)
+
+ (I# x13#) <- M.readArray vec $! start + 12
+ let !p13# = p12# `or#` ((maskw# x13# v# `or#` maskw# x13# v2#)
+ `and#` int2Word# 0x1000#)
+
+ (I# x14#) <- M.readArray vec $! start + 13
+ let !p14# = p13# `or#` ((maskw# x14# v# `or#` maskw# x14# v2#)
+ `and#` int2Word# 0x2000#)
+
+ (I# x15#) <- M.readArray vec $! start + 14
+ let !p15# = p14# `or#` ((maskw# x15# v# `or#` maskw# x15# v2#)
+ `and#` int2Word# 0x4000#)
+
+ (I# x16#) <- M.readArray vec $! start + 15
+ let !p16# = p15# `or#` ((maskw# x16# v# `or#` maskw# x16# v2#)
+ `and#` int2Word# 0x8000#)
+
+ return $! lineResult# p16# start
+{-# INLINE lineSearch32_2 #-}
+
+#endif
+
+
+------------------------------------------------------------------------------
+-- | Search through a mutable vector for one of three given int values,
+-- cache-line aligned. If the start index is cache-line aligned, and there is
+-- more than a cache-line's room between the start index and the end of the
+-- vector, we will search the cache-line all at once using an efficient
+-- branchless bit-twiddling technique. Otherwise, we will use a typical loop.
+--
+cacheLineSearch3 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> Int -- ^ value 3 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+cacheLineSearch3 !vec !start !value !value2 !value3 = do
+#ifdef NO_C_SEARCH
+ let !vlen = M.length vec
+ let !st1 = vlen - start
+ let !nvlen = numWordsInCacheLine - st1
+ let adv = (start + cacheLineIntMask) .&. complement cacheLineIntMask
+ let st2 = adv - start
+
+ if nvlen > 0 || not (isCacheLineAligned start)
+ then naiveSearch3 vec start (min st1 st2) value value2 value3
+ else lineSearch3 vec start value value2 value3
+#else
+ lineSearch3 vec start value value2 value3
+#endif
+{-# INLINE cacheLineSearch3 #-}
+
+
+#ifdef NO_C_SEARCH
+
+naiveSearch3 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ number of things to search
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> Int -- ^ value 3 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+naiveSearch3 !vec !start !nThings !value1 !value2 !value3 = go start
+ where
+ !doneIdx = start + nThings
+
+ go !i | i >= doneIdx = return (-1)
+ | otherwise = do
+ x <- M.readArray vec i
+ if x == value1 || x == value2 || x == value3
+ then return i
+ else go (i+1)
+{-# INLINE naiveSearch3 #-}
+
+
+lineSearch3 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> Int -- ^ value 3 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch3 | wordSize == 32 = lineSearch32_3
+ | otherwise = lineSearch64_3
+
+
+
+lineSearch64_3 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> Int -- ^ value 3 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch64_3 !vec !start !(I# v#) !(I# v2#) !(I# v3#) = do
+ (I# x1#) <- M.readArray vec $! start + 0
+ let !p1# = (maskw# x1# v# `or#` maskw# x1# v2# `or#` maskw# x1# v3#)
+ `and#` int2Word# 0x1#
+
+ (I# x2#) <- M.readArray vec $! start + 1
+ let !p2# = p1# `or#`
+ ((maskw# x2# v# `or#` maskw# x2# v2# `or#` maskw# x2# v3#)
+ `and#` int2Word# 0x2#)
+
+ (I# x3#) <- M.readArray vec $! start + 2
+ let !p3# = p2# `or#`
+ ((maskw# x3# v# `or#` maskw# x3# v2# `or#` maskw# x3# v3#)
+ `and#` int2Word# 0x4#)
+
+ (I# x4#) <- M.readArray vec $! start + 3
+ let !p4# = p3# `or#`
+ ((maskw# x4# v# `or#` maskw# x4# v2# `or#` maskw# x4# v3#)
+ `and#` int2Word# 0x8#)
+
+ (I# x5#) <- M.readArray vec $! start + 4
+ let !p5# = p4# `or#`
+ ((maskw# x5# v# `or#` maskw# x5# v2# `or#` maskw# x5# v3#)
+ `and#` int2Word# 0x10#)
+
+ (I# x6#) <- M.readArray vec $! start + 5
+ let !p6# = p5# `or#`
+ ((maskw# x6# v# `or#` maskw# x6# v2# `or#` maskw# x6# v3#)
+ `and#` int2Word# 0x20#)
+
+ (I# x7#) <- M.readArray vec $! start + 6
+ let !p7# = p6# `or#`
+ ((maskw# x7# v# `or#` maskw# x7# v2# `or#` maskw# x7# v3#)
+ `and#` int2Word# 0x40#)
+
+ (I# x8#) <- M.readArray vec $! start + 7
+ let !p8# = p7# `or#`
+ ((maskw# x8# v# `or#` maskw# x8# v2# `or#` maskw# x8# v3#)
+ `and#` int2Word# 0x80#)
+
+ return $! lineResult# p8# start
+{-# INLINE lineSearch64_3 #-}
+
+
+lineSearch32_3 :: IntArray s -- ^ vector to search
+ -> Int -- ^ start index
+ -> Int -- ^ value to search for
+ -> Int -- ^ value 2 to search for
+ -> Int -- ^ value 3 to search for
+ -> ST s Int -- ^ dest index where it can be found, or
+ -- \"-1\" if not found
+lineSearch32_3 !vec !start !(I# v#) !(I# v2#) !(I# v3#) = do
+ (I# x1#) <- M.readArray vec $! start + 0
+ let !p1# = (maskw# x1# v# `or#` maskw# x1# v2# `or#` maskw# x1# v3#)
+ `and#` int2Word# 0x1#
+
+ (I# x2#) <- M.readArray vec $! start + 1
+ let !p2# = p1# `or#`
+ ((maskw# x2# v# `or#` maskw# x2# v2# `or#` maskw# x2# v3#)
+ `and#` int2Word# 0x2#)
+
+ (I# x3#) <- M.readArray vec $! start + 2
+ let !p3# = p2# `or#`
+ ((maskw# x3# v# `or#` maskw# x3# v2# `or#` maskw# x3# v3#)
+ `and#` int2Word# 0x4#)
+
+ (I# x4#) <- M.readArray vec $! start + 3
+ let !p4# = p3# `or#`
+ ((maskw# x4# v# `or#` maskw# x4# v2# `or#` maskw# x4# v3#)
+ `and#` int2Word# 0x8#)
+
+ (I# x5#) <- M.readArray vec $! start + 4
+ let !p5# = p4# `or#`
+ ((maskw# x5# v# `or#` maskw# x5# v2# `or#` maskw# x5# v3#)
+ `and#` int2Word# 0x10#)
+
+ (I# x6#) <- M.readArray vec $! start + 5
+ let !p6# = p5# `or#`
+ ((maskw# x6# v# `or#` maskw# x6# v2# `or#` maskw# x6# v3#)
+ `and#` int2Word# 0x20#)
+
+ (I# x7#) <- M.readArray vec $! start + 6
+ let !p7# = p6# `or#`
+ ((maskw# x7# v# `or#` maskw# x7# v2# `or#` maskw# x7# v3#)
+ `and#` int2Word# 0x40#)
+
+ (I# x8#) <- M.readArray vec $! start + 7
+ let !p8# = p7# `or#`
+ ((maskw# x8# v# `or#` maskw# x8# v2# `or#` maskw# x8# v3#)
+ `and#` int2Word# 0x80#)
+
+ (I# x9#) <- M.readArray vec $! start + 8
+ let !p9# = p8# `or#`
+ ((maskw# x9# v# `or#` maskw# x9# v2# `or#` maskw# x9# v3#)
+ `and#` int2Word# 0x100#)
+
+ (I# x10#) <- M.readArray vec $! start + 9
+ let !p10# = p9# `or#`
+ ((maskw# x10# v# `or#` maskw# x10# v2# `or#` maskw# x10# v3#)
+ `and#` int2Word# 0x200#)
+
+ (I# x11#) <- M.readArray vec $! start + 10
+ let !p11# = p10# `or#`
+ ((maskw# x11# v# `or#` maskw# x11# v2# `or#` maskw# x11# v3#)
+ `and#` int2Word# 0x400#)
+
+ (I# x12#) <- M.readArray vec $! start + 11
+ let !p12# = p11# `or#`
+ ((maskw# x12# v# `or#` maskw# x12# v2# `or#` maskw# x12# v3#)
+ `and#` int2Word# 0x800#)
+
+ (I# x13#) <- M.readArray vec $! start + 12
+ let !p13# = p12# `or#`
+ ((maskw# x13# v# `or#` maskw# x13# v2# `or#` maskw# x13# v3#)
+ `and#` int2Word# 0x1000#)
+
+ (I# x14#) <- M.readArray vec $! start + 13
+ let !p14# = p13# `or#`
+ ((maskw# x14# v# `or#` maskw# x14# v2# `or#` maskw# x14# v3#)
+ `and#` int2Word# 0x2000#)
+
+ (I# x15#) <- M.readArray vec $! start + 14
+ let !p15# = p14# `or#`
+ ((maskw# x15# v# `or#` maskw# x15# v2# `or#` maskw# x15# v3#)
+ `and#` int2Word# 0x4000#)
+
+ (I# x16#) <- M.readArray vec $! start + 15
+ let !p16# = p15# `or#`
+ ((maskw# x16# v# `or#` maskw# x16# v2# `or#` maskw# x16# v3#)
+ `and#` int2Word# 0x8000#)
+
+ return $! lineResult# p16# start
+{-# INLINE lineSearch32_3 #-}
+
+#endif
122 src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs
@@ -0,0 +1,122 @@
+{-# LANGUAGE BangPatterns #-}
+
+module Data.HashTable.Internal.CheapPseudoRandomBitStream
+ ( BitStream
+ , newBitStream
+ , getNextBit
+ , getNBits
+ ) where
+
+import Control.Applicative
+import Control.Monad.ST
+import Data.Bits
+import Data.Int
+import Data.STRef
+import qualified Data.Vector.Unboxed as V
+import Data.Vector.Unboxed (Vector)
+
+import Data.HashTable.Internal.Utils
+
+
+------------------------------------------------------------------------------
+-- Chosen by fair dice roll. Guaranteed random. More importantly, there are an
+-- equal number of 0 and 1 bits in both of these vectors.
+random32s :: Vector Int32
+random32s = V.fromList [ 0xe293c315
+ , 0x82e2ff62
+ , 0xcb1ef9ae
+ , 0x78850172
+ , 0x551ee1ce
+ , 0x59d6bfd1
+ , 0xb717ec44
+ , 0xe7a3024e
+ , 0x02bb8976
+ , 0x87e2f94f
+ , 0xfa156372
+ , 0xe1325b17
+ , 0xe005642a
+ , 0xc8d02eb3
+ , 0xe90c0a87
+ , 0x4cb9e6e2
+ ]
+
+
+------------------------------------------------------------------------------
+random64s :: Vector Int64
+random64s = V.fromList [ 0x62ef447e007e8732
+ , 0x149d6acb499feef8
+ , 0xca7725f9b404fbf8
+ , 0x4b5dfad194e626a9
+ , 0x6d76f2868359491b
+ , 0x6b2284e3645dcc87
+ , 0x5b89b485013eaa16
+ , 0x6e2d4308250c435b
+ , 0xc31e641a659e0013
+ , 0xe237b85e9dc7276d
+ , 0x0b3bb7fa40d94f3f
+ , 0x4da446874d4ca023
+ , 0x69240623fedbd26b
+ , 0x76fb6810dcf894d3
+ , 0xa0da4e0ce57c8ea7
+ , 0xeb76b84453dc3873
+ ]
+
+
+------------------------------------------------------------------------------
+numRandoms :: Int
+numRandoms = 16
+
+
+------------------------------------------------------------------------------
+randoms :: Vector Int
+randoms | wordSize == 32 = V.map fromEnum random32s
+ | otherwise = V.map fromEnum random64s
+
+
+------------------------------------------------------------------------------
+data BitStream s = BitStream {
+ _curRandom :: !(STRef s Int)
+ , _bitsLeft :: !(STRef s Int)
+ , _randomPos :: !(STRef s Int)
+ }
+
+
+------------------------------------------------------------------------------
+newBitStream :: ST s (BitStream s)
+newBitStream =
+ unwrapMonad $
+ BitStream <$> (WrapMonad $ newSTRef $ V.unsafeIndex randoms 0)
+ <*> (WrapMonad $ newSTRef wordSize)
+ <*> (WrapMonad $ newSTRef 1)
+
+
+------------------------------------------------------------------------------
+getNextBit :: BitStream s -> ST s Int
+getNextBit = getNBits 1
+
+
+------------------------------------------------------------------------------
+getNBits :: Int -> BitStream s -> ST s Int
+getNBits nbits (BitStream crRef blRef rpRef) = do
+ !bl <- readSTRef blRef
+ if bl < nbits
+ then newWord
+ else nextBits bl
+
+ where
+ newWord = do
+ !rp <- readSTRef rpRef
+ let r = V.unsafeIndex randoms rp
+ writeSTRef blRef $! wordSize - nbits
+ writeSTRef rpRef $! if rp == (numRandoms-1) then 0 else rp + 1
+ extractBits r
+
+ extractBits r = do
+ let !b = r .&. ((1 `iShiftL` nbits) - 1)
+ writeSTRef crRef $! (r `iShiftRL` nbits)
+ return b
+
+ nextBits bl = do
+ !r <- readSTRef crRef
+ writeSTRef blRef $! bl - nbits
+ extractBits r
74 src/Data/HashTable/Internal/IntArray.hs
@@ -0,0 +1,74 @@
+{-# LANGUAGE BangPatterns #-}
+{-# LANGUAGE CPP #-}
+{-# LANGUAGE MagicHash #-}
+
+module Data.HashTable.Internal.IntArray
+ ( IntArray
+ , newArray
+ , readArray
+ , writeArray
+ , length
+ , toPtr
+ ) where
+
+import Control.Monad.ST
+import Data.Bits
+import qualified Data.Primitive.ByteArray as A
+import Data.Primitive.Types (Addr(..))
+import GHC.Exts
+import Prelude hiding (length)
+
+#ifdef BOUNDS_CHECKING
+#define BOUNDS_MSG(sz,i) concat [ "[", __FILE__, ":", \
+ show (__LINE__ :: Int), \
+ "] bounds check exceeded: ",\
+ "size was ", show (sz), " i was ", show (i) ]
+#define BOUNDS_CHECK(arr,i) let sz = (A.sizeofMutableByteArray (arr) \
+ `div` wordSizeInBytes) in \
+ if (i) < 0 || (i) >= sz \
+ then error (BOUNDS_MSG(sz,(i))) \
+ else return ()
+#else
+#define BOUNDS_CHECK(arr,i)
+#endif
+
+newtype IntArray s = IA (A.MutableByteArray s)
+
+
+wordSizeInBytes :: Int
+wordSizeInBytes = bitSize (0::Int) `div` 8
+
+
+-- | Cache line size, in bytes
+cacheLineSize :: Int
+cacheLineSize = 64
+
+
+newArray :: Int -> ST s (IntArray s)
+newArray n = do
+ let !sz = n * wordSizeInBytes
+ !arr <- A.newAlignedPinnedByteArray sz cacheLineSize
+ A.memsetByteArray arr 0 0 sz
+ return $! IA arr
+
+
+readArray :: IntArray s -> Int -> ST s Int
+readArray (IA a) idx = do
+ BOUNDS_CHECK(a,idx)
+ A.readByteArray a idx
+
+
+writeArray :: IntArray s -> Int -> Int -> ST s ()
+writeArray (IA a) idx val = do
+ BOUNDS_CHECK(a,idx)
+ A.writeByteArray a idx val
+
+
+length :: IntArray s -> Int
+length (IA a) = A.sizeofMutableByteArray a `div` wordSizeInBytes
+
+
+toPtr :: IntArray s -> Ptr a
+toPtr (IA a) = Ptr a#
+ where
+ !(Addr !a#) = A.mutableByteArrayContents a
355 src/Data/HashTable/Internal/Linear/Bucket.hs
@@ -0,0 +1,355 @@
+{-# LANGUAGE BangPatterns #-}
+{-# LANGUAGE CPP #-}
+
+module Data.HashTable.Internal.Linear.Bucket
+( Bucket,
+ newBucketArray,
+ newBucketSize,
+ emptyWithSize,
+ growBucketTo,
+ snoc,
+ size,
+ lookup,
+ delete,
+ toList,
+ fromList,
+ mapM_,
+ foldM,
+ expandBucketArray,
+ expandArray,
+ nelemsAndOverheadInWords,
+ bucketSplitSize
+) where
+
+
+------------------------------------------------------------------------------
+import qualified Control.Monad
+import Control.Monad hiding (mapM_, foldM)
+import Control.Monad.ST
+import Data.Maybe (fromMaybe)
+import Data.HashTable.Internal.Array
+import Data.STRef
+import Prelude hiding (lookup, mapM_)
+------------------------------------------------------------------------------
+import Data.HashTable.Internal.UnsafeTricks
+
+
+#ifdef DEBUG
+import System.IO
+#endif
+
+
+type Bucket s k v = Key (Bucket_ s k v)
+
+------------------------------------------------------------------------------
+data Bucket_ s k v = Bucket { _bucketSize :: {-# UNPACK #-} !Int
+ , _highwater :: {-# UNPACK #-} !(STRef s Int)
+ , _keys :: {-# UNPACK #-} !(MutableArray s k)
+ , _values :: {-# UNPACK #-} !(MutableArray s v)
+ }
+
+
+------------------------------------------------------------------------------
+bucketSplitSize :: Int
+bucketSplitSize = 16
+
+
+------------------------------------------------------------------------------
+newBucketArray :: Int -> ST s (MutableArray s (Bucket s k v))
+newBucketArray k = newArray k emptyRecord
+
+------------------------------------------------------------------------------
+nelemsAndOverheadInWords :: Bucket s k v -> ST s (Int,Int)
+nelemsAndOverheadInWords bKey = do
+ if (not $ keyIsEmpty bKey)
+ then do
+ !hw <- readSTRef hwRef
+ let !w = sz - hw
+ return (hw, constOverhead + 2*w)
+ else
+ return (0, 0)
+
+ where
+ constOverhead = 8
+ b = fromKey bKey
+ sz = _bucketSize b
+ hwRef = _highwater b
+
+
+------------------------------------------------------------------------------
+emptyWithSize :: Int -> ST s (Bucket s k v)
+emptyWithSize !sz = do
+ !keys <- newArray sz undefined
+ !values <- newArray sz undefined
+ !ref <- newSTRef 0
+
+ return $ toKey $ Bucket sz ref keys values
+
+
+------------------------------------------------------------------------------
+newBucketSize :: Int
+newBucketSize = 4
+
+
+------------------------------------------------------------------------------
+expandArray :: a -- ^ default value
+ -> Int -- ^ new size
+ -> Int -- ^ number of elements to copy
+ -> MutableArray s a -- ^ old array
+ -> ST s (MutableArray s a)
+expandArray def !sz !hw !arr = do
+ newArr <- newArray sz def
+ cp newArr
+
+ where
+ cp !newArr = go 0
+ where
+ go !i
+ | i >= hw = return newArr
+ | otherwise = do
+ readArray arr i >>= writeArray newArr i
+ go (i+1)
+
+
+------------------------------------------------------------------------------
+expandBucketArray :: Int
+ -> Int
+ -> MutableArray s (Bucket s k v)
+ -> ST s (MutableArray s (Bucket s k v))
+expandBucketArray = expandArray emptyRecord
+
+
+------------------------------------------------------------------------------
+growBucketTo :: Int -> Bucket s k v -> ST s (Bucket s k v)
+growBucketTo !sz bk | keyIsEmpty bk = emptyWithSize sz
+ | otherwise = do
+ if osz >= sz
+ then return bk
+ else do
+ hw <- readSTRef hwRef
+ k' <- expandArray undefined sz hw keys
+ v' <- expandArray undefined sz hw values
+ return $ toKey $ Bucket sz hwRef k' v'
+
+ where
+ bucket = fromKey bk
+ osz = _bucketSize bucket
+ hwRef = _highwater bucket
+ keys = _keys bucket
+ values = _values bucket
+
+
+------------------------------------------------------------------------------
+{-# INLINE snoc #-}
+-- Just return == new bucket object
+snoc :: Bucket s k v -> k -> v -> ST s (Int, Maybe (Bucket s k v))
+snoc bucket | keyIsEmpty bucket = mkNew
+ | otherwise = snoc' (fromKey bucket)
+ where
+ mkNew !k !v = do
+ debug "Bucket.snoc: mkNew"
+ keys <- newArray newBucketSize undefined
+ values <- newArray newBucketSize undefined
+
+ writeArray keys 0 k
+ writeArray values 0 v
+ ref <- newSTRef 1
+ return (1, Just $ toKey $ Bucket newBucketSize ref keys values)
+
+ snoc' (Bucket bsz hwRef keys values) !k !v =
+ readSTRef hwRef >>= check
+ where
+ check !hw
+ | hw < bsz = bump hw
+ | otherwise = spill hw
+
+ bump hw = do
+ debug $ "Bucket.snoc: bumping hw, bsz=" ++ show bsz ++ ", hw="
+ ++ show hw
+
+ writeArray keys hw k
+ writeArray values hw v
+ let !hw' = hw + 1
+ writeSTRef hwRef hw'
+ debug "Bucket.snoc: finished"
+ return (hw', Nothing)
+
+ doublingThreshold = bucketSplitSize `div` 2
+ growFactor = 1.5 :: Double
+ newSize z | z == 0 = newBucketSize
+ | z < doublingThreshold = z * 2
+ | otherwise = ceiling $ growFactor * fromIntegral z
+
+ spill !hw = do
+ let sz = newSize bsz
+ debug $ "Bucket.snoc: spilling, old size=" ++ show bsz ++ ", new size="
+ ++ show sz
+
+ bk <- growBucketTo sz bucket
+
+ debug "Bucket.snoc: spill finished, snoccing element"
+ let (Bucket _ hwRef' keys' values') = fromKey bk
+
+ let !hw' = hw+1
+ writeArray keys' hw k
+ writeArray values' hw v
+ writeSTRef hwRef' hw'
+
+ return (hw', Just bk)
+
+
+
+------------------------------------------------------------------------------
+{-# INLINE size #-}
+size :: Bucket s k v -> ST s Int
+size b | keyIsEmpty b = return 0
+ | otherwise = readSTRef $ _highwater $ fromKey b
+
+
+------------------------------------------------------------------------------
+-- note: search in reverse order! We prefer recently snoc'd keys.
+lookup :: (Eq k) => Bucket s k v -> k -> ST s (Maybe v)
+lookup bucketKey !k | keyIsEmpty bucketKey = return Nothing
+ | otherwise = lookup' $ fromKey bucketKey
+ where
+ lookup' (Bucket _ hwRef keys values) = do
+ hw <- readSTRef hwRef
+ go (hw-1)
+ where
+ go !i
+ | i < 0 = return Nothing
+ | otherwise = do
+ k' <- readArray keys i
+ if k == k'
+ then do
+ !v <- readArray values i
+ return $! Just v
+ else go (i-1)
+
+
+------------------------------------------------------------------------------
+{-#