Skip to content

Commit

Permalink
Add a high-water-mark indicator to hash-table kv vectors
Browse files Browse the repository at this point in the history
The high water mark allows many operations to be O(high-water-mark)
which is usually the same as O(count), i.e. less than O(size).
In particular, avoid scanning beyond the mark in MAPHASH, CLRHASH, rehash,
and the garbage collector's scavenging routines.

Furthermore, avoid linking all cells into a freelist initially. Anything above
the HWM is implicitly a list such that the next cell in the list is always
one beyond the high water mark.  Only the cells below the high-water mark
need to be placed in a freelist, which never happens in the absence of REMHASH.

Skip explicitly initializing the hash vector. Always store a hash
if applicable when a key is stored. Simiarly don't 0-init the NEXT vector.

Fix a suboptimality in immobile space core relocation on startup- it marked
a table as needing rehash if a value moved. In all probability, most tables
get so marked during save-lisp-and-die, but the bug could have been evident
if keys were non-pointers and values were pointers.

Minor touch-ups: rename hash-table-table to hash-table-pairs
and hash-table-number-entries to hash-table-%count.
  • Loading branch information
snuglas committed Jul 4, 2019
1 parent 803fd5b commit 8bf84ad
Show file tree
Hide file tree
Showing 11 changed files with 319 additions and 272 deletions.
48 changes: 32 additions & 16 deletions src/code/hash-table.lisp
Expand Up @@ -16,21 +16,24 @@
;;; SIZE is always the exact number of K/V entries that can be stored,
;;; and can be any number, not necessarily a power of 2.

;;; ______________________________________
;;; K/V | |
;;; vector | * | * | K | V | K | V | ......... |
;;; +____________________________________+
;;; <--- SIZE -->
;;; __________________________________________
;;; K/V | | |
;;; vector | * | * | K | V | K | V | ......... | * |
;;; +________________________________________+
;;; | <--- SIZE -->|
;;;
;;; ^--- pair index 1 and so on
;;;

;;; The length of TABLE (the K/V vector) is the specified :SIZE * 2
;;; plus 2 cells of overhead. There is a minimum of 16 k/v pairs,
;;; therefore a minimum physical length of 34 (including the overhead).
;;; Pair index 0 is not for user data. We index cell pairs by a physical
;;; index, not logical pair index. ("logical" pair 0 would occupy
;;; physical vector elements 2 and 3 as if the overhead didn't exist.)
;;; The length of PAIRS (the K/V vector) is the specified :SIZE * 2
;;; plus 3 elements of overhead, 2 at the beginning and one at the end.
;;; (It's slighly strange that extra cells are in two different places,
;;; however there's a reason: we need an indicator for the end of a chain,
;;; and/or unused bin, and we use 0 for that, which means that k/v pair 0
;;; is unusable. But we can't keep indiscriminately adding overhead cells
;;; to the front because that make even more k/v pairs unusable,
;;; whereas adding at the end doesn't cause any such problem)
;;; Pair index 1 is the first pair that stores user data.

;;; The length of the HASH-VECTOR is in direct correspondence with the
;;; physical k/v cells, so that we can store a hash per key and not worry
Expand All @@ -47,6 +50,13 @@
;;; the hash vector, the NEXT vector is sized at 1 greater than minimally
;;; necessary, to avoid adding and subtracting 1 from a pair index.

;;; The PAIRS vector has an odd length with the following overhead elements:
;;;
;;; [0] = backpointer to hash-table
;;; [1] = rehash-due-to-GC indicator
;;; ...
;;; [length-1] = high-water-mark

;;; HASH-TABLE is implemented as a STRUCTURE-OBJECT.
(sb-xc:deftype hash-table-index () '(unsigned-byte 32))
(sb-xc:defstruct (hash-table (:copier nil)
Expand All @@ -58,7 +68,7 @@
hash-fun
rehash-size
rehash-threshold
table
pairs
index-vector
next-vector
hash-vector
Expand All @@ -68,8 +78,12 @@
(getter #'error :type function :read-only t)
(setter #'error :type function :read-only t)
;; The Key-Value pair vector.
(table nil :type simple-vector)
;; The index vector. This may be larger than the hash size to help
;; Note: this vector has a "high water mark" which resembles a fill
;; pointer, but unlike a fill pointer, GC can ignore elements
;; above the high water mark. If you store non-immediate data past
;; that mark, you're sure to have problems.
(pairs nil :type simple-vector)
;; The index vector. This may be larger than the capacity to help
;; reduce collisions.
(index-vector nil :type (simple-array hash-table-index (*)))
;; This table parallels the KV vector, and is used to chain together
Expand Down Expand Up @@ -124,13 +138,15 @@
;; (CLHS says that these are all just "hints" and we're free to ignore)
(rehash-threshold nil :type (single-float ($0.0) $1.0) :read-only t)
;; The current number of entries in the table.
(number-entries 0 :type index)
(%count 0 :type index)
;; This slot is used to link weak hash tables during GC. When the GC
;; isn't running it is always NIL.
(next-weak-hash-table nil :type null)
;; Index into the Next vector chaining together free slots in the KV
;; vector.
(next-free-kv 0 :type index)
;; This index is allowed to exceed the high-water-mark by 1 unless
;; the HWM is at its maximum in which case this must be 0.
(next-free-kv 1 :type index)
;; List of values culled out during GC of weak hash table.
(culled-values nil :type list)
;; For detecting concurrent accesses.
Expand Down
29 changes: 20 additions & 9 deletions src/code/maphash.lisp
Expand Up @@ -20,15 +20,27 @@
(define-symbol-macro +empty-ht-slot+ (make-unbound-marker))
(defmacro empty-ht-slot-p (x) `(unbound-marker-p ,x))

;;; like INDEX, but only up to half the maximum. Used by hash-table
;;; code that does plenty to (aref v (* 2 i)) and (aref v (1+ (* 2 i))).
(deftype index/2 () `(integer 0 (,(floor sb-xc:array-dimension-limit 2))))

;;; The high water mark is an element of the pairs vector, and not
;;; a slot in the table.
;;; It is the number of logical pairs in use, so if HWM = 1, then
;;; 1 pair is in use occupying physical element indices 2 and 3.
(defmacro hash-table-pairs-hwm (pairs)
`(truly-the index/2 (svref ,pairs (1- (length ,pairs)))))

(define-compiler-macro maphash (&whole form function-designator hash-table
&environment env)
(when (sb-c:policy env (> space speed))
(return-from maphash form))
(with-unique-names (fun table size i kv-vector key value)
(with-unique-names (fun limit i kv-vector key value)
`(let* ((,fun (%coerce-callable-to-fun ,function-designator))
(,table ,hash-table)
(,kv-vector (hash-table-table ,table))
(,size (* 2 (length (hash-table-next-vector ,table)))))
(,kv-vector (hash-table-pairs ,hash-table))
;; The high water mark needs to be loaded only once due to the
;; prohibition against adding keys during traversal.
(,limit (1+ (* 2 (hash-table-pairs-hwm ,kv-vector)))))
;; Regarding this TRULY-THE: in the theoretical edge case of the largest
;; possible NEXT-VECTOR, it is not really true that the I+2 is an index.
;; However, for all intents and purposes, it is an INDEX because if not,
Expand All @@ -38,7 +50,7 @@
;; And it doesn't matter anyway - the compiler uses unsigned word
;; arithmetic here on account of (* 2 length) exceeding a fixnum.
(do ((,i 3 (truly-the index (+ ,i 2))))
((>= ,i ,size))
((> ,i ,limit))
;; We are running without locking or WITHOUT-GCING. For a weak
;; :VALUE hash table it's possible that the GC hit after KEY
;; was read and now the entry is gone. So check if either the
Expand Down Expand Up @@ -85,14 +97,13 @@ use eg. SB-EXT:WITH-LOCKED-HASH-TABLE to protect the WITH-HASH-TABLE-ITERATOR
for."
(let ((function (gensymify* name "-FUN")))
`(let ((,function
(let* ((table ,hash-table)
(kv-vector (hash-table-table table))
(size (* 2 (length (hash-table-next-vector table))))
(let* ((kv-vector (hash-table-pairs ,hash-table))
(limit (1+ (* 2 (hash-table-pairs-hwm kv-vector))))
(index 3))
(declare (fixnum index))
(flet ((,name ()
(loop
(when (>= index size) (return nil))
(when (> index limit) (return nil))
(let ((i index))
(incf (truly-the index index) 2)
(let ((value (aref kv-vector i)))
Expand Down

0 comments on commit 8bf84ad

Please sign in to comment.