Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New map internals #2181

Merged
merged 64 commits into from
Nov 17, 2022
Merged

New map internals #2181

merged 64 commits into from
Nov 17, 2022

Conversation

gingerBill
Copy link
Member

@gingerBill gingerBill commented Nov 11, 2022

High performance, cache-friendly, open-addressed Robin Hood hashing hash map data structure with various optimizations for Odin.

To make this map cache-friendly it uses a novel strategy to ensure keys and values of the map are always cache-line aligned and that no single key or value of any type ever straddles a cache-line. This cache efficiency makes for quick lookups because the linear-probe always addresses data in a cache friendly way. This is enabled through the use of a special meta-type called a Map_Cell which packs as many values of a given type into a local array adding internal padding to round to MAP_CACHE_LINE_SIZE. One other benefit to storing the internal data in this manner is false sharing no longer occurs when using a map, enabling efficient concurrent access of the map data structure with minimal locking if desired.

Notes:

  • Open-Addressed Robin Hood Hashing
  • SOA-based cache-friendly entries
  • Defaults to dynamic calls currently, add -use-static-map-calls to enable static calls
    • Which should be used and when?

This map implementation makes extensive use of uintptr for representing sizes, lengths, capacities, masks, pointers, offsets, and addresses to avoid expensive sign extension and masking that would be generated if types were casted all over. The only place regular ints show up is in the cap() and len() implementations.

// The raw, type-erased representation of a map.
//
// 32-bytes on 64-bit
// 16-bytes on 32-bit
Raw_Map :: struct {
	// A single allocation spanning all keys, values, and hashes.
	// {
	//   k: Map_Cell(K) * (capacity / ks_per_cell)
	//   v: Map_Cell(V) * (capacity / vs_per_cell)
	//   h: Map_Cell(H) * (capacity / hs_per_cell)
	// }
	//
	// The data is allocated assuming 64-byte alignment, meaning the address is
	// always a multiple of 64. This means we have 6 bits of zeros in the pointer
	// to store the capacity. We can store a value as large as 2^6-1 or 63 in
	// there. This conveniently is the maximum log2 capacity we can have for a map
	// as Odin uses signed integers to represent capacity.
	//
	// Since the hashes are backed by Map_Hash, which is just a 64-bit unsigned
	// integer, the cell structure for hashes is unnecessary because 64/8 is 8 and
	// requires no padding, meaning it can be indexed as a regular array of
	// Map_Hash directly, though for consistency sake it's written as if it were
	// an array of Map_Cell(Map_Hash).
	data:      uintptr,   // 8-bytes on 64-bits, 4-bytes on 32-bits
	len:       int,       // 8-bytes on 64-bits, 4-bytes on 32-bits
	allocator: Allocator, // 16-bytes on 64-bits, 8-bytes on 32-bits
}

Benefits of this approach over the old approach

  • VERY fast map get (~3x dynamic calls, ~5x static calls)
  • VERY fast map set (~5x dynamic calls, ~6x static calls)
  • One allocation per map rather than two (previously hashes and entries)
    • Only requires alloc_non_zeroed and free internally
  • SOA keys, values, and hashes, allowing each to be loaded into a separate cache line
  • Entries are stored in non-contiguous cell-layout which means no element straddles across a cache line
  • Small header information (runtime.Map_Info) for dynamic calls
  • Allows for calling delete_key whilst iterating across the map

Issues of this approach over the old approach

  • Cannot resize in-place, must make new map, copy contents to new map, and delete old map
  • Non-trivial to iterate across in a for in loop
    • Due to the SOA layout, non-contiguous cell-layout, and non-valid buckets
    • Old implementation had all entries be contiguous and in a separate dynamic array
  • Requires extra scratch storage for the keys and values
    • When inserting, the scratch storage is used rather than relying on stack memory
  • A lot more complicated than the previous implementation

General differences

  • Hashing function has not been changed EXCEPT it has been sanitized
    • 0 is not a valid hash value any more, it will be set to 1 internally
    • Highest bit must be zero has this bit is used as a flag to indicate a bucket was deleted but still allow for probing
  • log2(capacity) is encoded into the data as the lowest 6 bits (allowing for a hypothetical maximum capacity of 2^63 elements)
    • Capacities can only be a power of two

TODOs for Future PRs

  • Improved hash function
    • Use architecture specific instructions (e.g. AES on amd64) when possible
    • If not possible, fallback to a different hash
    • Currently using a (modified) fnv64a, maybe something wyhash might be better
      • wyhash is being used by other languages as a fallback, and appears to perform well with smhasher

@gingerBill gingerBill marked this pull request as ready for review November 13, 2022 23:47
map[K]struct{} works fine.
@gingerBill gingerBill merged commit 15bbdb2 into master Nov 17, 2022
@gingerBill gingerBill deleted the map-dev branch November 17, 2022 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants