Skip to content

Commit

Permalink
Bug #16739204 IMPROVE THE INNODB HASH FUNCTION
Browse files Browse the repository at this point in the history
Bug #23584861 	INNODB ADAPTIVE HASH INDEX USES A BAD PARTITIONING ALGORITHM FOR THE REAL WORLD

The InnoDB hash and random generator functions were improved in most important properties.
The hash methods are all changed to work on 64bit values instead of platform dependent type. Some were based on ideas from Tomek Czajka's blog https://sortingsearching.com/2020/05/21/hashing.html
The old `ut_hash_ulint` was calculating hash using only one xor with a 32bit value, so the hashes of values of multiple of 2^n had all the lowest bits always the same. It is replaced with a method that calculates 64bit out of 64bit value. The Tabulation Hashing is used as the main hashing algorithm. It has good random distribution of common input sequences, a property that is guarded by a new unit tests.
The `ut_rnd_gen_ulint` and its helper `ut_rnd_gen_next_ulint` were providing numbers using 32bit constants and many, many operations. This generator provided random values in a cycle of just 85000. It was replaced by a simple function that provides fully valuable 64bits of hash with a new method to hash 64bit integers, resulting in 2^64 long cycle.
Hashing of two 64bit integers was improved in distribution for common sequences like <i,i>, <i,constant>, <constant, i>. Property guarded by a new unit test.
ut_fold_binary was renamed to `ut::hash_binary_ib` and used only for InnoDB's page checksum calculation for a non-default checksum algorithm. All other usages switched to a new method `ut::hash_binary` which bases on method for hashing two 64bit integers. For buffers longer than 15 bytes it switches to calculate hash using crc32 (and doing some more hashing to get 64bit result), which after recent improvements by Jakub Lopuszanski using native CPU instructions and execution pipeline parallelism to achieve very high speeds (multiple bytes per core per each single cycle). Overall the new method is much faster than old one.
The hash index calculation from fold value is done using a new utility class `fast_modulo_t` designed and implemented by Jakub. It does more costly precalculations once, and then allow fast modulo operations without using very slow integer division CPU instructions. It's a little more than 3 times faster and calculating it with division and only 2 times slower than GCC's compile-time optimized modulo by a constant. It also has a wrapper that allows calculations while the modulo value is changed concurrently using lock-free `Seq_lock` also implemented by Jakub.
A new `ut0math.h` is added with strictly math-related functions. The old `ut_find_prime` is extracted to it, and two new `divide_128` and `multiply_uint64` added. It also adds a new utility class `fast_modulo_t` and `mt_fast_modulo_t`.
The `multiply_uint64` which calculated 128bit result of 64bit multiplication is executed very frequently, so it has intrinsic methods for x86 implemented.

Additional or more detailed changes:
hash_create and hash_table_free were deleted. The hash initialization was moved to `hash_table_t` constructor. The hash destruction is done automatically using default destructor that clears `cells` array as it is `unique_ptr` now.
`hast_table_t` get new field `n_cells_fast_modulo` to allow fast hash index from fold calculation.
`btr_get_search_slot` is added to be responsible for the AHI part index calculation using global fast modulo structure `btr_ahi_parts_fast_modulo`.
`ut_rnd_interval` did return results excluding the value of `high`, contrary to the documentation.
New unit tests were added to test many new or modified hashing and random generating methods, assuring long generator cycle and good random distributions of hashes.

RB#27454

Change-Id: Icbc62d9e2ca44a702c18ea66e2ca048d9240494a
  • Loading branch information
Marcin Babij committed Apr 6, 2022
1 parent 5558ecc commit b11a175
Show file tree
Hide file tree
Showing 70 changed files with 3,129 additions and 1,302 deletions.
3 changes: 2 additions & 1 deletion storage/innobase/CMakeLists.txt
@@ -1,4 +1,4 @@
# Copyright (c) 2006, 2021, Oracle and/or its affiliates.
# Copyright (c) 2006, 2022, Oracle and/or its affiliates.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License, version 2.0,
Expand Down Expand Up @@ -235,6 +235,7 @@ SET(INNOBASE_SOURCES
usr/usr0sess.cc
ut/ut0dbg.cc
ut/ut0list.cc
ut/ut0math.cc
ut/ut0mem.cc
ut/ut0new.cc
ut/ut0rbt.cc
Expand Down
2 changes: 1 addition & 1 deletion storage/innobase/btr/btr0cur.cc
Expand Up @@ -1393,7 +1393,7 @@ void btr_cur_search_to_nth_level(

/* If the first or the last record of the page
or the same key value to the first record or last record,
the another page might be choosen when BTR_CONT_MODIFY_TREE.
the another page might be chosen when BTR_CONT_MODIFY_TREE.
So, the parent page should not released to avoiding deadlock
with blocking the another search with the same key value. */
if (!detected_same_key_root && lock_intention == BTR_INTENTION_BOTH &&
Expand Down

0 comments on commit b11a175

Please sign in to comment.