-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[libc++] Add randomize unspecified stability in __hash_table
#105982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[libc++] Add randomize unspecified stability in __hash_table
#105982
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-libcxx Author: Arvid Jonasson (arvidjonasson) ChangesAdds functionality requested in #102303.
Full diff: https://github.com/llvm/llvm-project/pull/105982.diff 3 Files Affected:
diff --git a/libcxx/docs/DesignDocs/UnspecifiedBehaviorRandomization.rst b/libcxx/docs/DesignDocs/UnspecifiedBehaviorRandomization.rst
index 70278798ecf630..3e52a51684507e 100644
--- a/libcxx/docs/DesignDocs/UnspecifiedBehaviorRandomization.rst
+++ b/libcxx/docs/DesignDocs/UnspecifiedBehaviorRandomization.rst
@@ -82,5 +82,7 @@ Currently supported randomization
on the order of the remaining part
* ``std::nth_element``, there is no guarantee on the order from both sides of the
partition
+* ``std::unordered_{set,map}``, there is no guarantee on the order of the elements
+* ``std::unordered_{multiset,multimap}``, there is no guarantee on the order of the elements nor the order of equal elements
Patches welcome.
diff --git a/libcxx/include/__hash_table b/libcxx/include/__hash_table
index d5fbc92a3dfc4e..d6931a81d10a27 100644
--- a/libcxx/include/__hash_table
+++ b/libcxx/include/__hash_table
@@ -45,6 +45,11 @@
#include <limits>
#include <new> // __launder
+#ifdef _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY
+# include <__debug_utils/randomize_range.h>
+# include <__numeric/iota.h>
+#endif
+
#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
# pragma GCC system_header
#endif
@@ -980,6 +985,9 @@ private:
template <bool _UniqueKeys>
_LIBCPP_HIDE_FROM_ABI void __do_rehash(size_type __n);
+ template <bool _UniqueKeys>
+ _LIBCPP_HIDE_FROM_ABI void __debug_randomize_order();
+
template <class... _Args>
_LIBCPP_HIDE_FROM_ABI __node_holder __construct_node(_Args&&... __args);
@@ -1702,6 +1710,7 @@ void __hash_table<_Tp, _Hash, _Equal, _Alloc>::__rehash(size_type __n) _LIBCPP_D
template <class _Tp, class _Hash, class _Equal, class _Alloc>
template <bool _UniqueKeys>
void __hash_table<_Tp, _Hash, _Equal, _Alloc>::__do_rehash(size_type __nbc) {
+ __debug_randomize_order<_UniqueKeys>();
__pointer_allocator& __npa = __bucket_list_.get_deleter().__alloc();
__bucket_list_.reset(__nbc > 0 ? __pointer_alloc_traits::allocate(__npa, __nbc) : nullptr);
__bucket_list_.get_deleter().size() = __nbc;
@@ -1741,6 +1750,54 @@ void __hash_table<_Tp, _Hash, _Equal, _Alloc>::__do_rehash(size_type __nbc) {
}
}
+template <class _Tp, class _Hash, class _Equal, class _Alloc>
+template <bool _UniqueKeys>
+void __hash_table<_Tp, _Hash, _Equal, _Alloc>::__debug_randomize_order() {
+#ifdef _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY
+ size_type __total_nodes = size();
+ size_type __initialized_nodes = 0;
+
+ // Storage to handle non-assignable, non-default constructible __node_holder.
+ union __nh_storage {
+ __nh_storage() {}
+ ~__nh_storage() {}
+ __node_holder __nh;
+ };
+
+ auto __nh_storage_deleter = [&__initialized_nodes](__nh_storage* __p) {
+ for (size_type __i = 0; __i < __initialized_nodes; ++__i)
+ __p[__i].__nh.~__node_holder();
+ delete[] __p;
+ };
+
+ // Allocate storage for nodes and indices.
+ unique_ptr<__nh_storage[], decltype(__nh_storage_deleter)> __nodes(
+ new __nh_storage[__total_nodes], __nh_storage_deleter);
+ unique_ptr<size_type[]> __randomized_indices(new size_type[__total_nodes]);
+
+ // Move nodes into temporary storage.
+ for (; __initialized_nodes < __total_nodes; ++__initialized_nodes)
+ new (std::addressof(__nodes[__initialized_nodes].__nh)) __node_holder(remove(begin()));
+
+ // Randomize the order of indices.
+ std::iota(__randomized_indices.get(), __randomized_indices.get() + __total_nodes, size_type{0});
+ __debug_randomize_range<_ClassicAlgPolicy>(__randomized_indices.get(), __randomized_indices.get() + __total_nodes);
+
+ // Reinsert nodes into the hash table in randomized order.
+ for (size_type __i = 0; __i < __total_nodes; ++__i) {
+ __node_holder& __nh = __nodes[__randomized_indices[__i]].__nh;
+ __node_pointer __np = __nh->__upcast();
+ if _LIBCPP_CONSTEXPR_SINCE_CXX17 (_UniqueKeys) {
+ __node_insert_unique_perform(__np);
+ } else {
+ __next_pointer __pn = __node_insert_multi_prepare(__np->__hash(), __np->__get_value());
+ __node_insert_multi_perform(__np, __pn);
+ }
+ __nh.release();
+ }
+#endif
+}
+
template <class _Tp, class _Hash, class _Equal, class _Alloc>
template <class _Key>
typename __hash_table<_Tp, _Hash, _Equal, _Alloc>::iterator
diff --git a/libcxx/test/libcxx/containers/unord/hash_table_randomize_order.pass.cpp b/libcxx/test/libcxx/containers/unord/hash_table_randomize_order.pass.cpp
new file mode 100644
index 00000000000000..bec3c5d353f83f
--- /dev/null
+++ b/libcxx/test/libcxx/containers/unord/hash_table_randomize_order.pass.cpp
@@ -0,0 +1,79 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// Test std::unordered_{set,map,multiset,multimap} randomization
+
+// UNSUPPORTED: c++03
+// ADDITIONAL_COMPILE_FLAGS: -D_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY
+
+#include <unordered_set>
+#include <unordered_map>
+#include <cassert>
+#include <vector>
+#include <algorithm>
+
+const int kSize = 128;
+
+template <typename T, typename F>
+T get_random(F get_value) {
+ T v;
+ v.reserve(kSize);
+ for (int i = 0; i < kSize; ++i) {
+ v.insert(get_value());
+ }
+ v.rehash(v.bucket_count() + 1);
+ return v;
+}
+
+template <typename T, typename F>
+T get_deterministic(F get_value) {
+ T v;
+ v.reserve(kSize);
+ for (int i = 0; i < kSize; ++i) {
+ v.insert(get_value());
+ }
+ return v;
+}
+
+template <typename T>
+struct RemoveConst {
+ using type = T;
+};
+
+template <typename T, typename U>
+struct RemoveConst<std::pair<const T, U>> {
+ using type = std::pair<T, U>;
+};
+
+template <typename T, typename F>
+void test_randomization(F get_value) {
+ T t1 = get_deterministic<T>(get_value), t2 = get_random<T>(get_value);
+
+ // Convert pair<const K, V> to pair<K, V> so it can be sorted
+ using U = typename RemoveConst<typename T::value_type>::type;
+
+ std::vector<U> t1v(t1.begin(), t1.end()), t2v(t2.begin(), t2.end());
+
+ assert(t1v != t2v);
+
+ std::sort(t1v.begin(), t1v.end());
+ std::sort(t2v.begin(), t2v.end());
+
+ assert(t1v == t2v);
+}
+
+int main(int, char**) {
+ int i = 0, j = 0;
+ test_randomization<std::unordered_set<int>>([i]() mutable { return i++; });
+ test_randomization<std::unordered_map<int, int>>([i, j]() mutable { return std::make_pair(i++, j++); });
+ test_randomization<std::unordered_multiset<int>>([i]() mutable { return i++ % 32; });
+ test_randomization<std::unordered_multimap<int, int>>([i, j]() mutable {
+ return std::make_pair(i++ % 32, j++);
+ });
+ return 0;
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
…set,map,multiset,multimap} under _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY - Adds randomization of element order during rehash in unordered containers when the _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY flag is set, similar to existing behavior in sort, nth_element, and partial_sort.
e60b49a
to
83eb7ea
Compare
…nd_insertion_point()` from `__node_insert_multi_prepare()`. Use user defined allocator for temporary node and indices buffers in `__debug_randomize_order()`.
}; | ||
|
||
__nh_vec __nhv(size()); | ||
__index_vec __iv(size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd ideally want to use unique_ptr<__node_holder[], decltype(deleter)>
and unique_ptr<size_type[], decltype(deleter)>
(instead of __nh_vec
and __index_vec
) while still being able to use the user defined allocator. But min_pointer<T>
is giving me problems (by not being convertible to T*
). Is someone familiar with the matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you paste what the exact error is? It's probably something benign like a missing std::__to_address
call.
} | ||
|
||
private: | ||
_LIBCPP_HIDE_FROM_ABI __next_pointer __node_insert_multi_find_insertion_point(size_t __cp_hash, value_type& __cp_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__node_insert_multi_find_insertion_point
is used in __node_insert_multi_prepare
and __debug_randomize_order
. Is refactoring warranted or should I instead copy the "find insertion point" logic/code from __node_insert_multi_prepare
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you extracted this into a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I get feedback on this approach?
There's something I don't understand about this approach. This is a naive question so bear with me. If you shuffle the elements in the hash table, how do you then manage to find them again based on their hash? Doesn't that mean the elements will be at a location that isn't correlated to their hash, meaning we'll basically devolve into a linear search whenever we look for an element?
template <typename T, typename F> | ||
T get_random(F get_value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <typename T, typename F> | |
T get_random(F get_value) { | |
template <typename Container, typename F> | |
Container get_random(F get_value) { |
T v; | ||
v.reserve(kSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T v; | |
v.reserve(kSize); | |
Container c; | |
c.reserve(kSize); |
Or something like that -- minor change but it makes things easier to read.
} | ||
|
||
private: | ||
_LIBCPP_HIDE_FROM_ABI __next_pointer __node_insert_multi_find_insertion_point(size_t __cp_hash, value_type& __cp_val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you extracted this into a separate function.
} | ||
__nh.release(); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#endif | |
#endif // defined(_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY) |
}; | ||
|
||
__nh_vec __nhv(size()); | ||
__index_vec __iv(size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you paste what the exact error is? It's probably something benign like a missing std::__to_address
call.
Sorry, I just want to clarify:
Lets say my I don't completely understand your implementation, but it looks clever 👍 . |
Gentle ping @arvidjonasson! |
Adds functionality requested in #102303.
Expands on functionality of https://libcxx.llvm.org/DesignDocs/UnspecifiedBehaviorRandomization.html.
std::unordered_{set,map,multiset,multimap}
) when the_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY
flag is set, similar to existing behavior instd::sort
,std::nth_element
, andstd::partial_sort
.std::unordered_{multiset,multimap}
, equal ranges are shuffled and order within equal ranges are shuffled.std::unordered_{set,map}
, order of elements are shuffled.libcxx/test/libcxx/containers/unord/hash_table_randomize_order.pass.cpp
to assert that randomization works correctly.Pseudo code for current approach:
Can I get feedback on this approach?