-
-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Description
#15222 has introduced a way to enable virtual threads (which cannot extends FastThreadLocal) to use many FastThredLocalThread optimizations e.g. adaptive unshared magazines, FastThreadLocal::get, etc etc.
This has introduced a binary search vs the thread's id to verify a thread belong to the known set of registered fast thread local threads.
Since loom enable to create many virtual threads which are not fast thread local thread and we can still have a some which are event loops (see https://micronaut.io/2025/06/30/transitioning-to-virtual-threads-using-the-micronaut-loom-carrier/ and https://github.com/franz1981/Netty-VirtualThread-Scheduler) we would like a O(1) way to detect if a virtual thread doesn't belong to this set (ideally, but we need to measure it - which can be tricky!): in https://github.com/franz1981/netty/tree/4.2_not_fasttl_bench I've implemented a microbench which doesn't stress the branch predictor but still stress the log2(event loops) comparisons for such (negative) test.
I was thinking we could implement such in this way (used in openjdk/jdk#18309):
- create an array of (fast thread local) Thread's id ordered based on their hashing value
- store in a long (8 bytes) a bit for each id in the array (
hash(id) % 64-th bit position) - checking for id presence check first the (hash(id) % 64)-th bit: if there's none, it returns immediately
- if the bit is set, count the previous other bits (it's a single ASM instruction) to know the position where to start with searching in the array of ids
- etc etc
This should grant that for decently distributed thread ids and a cheap enough hashing we can have O(1)-ish for both; with slow-path w for loops search (starting from the computed position). Differently from Open addressing with linear probing we cannot just stop searching till the first "empty" is found - since the long id "bloom filter"(ish, again :P ) doesn't allow it (the order of bit set is there to allow the array of ids to be compressed - there are no empty slots, like for hash maps).
wdyt?