-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maint: Remove dependency on some elements of folly #1370
Conversation
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
folly recommends against not using it: https://github.com/facebook/folly/blob/main/folly/SpinLock.h#L18-L19 Since `std::mutex` is non-copyable, we use indirections via `std::unique`. Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
@@ -646,7 +645,7 @@ class StringReducer { | |||
virtual void reduce(PipelineContextRow& context_row, size_t column_index) = 0; | |||
}; | |||
|
|||
using LockType = folly::SpinLock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this required? Spin locks tend be faster than std::mutex
and mutexes in general if the lock is held briefly, I'm not sure about the current state of the code, just pointing out that it can have an impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
folly itself recommends against using SpinLock
:
https://github.com/facebook/folly/blob/main/folly/SpinLock.h#L18-L19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's the general case. I've seen cases where the spin lock increases the performance by a large factor because acquiring a mutex requires kernel call. Thus if you hold the mutex for short period of time the kernel call will negate any benefits. I was asking there was a reason for the spin lock in the first place and were there any tests to prove nothing changed significantly after the switch to a mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know.
Probably, @willdealtry can provide some insights?
I guess if the asv benchmarks aren't showing regressions, this might not be a concern. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASV benchmarks are not quite complete (still better than nothing). Let's see if @willdealtry has any concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has had quite a lot of optimization effort because it's a known bottleneck for us (and for anyone else trying to write string data). If we want to move away from using the folly spinlock, it's not difficult to write one that should be sufficient for our purposes, even if it doesn't maybe have all the trickery that the folly one has:
#ifdef WIN32
#define PAUSE _mm_pause()
#else if defined(__arm__) || defined(__ARM_ARCH)
#define PAUSE __yield()
#else
#define PAUSE __builtin_ia32_pause()
#endif
struct SpinLock {
std::atomic<bool> lock_ = false;
void lock() noexcept {
do {
if (!lock_.exchange(true, std::memory_order_acquire))
return;
while (lock_.load(std::memory_order_relaxed))
PAUSE;
} while(true);
}
bool try_lock() noexcept {
return !lock_.load(std::memory_order_relaxed) && !lock_.exchange(true, std::memory_order_acquire);
}
void unlock() { lock_.store(false, std::memory_order_release); }
};
N.B. I have not compiled, much less tested this, although it's pretty canonical. I remember doing almost exactly this with volatile bool and intrinsics in the early Cretaceous period (when I was young)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of the maps, std::unordered_map has generally horrible performance. I think over time we will probably end up writing some of our own maps for specific purposes, because quite a lot of the time we have use-cases where we could be much faster than a general-purpose hash map. For example in some cases, we can do without deletion, because we populate the map and then drop it entirely, and in other cases the values are known to be unique at insertion time, meaning that we can do away with a substantial part of the branching and complexity of a generic container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to move everything away from folly and robin_hood at the same time, it would seem to be worthwhile at least migrating to a modern and performant implementation globally, then replacing that where we think we have a special case.
I would be in favour of using this one, as it's at least as fast if not faster than any of the ones we currently use, is header-only and has an MIT license: https://github.com/martinus/unordered_dense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unordered_map
is for now a placeholder replacement, and I agree more performant implementations are needed.
Let's wait for the completion of the benchmarks triggered by b625e1f to see how severe performances regression are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recent boost::unordered_flat_{set,map}
are also worth considering.
I have opened #1390 to see whether we can replace robin hood with boost's collections.
599677b
to
61dc210
Compare
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the change of hash maps will probably have an impact on performance, as long as we dont set a proper replacement, I think it's fine - we can change that later. So LGTM. 👍🏽
According to b625e1f's ASV benchmarks, there are no performance regressions. I am also for replacing all the hash maps and hash sets with Let me know if this looks good to you, and I will revert b625e1f before merging this PR. |
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
…_map`" This reverts commit 76b7fc3.
81cebbb
to
d867cb3
Compare
Weirdly enough this PR as of 76b7fc3 had worse performance than |
Reference Issues/PRs
What does this implement or fix?
Remove or replace elements of folly.
Any other comments?
Checklist
Checklist for code changes...