Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maint: Remove dependency on some elements of folly #1370

Merged
merged 13 commits into from
Mar 11, 2024

Conversation

jjerphan
Copy link
Collaborator

@jjerphan jjerphan commented Feb 26, 2024

Reference Issues/PRs

What does this implement or fix?

Remove or replace elements of folly.

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
folly recommends against not using it:
https://github.com/facebook/folly/blob/main/folly/SpinLock.h#L18-L19

Since `std::mutex` is non-copyable, we use indirections via
`std::unique`.

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
@@ -646,7 +645,7 @@ class StringReducer {
virtual void reduce(PipelineContextRow& context_row, size_t column_index) = 0;
};

using LockType = folly::SpinLock;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this required? Spin locks tend be faster than std::mutex and mutexes in general if the lock is held briefly, I'm not sure about the current state of the code, just pointing out that it can have an impact.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

folly itself recommends against using SpinLock:
https://github.com/facebook/folly/blob/main/folly/SpinLock.h#L18-L19

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's the general case. I've seen cases where the spin lock increases the performance by a large factor because acquiring a mutex requires kernel call. Thus if you hold the mutex for short period of time the kernel call will negate any benefits. I was asking there was a reason for the spin lock in the first place and were there any tests to prove nothing changed significantly after the switch to a mutex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know.

Probably, @willdealtry can provide some insights?

I guess if the asv benchmarks aren't showing regressions, this might not be a concern. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASV benchmarks are not quite complete (still better than nothing). Let's see if @willdealtry has any concerns.

Copy link
Collaborator

@willdealtry willdealtry Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has had quite a lot of optimization effort because it's a known bottleneck for us (and for anyone else trying to write string data). If we want to move away from using the folly spinlock, it's not difficult to write one that should be sufficient for our purposes, even if it doesn't maybe have all the trickery that the folly one has:

#ifdef WIN32
#define PAUSE _mm_pause()
#else if defined(__arm__) || defined(__ARM_ARCH)
#define PAUSE __yield()
#else
#define PAUSE __builtin_ia32_pause()
#endif

struct SpinLock {
  std::atomic<bool> lock_ = false;

void lock() noexcept {
    do {
      if (!lock_.exchange(true, std::memory_order_acquire))
        return;
 
      while (lock_.load(std::memory_order_relaxed))
        PAUSE;
      
    } while(true);
  }

  bool try_lock() noexcept {
    return !lock_.load(std::memory_order_relaxed) && !lock_.exchange(true, std::memory_order_acquire);
  }
  
  void unlock() { lock_.store(false, std::memory_order_release); }
};

N.B. I have not compiled, much less tested this, although it's pretty canonical. I remember doing almost exactly this with volatile bool and intrinsics in the early Cretaceous period (when I was young)

Copy link
Collaborator

@willdealtry willdealtry Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the maps, std::unordered_map has generally horrible performance. I think over time we will probably end up writing some of our own maps for specific purposes, because quite a lot of the time we have use-cases where we could be much faster than a general-purpose hash map. For example in some cases, we can do without deletion, because we populate the map and then drop it entirely, and in other cases the values are known to be unique at insertion time, meaning that we can do away with a substantial part of the branching and complexity of a generic container.

Copy link
Collaborator

@willdealtry willdealtry Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to move everything away from folly and robin_hood at the same time, it would seem to be worthwhile at least migrating to a modern and performant implementation globally, then replacing that where we think we have a special case.

I would be in favour of using this one, as it's at least as fast if not faster than any of the ones we currently use, is header-only and has an MIT license: https://github.com/martinus/unordered_dense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::unordered_map is for now a placeholder replacement, and I agree more performant implementations are needed.

Let's wait for the completion of the benchmarks triggered by b625e1f to see how severe performances regression are?

Copy link
Collaborator Author

@jjerphan jjerphan Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recent boost::unordered_flat_{set,map} are also worth considering.

I have opened #1390 to see whether we can replace robin hood with boost's collections.

@jjerphan jjerphan marked this pull request as ready for review February 28, 2024 16:25
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Copy link
Collaborator

@Klaim Klaim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the change of hash maps will probably have an impact on performance, as long as we dont set a proper replacement, I think it's fine - we can change that later. So LGTM. 👍🏽

This reverts "Disable ec2 runners on PR (#1357)"

This reverts commit 198c9cc.
@jjerphan
Copy link
Collaborator Author

jjerphan commented Mar 1, 2024

According to b625e1f's ASV benchmarks, there are no performance regressions.

I am also for replacing all the hash maps and hash sets with martinus/unordered_dense as proposed by @willdealtry in #1370 (comment) in another PR. What do you think?

Let me know if this looks good to you, and I will revert b625e1f before merging this PR.

@jjerphan jjerphan force-pushed the maint/remove-folly-parts branch 2 times, most recently from 81cebbb to d867cb3 Compare March 7, 2024 09:05
@jjerphan
Copy link
Collaborator Author

jjerphan commented Mar 8, 2024

Weirdly enough this PR as of 76b7fc3 had worse performance than master. That is why I reverted the changes then.

@jjerphan jjerphan merged commit 4184a46 into master Mar 11, 2024
114 checks passed
@jjerphan jjerphan deleted the maint/remove-folly-parts branch March 11, 2024 12:12
@jjerphan jjerphan mentioned this pull request Mar 11, 2024
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants