Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't cache head/tail index in Consumer/Producer #48

Merged
merged 2 commits into from
Dec 9, 2023

Conversation

mgeier
Copy link
Owner

@mgeier mgeier commented Apr 27, 2021

I'm not sure whether this is an improvement or not.

The benchmarks are inconclusive on my machine.

The results might be different on a CPU with weak memory ordering, because the number of atomic loads increases.

Any comments?

@ghost
Copy link

ghost commented May 4, 2021

image

image

On my machine, this version faster than the default one with a large buffer but slower than the default one with a small buffer.

@mgeier
Copy link
Owner Author

mgeier commented May 9, 2021

Thanks @zhenpingfeng!

I've also seen slight improvements in the two-threads benchmark.
However, some of the single-threaded benchmarks have shown quite strong regressions.

I'm not sure whether the results are reliable and I don't know which of the benchmarks are most relevant in practice.

But since the improvements are small and the regressions are big, I'm hesitating to merge this.

Have you also tried the single-threaded benchmarks?

What kind of CPU have you used, if I may ask?

I'm using a Intel(R) Core(TM) i5-7Y54 CPU.

@ghost
Copy link

ghost commented May 10, 2021

image
image
image
image
single-threaded benchmarks result.

@ghost
Copy link

ghost commented May 10, 2021

In the single-threaded test results, the default version does have a slight advantage in some tests.
I‘m using the Intel Xeon Gold 6230 CPU at 2.8Ghz.

@mgeier
Copy link
Owner Author

mgeier commented Jun 7, 2021

Sorry for the late response.
I've run the benchmarks again on my Laptop in order to produce some plots.

The results are quite mixed. Most differences are within +-5%.

The following plots show a few bigger differences.

blue: this PR
red: base commit of this PR (cc4191f)

Some benchmarks improved quite a lot:

single-thread-single-byte/4-write:

image

single-thread-with-chunks/3-iterate-write:

image

While other benchmarks regressed quite a lot:

single-thread-with-chunks/1-pop:

image

single-thread-with-chunks/3-iterate-read:

image

Please note that single-thread-with-chunks/3-iterate-read has a huge regression while single-thread-with-chunks/3-iterate-write has a huge improvement. Strangely, single-thread-with-chunks/3-iterate-write-uninit shows no change (but I would have expected a similar behavior to single-thread-with-chunks/3-iterate-write).
Any ideas what could be the explanation for this?

The two-threads benchmarks (which I think might be the most relevant here), show only very small differences within the noise threshold:

two-threads/large:

image

two-threads/small:

image

In summary, the multi-threaded benchmarks show no difference, while the single-threaded ones show both big improvements and big regressions, which makes me think they are probably not very trustworthy.

Unless there is a good explanation for the results, I'm hesitating to merge this, because it might actually be a net negative.

Any theories that could explain the observations?

Any further benchmark results?

@RamType0
Copy link
Contributor

RamType0 commented Nov 8, 2021

IMHO, Ordering::Acquire isn't required when reading tail by Producer, or reading head by Consumer, isn't it?

@mgeier

This comment has been minimized.

@mgeier

This comment has been minimized.

@mgeier
Copy link
Owner Author

mgeier commented Dec 6, 2021

IMHO, Ordering::Acquire isn't required when reading tail by Producer, or reading head by Consumer, isn't it?

Yes, you are absolutely right @RamType0. Thanks for noticing this!

I've changed those operations to use Relaxed in e94a9d6.

I also added comments in case somebody is wondering later why Relaxed was chosen.

@mgeier mgeier marked this pull request as ready for review December 9, 2023 20:11
@mgeier
Copy link
Owner Author

mgeier commented Dec 9, 2023

I have decided to merge this. Some benchmarks are still inconclusive, but overall it tends to be an improvement. The two-thread benchmarks don't change.

@mgeier mgeier merged commit 80c6842 into mgeier:main Dec 9, 2023
9 checks passed
@mgeier mgeier deleted the optimize-indices branch December 9, 2023 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants