Don't cache head/tail index in Consumer/Producer #48

mgeier · 2021-04-27T07:44:18Z

I'm not sure whether this is an improvement or not.

The benchmarks are inconclusive on my machine.

The results might be different on a CPU with weak memory ordering, because the number of atomic loads increases.

Any comments?

ghost · 2021-05-04T11:36:21Z

On my machine, this version faster than the default one with a large buffer but slower than the default one with a small buffer.

mgeier · 2021-05-09T18:16:22Z

Thanks @zhenpingfeng!

I've also seen slight improvements in the two-threads benchmark.
However, some of the single-threaded benchmarks have shown quite strong regressions.

I'm not sure whether the results are reliable and I don't know which of the benchmarks are most relevant in practice.

But since the improvements are small and the regressions are big, I'm hesitating to merge this.

Have you also tried the single-threaded benchmarks?

What kind of CPU have you used, if I may ask?

I'm using a Intel(R) Core(TM) i5-7Y54 CPU.

ghost · 2021-05-10T03:23:52Z

single-threaded benchmarks result.

ghost · 2021-05-10T03:34:08Z

In the single-threaded test results, the default version does have a slight advantage in some tests.
I‘m using the Intel Xeon Gold 6230 CPU at 2.8Ghz.

mgeier · 2021-06-07T16:17:16Z

Sorry for the late response.
I've run the benchmarks again on my Laptop in order to produce some plots.

The results are quite mixed. Most differences are within +-5%.

The following plots show a few bigger differences.

blue: this PR
red: base commit of this PR (cc4191f)

Some benchmarks improved quite a lot:

single-thread-single-byte/4-write:

single-thread-with-chunks/3-iterate-write:

While other benchmarks regressed quite a lot:

single-thread-with-chunks/1-pop:

single-thread-with-chunks/3-iterate-read:

Please note that single-thread-with-chunks/3-iterate-read has a huge regression while single-thread-with-chunks/3-iterate-write has a huge improvement. Strangely, single-thread-with-chunks/3-iterate-write-uninit shows no change (but I would have expected a similar behavior to single-thread-with-chunks/3-iterate-write).
Any ideas what could be the explanation for this?

The two-threads benchmarks (which I think might be the most relevant here), show only very small differences within the noise threshold:

two-threads/large:

two-threads/small:

In summary, the multi-threaded benchmarks show no difference, while the single-threaded ones show both big improvements and big regressions, which makes me think they are probably not very trustworthy.

Unless there is a good explanation for the results, I'm hesitating to merge this, because it might actually be a net negative.

Any theories that could explain the observations?

Any further benchmark results?

RamType0 · 2021-11-08T15:07:42Z

IMHO, Ordering::Acquire isn't required when reading tail by Producer, or reading head by Consumer, isn't it?

mgeier · 2021-12-06T20:12:02Z

IMHO, Ordering::Acquire isn't required when reading tail by Producer, or reading head by Consumer, isn't it?

Yes, you are absolutely right @RamType0. Thanks for noticing this!

I've changed those operations to use Relaxed in e94a9d6.

I also added comments in case somebody is wondering later why Relaxed was chosen.

mgeier · 2023-12-09T21:12:34Z

I have decided to merge this. Some benchmarks are still inconclusive, but overall it tends to be an improvement. The two-thread benchmarks don't change.

mgeier mentioned this pull request May 19, 2021

False sharing? agerasev/ringbuf#7

Open

This comment has been minimized.

Sign in to view

mgeier force-pushed the optimize-indices branch from 2573eb7 to b0fbd44 Compare December 6, 2021 19:30

mgeier mentioned this pull request Dec 11, 2021

Rename cached head/tail -> cached_head/cached_tail #72

Merged

mgeier force-pushed the optimize-indices branch from e94a9d6 to 5bf04a4 Compare December 16, 2021 20:54

mgeier mentioned this pull request Jan 18, 2022

Remove unneeded Cells #80

Closed

mgeier mentioned this pull request Jan 31, 2022

Inconsistent mutability between WriteChunkUninit::producer and ReadChunk::consumer #79

Closed

mgeier force-pushed the optimize-indices branch from 5bf04a4 to b6ad316 Compare December 2, 2023 19:36

mgeier added 2 commits December 9, 2023 20:23

Don't cache head/tail index in Consumer/Producer

a019f5e

Use "Relaxed" for reading tail in producer and head in consumer

80c6842

mgeier force-pushed the optimize-indices branch from b6ad316 to 80c6842 Compare December 9, 2023 19:23

mgeier marked this pull request as ready for review December 9, 2023 20:11

mgeier merged commit 80c6842 into mgeier:main Dec 9, 2023
9 checks passed

mgeier deleted the optimize-indices branch December 9, 2023 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't cache head/tail index in Consumer/Producer #48

Don't cache head/tail index in Consumer/Producer #48

mgeier commented Apr 27, 2021

ghost commented May 4, 2021 •

edited by ghost

Loading

mgeier commented May 9, 2021

ghost commented May 10, 2021

ghost commented May 10, 2021

mgeier commented Jun 7, 2021

RamType0 commented Nov 8, 2021

This comment has been minimized.

This comment has been minimized.

mgeier commented Dec 6, 2021

mgeier commented Dec 9, 2023

Don't cache head/tail index in Consumer/Producer #48

Don't cache head/tail index in Consumer/Producer #48

Conversation

mgeier commented Apr 27, 2021

ghost commented May 4, 2021 • edited by ghost Loading

mgeier commented May 9, 2021

ghost commented May 10, 2021

ghost commented May 10, 2021

mgeier commented Jun 7, 2021

RamType0 commented Nov 8, 2021

This comment has been minimized.

This comment has been minimized.

mgeier commented Dec 6, 2021

mgeier commented Dec 9, 2023

ghost commented May 4, 2021 •

edited by ghost

Loading