perf: performance issue when there is only barrier passing the cluster #17646

st1page · 2024-07-10T08:02:52Z

In some real-world scenarios, we have encountered this situation.

No source throughput and other data, only barrier in the stream
barrier number pie up, but max commited epoch keep increasing
No very slow tasks in the await tree
high CPU usage, hyper(httpserver) and tokio in the flamegraph

In this situation, the only flow in the graph is the barrier. In our assumption, when there is no data between barriers, the barriers should flow through quickly. Otherwise, if the time for barriers to pass through the entire graph exceeds one second, then barriers will continue to accumulate, and the cluster will always be in a state of backpressure. more infos here https://www.notion.so/risingwave-labs/CVTE-2024-07-10-barrier-only-ecc23aa5b9ee4664a97a17c97a25d709?pvs=4

Below are some suspected causes and potential optimizations we are currently considering for this situation:

Hyper + Tokio's performance issues when handling small network packets. We have discovered that in this scenario, the CPU usage is abnormally high, and there are many atomic variable and lock calls from Hyper in the flame graph.
backpressure with limiting concurrent barrier number in exchange. We introduce exchange_concurrent_barriers to limit concurrent barriers in exchange in feat(streaming): limit concurrent barriers in exchange based on permits #9427, which makes the backpressure more sensitive. When there are only barriers on the stream, the only barrier stream will be slow when exchange_concurrent_barriers = 1 and in that case, all barrier messages will pass the streaming message sequentially, in other words, the upstream must send the barrier until the downstream actor has consumed that message and send back a ack message.
- When sending barriers to the downstream system, it is necessary to wait for the downstream to receive and consume the barrier before returning the RPC permit to the upstream. Only then can the upstream send the next barrier.
- Even after adjusting the value of exchange_concurrent_barriers to a larger number, the problem was not resolved. This suggests that the issue lies in another direction, where there is no batching occurring. That is, for every barrier consumed by the downstream, an RPC is sent to the upstream to return the permit. In other words, there is no batching of permits.
The streaming pipeline is quite extensive. The triggering scenario involves a left-deep tree join of 20 tables, which results in the barrier flow having to pass through dozens of exchanges along the entire pipeline.
Barrier interval. Obviously, increasing the barrier interval can resolve this situation. On the contrary, we could also use a smaller barrier interval to reproduce the issue easier.

Even with some conjectures, I believe we first need to find a way to reliably reproduce this situation before attempting to make improvements. This way, we can verify the effectiveness of the optimizations.

reproduce the case and invesitigate

BugenZhao · 2024-07-17T03:14:25Z

Will it be caused by the problem described in #17612?

fuyufjh · 2024-08-19T07:34:34Z

Perhaps fixed by #17612.

fuyufjh · 2024-08-22T16:17:45Z

Recurs today in another case

hzxa21 · 2024-10-18T08:30:25Z

Recurs today in another case

fuyufjh · 2024-10-18T09:36:38Z

The overhead (CPU usage) comes from the passing the barriers. We have to admit that passing a barrier will cost some overhead, although the overhead is neglectable in most cases.

It reminds me of #6943, where we set the in_flight_barrier_nums to 10000 to eliminate the upper bound. This also causes a side effect - the overhead of passing barriers grows along with the number of in-flight barriers.

As written in #6943, the motivation was because of OOM issues. Shall we reduce number from 10000 to some lower number such as 1000 or 100, in order to reach a balance point between these two issues?

st1page changed the title ~~perf: there is only barrier passing the streaming~~ perf: there is only barrier passing the cluster Jul 10, 2024

github-actions bot added this to the release-1.10 milestone Jul 10, 2024

st1page changed the title ~~perf: there is only barrier passing the cluster~~ perf: performance issue when there is only barrier passing the cluster Jul 10, 2024

fuyufjh modified the milestones: release-1.10, release-1.11 Jul 10, 2024

fuyufjh closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2024

fuyufjh reopened this Aug 22, 2024

fuyufjh removed this from the release-2.0 milestone Oct 18, 2024

fuyufjh added this to the release-2.2 milestone Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: performance issue when there is only barrier passing the cluster #17646

perf: performance issue when there is only barrier passing the cluster #17646

st1page commented Jul 10, 2024 •

edited

Loading

BugenZhao commented Jul 17, 2024

fuyufjh commented Aug 19, 2024 •

edited

Loading

fuyufjh commented Aug 22, 2024

hzxa21 commented Oct 18, 2024

fuyufjh commented Oct 18, 2024

perf: performance issue when there is only barrier passing the cluster #17646

perf: performance issue when there is only barrier passing the cluster #17646

Comments

st1page commented Jul 10, 2024 • edited Loading

BugenZhao commented Jul 17, 2024

fuyufjh commented Aug 19, 2024 • edited Loading

fuyufjh commented Aug 22, 2024

hzxa21 commented Oct 18, 2024

fuyufjh commented Oct 18, 2024

st1page commented Jul 10, 2024 •

edited

Loading

fuyufjh commented Aug 19, 2024 •

edited

Loading