Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: performance issue when there is only barrier passing the cluster #17646

Open
1 task
st1page opened this issue Jul 10, 2024 · 5 comments
Open
1 task

perf: performance issue when there is only barrier passing the cluster #17646

st1page opened this issue Jul 10, 2024 · 5 comments
Milestone

Comments

@st1page
Copy link
Contributor

st1page commented Jul 10, 2024

In some real-world scenarios, we have encountered this situation.

  • No source throughput and other data, only barrier in the stream
  • barrier number pie up, but max commited epoch keep increasing
  • No very slow tasks in the await tree
  • high CPU usage, hyper(httpserver) and tokio in the flamegraph

In this situation, the only flow in the graph is the barrier. In our assumption, when there is no data between barriers, the barriers should flow through quickly. Otherwise, if the time for barriers to pass through the entire graph exceeds one second, then barriers will continue to accumulate, and the cluster will always be in a state of backpressure. more infos here https://www.notion.so/risingwave-labs/CVTE-2024-07-10-barrier-only-ecc23aa5b9ee4664a97a17c97a25d709?pvs=4

Below are some suspected causes and potential optimizations we are currently considering for this situation:

  • Hyper + Tokio's performance issues when handling small network packets. We have discovered that in this scenario, the CPU usage is abnormally high, and there are many atomic variable and lock calls from Hyper in the flame graph.
  • backpressure with limiting concurrent barrier number in exchange. We introduce exchange_concurrent_barriers to limit concurrent barriers in exchange in feat(streaming): limit concurrent barriers in exchange based on permits #9427, which makes the backpressure more sensitive. When there are only barriers on the stream, the only barrier stream will be slow when exchange_concurrent_barriers = 1 and in that case, all barrier messages will pass the streaming message sequentially, in other words, the upstream must send the barrier until the downstream actor has consumed that message and send back a ack message.
    • When sending barriers to the downstream system, it is necessary to wait for the downstream to receive and consume the barrier before returning the RPC permit to the upstream. Only then can the upstream send the next barrier.
    • Even after adjusting the value of exchange_concurrent_barriers to a larger number, the problem was not resolved. This suggests that the issue lies in another direction, where there is no batching occurring. That is, for every barrier consumed by the downstream, an RPC is sent to the upstream to return the permit. In other words, there is no batching of permits.
  • The streaming pipeline is quite extensive. The triggering scenario involves a left-deep tree join of 20 tables, which results in the barrier flow having to pass through dozens of exchanges along the entire pipeline.
  • Barrier interval. Obviously, increasing the barrier interval can resolve this situation. On the contrary, we could also use a smaller barrier interval to reproduce the issue easier.

Even with some conjectures, I believe we first need to find a way to reliably reproduce this situation before attempting to make improvements. This way, we can verify the effectiveness of the optimizations.

  • reproduce the case and invesitigate
@st1page st1page changed the title perf: there is only barrier passing the streaming perf: there is only barrier passing the cluster Jul 10, 2024
@github-actions github-actions bot added this to the release-1.10 milestone Jul 10, 2024
@st1page st1page changed the title perf: there is only barrier passing the cluster perf: performance issue when there is only barrier passing the cluster Jul 10, 2024
@fuyufjh fuyufjh modified the milestones: release-1.10, release-1.11 Jul 10, 2024
@BugenZhao
Copy link
Member

Will it be caused by the problem described in #17612?

@fuyufjh
Copy link
Member

fuyufjh commented Aug 19, 2024

Perhaps fixed by #17612.

@fuyufjh fuyufjh closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2024
@fuyufjh
Copy link
Member

fuyufjh commented Aug 22, 2024

Recurs today in another case

@fuyufjh fuyufjh reopened this Aug 22, 2024
@hzxa21
Copy link
Collaborator

hzxa21 commented Oct 18, 2024

Recurs today in another case

@fuyufjh fuyufjh removed this from the release-2.0 milestone Oct 18, 2024
@fuyufjh
Copy link
Member

fuyufjh commented Oct 18, 2024

The overhead (CPU usage) comes from the passing the barriers. We have to admit that passing a barrier will cost some overhead, although the overhead is neglectable in most cases.

It reminds me of #6943, where we set the in_flight_barrier_nums to 10000 to eliminate the upper bound. This also causes a side effect - the overhead of passing barriers grows along with the number of in-flight barriers.

As written in #6943, the motivation was because of OOM issues. Shall we reduce number from 10000 to some lower number such as 1000 or 100, in order to reach a balance point between these two issues?

@fuyufjh fuyufjh added this to the release-2.2 milestone Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants