Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared source: don't backfill for the first MV #16576

Closed
Tracked by #16003
xxchan opened this issue May 3, 2024 · 8 comments
Closed
Tracked by #16003

shared source: don't backfill for the first MV #16576

xxchan opened this issue May 3, 2024 · 8 comments
Assignees
Milestone

Comments

@xxchan
Copy link
Member

xxchan commented May 3, 2024

Previously I thought #16348 is enough to avoid waste work, but it seems not the case.

During this benchmark, we can see the backfill never catches up with the upstream source before the benchmark finishes.

image

https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&from=1714671911000&to=1714673747000&var-namespace=xxtest1

Actually for the first MV, we can directly skip the backfill stage, since the source is paused before the MV is created. However, to implement this, we need to somehow let the MV know whether it's the first..

@github-actions github-actions bot added this to the release-1.9 milestone May 3, 2024
@xxchan xxchan changed the title shared source: don't backfill for the first MV on the source shared source: don't backfill for the first MV on shared source May 3, 2024
@xxchan xxchan changed the title shared source: don't backfill for the first MV on shared source shared source: don't backfill for the first MV May 3, 2024
@xxchan xxchan self-assigned this May 3, 2024
@xxchan
Copy link
Member Author

xxchan commented May 6, 2024

However, to implement this, we need to somehow let the MV know whether it's the first..

I came up with a simple solution: sleep 1s after the SourceExecutor is resumed. 🤡

Another simple solution: change the poll strategy in the SourceBackfillExecutor. At the beginning, prefer backfill side, then switch to prefer upstream. However, I'm not sure whether the first poll can get data after the kafka reader just get created.

@xxchan
Copy link
Member Author

xxchan commented May 6, 2024

@BugenZhao reminded me that even after this problem is solved, some scenarios are still not optimal: e.g., if we create 10 MVs together, we cannot ensure the later MVs can catch up faster (and thus can share work).

@xxchan
Copy link
Member Author

xxchan commented May 6, 2024

if we create 10 MVs together, we cannot ensure the later MVs can catch up faster (and thus can share work).

If so (i.e., we don't want to optimize this scenario), maybe it's better to let the upstream SourceExecutor directly start from latest.

Assume there's a lot of historical data, and relatively small throughput for new data.

Let's compare: (S=SourceExecutor, B=SourceBackfillExecutor)

case A. S starts from the specified offset, same as B

S begins with high workload to read historical data. The first B cannot catch up (as mentioned by this issue). We can implement some mechanism to let the first B immediately finishes, and forwards S (i.e., B=S).

When S doesn't finish reading historical data, and now we create 2nd B2, it probably cannot catch up. B2 also need to drop a lot of data from S.

After S finishes reading historical data, it will performs like case B.

case B. S starts from latest

S begins with low workload. All Bs perform similarly. None of them need to drop a lot of data from S.

A little difference with case A is that now the 1st B cannot finish immediately:

image

This also means that we do not need to implement the special treatment mentioned in this issue. At the same time, we might need to change upstream source's timing of creating source reader, to make sure it's latest when resumed (i.e., when the first MV created).

@xxchan
Copy link
Member Author

xxchan commented May 6, 2024

To conclude, the difference seems not very large. We need to backfill N times for N MVs, and can only share work after steady state.

  • The largest difference is whether the first MV's backfilling is from upstream or itself.
  • But in case A, later MVs might need to drop a lot of data from S, and has larger overhead.

Some further questions:

Why can't Bs catch up with S if they start together?
Because B has some overhead of iterating data from S.

for (i, (_, row)) in chunk.rows().enumerate() {
let split = row.datum_at(split_idx).unwrap().into_utf8();
let offset = row.datum_at(offset_idx).unwrap().into_utf8();
let backfill_state =
backfill_stage.states.get_mut(split).unwrap();
let vis = backfill_state.handle_upstream_row(offset);
new_vis.set(i, vis);
}

This might also be the reason why in the original issue's figure, fragment 1 is faster than fragment 2.
I think the algorithm can be optimized: at the beginning when backfilling is far from upstream, we don't need to check the upstream's offset, and we are in a "fast chasing" stage.

But even when this is optimized, I think it still cannot catch up. Because if the backfill is fast, then the upstream source also has a lot of work to do. So it has to be backpressured or rate limited.

Is it possible to share work for historical data?
First we have to adopt case A.
(BTW, I think changing this behavior isn't a breaking change, so it's fine to experiment both options)

Then there are several ways to do it

  • Manually rate limit/pause the source first. I don't think this is practical, because it requires the user to have very deep understanding of how the executors work..
  • Adaptive rate limit the source when downstream is backfilling. I think this is hard to implement. At the same time, it's possible that users don't want the other jobs affected when creating a new MV on the source.
  • transactional DDL to create the MVs together

Do we want to share work for historical data?
I'm not sure.

  • Is it common to have a lot of historical data to ingest from Kafka in reality?
  • A more important question might be: is it common to create a lot of MVs on the same source? Why not materialize the source into a table or MV first? At the end of the day, what's the benefit of shared source over materialized source (besides storage)? One possible use case is to route messages from a source (unified topic) to multiple ones (like we do in nexmark benchmark). But in reality, why not do the routing inside Kafka?

Anyway, if such use cases exist, it's of course nice to share work.

@xxchan
Copy link
Member Author

xxchan commented May 6, 2024

I think case B (source starting from latest) is simple and works good in most simple cases.

changing this behavior isn't a breaking change

It's because backfill always start from specified offset, and where the upstream starts only affect when it finishes backfilling.

For the source executor, it only changes the starting position. When the offset is persisted in the state table, we will use that. So it's also fine.

@tabVersion
Copy link
Contributor

Is it common to have a lot of historical data to ingest from Kafka in reality?

In most business logic, Kafka historical data does not matter, it is far away from "real-time". But for a streaming database, a more common case is that online data is first written to OLTP for online serving and duplicated to kafka for some analysis. For this question, the feature is not the most frequently used one but essential.

@tabVersion
Copy link
Contributor

Actually for the first MV, we can directly skip the backfill stage, since the source is paused before the MV is created. However, to implement this, we need to somehow let the MV know whether it's the first...

I have a little concern with the pr's object, when creating a steaming job with source, we update relation_ref_count on the frontend. We can get the exact number instead of doing hacks.

@xxchan
Copy link
Member Author

xxchan commented May 9, 2024

@tabVersion with #16626,we are not going to implement the original idea of the issue any more. We will backfill for every MV 👀

@xxchan xxchan closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants