Use pull based model for Segment Replication #4577

ankitkala · 2022-09-23T09:53:50Z

Pull based model for Segment Replication
We're working on using the existing implementation of Segment replication for cross cluster replication. We've identified 2 major areas in the existing Segment Replication design which doesn't work well for cross cluster replication. Technically we can get around these by overriding those part of implementation just for CCR, but wanted to check if it is possible to maintain a single solution that works for both cases.

Main crux of the issue is that current approach relies on a 2 way communication where primary can talk to replica and the vice versa. This doesn't work well for cross cluster use cases as we establish uni-directional connections where follower polls the leader and get all the data. We did explore whether we can create bi-directional cross cluster connections but decided to go against it.

Why not bi-directional connection for CCR?
Pros:

Can re-use Segment Replication without additional changes.
Can allow follower to react to changes on leader (instead of polling)(e.g. listen to leader index events like close). But might put additional overhead on leader to communicate with followers (specially with multiple followers).

Cons:

Breaking change from the existing connection mechanism which is currently used for cross-cluster search as well as replication.
Bidirectional connection makes version upgrades tricky.
- With current unidirectional connection, we recommend the users to upgrade follower first to ensure that CCR doesn't break during upgrade. With introduction of bi-directional connections, both clusters can be leader for different indices which will result in cyclic dependencies. Thus user can't upgrade the clusters without halting the replication.
From CCR long term perspective, it doesn't add substantial value for us to justify a major overhaul of the current design. IMHO, it'd be much easier to modify the Segment Replication design unless there is a strong reason not to do so.

These are the 2 changes that would align the existing Segment Replication implementation with CCR. Kudos to the folks working on the original design who ensured such implementation swap can be done easily without modifying other other sub-components.

1. Use a pull mechanism for downloading the segments (instead of push)
Current implementation for Segment Replication relies on the peer recovery constructs where it pushes the segments from primary's node to replica's node using the MultiChunkTransfer by invoking write chunk on the replica's node. We can change the direction here where replica requests each chunk from leader and stores locally. This might require some additional book-keeping on the replica side but should be do-able. We're already doing something similar for CCR where we fetch the segment from leader cluster by treating it as a snapshot repository here
Also, this align with the remote store integration where replica would need to pull the segments from primary's remote store.

2. Use a polling mechanism on replica instead of listeners on primary refresh
Instead of listening to refreshes on Primary, we can model it as a periodic polling where each replica polls for the latest checkpoint and trigger the replication event. The change in itself should be simple to do but we might want to do a benchmark comparison to ensure that there is no regression.

The text was updated successfully, but these errors were encountered:

mch2 · 2022-09-26T19:00:59Z

@ankitkala Thanks for raising this. Imo the right idea long-term is to let users configure the behavior based on need & add support for a pull model. However, with segrep having two sources of copy in the near future (primary & remote store), is it worth adding a pull model with the primary only implementation of CCR now or should we start with CCR+segrep support only with a remote store?

Bukhtawar · 2022-09-27T06:16:10Z

Thanks @mch2 I think we should also think about how the interactions would look like if we just had remote store integration with primary(even outside CCR use case). Wouldn't that inherently be a pull based mechanism, where replicas poll for checkpoints and download segments diffs. So I think pull based mechanism will inherently be used even in local cluster with remote store. Now the only place we would be doing a push is in-cluster segrep with a local store.
That's something we need to analyse if there are any known implications of changing that to pull. Let me know if we've done a similar analysis to conclude one way or the other

gbbafna · 2022-09-27T11:27:32Z

How are we envisioning OpenSearch users to use the SegRep feature :

in-cluster segrep with a local store.
in-cluster segrep with remote store

If the ~~first~~second one is the way forward for most of them , we might not need to solve for problems like #4245

ankitkala · 2022-09-27T15:05:00Z

is it worth adding a pull model with the primary only implementation of CCR now or should we start with CCR+segrep support only with a remote store

@mch2 For CCR, SegRep with remote store is definitely the first choice for us. Whether we want to support only one combination or both needs to be decided(will bring this up during feature brief). But if we want to keep the option to support segrep without remote store open, it might be worthwhile to do these refactors early on.

For local SegRep, are there any pros of not relying on remote store? The only reason i can think of is the additional effort required to setup the repository which might affect the adoption.

ankitkala · 2022-09-27T15:08:16Z

How are we envisioning OpenSearch users to use the SegRep feature :

in-cluster segrep with a local store.

in-cluster segrep with remote store

If the first one is the way forward for most of them , we might not need to solve for problems like #4245

Isn't it other way round? With remote store, we won't have to solve for the network bandwidth.

gbbafna · 2022-09-28T04:21:17Z

@ankitkala : Thanks , corrected my comment.

For local SegRep, are there any pros of not relying on remote store? The only reason i can think of is the additional effort required to setup the repository which might affect the adoption.

Most of the users already use snapshot for durability. So setting up repository shouldn't be a hindrance IMO. Cost might be a reason not to rely on remote store . @mch2 @Bukhtawar would know better on this .

I prefer that there are few modes for OpenSearch to operate for ease of operations and maintenance.

ankitkala · 2023-04-26T13:11:51Z

Reviving this discussion again since now we've been thinking towards segments and remote store integration.
To disambiguate the discussion, the directionality(push or pull) can be thought either in terms of data flow or control flow.

Control flow
Control flow would determine the trigger point for replicas to fetch the changes.

Pull based
A pull based approach would rely on a polling mechanism where replica keeps polling(leader node or remote store) for new changes.
Pros:

Complete reader write separation.

Cons:

Excessive Polling can be an issue if there are large number of shards. Similarly, we'd also be polling redundantly for primaries which aren't taking any active write traffic.

Push based
A push based approach would imply using event driven mechanism where primary shard would notify the replica after the new segments are available(i.e. refresh incase of local segrep and segemnt uploaded to remote store for segrep with remote store).

Pros:

Simplified and deterministic behaviour for when the replica fetches the new changes.
Not a one way door. We can easily flip to a pull if required in future.

Cons:

Replica can fall behind if it can't process(or misses) a particular notification. Though not ideal, this should automatically resolve with next refresh in most cases. Another variant of such failure is what was observed here with relocation and wait_until requests.
Not complete separation of reader/writer. The interaction between primary(writer) and reader(replica) is minimal though(with remote store) and definitely can be fixed eventually (for example with a messaging queue).

Data flow:
For data flow,
- Pull based model aligns with segment replication using remote store where replica directly downloads the segments from remote store.
- For local segrep cases, the current approach is push based which is how the peer recovery is implemented as well. It can be moved to a pull based approach but there aren't any benefits IMO.

Compatibility with CCR:
Talking about the primary use case which is using segment replication with remote store, CCR can keep parity with the approach for local segrep with remote store, where primary sends a notification to the replica/follower and then replica/follower downloads the new segments from remote store.

One major blocking issue can the the bi-directional nature of the communication between leader and follower. As of now, we've not been thinking about CCR without remote store, so the data flow of local segrep shouldn't be an issue.

Just to conclude, we should still keep the control flow a push based mechanism. We'll need additional handling for cases like wait_until though.
Data flow for segrep with remote store will be a pull based mechanism. For local segrep cases, we can explore whether we want to align it with segrep + remote implementation but making it pull, but i don't see enough ROI and is definitely not urgent to picked up.

anasalkouz · 2023-05-04T16:16:02Z

Closing this, we don't see a need for this at the moment.

Bukhtawar · 2024-05-17T06:24:04Z

Re-opening this as we want to evaluate and streamline mechanisms across local and remote replicas

ankitkala added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 23, 2022

kotwanikunal added the distributed framework label Sep 26, 2022

anasalkouz added discuss Issues intended to help drive brainstorming and decision making and removed untriaged labels Sep 27, 2022

dreamer-89 mentioned this issue Sep 29, 2022

[Segment Replication] Remote store integration high level design. #4555

Closed

ankitkala mentioned this issue Oct 10, 2022

[RFC] Using Segment Replication for Cross Cluster Replication opensearch-project/cross-cluster-replication#550

Closed

mch2 mentioned this issue Nov 8, 2022

[META] Segment Replication Issue list #2194

Closed

31 tasks

anasalkouz assigned ankitkala Apr 20, 2023

anasalkouz closed this as completed May 4, 2023

mch2 mentioned this issue Oct 16, 2023

[Segment Replication] Add background checkpoint sync making SegRep more resilient to network failure #10652

Open

Bukhtawar reopened this May 17, 2024

github-actions bot added the untriaged label May 17, 2024

andrross added Indexing:Replication Issues and PRs related to core replication framework eg segrep and removed distributed framework untriaged labels May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pull based model for Segment Replication #4577

Use pull based model for Segment Replication #4577

ankitkala commented Sep 23, 2022 •

edited

mch2 commented Sep 26, 2022 •

edited

Bukhtawar commented Sep 27, 2022

gbbafna commented Sep 27, 2022 •

edited

ankitkala commented Sep 27, 2022

ankitkala commented Sep 27, 2022

gbbafna commented Sep 28, 2022

ankitkala commented Apr 26, 2023

anasalkouz commented May 4, 2023

Bukhtawar commented May 17, 2024

Use pull based model for Segment Replication #4577

Use pull based model for Segment Replication #4577

Comments

ankitkala commented Sep 23, 2022 • edited

mch2 commented Sep 26, 2022 • edited

Bukhtawar commented Sep 27, 2022

gbbafna commented Sep 27, 2022 • edited

ankitkala commented Sep 27, 2022

ankitkala commented Sep 27, 2022

gbbafna commented Sep 28, 2022

ankitkala commented Apr 26, 2023

anasalkouz commented May 4, 2023

Bukhtawar commented May 17, 2024

ankitkala commented Sep 23, 2022 •

edited

mch2 commented Sep 26, 2022 •

edited

gbbafna commented Sep 27, 2022 •

edited