Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add segment replication source implementation for remote store integration #4793

Closed
Tracked by #4448
dreamer-89 opened this issue Oct 14, 2022 · 2 comments · Fixed by #7653
Closed
Tracked by #4448

Add segment replication source implementation for remote store integration #4793

dreamer-89 opened this issue Oct 14, 2022 · 2 comments · Fixed by #7653
Assignees

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented Oct 14, 2022

With remote store integration, we need to add a new replication source implementation which provides capability to pull data from remote store. Thanks to existing design, the new implementation need to implement the SegmentReplicationSource which works well with existing implementation of SegmentReplicationTargetService and SegmentReplicationTarget.

The new replication source (RemoteSegmentReplicationSource?) needs to provide below method definitions. Replica uses this remote replication source for refreshing its local store. Replica first issues metadata of files on primary via getCheckpointMetadata call, which returns CheckpointInfoResponse. Replica uses this response to first evaluate missing files on its local store and then request same via getSegmentFiles from primary.

  @Override
  public void getCheckpointMetadata(long replicationId, ReplicationCheckpoint checkpoint, ActionListener<CheckpointInfoResponse> listener) {
        // Make call to remote store to build new replication checkpoint metadata.
    }

  @Override
  public void getSegmentFiles(long replicationId, ReplicationCheckpoint checkpoint, List<StoreFileMetadata> filesToFetch, Store store, ActionListener<GetSegmentFilesResponse> listener) {
      // Make call to remote store to fetch the metadata of all files to evaluate diff. Request files missing on local disk store.
  }

Remote store design: #2700
Segment replication integration: #4555
Remote store and segment replication integration: #4555

@mch2
Copy link
Member

mch2 commented May 1, 2023

@ankitkala has put up a draft implementation here - #7028. We can build on this / add tests behind a new setting.

@ankitkala
Copy link
Member

On high level, these are changes that we need. Let me know if i'm missing anything here:

  • Add support for triggering replication event on segment upload(instead of refresh). This should be behind a new feature flag.
  • Add new Replication source to pull the segment diff from the remote store.
  • Refactor the SegmentReplicationTarget so that
    • Option 1: The steps getCheckpointMetadata and finalizeReplication are skipped entirely for remote store which just simplifies the solution for remote store.
    • Option 2: These steps can be supported with remote store. For e.g. if we still want to use StoreFileMetadata for computing the diff(just to keep parity with local SegRep),
      • Option 2.1: Figure out a way to serialize and upload the StoreFileMetadata files to remote store
      • Option 2.2: Reconstruct those from existing metadata file. In the draft PR, we're already reconstructing it but we can also upload the remaining attributes(Version writtenBy & hash) so make it complete.
  • Handle corner cases around recovery, relocation, primary upgrade, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants