Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A failed after write refresh can prevent advancing the local checkpoint even when the operations were made durable by the translog #108190

Open
fcofdez opened this issue May 2, 2024 · 1 comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement Team:Distributed Meta label for distributed team

Comments

@fcofdez
Copy link
Contributor

fcofdez commented May 2, 2024

Today, when an after write refresh fails, we consider the AsyncAfterWriteAction as failed:

// TODO: Temporary until we fail unpromotable shard
if (refreshFailure.get() != null) {
respond.onFailure(refreshFailure.get());
} else {
respond.onSuccess(refreshed.get());
}

This prevents from advancing the local checkpoints with the safely persisted seq nos:

primaryResult.runPostReplicationActions(new ActionListener<>() {
@Override
public void onResponse(Void aVoid) {
successfulShards.incrementAndGet();
updateCheckPoints(
primary.routingEntry(),
primary::localCheckpoint,
primary::globalCheckpoint,
() -> decPendingAndFinishIfNeeded()
);
}
@Override
public void onFailure(Exception e) {
logger.trace("[{}] op [{}] post replication actions failed for [{}]", primary.routingEntry().shardId(), opType, request);
// TODO: fail shard? This will otherwise have the local / global checkpoint info lagging, or possibly have replicas
// go out of sync with the primary
finishAsFailed(e);
}
});
}

We should reconsider this behaviour and maybe advance the local checkpoints when the refresh failed for an unpromotable shard.

@fcofdez fcofdez added >enhancement :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. Team:Distributed Meta label for distributed team labels May 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

2 participants