Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DoUntilQuorum: don't use non-zone-aware logging when all zones are required for quorum #403

Merged
merged 4 commits into from
Oct 16, 2023

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Oct 13, 2023

What this PR does:

This PR fixes an issue where DoUntilQuorum may generate more log messages and span events than expected when zone-awareness is enabled but all healthy zones are required for quorum. (For example, zone-awareness is enabled, three zones are running with 100 instances each, but one instance in the first zone is unavailable, so all instances in the other two zones are required for quorum.)

Previously, if zone-awareness was enabled and all healthy zones were required for quorum, both ReplicationSet.MaxErrors and ReplicationSet.MaxUnavailableZones would be 0. DoUntilQuorum would then default to non-zone-aware mode. In non-zone-aware mode, DoUntilQuorum logs a starting request to instance message for every instance.

In contrast, in zone-aware mode, DoUntilQuorum logs a starting requests to zone message for each unique zone.

In the example above, with all instances healthy, DoUntilQuorum would log two or three starting requests to zone messages, but as soon as one instance becomes unhealthy, a subsequent DoUntilQuorum call would log 200 starting request to instance messages.

This PR changes the behaviour of DoUntilQuorum to always run in zone-aware mode if zone-awareness is enabled, even if both ReplicationSet.MaxErrors and ReplicationSet.MaxUnavailableZones are 0.

Which issue(s) this PR fixes:

(none)

Checklist

  • Tests updated
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@charleskorn charleskorn force-pushed the charleskorn/dont-spam-trace-events branch from 3737ba8 to c1ad005 Compare October 13, 2023 06:24
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

ring/replication_set.go Outdated Show resolved Hide resolved
Co-authored-by: Joshua Hesketh <josh@hesketh.net.au>
@charleskorn charleskorn enabled auto-merge (squash) October 16, 2023 01:18
@charleskorn charleskorn merged commit dcd7c1e into main Oct 16, 2023
2 of 3 checks passed
@charleskorn charleskorn deleted the charleskorn/dont-spam-trace-events branch October 16, 2023 01:23
ying-jeanne pushed a commit that referenced this pull request Nov 2, 2023
…required for quorum (#403)

* Add `ReplicationSet.ZoneAwarenessEnabled` field

* Use zone-aware context handling and logging even if no unavailable zones are permitted.

* Add changelog entry

* Update ring/replication_set.go

Co-authored-by: Joshua Hesketh <josh@hesketh.net.au>

---------

Co-authored-by: Joshua Hesketh <josh@hesketh.net.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants