Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ReplicationSet.ZoneCount method. #298

Merged
merged 4 commits into from
May 16, 2023

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented May 15, 2023

What this PR does:

This PR adds a ZoneCount method to ReplicationSet. This is useful for consumers that need to allocate a slice or map with enough space for an entry per zone.

This is an initial naïve implementation. A more complex implementation could take advantage of the fact that many places that create a ReplicationSet know how many zones it contains and so this could be stored in the ReplicationSet. I'm open to feedback on whether this is worth implementing.

Which issue(s) this PR fixes:

(none)

Checklist

  • Tests updated
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci pracucci self-requested a review May 15, 2023 04:56
@charleskorn charleskorn force-pushed the charleskorn/replicaset-zonecount branch from 64715e1 to f6bcd18 Compare May 15, 2023 04:57
@charleskorn charleskorn marked this pull request as ready for review May 15, 2023 04:57
@@ -243,6 +243,17 @@ func (r ReplicationSet) GetAddressesWithout(exclude string) []string {
return addrs
}

// ZoneCount returns the number of unique zones represented by instances within this replication set.
func (r *ReplicationSet) ZoneCount() int {
zones := map[string]struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it's faster if you store the unique set of zones as a slice, because we don't expect more than 3 zones (and iterating a slice 3 times over and over should be faster than lookup up a map). Could you add a microbenchmark and try it, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, the slice lookup is faster for the larger sets we expect to see, and the absolute difference for smaller sets is not substantial:

goos: darwin
goarch: arm64
pkg: github.com/grafana/dskit/ring
                                                           │   map.txt    │              slice.txt               │
                                                           │    sec/op    │    sec/op     vs base                │
ReplicationSetZoneCount/1_instances_per_zone,_1_zones-10     12.83n ±  1%    25.11n ± 0%   +95.71% (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_2_zones-10     24.29n ±  7%    58.20n ± 1%  +139.65% (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_3_zones-10     31.23n ±  4%    97.15n ± 1%  +211.03% (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_1_zones-10     20.98n ±  6%    28.89n ± 0%   +37.67% (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_2_zones-10     42.05n ±  8%    70.84n ± 0%   +68.47% (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_3_zones-10     56.70n ± 11%   119.00n ± 0%  +109.90% (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_1_zones-10     42.93n ± 21%    40.18n ± 0%    -6.42% (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_2_zones-10     88.44n ± 10%   102.60n ± 2%   +16.02% (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_3_zones-10     136.2n ±  2%    183.8n ± 0%   +34.90% (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_1_zones-10    92.34n ±  3%    59.22n ± 0%   -35.87% (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_2_zones-10    201.3n ±  1%    161.4n ± 0%   -19.82% (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_3_zones-10    322.3n ±  3%    297.9n ± 0%    -7.59% (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_1_zones-10   815.9n ±  4%    408.8n ± 0%   -49.90% (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_2_zones-10   1.693µ ±  5%    1.144µ ± 0%   -32.41% (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_3_zones-10   2.777µ ±  7%    2.113µ ± 0%   -23.90% (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_1_zones-10   2.442µ ±  9%    1.172µ ± 0%   -52.01% (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_2_zones-10   5.053µ ± 11%    3.301µ ± 0%   -34.67% (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_3_zones-10   7.953µ ±  2%    6.135µ ± 1%   -22.86% (p=0.002 n=6)
geomean                                                      208.8n          222.2n         +6.45%

                                                           │   map.txt   │          slice.txt          │
                                                           │    B/op     │    B/op     vs base         │
ReplicationSetZoneCount/1_instances_per_zone,_1_zones-10     0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_2_zones-10     0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_3_zones-10      0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_1_zones-10     0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_2_zones-10     0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_3_zones-10      0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_1_zones-10     0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_2_zones-10     0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_3_zones-10      0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_1_zones-10    0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_2_zones-10    0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_3_zones-10     0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_1_zones-10   0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_2_zones-10   0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_3_zones-10    0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_1_zones-10   0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_2_zones-10   0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_3_zones-10    0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
geomean                                                                ¹   44.14       ?
¹ summaries must be >0 to compute geomean

                                                           │   map.txt    │          slice.txt          │
                                                           │  allocs/op   │ allocs/op   vs base         │
ReplicationSetZoneCount/1_instances_per_zone,_1_zones-10     0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_2_zones-10     0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_3_zones-10     0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_1_zones-10     0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_2_zones-10     0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_3_zones-10     0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_1_zones-10     0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_2_zones-10     0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_3_zones-10     0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_1_zones-10    0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_2_zones-10    0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_3_zones-10    0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_1_zones-10   0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_2_zones-10   0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_3_zones-10   0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_1_zones-10   0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_2_zones-10   0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_3_zones-10   0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
geomean                                                                 ¹   1.817       ?
¹ summaries must be >0 to compute geomean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth the complexity of passing the zone count through from places that create ReplicationSets that know how many zones there are? eg. in Ring.Get() we know the number of zones in the set if zone awareness is enabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would wait to see the actual impact on performance before doing any other optimization. Typically operations on the ring in the read path are neglibigle (while they may be significative on the write path).

This is much faster for larger rings, and the absolute difference for
smaller rings is not significant:

goos: darwin
goarch: arm64
pkg: github.com/grafana/dskit/ring
                                                           │   map.txt    │              slice.txt               │
                                                           │    sec/op    │    sec/op     vs base                │
ReplicationSetZoneCount/1_instances_per_zone,_1_zones-10     12.83n ±  1%    25.11n ± 0%   +95.71% (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_2_zones-10     24.29n ±  7%    58.20n ± 1%  +139.65% (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_3_zones-10     31.23n ±  4%    97.15n ± 1%  +211.03% (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_1_zones-10     20.98n ±  6%    28.89n ± 0%   +37.67% (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_2_zones-10     42.05n ±  8%    70.84n ± 0%   +68.47% (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_3_zones-10     56.70n ± 11%   119.00n ± 0%  +109.90% (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_1_zones-10     42.93n ± 21%    40.18n ± 0%    -6.42% (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_2_zones-10     88.44n ± 10%   102.60n ± 2%   +16.02% (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_3_zones-10     136.2n ±  2%    183.8n ± 0%   +34.90% (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_1_zones-10    92.34n ±  3%    59.22n ± 0%   -35.87% (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_2_zones-10    201.3n ±  1%    161.4n ± 0%   -19.82% (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_3_zones-10    322.3n ±  3%    297.9n ± 0%    -7.59% (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_1_zones-10   815.9n ±  4%    408.8n ± 0%   -49.90% (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_2_zones-10   1.693µ ±  5%    1.144µ ± 0%   -32.41% (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_3_zones-10   2.777µ ±  7%    2.113µ ± 0%   -23.90% (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_1_zones-10   2.442µ ±  9%    1.172µ ± 0%   -52.01% (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_2_zones-10   5.053µ ± 11%    3.301µ ± 0%   -34.67% (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_3_zones-10   7.953µ ±  2%    6.135µ ± 1%   -22.86% (p=0.002 n=6)
geomean                                                      208.8n          222.2n         +6.45%

                                                           │   map.txt   │          slice.txt          │
                                                           │    B/op     │    B/op     vs base         │
ReplicationSetZoneCount/1_instances_per_zone,_1_zones-10     0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_2_zones-10     0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_3_zones-10      0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_1_zones-10     0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_2_zones-10     0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_3_zones-10      0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_1_zones-10     0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_2_zones-10     0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_3_zones-10      0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_1_zones-10    0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_2_zones-10    0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_3_zones-10     0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_1_zones-10   0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_2_zones-10   0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_3_zones-10    0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_1_zones-10   0.00 ± 0%     16.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_2_zones-10   0.00 ± 0%     48.00 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_3_zones-10    0.0 ± 0%     112.0 ± 0%  ? (p=0.002 n=6)
geomean                                                                ¹   44.14       ?
¹ summaries must be >0 to compute geomean

                                                           │   map.txt    │          slice.txt          │
                                                           │  allocs/op   │ allocs/op   vs base         │
ReplicationSetZoneCount/1_instances_per_zone,_1_zones-10     0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_2_zones-10     0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/1_instances_per_zone,_3_zones-10     0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_1_zones-10     0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_2_zones-10     0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/2_instances_per_zone,_3_zones-10     0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_1_zones-10     0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_2_zones-10     0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/5_instances_per_zone,_3_zones-10     0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_1_zones-10    0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_2_zones-10    0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/10_instances_per_zone,_3_zones-10    0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_1_zones-10   0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_2_zones-10   0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/100_instances_per_zone,_3_zones-10   0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_1_zones-10   0.000 ± 0%     1.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_2_zones-10   0.000 ± 0%     2.000 ± 0%  ? (p=0.002 n=6)
ReplicationSetZoneCount/300_instances_per_zone,_3_zones-10   0.000 ± 0%     3.000 ± 0%  ? (p=0.002 n=6)
geomean                                                                 ¹   1.817       ?
¹ summaries must be >0 to compute geomean
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Another option to reducing the impact of this counting is to cache it within ReplicationsSet. Maybe an unexported field on the struct numZones which is returned if non-zero; otherwise, the loop you already have runs and stores the result in numZones. But I'm also not sure about the usage pattern of ZoneCount() and whether it will be called multiple times.

@charleskorn
Copy link
Contributor Author

LGTM

Another option to reducing the impact of this counting is to cache it within ReplicationsSet. Maybe an unexported field on the struct numZones which is returned if non-zero; otherwise, the loop you already have runs and stores the result in numZones. But I'm also not sure about the usage pattern of ZoneCount() and whether it will be called multiple times.

The one place I'm planning on using this (Mimir's read path) will only call the method once per request, so caching isn't required for my use case at least.

@charleskorn charleskorn merged commit da491e5 into main May 16, 2023
3 checks passed
@charleskorn charleskorn deleted the charleskorn/replicaset-zonecount branch May 16, 2023 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants