-
Notifications
You must be signed in to change notification settings - Fork 94
docs(ADR): extends the fractional operator to support up to .001% distributions #1800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4f0f886
15ea16c
8ac7bb3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,279 @@ | ||
| --- | ||
| # Valid statuses: draft | proposed | rejected | accepted | superseded | ||
| status: draft | ||
| author: Michael Beemer | ||
| created: 2025-09-10 | ||
| updated: 2025-09-10 | ||
| --- | ||
|
|
||
| # High-Precision Fractional Bucketing for Sub-Percent Traffic Allocation | ||
|
|
||
| This ADR proposes enhancing the fractional operation to support high-precision traffic allocation down to 0.001% granularity by increasing the internal bucket count from 100 to 100,000 while maintaining the existing weight-based API. | ||
|
|
||
| ## Background | ||
|
|
||
| The current fractional operation in flagd uses a 100-bucket system that maps hash values to percentages in the range [0, 100]. | ||
| This approach works well for most use cases but has significant limitations in high-throughput environments where precise sub-percent traffic allocation is required. | ||
|
|
||
| Currently, the smallest allocation possible is 1%, which is insufficient for: | ||
|
|
||
| - Gradual rollouts in ultra-high-traffic systems where 1% could represent millions of users | ||
| - A/B testing scenarios requiring precise control over small experimental groups | ||
| - Canary deployments where operators need to start with very small traffic percentages (e.g., 0.1% or 0.01%) | ||
|
|
||
| The current implementation in `fractional.go` calculates bucket assignment using: | ||
|
|
||
| ```go | ||
| bucket := hashRatio * 100 // in range [0, 100] | ||
| ``` | ||
|
|
||
| This limits granularity to 1% increments, making it impossible to achieve the precision required for sophisticated traffic management strategies. | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Support traffic allocation precision down to 0.001% (3 decimal places) | ||
| - Maintain backwards compatibility with existing weight-based API | ||
| - Preserve deterministic bucketing behavior (same hash input always produces same bucket) | ||
| - Ensure consistent bucket assignment across different programming languages | ||
| - Support weight values up to a reasonable maximum that works across multiple languages | ||
| - Maintain current performance characteristics | ||
| - Prevent users from being moved between buckets when only distribution percentages change | ||
| - Guarantee that any variant with weight > 0 receives some traffic allocation | ||
| - Handle edge cases gracefully without silent failures | ||
| - Validate weight configurations and provide clear error messages for invalid inputs | ||
|
|
||
| ## Considered Options | ||
|
|
||
| - **Option 1: 10,000 buckets (0.01% precision)** - 1 in every 10,000 users, better but still not sufficient for many high-throughput use cases | ||
| - **Option 2: 100,000 buckets (0.001% precision)** - 1 in every 100,000 users, meets most high-precision needs | ||
| - **Option 3: 1,000,000 buckets (0.0001% precision)** - 1 in every 1,000,000 users, likely overkill and could impact performance | ||
|
Comment on lines
+47
to
+49
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any obvious reason that we don't want the max bucket amount to be the sum of all bucket ratios? ie: This is:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to think about this a bit more. I don't think my second image is accurate because, despite the bucket sizes changing, the location across the distribution should be consistent. I'll need to run a few tests to see which approach is better.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand the challenge here properly (and I am not 100% sure so please correct me), we want to ensure that, for a given value "X", it's always distributed into the same bucket no matter how we express the buckets proportion. So, [0.3, 0.5, 0, 0.2], [3, 5, 2], [9, 15, 6], etc should all work the same, across all platforms, all languages. Additionally, we would like to ensure that very skewed distributions (e.g. [0.1, 1000000]) don't end up simplifying some buckets to 0. Lastly, we need to be cautious about floating point arithmetic. Overall, this is a pretty complex problem :D To meet all these requirements, I think, we need to implement some sort of integer-based bucket normalization to get the canonical representation of the buckets proportion (e.g. the examples above would all normalize to [3, 5, 2]). To achieve that we might need the following:
The challenge with that approach though is that now we can have the sum of all the buckets be larger than the maximum hash... There are ways around this (that would require using a different approach to "hashing" or downscaling the buckets to fit in int) or we can just say that such cases constitute invalid inputs (I think that saying that the minimum valid resolution is 0.001% essentially guarantees that?).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ya I think you understand the challenge. I spoke with @beeme1mr a bit about this, and we suspect my proposal here might solve most of our concerns. I don't think we "lose determinism" in any substantially different way than alternatives. We can lean on JSON schema to mark any non-integer inputs as invalid - we don't really need to support decimals here... as long as we give users the ability to describe relative weights (this can be done with ints, obviously) we can sidestep that mess, IMO. Overflows might still be a concern for high numbers, but we can also specify and document limits to the total weight and error if we exceed that - that might be better than calculating a GCD, which adds a performance cost that's not going to be necessary in most cases (I'm quite confident most people will just use configs with weights like
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm going to do a little PoC for this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even if we get rid of floats, in your proposal, how would we deal with same proportions but different values, e.g. [3, 5, 2], [9, 15, 6]? |
||
|
|
||
| ## Proposal | ||
|
|
||
| Implement a 100,000-bucket system that provides 0.001% precision while maintaining the existing integer weight-based API. | ||
|
|
||
| ### API changes | ||
|
|
||
| No API changes are required. The existing fractional operation syntax remains unchanged: | ||
|
|
||
| ```json | ||
| "fractional": [ | ||
| { "cat": [{ "var": "$flagd.flagKey" }, { "var": "email" }] }, | ||
| ["red", 50], | ||
| ["blue", 30], | ||
| ["green", 20] | ||
| ] | ||
| ``` | ||
|
|
||
| ### Implementation Changes | ||
|
|
||
| 1. **Bucket Count**: Change from 100 to 100,000 buckets by modifying bucket calculation from `hashRatio * 100` to `hashRatio * 100000` | ||
| 2. **Minimum Allocation Guarantee**: Any variant with weight > 0 receives at least 1 bucket (0.001%) | ||
| 3. **Excess Bucket Handling**: Remove excess buckets from the largest variant to maintain exactly 100,000 total buckets | ||
| 4. **Weight Sum Validation**: Reject configurations where total weight exceeds maximum safe integer value | ||
| 5. **Maximum Weight Sum**: Use language-specific maximum 32-bit signed integer constants for cross-platform compatibility | ||
|
|
||
| ### Minimum Allocation Guarantee | ||
|
|
||
| To prevent silent configuration failures, any variant with a positive weight will receive at least 0.001% allocation (1 bucket), even if the calculated percentage would round to zero. This ensures predictable behavior where positive weights always result in some traffic allocation. | ||
|
|
||
| **Example**: Configuration `["variant-a", 1], ["variant-b", 1000000]` | ||
|
|
||
| - Without guarantee: variant-a gets 0% (never selected) | ||
| - With guarantee: variant-a gets 0.001%, variant-b gets 99.999% | ||
|
|
||
| ### Excess Bucket Management | ||
|
|
||
| When minimum allocations cause the total to exceed 100,000 buckets, excess buckets are removed from the variant with the largest allocation. | ||
| This approach: | ||
|
Comment on lines
+87
to
+88
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The description of how excess buckets are handled is slightly inconsistent. This section states that excess buckets are removed from 'the variant with the largest allocation' (singular), while the 'Edge Case Handling' section on line 127 refers to it as 'Excess distributed fairly among largest variants' (plural). The code example shows a sequential removal process. For clarity and consistency, I suggest refining the description to accurately reflect the implementation, for example: 'Excess buckets are removed sequentially from variants with the largest allocations, starting with the largest, until the total bucket count is exactly 100,000.' |
||
|
|
||
| - Maintains the minimum guarantee for small variants | ||
| - Has minimal impact on large variants (small relative reduction) | ||
| - Preserves deterministic behavior | ||
| - Prevents bucket count overflow | ||
|
|
||
| ### Weight Sum Validation | ||
|
|
||
| When the total weight sum exceeds the maximum safe integer value, the fractional evaluation will return a validation error with a clear message. | ||
| This prevents integer overflow issues and provides immediate feedback to users about invalid configurations. | ||
|
|
||
| ```go | ||
| import "math" | ||
|
|
||
| func validateWeightSum(variants []fractionalEvaluationVariant) error { | ||
| var totalWeight int64 = 0 | ||
| for _, variant := range variants { | ||
| totalWeight += int64(variant.weight) | ||
| if totalWeight > math.MaxInt32 { | ||
| return fmt.Errorf("total weight sum %d exceeds maximum allowed value %d", | ||
| totalWeight, math.MaxInt32) | ||
| } | ||
| } | ||
| return nil | ||
| } | ||
| ``` | ||
|
|
||
| Implementations should prefer built-in language constants (e.g., `math.MaxInt32` in Go, `Integer.MAX_VALUE` in Java, `int.MaxValue` in C#) rather than hardcoded values to ensure maintainability and clarity. | ||
|
|
||
| ### Edge Case Handling | ||
|
|
||
| The implementation addresses several edge cases: | ||
|
|
||
| 1. **All weights are 0**: Returns empty string (maintains current behavior) | ||
| 2. **Negative weights**: Treated as 0 (maintains current validation behavior) | ||
| 3. **Single variant**: Receives all 100,000 buckets regardless of weight value | ||
| 4. **Empty variants**: Returns error (maintains current validation behavior) | ||
| 5. **Weight sum overflow**: Returns validation error with clear message | ||
| 6. **Multiple variants with minimum allocation**: Excess distributed fairly among largest variants | ||
|
|
||
| ### Maximum Weight Considerations | ||
|
|
||
| To ensure cross-language compatibility, we establish a maximum total weight sum equal to the maximum 32-bit signed integer value (2,147,483,647). This limit: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a good limit, regardless of this. |
||
|
|
||
| - Works reliably across all target languages (Go, Java, .NET, JavaScript, Python) | ||
| - Provides more than sufficient range for any practical use case | ||
| - Prevents integer overflow issues in 32-bit signed integer systems | ||
| - Allows for extremely fine-grained control (individual weights can be 1 out of 2+ billion) | ||
| - Uses language-native constants for better maintainability | ||
|
|
||
| ### Code Changes | ||
|
|
||
| The following shows how the core logic in `fractional.go` would be modified. | ||
|
|
||
| ```go | ||
| const bucketCount = 100000 | ||
|
|
||
| // bucketAllocation represents the number of buckets allocated to a variant | ||
| type bucketAllocation struct { | ||
| variant string | ||
| buckets int | ||
| } | ||
|
|
||
| func (fe *Fractional) Evaluate(values, data any) any { | ||
| valueToDistribute, feDistributions, err := parseFractionalEvaluationData(values, data) | ||
| if err != nil { | ||
| fe.Logger.Warn(fmt.Sprintf("parse fractional evaluation data: %v", err)) | ||
| return nil | ||
| } | ||
|
|
||
| if err := validateWeightSum(feDistributions.weightedVariants); err != nil { | ||
| fe.Logger.Warn(fmt.Sprintf("weight validation failed: %v", err)) | ||
| return nil | ||
| } | ||
|
|
||
| return distributeValue(valueToDistribute, feDistributions) | ||
| } | ||
|
|
||
| func validateWeightSum(variants []fractionalEvaluationVariant) error { | ||
| var totalWeight int64 = 0 | ||
| for _, variant := range variants { | ||
| totalWeight += int64(variant.weight) | ||
| if totalWeight > math.MaxInt32 { | ||
| return fmt.Errorf("total weight sum %d exceeds maximum allowed value %d", | ||
| totalWeight, math.MaxInt32) | ||
| } | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func calculateBucketAllocations(variants []fractionalEvaluationVariant, totalWeight int) []bucketAllocation { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Properly supporting guaranteed bucketing may add more complexity than I'd like. I'm sure I can address this issue but I'd like feedback on if it's worth supporting this at all. The reason I added this is to avoid configurations like |
||
| allocations := make([]bucketAllocation, len(variants)) | ||
| totalAllocated := 0 | ||
|
|
||
| // Calculate initial allocations | ||
| for i, variant := range variants { | ||
| if variant.weight == 0 { | ||
| allocations[i] = bucketAllocation{variant: variant.variant, buckets: 0} | ||
| } else { | ||
| // Calculate proportional allocation | ||
| proportional := int((int64(variant.weight) * bucketCount) / int64(totalWeight)) | ||
| // Ensure minimum allocation of 1 bucket for any positive weight | ||
| buckets := max(1, proportional) | ||
| allocations[i] = bucketAllocation{variant: variant.variant, buckets: buckets} | ||
| } | ||
| totalAllocated += allocations[i].buckets | ||
| } | ||
|
|
||
| // Handle excess buckets by removing from largest allocation | ||
| excess := totalAllocated - bucketCount | ||
| if excess > 0 { | ||
| // Sort indices by bucket count (descending) to find largest allocation | ||
| indices := make([]int, len(allocations)) | ||
| for i := range indices { | ||
| indices[i] = i | ||
| } | ||
| sort.Slice(indices, func(i, j int) bool { | ||
| if allocations[indices[i]].buckets == allocations[indices[j]].buckets { | ||
| return allocations[indices[i]].variant < allocations[indices[j]].variant // Tie-break by variant name | ||
| } | ||
| return allocations[indices[i]].buckets > allocations[indices[j]].buckets | ||
| }) | ||
|
|
||
| // Remove excess from largest allocation, respecting minimum guarantee | ||
| for _, idx := range indices { | ||
| if excess <= 0 { | ||
| break | ||
| } | ||
|
|
||
| // Don't reduce below 1 bucket if original weight > 0 | ||
| minAllowed := 0 | ||
| if variants[idx].weight > 0 { | ||
| minAllowed = 1 | ||
| } | ||
|
|
||
| canRemove := allocations[idx].buckets - minAllowed | ||
| toRemove := min(excess, canRemove) | ||
| allocations[idx].buckets -= toRemove | ||
| excess -= toRemove | ||
| } | ||
| } | ||
|
|
||
| return allocations | ||
| } | ||
| ``` | ||
|
|
||
| **5. Replace the distribution logic:** | ||
|
|
||
| ```go | ||
| func distributeValue(value string, feDistribution *fractionalEvaluationDistribution) string { | ||
| if feDistribution.totalWeight == 0 { | ||
| return "" | ||
| } | ||
|
|
||
| allocations := calculateBucketAllocations(feDistribution.weightedVariants, feDistribution.totalWeight) | ||
|
|
||
| hashValue := int32(murmur3.StringSum32(value)) | ||
| hashRatio := math.Abs(float64(hashValue)) / math.MaxInt32 | ||
| bucket := int(hashRatio * bucketCount) // in range [0, bucketCount) | ||
|
|
||
| currentBucket := 0 | ||
| for _, allocation := range allocations { | ||
| currentBucket += allocation.buckets | ||
| if bucket < currentBucket { | ||
| return allocation.variant | ||
| } | ||
| } | ||
|
|
||
| return "" | ||
| } | ||
| ``` | ||
|
|
||
| ### Consequences | ||
|
|
||
| - Good, because it enables precise traffic control for high-throughput environments | ||
| - Good, because it matches industry-standard precision offered by leading vendors | ||
| - Good, because it maintains API backwards compatibility | ||
| - Good, because integer weights remain simple to understand and configure | ||
| - Good, because it prevents silent configuration failures through minimum allocation guarantee | ||
| - Good, because excess handling is predictable and fair | ||
| - Good, because weight validation provides clear error messages for invalid configurations | ||
| - Bad, because it represents a behavioral breaking change for existing configurations | ||
| - Bad, because it slightly increases memory usage for bucket calculations | ||
| - Bad, because actual percentages may differ slightly from configured weights due to minimum allocations | ||
|
|
||
| ### Implementation Plan | ||
|
|
||
| 1. Update flagd-testbed with comprehensive test cases for high-precision fractional bucketing across all evaluation modes | ||
| 2. Implement core logic in flagd to support 100,000-bucket system with minimum allocation guarantee and excess handling | ||
| 3. Update flagd providers to ensure consistent behavior and testing across language implementations | ||
| 4. Documentation updates, migration guides, and example configurations to demonstrate the new precision capabilities | ||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this. It isn't necessary if the flag is configured properly but I'm afraid that it wouldn't be that obvious that there's a misconfiguration. This basically prevents 0% distribution if a weight is defined.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also feel conflicted about this, if we were to go forward with a strictly defined max bucket size.
TBH I'm not sure the special handling is worth the possible user confusion in this extreme case.
The other obvious solution is to add a warning at evaluation time (we do similar things for other rules, like invalid semver params)