Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete example implementation of naive reservoir sampling algorithm #2205

Closed
Tracked by #3756
dyladan opened this issue Dec 8, 2021 · 1 comment · Fixed by #3760
Closed
Tracked by #3756

Incomplete example implementation of naive reservoir sampling algorithm #2205

dyladan opened this issue Dec 8, 2021 · 1 comment · Fixed by #3760
Assignees
Labels
area:sdk Related to the SDK spec:metrics Related to the specification/metrics directory

Comments

@dyladan
Copy link
Member

dyladan commented Dec 8, 2021

In exemplar defaults the SimpleExemplarReservoir is defined as:

SimpleExemplarReservoir This Exemplar reservoir MAY take a configuration parameter for the size of the reservoir pool. The reservoir will accept measurements using an equivalent of the naive reservoir sampling algorithm

bucket = random_integer(0, num_measurements_seen)
if bucket < num_buckets then
 reservoir[bucket] = measurement
end

Reading the article, it seems like the implementation given here is incorrect or at least incomplete. For the first n elements where n is the size of the reservoir the exemplar should be offered to bucket n-1. Only after the reservoir is full should exemplars be sampled by generating the random number.

A more complete example might be:

bucket = num_measurements_seen if num_measurements_seen < reservoir.size else random_integer(0, num_measurements_seen)
if bucket < num_buckets then
	reservoir[bucket] = measurement
end
@dyladan dyladan added the spec:metrics Related to the specification/metrics directory label Dec 8, 2021
@reyang reyang added the area:sdk Related to the SDK label Jan 11, 2022
@SergeyKanzhelev SergeyKanzhelev removed their assignment Feb 18, 2023
@jsuereth
Copy link
Contributor

jsuereth commented Nov 3, 2023

This is a good call out, by simplification was, partially, on purpose, assuming high-load on measurements. However, the version you list is better for low-volume instruments. We should update the example.

In practice, the important aspect of this sampler is that the likelihood of sampling changes as more measurement are seen.

@jsuereth jsuereth self-assigned this Nov 3, 2023
jsuereth added a commit to jsuereth/opentelemetry-specification that referenced this issue Nov 10, 2023
jmacd added a commit that referenced this issue Dec 1, 2023
Fixes #2205
Fixes #3674 
Fixes #3669
Partially fixes #2421

## Changes

- Update example exemplar algorithm to account for initial reservoir
fill
- Update fixed-size defaults to account for memory contention /
optimization in Java impl
- Set a default for exponential histogram aggregation
- Clarify that ExemplarFilter should be configured on MeterProvider
- Make it clear that ONE reservoir is create PER timeseries datapoint
(not one reservoir per view or metric name).
- Allow flexibility in Reservoir `offer` definition based on feedback
from Go impl.

* Related issues #3756

---------

Co-authored-by: David Ashpole <dashpole@google.com>
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK spec:metrics Related to the specification/metrics directory
Projects
None yet
4 participants