Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/deltatocumulative] limit tracked streams #31488

Merged
merged 5 commits into from Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 11 additions & 0 deletions internal/exp/metrics/staleness/staleness.go
Expand Up @@ -13,6 +13,11 @@ import (
// We override how Now() is returned, so we can have deterministic tests
var NowFunc = time.Now

var (
_ streams.Map[any] = (*Staleness[any])(nil)
_ streams.Evictor = (*Staleness[any])(nil)
)

// Staleness a a wrapper over a map that adds an additional "staleness" value to each entry. Users can
// call ExpireOldEntries() to automatically remove all entries from the map whole staleness value is
// older than the `max`
Expand Down Expand Up @@ -82,3 +87,9 @@ func (s *Staleness[T]) Next() time.Time {
_, ts := s.pq.Peek()
return ts
}

func (s *Staleness[T]) Evict() identity.Stream {
id, _ := s.pq.Pop()
s.items.Delete(id)
return id
}
6 changes: 6 additions & 0 deletions internal/exp/metrics/streams/streams.go
Expand Up @@ -50,3 +50,9 @@ func (m HashMap[T]) Items() func(yield func(identity.Stream, T) bool) bool {
func (m HashMap[T]) Len() int {
return len((map[identity.Stream]T)(m))
}

// Evictors remove the "least important" stream based on some strategy such as
// the oldest, least active, etc.
type Evictor interface {
Evict() identity.Stream
}
32 changes: 28 additions & 4 deletions processor/deltatocumulativeprocessor/internal/streams/limit.go
Expand Up @@ -11,20 +11,31 @@ import (
"github.com/open-telemetry/opentelemetry-collector-contrib/internal/exp/metrics/streams"
)

func Limit[T any](m Map[T], max int) Map[T] {
func Limit[T any](m Map[T], max int) LimitMap[T] {
return LimitMap[T]{Map: m, Max: max}
}

type LimitMap[T any] struct {
Max int

Evictor streams.Evictor
streams.Map[T]
}

func (m LimitMap[T]) Store(id identity.Stream, v T) error {
if m.Map.Len() >= m.Max {
return ErrLimit(m.Max)
if m.Map.Len() < m.Max {
return m.Map.Store(id, v)
}

errl := ErrLimit(m.Max)
if m.Evictor != nil {
gone := m.Evictor.Evict()
if err := m.Map.Store(id, v); err != nil {
return err
}
return ErrEvicted{ErrLimit: errl, id: gone}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return an error? Yes we had to evict something, but the store itself was a "success". IMO, these aren't actual errors. Just statistics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point they are errors imo, the consumer (processor) should decide what to do with them.
when #31363 is merged, those will turn into metrics.
Until then I think they are best treated as errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But I guess my point is that eviction itself isn't an error. It's just something that happens. IE, eviction should not increment the metrics_processed{error=true} metric (I don't recall the correct name :P ).

Perhaps the eviction yes/no value should be returned as an additional return, instead of being convolved with the error status. Thoughts?

Copy link
Contributor Author

@sh0rez sh0rez Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree eviction should not increment that specific counter, but rather be streams_evicted, so you can easily alert on that. Imo eviction should never be part of normal operations, as it signals a fairly serious capacity issue and thus is very close to an error scenario.

I think it's a not uncommon thing in Go to use the Error for non-fatal occurrences, as that's the whole point of having a value-based, easy to manipulate error system.

}
return m.Map.Store(id, v)
return errl
}

type ErrLimit int
Expand All @@ -37,3 +48,16 @@ func AtLimit(err error) bool {
var errLimit ErrLimit
return errors.As(err, &errLimit)
}

type ErrEvicted struct {
ErrLimit
id Ident
}

func (e ErrEvicted) Error() string {
return fmt.Sprintf("%s. evicted stream %s", e.ErrLimit, e.id)
}

func (e ErrEvicted) Unwrap() error {
return e.ErrLimit
}
Expand Up @@ -33,3 +33,5 @@ func (a MapAggr[D]) Aggregate(id Ident, dp D) (D, error) {
v, _ := a.Map.Load(id)
return v, err
}

type Evictor = streams.Evictor
6 changes: 5 additions & 1 deletion processor/deltatocumulativeprocessor/processor.go
Expand Up @@ -54,7 +54,11 @@ func newProcessor(cfg *Config, log *zap.Logger, next consumer.Metrics) *Processo
dps = &exp
}
if cfg.MaxStreams > 0 {
dps = streams.Limit(dps, cfg.MaxStreams)
lim := streams.Limit(dps, cfg.MaxStreams)
if proc.exp != nil {
lim.Evictor = proc.exp
}
dps = lim
}

proc.aggr = streams.IntoAggregator(dps)
Expand Down