-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics #15354
Conversation
…tribution of incoming metrics
Thanks so much for the pull request! |
!signed-cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
If we are going to take a processor like this, the messaging to the user needs to be improved, but I still need to talk to the rest of the team if this is something we wish to support as well. I've given some initial comments.
If I send 3 metrics and use your traffic shaper as follows:
[[inputs.exec]]
commands = [
"echo metric,host=a value=42",
"echo metric,host=b value=1",
"echo metric,host=c value=2",
]
data_format = "influx"
[[processors.traffic_shaper]]
samples = 1
buffer_size = 10000
I still see all 3 metrics sent at each interval.
I see time unit is not exposed in the config, which it should be, and defaults to 1 second. If I change this to 10 seconds to match the flush interval, I then see 1, sometimes 2, metrics get produces.
What I don't see is the processors buffer size at any given time. I think this is a major issue as a user would have no way to know or gauge if they are not sending enough metrics at any given time.
Thanks
|
||
## No of samples to be emitted per time unit, default is seconds | ||
## This should be used in conjunction with number of telegraf instances. | ||
samples = 20000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaults can be commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
## Buffer Size | ||
## If buffer is full the incoming metrics will be dropped | ||
buffer_size = 1000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expose the time unit option as a config.Duration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done added rate in the config
output traffic is uniform | ||
|
||
Example of uneven traffic distribution | ||
![traffic_distribution](./docs/traffic_distribution.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we omit the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Queue chan *telegraf.Metric | ||
Acc telegraf.Accumulator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope have changed it.
func (t *TrafficShaper) Stop() { | ||
t.Log.Debugf("Got stop signal %s", time.Now().String()) | ||
close(t.Queue) | ||
t.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will block telegraf from exiting until all metrics are flushed from the queue? Not sure this is the behavior we want. When someone closes or stops telegraf, things should clean up, but this could block for 100s or 1000s of seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this as a config so that for users can choose accordingly.
"github.com/influxdata/telegraf/metric" | ||
"github.com/influxdata/telegraf/testutil" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include a test with tracking metrics. See the other processors for examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Have added rate time interval as config, we have exposed metrics like messagesInFlight for observability. |
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics
Summary
Use Case
An in-memory traffic shaper processor which evens out traffic so that output traffic is uniform
We use telegraf as a proxy and we receive data that is spiky in nature, in our every 10 minute we receive a spike and this affects our downstream systems since it needs to also process at the same rate, this leads to wastage of resource since the cpu, memory needs to be provisioned for peaks
Screenshot of spiky behavior before and after using this plugin it is visible that after 1:00 the output rate is steady.
Checklist
Related issues
resolves #15353