Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset the flush interval timer when flush is requested or batch is ready. #8953

Merged
merged 2 commits into from
Mar 9, 2021

Conversation

ivorybilled
Copy link
Contributor

Reset the flush interval timer when flush is requested or batch is ready, so that timer doesn't expire (and then falsely report that flush interval has been exceeded without writing) while one of those flushes is occurring.

resolves #8941

…ady, so that timer doesn't expire while one of those flushes is occurring.
Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤝 ✅ CLA has been signed. Thank you!

@ivorybilled ivorybilled requested a review from ssoroka March 8, 2021 15:08
Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im a tiger

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roar

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shh

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love telegraf! ahahahahaha 🐯🐯🐯🐙

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivorybilled
Copy link
Contributor Author

!retry-checks

@ivorybilled ivorybilled merged commit a6d2c4f into master Mar 9, 2021
@ivorybilled ivorybilled deleted the ResetFlushIntervalOnBatchFullOrTriggered branch March 9, 2021 19:35
ssoroka pushed a commit that referenced this pull request Mar 10, 2021
…ady. (#8953)

* Reset the flush interval timer when flush is requested or batch is ready, so that timer doesn't expire while one of those flushes is occurring.

* Update tick.go

(cherry picked from commit a6d2c4f)
@gnjack
Copy link
Contributor

gnjack commented Sep 6, 2021

@jagularr @ssoroka We've just upgraded to telegraf 1.19.3 and I believe this change has introduced a subtle bug where the buffer is not flushed fully when the metric throughput is higher than that of the batch_size / flush_interval and there are large spikes in metrics to write.

In this scenario ticker.Elapsed() is never true due to output.BatchReady calling ticker.Reset() more frequently than the flush_interval.

output.WriteBatch only flushes a single batch of metrics, unlike ticker.Elapsed() which calls output.Write flushing as many batches as in the buffer in a single call.

If you then have a massive spike of metrics (several multiples of batch_size) then BatchReady will be set to time.Now multiple times in very quick succession.

case r.BatchReady <- time.Now():

This may only actually trigger output.WriteBatch once on the next iteration of the flushLoop flushing only a single batch, leaving many possible batches unflushed. Due to ticker.Reset() being constantly called, output.Write is never called and this backlog of many possible batches in the backlog is never flushed as we are only ever flushing a single batch at a time using output.WriteBatch.

You can see this on the following graphs - metrics build up in the buffer unflushed after every spike in metrics gathered, until we get lucky and flush_interval passes without an early BatchReady induced flush, causing output.Write to be called instead of output.WriteBatch and actually flushing all buffered batches in one method call.
image

@reimda reimda mentioned this pull request Sep 23, 2021
3 tasks
powersj added a commit to powersj/telegraf that referenced this pull request Sep 23, 2021
powersj added a commit to powersj/telegraf that referenced this pull request Sep 23, 2021
powersj added a commit that referenced this pull request Sep 30, 2021
reimda pushed a commit that referenced this pull request Oct 6, 2021
…r batch is ready. (#8953)" (#9800)

This reverts commit a6d2c4f.

(cherry picked from commit 70afc94)
powersj added a commit to powersj/telegraf that referenced this pull request Aug 3, 2022
When Telegraf processes a very large amount of data, such that the
batch size is constantly hit, then Telegraf may be sending batches
constantly. At the same time, the flush interval could be hit. During
the sending of a batch, the ticker for the flush interval can go to
zero, and as a result, the batch will produce a warning about not
sending within the interval time.

Sending batches should not send messages to the user that the interval
expired as they are sepearte paths.

Previously, influxdata#8953 attempted to reset the ticker during the sending of a
batch. This introduced an issue where the buffer was never fully
flushed.

fixes: influxdata#8941
arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023
…ady. (influxdata#8953)

* Reset the flush interval timer when flush is requested or batch is ready, so that timer doesn't expire while one of those flushes is occurring.

* Update tick.go

(cherry picked from commit a6d2c4f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

did not complete within its flush interval
3 participants