Skip to content

Conversation

yaauie
Copy link
Contributor

@yaauie yaauie commented Oct 3, 2022

RETROACTIVE EDIT TO HIGHLIGHT ISSUE

v3.1.1 of this plugin made changes to attempt to make the plugin more-interruptible, but had the undesired side-effect of very high CPU in the default configuration of data_timeout => -1 (no timeout, read repeatedly until socket stops existing).

A workaround was suggested along with the bug report to set the data_timeout to a very high number (like 315360000 "ten years") which prevents the high-CPU issue, but a plugin in this configuration connected to a socket that was not receiving traffic could not be interrupted because it is blocking on IO#Select for the entire data_timeout and not checking for plugin shutdown state.

In order to both observe shutdown state and the default no-timeout for the data channel, we need to decouple blocking IO limits from our data_timeout.


Introduces Unix#io_interruptable_readpartial, which emulates IO#readpartial while observing both the plugin's stop condition and an optional timeout.

Internally, this helper limits its blocking calls to 10s or less whether or not the plugin is configured with a data_timeout, allowing it to periodically observe the plugin's stop condition to exit gracefully in a reasonable amount of time when the socket is not receiving writes.

This fixes a regression introduced in v3.1.1 of this plugin (#29) in which the default configuration of data_timeout => -1 (timeout disabled) consumed excess CPU attempting non-blocking reads of the IO when it had no readable bytes.

Introduces `Unix#io_interruptable_readpartial`, which emulates `IO#readpartial`
while observing both the plugin's stop condition and an optional timeout.

Internally, this helper limits its blocking calls to 10s or less whether or not
the plugin is configured with a `data_timeout`, allowing it to periodically
observe the plugin's stop condition to exit gracefully in a reasonable amount
of time when the socket is not receiving writes.

This fixes a regression introduced in v3.1.1 of this plugin in which the
default configuration of `data_timeout => -1` (timeout disabled) consumed
excess CPU attempting non-blocking reads of the IO when it had no readable
bytes.
Add link to PR
result = readable_io.read_nonblock(maxlen, exception: false)

return result if result.kind_of?(String)
raise EOFError if result.nil?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a case where esult.nil? is true?

Copy link

@mashhurs mashhurs Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on IO#read_nonblock doc:
At EOF, it will return nil instead of raising EOFError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When using IO#read_nonblock with exception: false we must handle all of the failure conditions ourselves.

@logger.info("Closing connection after read timeout", :path => @path)
return
elsif data == :stopping
@logger.trace("Shutdown in progress", :path => @path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make the log info level for better visibility?

Copy link
Contributor Author

@yaauie yaauie Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally added at trace level because this is noise about the plugin stopping reads from the socket that is not likely to be useful when a user is going about their normal business and already has signal about the plugin and pipeline getting closed.

Copy link

@mashhurs mashhurs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yaauie yaauie merged commit 68a1982 into logstash-plugins:main Oct 3, 2022
@yaauie yaauie deleted the data_timeout-fixes branch October 3, 2022 18:09
@yaauie yaauie self-assigned this Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants