Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read timeout to socket_listener #2571

Closed
wants to merge 9 commits into from

Conversation

soldierkam
Copy link
Contributor

@soldierkam soldierkam commented Mar 24, 2017

Add TCP timeout support to socket_listener

Required for all PRs:

  • CHANGELOG.md updated (we recommend not updating this until the PR has been approved by a maintainer)
  • Sign CLA (if not already signed)
  • README.md updated (if adding a new plugin)

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the rest of telegraf, we should use an internal.Duration for the duration.

Perhaps we should have a default though to avoid half open sockets. What do you think @phemmer ?

## Read deadline (timeout).
## Only applies to stream sockets (e.g. TCP).
## 0 (default) is unlimited.
# read_deadline = 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be timeout, the reason is that it is a relative time where a deadline would be absolute and I think the "read" part is implied because of the input type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well maybe read_timeout is better, naming is so hard :)

@phemmer
Copy link
Contributor

phemmer commented Mar 31, 2017

Perhaps we should have a default though to avoid half open sockets. What do you think @phemmer ?

Hrm, hard question. I think I lean more towards no default. If the socket being left open and idle is a problem, then I would say it's up to the sender to shut it down, not the recipient. In fact the recipient shutting it down can cause data loss if it shuts it down while a message is in transit. So yeah, definitely no default.
However this does raise the issue of dead connection detection. We probably should add a feature for controlling TCP keep alives.

Oh, and we should never have a half-open socket. Once the sender closes their side, telegraf shuts down its side.

@phemmer
Copy link
Contributor

phemmer commented Mar 31, 2017

Oh, why TCP only? Read timeouts (SetReadDeadline) is valid for datagram (UDP) sockets too.

Nevermind, yes it's valid, but different use case. For datagram we only have a single socket. It's not TCP where we have a separate socket for every connection. We don't want to shut down the whole socket if it's idle.

@danielnelson
Copy link
Contributor

What distinguishes a half-open socket from a dead connection?

@phemmer
Copy link
Contributor

phemmer commented Mar 31, 2017

What distinguishes a half-open socket from a dead connection?

Half-open is when one side has issued a Close() on the socket but the other side has not. It's perfectly acceptable (as far as the TCP protocol goes) for the reading side of a socket to be closed, but still send data on the writing side.

Dead connection can happen on a full-open, or half-open. Without TCP keep alives, the recipient will never notice if a connection is severed uncleanly (no FIN/RST packet sent. E.G. sender's power cord pulled). With the TCP keep alive, one side sends an empty packet, and the other side responds with an ACK. No response means dead connection.

@nhaugo nhaugo added this to the 1.3.0 milestone Mar 31, 2017
Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead an update the CHANGELOG.md file as well.

@@ -112,6 +117,7 @@ type SocketListener struct {
ServiceAddress string
MaxConnections int
ReadBufferSize int
ReadTimeout int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been more clear, this is the value that should be an internal.Duration. This will allow people to specify the timeout like "30s", so make sure to update the README as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's me. Sorry for that.

-Rename timeout parameter
-Use internal.Duration
Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to SetReadDeadline before the first read.

@@ -78,6 +80,9 @@ func (ssl *streamSocketListener) read(c net.Conn) {
for _, m := range metrics {
ssl.AddFields(m.Name(), m.Fields(), m.Tags(), m.Time())
}
if ssl.ReadTimeout.Duration > 0 {
c.SetReadDeadline(time.Now().Add(ssl.ReadTimeout.Duration * time.Second))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is a duration we don't want to multiple it

@phemmer
Copy link
Contributor

phemmer commented Apr 7, 2017

I think we need to SetReadDeadline before the first read.

Correct. I'd recommend restructuring the loop a bit. Meaning don't set the deadline before the loop, and then again at the end of it. Just do something like:

for {
  if ssl.ReadTimeout.Duration > 0 { set deadline here }
  if !scnr.Scan() { break }
  ...
}

@danielnelson
Copy link
Contributor

When testing I receive:

2017-04-07T23:16:32Z E! Error in plugin [inputs.socket_listener]: read tcp [::1]:8094->[::1]:60636: i/o timeout

I'm just connecting with nc localhost 8094 and sending nothing.

@soldierkam
Copy link
Contributor Author

What "read_timeout" value you've got in config file and how long it take to receive this message?
I tested this on Linux 4.4.1 and it works ok. Im using telnet for testing.
I receive this message only if I set read_timeout to some value and I do not send anything for that amount of time. It would be better if this will be warning message.

@danielnelson
Copy link
Contributor

Yeah this is on the actual timeout. I think we shouldn't log anything on timeout, or it should be debug level.

@danielnelson danielnelson modified the milestones: 1.3.0, 1.4.0 Apr 27, 2017
@soldierkam
Copy link
Contributor Author

How can I log it at debug level? Accumulator have only AddError method. I see three options:

  1. add AddDebug method to Accumulator
  2. add Debug flag to plugin conf and use it like in https://github.com/influxdata/telegraf/blob/master/plugins/inputs/mailchimp/chimp_api.go#L150
  3. just log message without if statement like in https://github.com/influxdata/telegraf/blob/master/internal/models/makemetric.go#L125

@danielnelson
Copy link
Contributor

Go with #3 for now.

@danielnelson
Copy link
Contributor

Merged as f5a8415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants