Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve syslog connection handling #4369

Merged
merged 10 commits into from
Jul 17, 2018
Merged

Improve syslog connection handling #4369

merged 10 commits into from
Jul 17, 2018

Conversation

glinton
Copy link
Contributor

@glinton glinton commented Jul 2, 2018

This change reads from a syslog client in a separate step from the log parsing. This allows fewer errors in telegraf's logs about exceeding read timeouts. This should also limit the amount of reconnecting syslog clients must do when streaming logs to telegraf.

Previously, the syslog plugin would immediately start reading from a connection and if data was not immediately provided (or a connection was made and closed), found EOF, expecting a MSGLEN would be printed to telgraf's log.

By reading the data on the connection in a separate step, we are able to handle connection issues separately from parsing issues, thus resulting in less of them. This will also allow for syslog client connections to stay alive longer.

Resolves #4335

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leodido Is timeout handling something we can push down to go-syslog? I think we can't handle this here unless we were to also handle the framing.

return
}
// other error, log and return
s.store(rfc5425.Result{Error: fmt.Errorf("failed reading from syslog client - %s", err.Error())}, acc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use acc.AddError(err) directly.

}

parseMsg:
var p *rfc5425.Parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would only make one parser for each call to handle.

p.ParseExecuting(func(r *rfc5425.Result) {
s.store(*r, acc)
})
n, err := io.Copy(data, conn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this read until EOF? How many messages can syslog send over a single connection. If it allows persistent connection then this could be very large.

Copy link
Contributor

@leodido leodido Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A receiver MUST use the message length to delimit a syslog messages, not EOF.
Ie., 14 octets for the following message <1>1 - - - - -, thus 14 <1>1 - - - - -.

RFC5425 specifies only that receivers SHOULD be able to process messages with lengths up to and including 8192 octets.

There is no maximum.

@leodido
Copy link
Contributor

leodido commented Jul 6, 2018

@danielnelson no we cannot push down to go-syslog imho, RFC5425 says nothing about timeout.

@leodido
Copy link
Contributor

leodido commented Jul 6, 2018

Anyway the found EOF, expecting MSGLEN error it the intended behavior (by RFC) at the moment for empty inputs, test here.

@russorat russorat added this to the 1.7.2 milestone Jul 6, 2018
@danielnelson
Copy link
Contributor

Without duplicating the syslog tcp framing, we can't do the message splitting in Telegraf since we don't know where messages begin or ends. I think this means we have to send in the connection io.Reader.

So, I think we could handle the error in the ParseExecuting function (Syslog.store) by watching for the found EOF, expecting MSGLEN message. The downside of this method is we don't know if the connection was closed, a timeout occured or something else. This makes good error messages hard to produce. Also, we would need to match on the error string which is not preferred. As an aside, it also seems like we need to extend the deadline in this function to prevent errors every $timeout seconds.

However, what I think would be ideal is if go-syslog could return errors when returned from the Read function of the io.Reader provided. The application could then check the error and handle it in an way depending on the type of the io.Reader.

@danielnelson danielnelson modified the milestones: 1.7.2, 1.7.3 Jul 17, 2018
@glinton glinton added the fix pr to fix corresponding bug label Jul 17, 2018
@glinton glinton modified the milestones: 1.7.3, 1.7.2 Jul 17, 2018
@glinton glinton merged commit 69d22af into master Jul 17, 2018
@glinton glinton deleted the bugfix/4335 branch July 17, 2018 22:47
glinton added a commit that referenced this pull request Jul 17, 2018
otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Telegraf Syslog plugin.
4 participants