New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default connectionCheckTimeout is too long #120

Closed
markdascher opened this Issue Mar 19, 2018 · 6 comments

Comments

3 participants
@markdascher

markdascher commented Mar 19, 2018

#87 fixed the original problem of disconnecting/reconnecting for each message, but a couple related issues were just reported to us:

  • We're still seeing a delay of ~500 ms in between each message, but at least it doesn't disconnect each time.
  • The default for connectionCheckTimeout is actually 500000, rather than the documented 100. That means we're inserting a delay of 500 ms in between each message. UdpConfig looks similar.

In this case, changing it to an actual value of 100 (via NLog.config) seems to solve the problem. Could the fix be that easy, or is there a reason it was set to 500000 initially?

@luigiberrettini

This comment has been minimized.

Show comment
Hide comment
@luigiberrettini

luigiberrettini Mar 20, 2018

Owner

I opened #117 to update documentation on connectionCheckTimeout

As you can see the Socket connection check relies on the Poll method that has a timeout: this means that it should not wait for that timeout on each check but only if the socket is not connected

This is the only way documented to check if the connection was closed, reset, or terminated:
documentation says that using SelectMode.SelectWrite you can determine if the Socket is writable but only with SelectMode.SelectRead you can check for connection problems.

IMHO if you change connectionCheckTimeout to 0 you can disable the check but either disabling it or reducing the timeout does not help when you have network problems: you are simply saying discard connection check after a timeout

Owner

luigiberrettini commented Mar 20, 2018

I opened #117 to update documentation on connectionCheckTimeout

As you can see the Socket connection check relies on the Poll method that has a timeout: this means that it should not wait for that timeout on each check but only if the socket is not connected

This is the only way documented to check if the connection was closed, reset, or terminated:
documentation says that using SelectMode.SelectWrite you can determine if the Socket is writable but only with SelectMode.SelectRead you can check for connection problems.

IMHO if you change connectionCheckTimeout to 0 you can disable the check but either disabling it or reducing the timeout does not help when you have network problems: you are simply saying discard connection check after a timeout

@luigiberrettini luigiberrettini self-assigned this Mar 20, 2018

@markdascher

This comment has been minimized.

Show comment
Hide comment
@markdascher

markdascher Mar 20, 2018

I'm not sure a documentation update is the best solution, as this is causing NLog.Targets.Syslog to be limited to 2 messages/sec by default. We were working with a customer whose logs were over 24 hours behind, because it was working through a gradually-increasing backlog due to exceeding this limit. Reducing connectionCheckTimeout to 100 allows for much higher message throughput.

My understanding of the socket.Poll method is that it returns under three conditions: available data, connection closed, or timeout expired. For syslog, first two conditions only happen at the end of the connection, so 99% of the time we're going to wait for the full timeout to expire.

Honestly, passing an actual timeout of 0 to socket.Poll might even work, as it would at least tell us if the remote end has initiated a connection close. (We don't even need the socket.Available == 0 condition, as the only data that'll ever appear from a syslog receiver is a TLS close_notify.)

The only downside to a low timeout is that we might end up sending (and losing) a bunch of back-to-back messages before realizing that there's a problem. Specifically, if the local host isn't aware that the connection is down, it'll send as much data as it can until receiving a TCP RST from the remote end. Waiting a full round-trip delay in between each message gives us time to wait for a potential TCP RST in this situation. I'm not sure it's worth the severe rate-limiting, though.

If it were up to me, I wouldn't set the default to anything higher than 50000, or 50 ms. That's a full round-trip delay for many Internet destinations, and even in the worst-case we'll only lose 3-4 messages before getting a response from the remote end. But I'd also seriously consider dropping it much lower, as the resulting maximum of 20 messages/sec still isn't unheard of.

markdascher commented Mar 20, 2018

I'm not sure a documentation update is the best solution, as this is causing NLog.Targets.Syslog to be limited to 2 messages/sec by default. We were working with a customer whose logs were over 24 hours behind, because it was working through a gradually-increasing backlog due to exceeding this limit. Reducing connectionCheckTimeout to 100 allows for much higher message throughput.

My understanding of the socket.Poll method is that it returns under three conditions: available data, connection closed, or timeout expired. For syslog, first two conditions only happen at the end of the connection, so 99% of the time we're going to wait for the full timeout to expire.

Honestly, passing an actual timeout of 0 to socket.Poll might even work, as it would at least tell us if the remote end has initiated a connection close. (We don't even need the socket.Available == 0 condition, as the only data that'll ever appear from a syslog receiver is a TLS close_notify.)

The only downside to a low timeout is that we might end up sending (and losing) a bunch of back-to-back messages before realizing that there's a problem. Specifically, if the local host isn't aware that the connection is down, it'll send as much data as it can until receiving a TCP RST from the remote end. Waiting a full round-trip delay in between each message gives us time to wait for a potential TCP RST in this situation. I'm not sure it's worth the severe rate-limiting, though.

If it were up to me, I wouldn't set the default to anything higher than 50000, or 50 ms. That's a full round-trip delay for many Internet destinations, and even in the worst-case we'll only lose 3-4 messages before getting a response from the remote end. But I'd also seriously consider dropping it much lower, as the resulting maximum of 20 messages/sec still isn't unheard of.

@luigiberrettini

This comment has been minimized.

Show comment
Hide comment
@luigiberrettini

luigiberrettini Mar 20, 2018

Owner

Sorry but you can set the timeout to whatever limit you want through configuration.

I still do not understand why Poll, in your case, is waiting the whole timeout for each message... the definition of timeout is something that is waited AT MOST not always.

Moreover data should not be available to read on a connection used to send and the connection should not be closed on each send.

Owner

luigiberrettini commented Mar 20, 2018

Sorry but you can set the timeout to whatever limit you want through configuration.

I still do not understand why Poll, in your case, is waiting the whole timeout for each message... the definition of timeout is something that is waited AT MOST not always.

Moreover data should not be available to read on a connection used to send and the connection should not be closed on each send.

@karlra

This comment has been minimized.

Show comment
Hide comment
@karlra

karlra May 16, 2018

Hello,

I was trying this library and I was pulling my hair regarding why it would only write 2 messages per second, even though I was queueing them at high speed. I found this thread and changing exceptionSyslogTarget.MessageSend.Udp.ConnectionCheckTimeout to 100 resolves the issue. But just so you know, out of the box it is limited to two messages per second and this can't be right, can it? I also don't understand exactly what I'm doing with this 100 value so that doesn't feel right either.

This is the config I am using:

var exceptionSyslogTarget = new NLog.Targets.Syslog.SyslogTarget();

exceptionSyslogTarget.MessageSend.Udp.ConnectionCheckTimeout = 100;
exceptionSyslogTarget.MessageCreation.Facility = NLog.Targets.Syslog.Settings.Facility.Local1;
exceptionSyslogTarget.MessageCreation.Rfc = NLog.Targets.Syslog.Settings.RfcNumber.Rfc5424;
exceptionSyslogTarget.MessageCreation.Rfc5424.Hostname = "${machinename}";
exceptionSyslogTarget.MessageCreation.Rfc5424.AppName = "DAEMON.App";
exceptionSyslogTarget.MessageCreation.Rfc5424.ProcId = "${processid}";
exceptionSyslogTarget.MessageCreation.Rfc5424.MsgId = "1";
exceptionSyslogTarget.MessageCreation.Rfc5424.DisableBom = true;
exceptionSyslogTarget.Name = "errors";
exceptionSyslogTarget.Layout = "${message}";
var exceptionRule = new LoggingRule("Errors", LogLevel.Debug, exceptionSyslogTarget);
config.LoggingRules.Add(exceptionRule);
config.AddTarget(exceptionSyslogTarget);

karlra commented May 16, 2018

Hello,

I was trying this library and I was pulling my hair regarding why it would only write 2 messages per second, even though I was queueing them at high speed. I found this thread and changing exceptionSyslogTarget.MessageSend.Udp.ConnectionCheckTimeout to 100 resolves the issue. But just so you know, out of the box it is limited to two messages per second and this can't be right, can it? I also don't understand exactly what I'm doing with this 100 value so that doesn't feel right either.

This is the config I am using:

var exceptionSyslogTarget = new NLog.Targets.Syslog.SyslogTarget();

exceptionSyslogTarget.MessageSend.Udp.ConnectionCheckTimeout = 100;
exceptionSyslogTarget.MessageCreation.Facility = NLog.Targets.Syslog.Settings.Facility.Local1;
exceptionSyslogTarget.MessageCreation.Rfc = NLog.Targets.Syslog.Settings.RfcNumber.Rfc5424;
exceptionSyslogTarget.MessageCreation.Rfc5424.Hostname = "${machinename}";
exceptionSyslogTarget.MessageCreation.Rfc5424.AppName = "DAEMON.App";
exceptionSyslogTarget.MessageCreation.Rfc5424.ProcId = "${processid}";
exceptionSyslogTarget.MessageCreation.Rfc5424.MsgId = "1";
exceptionSyslogTarget.MessageCreation.Rfc5424.DisableBom = true;
exceptionSyslogTarget.Name = "errors";
exceptionSyslogTarget.Layout = "${message}";
var exceptionRule = new LoggingRule("Errors", LogLevel.Debug, exceptionSyslogTarget);
config.LoggingRules.Add(exceptionRule);
config.AddTarget(exceptionSyslogTarget);

@luigiberrettini

This comment has been minimized.

Show comment
Hide comment
@luigiberrettini

luigiberrettini May 16, 2018

Owner

Hi @karlra ConnectionCheckTimeout, as stated in the README, is the time, in microseconds, to wait for a response when checking the UDP socket connection status (default: 500000; 0 means the only check performed is UdpClient and inner socket != null)

Can you please create a test project replicating the issue?
You can start from something like this:

[Fact]
public void SendMessage()
{
    const string ipAddress = '127.0.0.1';
    const int port = '8192';
    const int connectionCheckTimeout = 500000;

    Stopwatch stopwatch = new Stopwatch();
    using (var udp = new UdpClient(ipAddress, port))
    {
        stopwatch.Start();
        IsConnected(udp?.Client, connectionCheckTimeout);
        Assert.True(stopwatch.ElapsedMilliseconds < connectionCheckTimeout)

        var bytes = new ASCIIEncoding().GetBytes($"my message 1");
        udp.Send(bytes, bytes.Length);
        
        stopwatch.Restart();
        stopwatch.Stop();
        Assert.True(stopwatch.ElapsedMilliseconds < connectionCheckTimeout)
    }
}

private bool IsConnected(Socket socket, int timeout)
{
    if (socket == null)
        return true;

    if (timeout <= 0)
        return true;
    
    var isDisconnected = socket?.Poll(timeout, SelectMode.SelectRead) == true && socket?.Available == 0;
    return !isDisconnected;
}
Owner

luigiberrettini commented May 16, 2018

Hi @karlra ConnectionCheckTimeout, as stated in the README, is the time, in microseconds, to wait for a response when checking the UDP socket connection status (default: 500000; 0 means the only check performed is UdpClient and inner socket != null)

Can you please create a test project replicating the issue?
You can start from something like this:

[Fact]
public void SendMessage()
{
    const string ipAddress = '127.0.0.1';
    const int port = '8192';
    const int connectionCheckTimeout = 500000;

    Stopwatch stopwatch = new Stopwatch();
    using (var udp = new UdpClient(ipAddress, port))
    {
        stopwatch.Start();
        IsConnected(udp?.Client, connectionCheckTimeout);
        Assert.True(stopwatch.ElapsedMilliseconds < connectionCheckTimeout)

        var bytes = new ASCIIEncoding().GetBytes($"my message 1");
        udp.Send(bytes, bytes.Length);
        
        stopwatch.Restart();
        stopwatch.Stop();
        Assert.True(stopwatch.ElapsedMilliseconds < connectionCheckTimeout)
    }
}

private bool IsConnected(Socket socket, int timeout)
{
    if (socket == null)
        return true;

    if (timeout <= 0)
        return true;
    
    var isDisconnected = socket?.Poll(timeout, SelectMode.SelectRead) == true && socket?.Available == 0;
    return !isDisconnected;
}
@karlra

This comment has been minimized.

Show comment
Hide comment
@karlra

karlra May 16, 2018

Hello,

Here's a minimal project demonstrating the issue. I needed to make a real project to show the issue so I couldn't start with your unit test, sorry.

If you take the compiled files in /bin/Release/netcoreapp2.0/publish and upload them to a VM with rsyslog listening on UDP, you will notice that it's sending one message every 500 ms. Uncomment the commented line to immediately fix the issue.

Thanks for your hard work on this project, it works really well apart from this small issue!

SyslogProblem.zip

karlra commented May 16, 2018

Hello,

Here's a minimal project demonstrating the issue. I needed to make a real project to show the issue so I couldn't start with your unit test, sorry.

If you take the compiled files in /bin/Release/netcoreapp2.0/publish and upload them to a VM with rsyslog listening on UDP, you will notice that it's sending one message every 500 ms. Uncomment the commented line to immediately fix the issue.

Thanks for your hard work on this project, it works really well apart from this small issue!

SyslogProblem.zip

Repository owner deleted a comment from karlra Jul 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment