Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add send/recv timeouts for the dispatcher on app sockets. #1356

Merged
merged 3 commits into from
Dec 11, 2017

Conversation

kormat
Copy link
Contributor

@kormat kormat commented Dec 8, 2017

The current dispatcher design will deadlock if an application is failing
to read from its socket and that socket's buffer fills up.
#1278 is the relevant issue.

This patch adds a 2s read/write timeout for app sockets, after which the
dispatcher will log an error and close the socket. This doesn't fully
fix the issue, but it does mean the dispatcher can recover, and there
will be clear logging to ease debugging, while hopefully avoiding any
packet loss during normal operation.

Also:

  • Fix a memory leak in parse_request.

This change is Reviewable

@scrye
Copy link
Contributor

scrye commented Dec 8, 2017

Reviewed 2 of 2 files at r1.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


c/dispatcher/dispatcher.c, line 472 at r1 (raw file):

    timeout.tv_sec = 2;
    timeout.tv_usec = 0;
    errno = 0;

You don't need to set errno, functions that fail will set it anyway.


c/dispatcher/dispatcher.c, line 478 at r1 (raw file):

            return;
    }
    errno = 0;

Same as above.


c/dispatcher/dispatcher.c, line 479 at r1 (raw file):

    }
    errno = 0;
    if (setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout, sizeof(timeout)) < 0) {

This actually solves a really nasty issue where a truncated packet would cause the dispatcher to hang.


c/dispatcher/dispatcher.c, line 1200 at r1 (raw file):

int deliver_data(int sock, HostAddr *from, uint8_t *buf, int len)
{
    errno = 0;

You don't need to set errno, functions that fail will set it anyway.


c/dispatcher/dispatcher.c, line 1202 at r1 (raw file):

    errno = 0;
    if (send_dp_header(sock, from, len) != 0) {
        zlog_warn(zc, "Failed to send dp header to app on fd %d (errno: %s)", sock, strerror(errno));

(errno: %s) should be rephrased, as it's not printing the error number.


c/dispatcher/dispatcher.c, line 1206 at r1 (raw file):

        return -1;
    }
    errno = 0;

You don't need to set errno, functions that fail will set it anyway.


c/dispatcher/dispatcher.c, line 1209 at r1 (raw file):

    int sent = send_all(sock, buf, len);
    if (sent != len) {
        zlog_warn(zc, "Failed to send all data to app on fd %d, expected:%dB got:%dB (errno: %s)",

(errno: %s) should be rephrased, as it's not printing the error number.


Comments from Reviewable

The current dispatcher design will deadlock if an application is failing
to read from its socket and that socket's buffer fills up.
scionproto#1278 is the relevant issue.

This patch adds a 2s read/write timeout for app sockets, after which the
dispatcher will log an error and close the socket. This doesn't fully
fix the issue, but it does mean the dispatcher can recover, and there
will be clear logging to ease debugging, while hopefully avoiding any
packet loss during normal operation.

Also:
- Fix a memory leak in parse_request.
@kormat
Copy link
Contributor Author

kormat commented Dec 11, 2017

Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


c/dispatcher/dispatcher.c, line 472 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

You don't need to set errno, functions that fail will set it anyway.

It's not quite that simple. See https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152351 for some background. I've ran into the problem too many times where a stale non-zero errno caused a program (or humans, reading the logs) to believe an error had occurred. So as a matter of principle i try to set it to 0 before a call if it's going to be used in the error handling/reporting after the call. That way it is unambiguous where/when the errno was set, and there can't be confusion about stale values.


c/dispatcher/dispatcher.c, line 1200 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

You don't need to set errno, functions that fail will set it anyway.

That's not guaranteed, send_dp_header isn't a syscall. It's possible for it to fail due to a logic error, without any syscall failing, in which case a stale non-zero errno value would give a completely bogus reason in the logs.


c/dispatcher/dispatcher.c, line 1202 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

(errno: %s) should be rephrased, as it's not printing the error number.

Done.


c/dispatcher/dispatcher.c, line 1209 at r1 (raw file):

Previously, scrye (Sergiu Costea) wrote…

(errno: %s) should be rephrased, as it's not printing the error number.

Done.


Comments from Reviewable

@scrye
Copy link
Contributor

scrye commented Dec 11, 2017

:lgtm:


Reviewed 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


c/dispatcher/dispatcher.c, line 472 at r1 (raw file):

Previously, kormat (Stephen Shirley) wrote…

It's not quite that simple. See https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152351 for some background. I've ran into the problem too many times where a stale non-zero errno caused a program (or humans, reading the logs) to believe an error had occurred. So as a matter of principle i try to set it to 0 before a call if it's going to be used in the error handling/reporting after the call. That way it is unambiguous where/when the errno was set, and there can't be confusion about stale values.

The link refers to C standard library calls; those are generally more complex, and do fancier stuff with errno.

Nearly all syscall wrappers from the Linux API (like the one below) have the same behavior, returning <0 on errors (or a -1 pointer) and setting errno (as specified in their man pages). In this case setting errno beforehand is not needed. For examples, see coreutils sources: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/.

I find errno resets a bit confusing, because it hints to me that the program is trying to do something unnatural with it (e.g., using it during decision-making).

In any case, we can leave it as is.


c/dispatcher/dispatcher.c, line 1200 at r1 (raw file):

Previously, kormat (Stephen Shirley) wrote…

That's not guaranteed, send_dp_header isn't a syscall. It's possible for it to fail due to a logic error, without any syscall failing, in which case a stale non-zero errno value would give a completely bogus reason in the logs.

Hmm, then those logic errors cannot be told apart based on the logging message below.

I'm fine with leaving them as is; the dispatcher's error handling is quite flaky in multiple places and spending time on it isn't a priority.


Comments from Reviewable

@kormat kormat merged commit b80071f into scionproto:master Dec 11, 2017
@kormat kormat deleted the disp_timeout branch December 11, 2017 13:57
andviane pushed a commit to andviane/scion that referenced this pull request Feb 6, 2018
…#1356)

The current dispatcher design will deadlock if an application is failing
to read from its socket and that socket's buffer fills up.
scionproto#1278 is the relevant issue.

This patch adds a 2s read/write timeout for app sockets, after which the
dispatcher will log an error and close the socket. This doesn't fully
fix the issue, but it does mean the dispatcher can recover, and there
will be clear logging to ease debugging, while hopefully avoiding any
packet loss during normal operation.

Also:
- Fix a memory leak in parse_request.
- Log assigned port when request was for port 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants