Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket: tcpconn_get() error gets turned into E_OUT_OF_MEM which becomes "500 No Error" #47

Closed
wdoekes opened this issue Jan 19, 2015 · 3 comments

Comments

@wdoekes
Copy link
Contributor

wdoekes commented Jan 19, 2015

My websocket TLS server is full of these kinds of messages:

Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19701]: NOTICE: <script>:
  http:217.120.x.x:55386: WS connection closed
...
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: WARNING: <core>
  [msg_translator.c:2506]: via_builder(): TCP/TLS connection (id: 0) for WebSocket could not be found
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: <core>
  [msg_translator.c:1722]: build_req_buf_from_sip_req(): could not create Via header
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: tm
  [t_fwd.c:527]: prepare_new_uac(): could not build request
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: tm
  [t_fwd.c:1773]: t_forward_nonack(): ERROR: t_forward_nonack: failure to add branches
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: sl
  [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: No error (2/SL)
(repeat these last errors for a bunch of attempted NOTIFY forwards)

The route block does basically something like this:

    # add_contact_alias(); # only for requests from the outside
    loose_route();
    if (!t_relay()) {
            sl_reply_error();
    }

The problem arises here:

}else if (send_info->proto==PROTO_WS){
...
    con = tcpconn_get(send_info->id, &ip, port, from, 0)
...
    if (con == NULL) {
        LM_WARN("TCP/TLS connection (id: %d) for WebSocket could not be found\n", send_info->id);

The NULL failure status gets returned up to prepare_new_uac in t_fwd.c:

    shbuf=build_req_buf_from_sip_req( i_req, &len, dst, BUILD_IN_SHM);
    if (!shbuf) {
            LM_ERR("could not build request\n");
            ret=E_OUT_OF_MEM;
            goto error01;
    }

At this point, ser_error will become E_OUT_OF_MEM while it should be something like E_SEND.

And E_OUT_OF_MEM gets translated to 500 No Error because we're not running in DEBUG mode.

What causes the connection to drop in the first place, you ask?

18:10:18.690738 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [S], seq 1323983240, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
18:10:18.690863 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [S.], seq 4077761781, ack 1323983241, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 3], length 0
18:10:18.710846 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 1, win 256, length 0
18:10:18.808751 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [P.], seq 1:246, ack 1, win 256, length 245
...
18:10:19.233415 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], ack 31348, win 5126, length 0
18:10:26.489764 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [P.], seq 31348:32473, ack 34578, win 255, length 1125
...
18:10:26.501409 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [P.], seq 42255:42916, ack 46010, win 5046, length 661
18:10:26.527755 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 36993, win 252, length 0
18:10:26.527860 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], seq 42916:47296, ack 46010, win 5278, length 4380
18:10:26.527888 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [FP.], seq 47296:48663, ack 46010, win 5278, length 1367
18:10:26.529179 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 40501, win 254, length 0
18:10:26.529200 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 42916, win 251, length 0
18:10:26.547276 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 48664, win 251, length 0
18:10:26.549712 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [F.], seq 46010, ack 48664, win 251, length 0
18:10:26.549750 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], ack 46011, win 5278, length 0

Where you see that the FIN is initiated by 195.35.x.x which is the Kamailio websocket server.

The cause (probably) is the WS client closing the connection. In this case after re-subscribing with Expires:0. The presence server attempts to reply with a bunch of NOTIFYs with Subscription-State: terminated;reason=timeout but they bounce on the broken connection. If Kamailio would return a nice "477 Unfortunately error on sending to next hop occurred" it'd be prettier.

Getting less "error" messages (a total of 6 per expired/unsubscribed subscription) after this error --which is apparently very common -- would be beneficial too.

As for fixing:

  • We could change the via_builder to set ser_error (and check that in build_req_buf_from_sip_req), or
  • add error-code-out-parameters to all calls from build_req_buf_from_sip_req and down.
    I'm not sure if either is the best way.

As for the excessive error reporting, would looking at ser_error before printing (another) error be an acceptable fix?

Cheers,
Walter Doekes
OSSO B.V.

@miconda
Copy link
Member

miconda commented Jan 23, 2015

What version are you using? The lines reported in logs are not matching latest 4.2 or master branches.

Using ser_error should work to differentiate on what happened inside via_builder() is ok. You can make a patch for that and if works for you, submit it here.

Not sure I get how you would protect with ser_error against excessive logging, but perhaps when I see the code it will be more clear. Make this a second patch to be reviewed separately, allowing the first to be merged without dependency on this one.

@wdoekes
Copy link
Contributor Author

wdoekes commented Jan 23, 2015

That's a 4.1.6 (patched with #26), but I believe I checked master when looking into the source why this happens.

Thanks for the feedback. Now I'll "just" have to reliably reproduce this first :)

@miconda
Copy link
Member

miconda commented Apr 1, 2015

Closing this one not having a follow up for two months. Open a new one whenever you have new details to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants