Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make client retransmit its message #163

Closed
hongnguyen-tma opened this issue Apr 7, 2018 · 21 comments
Closed

How to make client retransmit its message #163

hongnguyen-tma opened this issue Apr 7, 2018 · 21 comments

Comments

@hongnguyen-tma
Copy link

hongnguyen-tma commented Apr 7, 2018

Hi
Do you know how to make the client retransmit its message?
From the source code, it seems that libcoap will invoke the nack_handler with reason = COAP_NACK_TOO_MANY_RETRIES when max-retransmit is reached and I need to test my client with this case.
https://github.com/obgm/libcoap/blob/develop/src/net.c#L935

I tried by stopping the server before making client to send message, but it still not fall into this case. What I got is just :

Apr 07 13:06:57 WARN coap_network_read: unreachable Apr 07 13:06:57 DEBG *** 127.0.0.1:35438 <-> 127.0.0.1:4646 DTLS: session reset Apr 07 13:06:57 DEBG ** 127.0.0.1:35438 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay

Please share with me if you know any way to make this case happen.

Thanks

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 7, 2018

Hi Hong,
I have just updated the manual page for coap_recovery in PR #139 with the following which should help you here.

TESTING

The libcoap recovery/re-transmit logic will only work for confirmable requests.

To see what is happening (other than by sniffing the network traffic), the
logging level needs to be set to LOG_DEBUG in the client by using
coap_set_log_level(LOG_DEBUG) and coap_dtls_set_log_level(LOG_DEBUG).

The client needs to be sending confirmable requests during the test.

The server can either be stopped, or if packet loss levels are set to 100% by
using coap_debug_set_packet_loss("100%") when receiving the client requests.

Note: If the server end of the connection is returning ICMP unreachable packets
after being turned off, you will get a debug message of the form
"coap_network_read: unreachable", so libcoap will stop doing the retries. If
this is the case, then you need to make use of (on the server)
coap_debug_set_packet_loss("100%") or put in some packet filtering to drop the
packets.

The client should then restart transmitting the requests based on the
ack_timeout, ack_random_factor and max_retransmit values. The client's
nack_handler will get called with COAP_NACK_TOO_MANY_RETRIES when the
confirmable request cannot be successfully transmitted.

@hongnguyen-tma
Copy link
Author

hongnguyen-tma commented Apr 9, 2018

Hi mrdeep1
Thank you for replying me. I set coap_debug_set_packet_loss("100%") in server side as you said
and still waiting for max-retransmission being reached. The interval for each retransmission is about 8 minutes. I'm using all default setting values (ack-timeout, ack-random-factor, max-restransmit....) According to rfc7252#section-4.8, page29, the total of retransmission should be only 93 seconds, right?

Apr 09 11:23:22 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS tid=29546: retransmission #1
Apr 09 11:23:22 DEBG SSL_connect:before SSL initialization
Apr 09 11:23:22 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:23:22 DEBG SSL_connect:SSLv3/TLS write client hello
Apr 09 11:23:22 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS tid=29546: removed
Apr 09 11:23:22 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
Apr 09 11:23:24 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:23:24 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:23:27 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:23:27 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:23:32 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:23:32 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:23:40 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:23:40 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:23:57 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:23:57 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:24:30 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:24:30 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:25:31 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:25:31 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:26:32 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:26:32 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:27:32 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:27:32 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:28:33 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:28:33 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:29:34 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:29:34 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:30:35 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:30:35 DEBG * 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 09 11:31:35 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 09 11:31:35 DEBG *** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS: session disconnected
Apr 09 11:31:35 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
Apr 09 11:31:35 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (2156ms)
Apr 09 11:31:38 DEBG IN-coap_retransmit
Apr 09 11:31:38 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS tid=29546: retransmission #2

Thanks
Hong

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 9, 2018

Hi Hong,

What actually is happening here is the DTLS session is trying many times before giving up. This is not the retransmit of a CoAP message.

That said, I have just raised PR #165 so that the DTLS startup is also subject to the MAX-TRANSMIT value.

@hongnguyen-tma
Copy link
Author

hongnguyen-tma commented Apr 9, 2018

Hi mrdeep1

This is not the retransmit of a CoAP message.

Can you explain a little bit more? I don't know the result I got is correct or not.

The above log is when I made my client sent out an Empty confirmable message (type = 0, code=0). And I saw in the log, that the total retransmits is 4 times as the default value of libcoap. It finally gave up retrying with this message:

"Apr 09 11:56:28 DEBG ** 127.0.0.1:49048 <-> 127.0.0.1:4646 DTLS tid=29546: give up after 4 attempts"
It seems works correctly, except that too long retry-time among 2 retransmissions

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 9, 2018

The CoAP message retry rules are
1st retransmit after 1 * ack_timeout * ack_random factor (3 seconds)
2nd retransmit after 2 * ack_timeout * ack_random factor (6 seconds)
3rd retransmit after 3 * ack_timeout * ack_random factor (12 seconds)
4th retransmit after 4 * ack_timeout * ack_random factor (24 seconds)
5th retransmit after 5 * ack_timeout * ack_random factor (48 seconds)

As max_transmit is 4, then the 5th retransmit does not get sent, but at that point COAP_NACK_TOO_MANY_RETRIES gets raised in your nack_handler (if defined). Note that the sum of the seconds is 93.

Your log entries were showing many DTLS session setup retries before your CoAP Ping was then retried which as you say eventually gave up after 4 retries. The 4 retries did fail after 93 seconds of CoAP message retries, but there was the additional long time consumed by DTLS trying to set up a session (which my code change limits to the default of 4 retransmits).

@hongnguyen-tma
Copy link
Author

I got it. Thank mrdeep1.

@hongnguyen-tma
Copy link
Author

hongnguyen-tma commented Apr 12, 2018

Hi
Is it possible to send many confirmable requests during ping retry? I made coap_debug_set_packet_loss("100%") in server side and get client send another confirmable request (type = 0, code = 1 , which is different from the ping request) during ping retry, the program went crashed as following :

Apr 12 16:02:25 DEBG ** 127.0.0.1:52034 <-> 127.0.0.1:4646 DTLS tid=59092: retransmission #1
Apr 12 16:02:25 DEBG SSL_connect:before SSL initialization
Apr 12 16:02:25 DEBG * 127.0.0.1:52034 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 12 16:02:25 DEBG SSL_connect:SSLv3/TLS write client hello
Apr 12 16:02:25 DEBG ** 127.0.0.1:52034 <-> 127.0.0.1:4646 DTLS tid=59092: removed
dots_client: src/coap_session.c:250: coap_session_delay_pdu: Assertion `removed == node' failed.

Thanks
Hong

@hongnguyen-tma
Copy link
Author

hongnguyen-tma commented Apr 12, 2018

It also crashed if I stopped the ping function, and made my client sends one confirmable message only (type 0 , code 1). Do you know what that problem is?

Apr 12 16:13:11 DEBG SSL_connect:before SSL initialization
Apr 12 16:13:11 DEBG * 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 12 16:13:11 DEBG SSL_connect:SSLv3/TLS write client hello
Apr 12 16:13:11 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
Apr 12 16:13:13 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 12 16:13:13 DEBG * 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 12 16:13:13 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
Apr 12 16:13:15 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
Apr 12 16:13:16 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 12 16:13:16 DEBG *** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS: session disconnected
Apr 12 16:13:16 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
Apr 12 16:13:16 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (2156ms)
Apr 12 16:13:16 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
Apr 12 16:13:16 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (500ms)
Apr 12 16:13:16 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
Apr 12 16:13:16 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (282ms)
Apr 12 16:13:19 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546: retransmission #1
Apr 12 16:13:19 DEBG SSL_connect:before SSL initialization
Apr 12 16:13:19 DEBG * 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 12 16:13:19 DEBG SSL_connect:SSLv3/TLS write client hello
Apr 12 16:13:19 DEBG ** 127.0.0.1:51350 <-> 127.0.0.1:4646 DTLS tid=29546: removed
dots_client: src/coap_session.c:250: coap_session_delay_pdu: Assertion `removed == node' failed.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 12, 2018

Hi Hong,

Which version of code are you working with?

Is this a PKI or PSK connection?

I cannot reproduce what you are seeing at the moment. A full -v9 log may help here.

Have you tried debugging your executable to see what 'removed' and 'node' variables contain?

@hongnguyen-tma
Copy link
Author

Hi mrdeep1
I am using version which is closed to f3c1542, PKI connection. At this version it seems that the highest log level is v7. Below is the log I re-run with debug info for 'removed' and 'node'. Please share with me if you can figure out anything.

Apr 13 09:46:27 DEBG SSL_connect:before SSL initialization
Apr 13 09:46:27 DEBG * 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 13 09:46:27 DEBG SSL_connect:SSLv3/TLS write client hello
Apr 13 09:46:27 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
Apr 13 09:46:29 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 13 09:46:29 DEBG * 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 13 09:46:29 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
Apr 13 09:46:31 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
Apr 13 09:46:32 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 13 09:46:32 DEBG *** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS: session disconnected
Apr 13 09:46:32 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
Apr 13 09:46:32 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (2156ms)
Apr 13 09:46:32 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
Apr 13 09:46:32 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (500ms)
Apr 13 09:46:32 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
Apr 13 09:46:32 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (282ms)
Apr 13 09:46:35 DEBG IN-coap_retransmit
Apr 13 09:46:35 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546: retransmission #1
Apr 13 09:46:35 DEBG SSL_connect:before SSL initialization
Apr 13 09:46:35 DEBG * 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 13 09:46:35 DEBG SSL_connect:SSLv3/TLS write client hello
Apr 13 09:46:35 DEBG ** 127.0.0.1:51289 <-> 127.0.0.1:4646 DTLS tid=29546: removed
Apr 13 09:46:35 DEBG ******node.id = 29546, node.timeout=2156, node.retransmi_cnt=1, note.t = 6530
Apr 13 09:46:35 DEBG ******removed.id = 29546, removed.timeout=2656, removed.retransmi_cnt=0, removed.t = 500
dots_client: src/coap_session.c:252: coap_session_delay_pdu: Assertion `removed == node' failed.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 13, 2018

Hi Hong,

Using version f3c1542, I have modified examples coap-client to be able to do PKI. I am running coap-server as 'coap-server -v9 -l 100%' and coap-client as 'coap-client -v7 -c server-cert.pem coaps://127.0.0.1' (-c option is new) and get as output

Apr 13 09:53:44 DEBG *** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: new outgoing session
Apr 13 09:53:44 DEBG SSL_connect:before SSL initialization
Apr 13 09:53:44 DEBG * 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: sent 233 bytes
Apr 13 09:53:44 DEBG SSL_connect:SSLv3/TLS write client hello
Apr 13 09:53:44 DEBG sending CoAP request:
v:1 t:CON c:GET i:736a {} [ ]
Apr 13 09:53:44 DEBG ** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS tid=29546: delayed
Apr 13 09:53:44 DEBG timeout is set to 90 seconds
Apr 13 09:53:46 DEBG ** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: DTLS retransmit timeout
Apr 13 09:53:46 DEBG * 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: sent 233 bytes
Apr 13 09:53:49 DEBG ** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: DTLS retransmit timeout
Apr 13 09:53:49 DEBG * 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: sent 233 bytes
Apr 13 09:53:54 DEBG ** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: DTLS retransmit timeout
Apr 13 09:53:54 DEBG * 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: sent 233 bytes
Apr 13 09:54:03 DEBG ** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: DTLS retransmit timeout
Apr 13 09:54:03 DEBG * 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: sent 233 bytes
Apr 13 09:54:20 DEBG ** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: DTLS retransmit timeout
Apr 13 09:54:20 DEBG * 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: sent 233 bytes
Apr 13 09:54:53 DEBG ** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: DTLS retransmit timeout
Apr 13 09:54:53 DEBG * 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: sent 233 bytes
Apr 13 09:55:14 INFO timeout
Apr 13 09:55:14 DEBG *** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: session closed

There are some notable differences.
Line 1 not there in your output
Apr 13 09:53:44 DEBG *** 127.0.0.1:57598 <-> 127.0.0.1:5684 DTLS: new outgoing session
This tells me that in the client to create a new outgoing sesson you are not using coap_new_client_session_pki(), or there is a failure in coap_new_client_session_pki() - using coap_new_client_session() or coap_make_session() is not valid if you want PKI.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 13, 2018

Hi Hong,

It is may be worth making the following change to see if the error message is getting reported

diff --git a/src/coap_openssl.c b/src/coap_openssl.c
index dad0db1..0c0ac0c 100644
--- a/src/coap_openssl.c
+++ b/src/coap_openssl.c
@@ -725,8 +725,10 @@ void *coap_dtls_new_client_session(coap_session_t *session) {
coap_openssl_context_t *context = ((coap_openssl_context_t *)session->context->dtls_context);
coap_dtls_context_t *dtls = &context->dtls;

- if (!context->psk_pki_enabled)
+ if (!context->psk_pki_enabled) {
+ coap_log(LOG_ERR, "coap_new_client_session: PKI or PSK have not been defined\n");
goto error;
+ }
ssl = SSL_new(dtls->ctx);
if (!ssl)
goto error;

@hongnguyen-tma
Copy link
Author

Hi mrdeep1
Thank you for replying me.
The log message " DTLS: new outgoing session" was output when I started my client. Sorry for not copying full log. Below is the log when I run again :

  • When I started my client :
    Apr 13 17:08:46 DEBG *** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: new outgoing session
    Apr 13 17:08:46 DEBG SSL_connect:before SSL initialization
    Apr 13 17:08:46 DEBG * 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
    Apr 13 17:08:46 DEBG SSL_connect:SSLv3/TLS write client hello
    Apr 13 17:08:48 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
    Apr 13 17:08:48 DEBG * 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
    Apr 13 17:08:51 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
    Apr 13 17:08:51 DEBG *** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: session disconnected

  • When I made my client send a request (type = 0, code = 1)
    Apr 13 17:09:57 DEBG SSL_connect:before SSL initialization
    Apr 13 17:09:57 DEBG * 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
    Apr 13 17:09:57 DEBG SSL_connect:SSLv3/TLS write client hello
    Apr 13 17:09:57 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
    Apr 13 17:09:59 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
    Apr 13 17:09:59 DEBG * 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
    Apr 13 17:09:59 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
    Apr 13 17:10:01 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
    Apr 13 17:10:02 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
    Apr 13 17:10:02 DEBG *** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: session disconnected
    Apr 13 17:10:02 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
    Apr 13 17:10:02 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (2156ms)
    Apr 13 17:10:02 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
    Apr 13 17:10:02 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (500ms)
    Apr 13 17:10:02 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
    Apr 13 17:10:02 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (282ms)
    Apr 13 17:10:05 DEBG IN-coap_retransmit
    Apr 13 17:10:05 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546: retransmission README: translate oder => or #1
    Apr 13 17:10:05 DEBG SSL_connect:before SSL initialization
    Apr 13 17:10:05 DEBG * 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
    Apr 13 17:10:05 DEBG SSL_connect:SSLv3/TLS write client hello
    Apr 13 17:10:05 DEBG ** 127.0.0.1:48906 <-> 127.0.0.1:4646 DTLS tid=29546: removed
    Apr 13 17:10:05 DEBG ******node.id = 29546, node.timeout=2156, node.retransmi_cnt=1, note.t = 6530
    Apr 13 17:10:05 DEBG ******removed.id = 29546, removed.timeout=2656, removed.retransmi_cnt=0, removed.t = 500
    dots_client: src/coap_session.c:252: coap_session_delay_pdu: Assertion `removed == node' failed.

Note :
I 'm not using coap_new_client_session_pki(). Instead of that, I use the followings, which seems to be similar.
coap_new_context()
coap_dtls_context_set_pki()
coap_new_client_session()

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 13, 2018

Hi Hong,
coap_dtls_context_set_pki() is shortly not to be exposed publically any more (as are a lot of coap_dtls_ functions) in Pull Request #151 , and so you need to move over to using coap_new_client_session_pki(). This is not the issue here.
You appear to have modified some library code - e.g. timeout/retry count in coap_dtls_handle_timeout(), and it appears that after second "session disconnected", there are 3 versions of the PDU getting re-transmitted, and then when one of the 3 PDUs gets deleted, the wrong one is chosen.

It looks like you are calling coap_send() for the same PDU 3 times - correct?
(sent at 17:09:57, 17:09:59 and 17:10:01)

@hongnguyen-tma
Copy link
Author

Thank mrdeep1 for great help. It was my mistake for setting retry for message in my client. I removed retry and got it work.

Hong

@hongnguyen-tma
Copy link
Author

hongnguyen-tma commented Apr 14, 2018

Hi mrdeep1

The retransmission works fine for 1 ~ 2 different messages. When I made my client send out 3 different messages, it crashed again. There seems to be a problem with searching for the target node inside function "net.c/ coap_remove_from_queue()". I added some debug code to have more detail.

Apr 14 15:10:07 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29546: delayed
Apr 14 15:10:07 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29547: delayed
Apr 14 15:10:07 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29548: delayed
Apr 14 15:10:07 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 14 15:10:07 DEBG * 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 14 15:10:10 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS: DTLS retransmit timeout
Apr 14 15:10:10 DEBG *** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS: session disconnected
Apr 14 15:10:10 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29546: not transmitted after delay
Apr 14 15:10:10 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29546 added to retransmit queue (2313ms)
Apr 14 15:10:10 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29547: not transmitted after delay
Apr 14 15:10:10 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29547 added to retransmit queue (687ms)
Apr 14 15:10:10 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29548: not transmitted after delay
Apr 14 15:10:10 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29548 added to retransmit queue (2281ms)
Apr 14 15:10:13 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29548: retransmission #1
Apr 14 15:10:13 DEBG SSL_connect:before SSL initialization
Apr 14 15:10:13 DEBG * 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS: sent 233 bytes
Apr 14 15:10:13 DEBG SSL_connect:SSLv3/TLS write client hello
Apr 14 15:10:13 DEBG queue content:
Apr 14 15:10:13 DEBG ******ep.id = 29546, ep.timeout=2313, ep.retransmi_cnt=0, ep.t = 32
Apr 14 15:10:13 DEBG ******ep.id = 29547, ep.timeout=3000, ep.retransmi_cnt=0, ep.t = 687
Apr 14 15:10:13 DEBG ******ep.id = 29548, ep.timeout=2281, ep.retransmi_cnt=1, ep.t = 6843
Apr 14 15:10:13 DEBG IN-coap_remove_from_queue() : target-id = 29548
Apr 14 15:10:13 DEBG After do-while : stopped at q->id = 29547
Apr 14 15:10:13 DEBG ** 127.0.0.1:38693 <-> 127.0.0.1:4646 DTLS tid=29548: removed
Apr 14 15:10:13 DEBG OUT-coap_remove_from_queue() with removed node->id=29547
Apr 14 15:10:13 DEBG ******node.id = 29548, node.timeout=2281, node.retransmi_cnt=1, node.t = 7530
Apr 14 15:10:13 DEBG ******removed.id = 29547, removed.timeout=3000, removed.retransmi_cnt=0, removed.t = 687
dots_client: src/coap_session.c:253: coap_session_delay_pdu: Assertion `removed == node' failed.

The condition expression for do-while() looks strange.
do { p = q; q = q->next; } while (q && session != q->session && id != q->id);

That allows the loop to stop at any node with the same session value but different id. I try changing it as below at got it worked

do { p = q; q = q->next; } while (q && (session != q->session || id != q->id));

How you do think?

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 16, 2018

Hi Hong,

Whilst not explicitly stated that multiple confirmable requests can be sent using the same session in RFC7272, it does say
"4.2 Messages Transmitted Reliably" Retransmission is controlled by two things that a CoAP endpoint MUST keep track of for each Confirmable message it sends while waiting for an acknowledgement (or reset): a timeout and a retransmission counter.

and RFC8223 (for TCP) it is more explicit
"3.3 Message transmission" A CoAP client can send multiple requests without waiting for a response, and the CoAP server can return responses in any order.

So as I read this, multiple confirmable requests can be sent using a single session before acknowledgements are seen. So, your code change looks good.

@hongnguyen-tma
Copy link
Author

Thank mrdeep1 for replying me.

@obgm
If possible, please consider to reflect this change in your next release.

@obgm
Copy link
Owner

obgm commented Apr 18, 2018

Will so, thank you @hongnguyen-tma and @mrdeep1

obgm added a commit that referenced this issue Apr 18, 2018
Issue #163 [1] points out that coap_remove_from_queue() may release
the wrong PDU from the retransmit queue due to a false hit in the
do/while loop. This change was proposed by hongnguyen-tma to fix
this issue.

[1] #163
@mrdeep1
Copy link
Collaborator

mrdeep1 commented Nov 28, 2018

@hongnguyen-tma Can this now be closed from your perspective?

@hongnguyen-tma
Copy link
Author

@mrdeep1
I have no problem with this now so let's close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants