Retry timeout do not work as expected #197

mbaril · 2018-06-05T21:03:24Z

I think the retry timeout do not work correctly when multiple CONF messages are sent in parallel.

Test procedure:

Step 1: I run the CoAP server example (coap-server)
Step 2: I create a CoAP observer on the resource /time.
Step 3: I kill the CoAP observer process

Test #1: OK
I use COAP_RESOURCE_CHECK_TIME of 120 and a COAP_OBS_MAX_FAIL is equal to 0.

Messages sent by server after step 3:

After 4 retries the CoAP server remove the observer entry as expected.
In this test the delay between each retries are correct.

Test #2: Delay between retries are not good

I use COAP_RESOURCE_CHECK_TIME of 5 sec and a COAP_OBS_MAX_FAIL is equal to 0.

Messages sent by server after step 3:

This time, the delay between retries do not seems to be correct.

MsgID= 29557.
Retry 1 at 2.2 sec (OK)
Retry 2 at 7.3 sec (OK)
Retry 3 at 12.1 sec (Not ok. Should be approx. 15.4 sec)
Retry #4 at 15.4 sec (Not ok.. Should approx. 33 sec)

MsgID= 29558.
Retry 1 at 3.0 sec (OK)
Retry 2 at 7.8 sec (Not ok. Should be approx. 9 sec)
Retry 3 at 9.8 sec (Not ok. Should be approx. 21 sec)
Retry #4 at 12.8 sec sec (Not ok.. Should approx. 45 sec)

mbaril · 2018-06-05T21:07:39Z

Coap server logs
serverLogs.zip

mrdeep1 · 2018-06-06T12:05:23Z

With test 1, there is only one outstanding unsolicited Observe response waiting for an ACK from the client, and so the retry counters are as expected.

With test 2, the server continues to send additional unsolicited Observe responses at the repeat frequency (in your case every 5 secs) until the logic times things completely out and drops the observer. So in your test 2 you have 4 outstanding Observe responses waiting for an ACK before observer is dropped.

The issue here is that coap_insert_node() is biasing the timeouts (making it a delta time between 2 entries) of a queue of outstanding requests, making the re-transmit times shorter as this delta bias not re-adjusted when the head of the queue is taken off and processed (because of the nextpdu->t = 0, and so coap_pop_next(ctx) does not update the delta).

Fixing coap_insert_node() by not making it time deltas fixes your issue, but causes the sendqueue tests to blow up.

diff --git a/src/net.c b/src/net.c
index c7cbfa6..526e833 100644
--- a/src/net.c
+++ b/src/net.c
@@ -201,21 +201,16 @@ coap_insert_node(coap_queue_t **queue, coap_queue_t *node) {
   if (node->t < q->t) {
     node->next = q;
     *queue = node;
-    q->t -= node->t;           /* make q->t relative to node->t */
     return 1;
   }
 
   /* search for right place to insert */
   do {
-    node->t -= q->t;           /* make node-> relative to q->t */
     p = q;
     q = q->next;
   } while (q && q->t <= node->t);
 
   /* insert new item */
-  if (q) {
-    q->t -= node->t;           /* make q->t relative to node->t */
-  }
   node->next = q;
   p->next = node;
   return 1;

Fixing coap_write() to keep the bias and cause it to get added to the first entry to the queue also works, but also the LWIP coap_retransmittimer_execute() function needs to get fixed - I suspect in the same way, but I do not have a LWIP version to test against. The CONTIKI version appears to have no issues here from code inspection.

diff --git a/src/coap_io.c b/src/coap_io.c
index 174d239..265bd0b 100644
--- a/src/coap_io.c
+++ b/src/coap_io.c
@@ -1093,7 +1093,6 @@ coap_write(coap_context_t *ctx,
   nextpdu = coap_peek_next(ctx);
 
   while (nextpdu && now >= ctx->sendqueue_basetime && nextpdu->t <= now - ctx->sendqueue_basetime) {
-    nextpdu->t = 0;
     coap_retransmit(ctx, coap_pop_next(ctx));
     nextpdu = coap_peek_next(ctx);
   }
diff --git a/src/net.c b/src/net.c
index c7cbfa6..c035e34 100644
--- a/src/net.c
+++ b/src/net.c
@@ -2304,7 +2304,6 @@ static void coap_retransmittimer_execute(void *arg) {
       break;
     } else {
       elapsed -= nextinqueue->t;
-      nextinqueue->t = 0;
       coap_retransmit(ctx, coap_pop_next(ctx));
       nextinqueue = coap_peek_next(ctx);
     }

@obgm What was the reasoning for maintaining retry times as deltas between each entry in the coap_queue_t list?
Not having them as deltas I think simplifies the code.
I understand why the times should be relative to sendqueue_basetime.

obgm · 2018-06-06T13:57:56Z

The reasoning was that numbers will be smaller and, more important, when you insert a node into the list, you only need to update the t value of the subsequent node (and set the value of the inserted node). I still think that this is a good idea (but have not looked into the issue discussed here).

mrdeep1 · 2018-06-06T14:09:11Z

The issue is when the node is pop'd off the list - the t value of the next entry does not get adjusted.

It is quite easy to reproduce using standard coap-client and coap-server

coap-server -v9
coap-client  -s 30 coap://127.0.0.1/time

Then abort coap-client after it has received about 2 time updates. Capture output from coap-server and grep for a tid value to see the re-transmit times.

The node queue uses time deltas between each entry in the queue for retransmissions. The head entry does not have a time delta. When the head of the head of the queue is taken off for retransmission the next in the queue needs to have the time delta updated to be an absolute value. However, this does not happen if the current head has had the time offset fore set to 0 and so the retransmit timer fires more frequently than expected. There are 2 ways to fix this - not maintain time deltas between queue entries, or not set the head entries time to 0. Doing the latter means that code that relies on the deltas (I can only find the test suite) continue to work before calling coap_retransmit(). See Issue obgm#197 for further information.

mrdeep1 · 2018-08-01T07:47:57Z

@mbaril Can you please confirm the issue has now been fixed in the latest develop code.

obgm · 2018-11-06T12:35:11Z

Can anyone confirm this to be fixed?

obgm · 2019-09-20T10:39:09Z

Closing as the issue seems to be solved.

mrdeep1 mentioned this issue Jun 23, 2018

Update code to build LwIP and Contiki from the examples directory #208

Merged

mrdeep1 mentioned this issue Jul 10, 2018

coap_io.c,net.c: Fix head of node queue re-transmit timeouts #221

Merged

obgm closed this as completed Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry timeout do not work as expected #197

Retry timeout do not work as expected #197

mbaril commented Jun 5, 2018

mbaril commented Jun 5, 2018

mrdeep1 commented Jun 6, 2018

obgm commented Jun 6, 2018

mrdeep1 commented Jun 6, 2018

mrdeep1 commented Aug 1, 2018

obgm commented Nov 6, 2018

obgm commented Sep 20, 2019

Retry timeout do not work as expected #197

Retry timeout do not work as expected #197

Comments

mbaril commented Jun 5, 2018

mbaril commented Jun 5, 2018

mrdeep1 commented Jun 6, 2018

obgm commented Jun 6, 2018

mrdeep1 commented Jun 6, 2018

mrdeep1 commented Aug 1, 2018

obgm commented Nov 6, 2018

obgm commented Sep 20, 2019