Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async GET for Observe resource causes core dump #306

Closed
leenowell opened this issue Mar 3, 2019 · 171 comments
Closed

Async GET for Observe resource causes core dump #306

leenowell opened this issue Mar 3, 2019 · 171 comments

Comments

@leenowell
Copy link

leenowell commented Mar 3, 2019

Hi All,

I modified the example client and server "Hello World" example to make the async GET resource"Observable". The initial GET works fine but then having set the resource as dirty and called coap_check_notify() it core dumps. It looks like the reason is that handler receives a NULL request pointer and the call to coap_register_async calls coap_transaction_id which ultimately derefences the NULL pointer.

I have tried checking for NULL request and artificially creating a request PDU to send to the coap_register_async call. This prevents the core dump but the client then rejects it (I assume because the id is unknown to it). On the server I get the following message "ALRT got RST for message X" and no errors on the client. I have tried the following and none work

  1. storing the ID of the original request (i.e. request->hdr->id) and setting the ID of temp request PDU
  2. setting the ID of the temp request PDU to 0
  3. creating a new ID of for the temp request PDU
    Using this technique, the client receives the initial response then 12 updates and nothing else received.

The async handler is (sorry not sure why the code won't render correctly)

`static void
async_handler(coap_context_t *ctx, struct coap_resource_t *resource,
const coap_endpoint_t *local_interface, coap_address_t *peer,
coap_pdu_t *request, str *token, coap_pdu_t *response)
{
if (request == NULL)
{
ESP_LOGI(TAG, "Request is NULL so creating one ID[%d]", nRequestID);

	request = coap_new_pdu();
	if (request)
	{
		uint8_t     get_method = 1;
		request->hdr->type = COAP_MESSAGE_CON;
		request->hdr->id   = nRequestID; //coap_new_message_id(ctx);
		request->hdr->code = get_method;
		unsigned char buf[3]; 
		coap_add_option(request, COAP_OPTION_OBSERVE, coap_encode_var_bytes(buf, COAP_OBSERVE_ESTABLISH), buf); 

		//coap_add_option(request, COAP_OPTION_URI_PATH, uri.path.length, uri.path.s);
	}
	else
		return;
}
else
	nRequestID = request->hdr->id;


async = coap_register_async(ctx, peer, request, COAP_ASYNC_SEPARATE | COAP_ASYNC_CONFIRM, (void*)"no data");

}
`

Is this a bug or am I doing something wrong?

thanks

Lee.

@obgm
Copy link
Owner

obgm commented Mar 3, 2019

Handlers for observable resources may be called with the request parameter set to NULL. This tells a handler that there is no actual request but the handler has been called because the observed resource has been marked dirty. As a consequence, when you want to register a request for a separate response, you will need to make a copy of the original request and pass it to coap_register_async(). But as notifications are separate responses per definition, you will never have any reason to do that, anyway.

@obgm obgm added the question label Mar 3, 2019
@leenowell
Copy link
Author

Thanks @obgm for getting back to me. I understand your comments about the request being NULL and it makes sense to me. Unfortunately, I didn't quite follow the second part of your answer - sorry. Are you saying that on the initial get of an observed resource I need to store the request such that then if the handler is called with a NULL (i.e. the resource has been updated) I can add the saved request to the call? If I am reading the last sentence correctly, do you mean that I shouldn't call coap_register_async() for the notification anyway? If so, what should I do? I guess maybe the bit I am missing is without the request, how does it know where to send the response to. Sorry if I am being dozy!

Do you have an example of this scenario by any chance?

thanks again

Lee.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 4, 2019

The async code is primarily to test out the asynchronous responses inherent within the CoAP protocol.

It is unclear as to what you are expecting to get sent back from the coap-server with an observe triggered response. When the async process completes (delayed by the defined amount of time), it simply sends back a "done" message. For example for coap-client -v9 coap://127.0.0.1/async?3

Mar 04 12:40:31 DEBG *  127.0.0.1:42090 <-> 127.0.0.1:5683 UDP : sent 12 bytes
v:1 t:CON c:GET i:736a {} [ Uri-Path:async, Uri-Query:3 ]
Mar 04 12:40:31 DEBG *  127.0.0.1:42090 <-> 127.0.0.1:5683 UDP : received 4 bytes
v:1 t:ACK c:0.00 i:736a {} [ ]
Mar 04 12:40:34 DEBG *  127.0.0.1:42090 <-> 127.0.0.1:5683 UDP : received 9 bytes
v:1 t:CON c:2.05 i:736a {} [ ] :: 'done'

With the current hnd_get_async() in coap-server function, if called by an observe trigger (happens once per sec with current code, assuming the resource for hnd_get_async() is set as observeable), then as there is not a request from the client per-se, the request variable will be NULL. However, response will have been set up and you would be expected to populate it at a minimum with the token and a response code before returning from the function. Data may be useful to add in, but this data needs to be defined by you based on the current observe triggered response. For an observeable call to hnd_get_async() you should not set up another coap_register_async() as this async has already been registered.

Then at some point in the future (based on the delay count), the "done" message will get sent.

@leenowell
Copy link
Author

leenowell commented Mar 4, 2019

Thanks for the explanation. I think between the 2 the responses I think I see the gap in my understanding so thanks both for that. I had incorrectly assumed I needed the request to determine where to send the response to and the token. Also, I had assumed once you started with async all responses had to go the same way. I have now updated my get handler to do the following which seems to work (although for some reason the first 4 calls get a RST on the server). First call does the Asych response and then subsequent ones are updates when the resource is marked dirty.

static void
async_handler(coap_context_t *ctx, struct coap_resource_t *resource,
              const coap_endpoint_t *local_interface, coap_address_t *peer,
              coap_pdu_t *request, str *token, coap_pdu_t *response)
{
    if (request == NULL)
    {
        unsigned char buf[3];
        char response_data[50];
    	sprintf(response_data, "Hello Obs [%d]", nNumbUpdates);

        response = coap_pdu_init(COAP_MESSAGE_CON, COAP_RESPONSE_CODE(205), 0, COAP_MAX_PDU_SIZE);
        response->hdr->id = coap_new_message_id(ctx);
        coap_add_token(response, sizeof(token), token);
        coap_add_option(response, COAP_OPTION_OBSERVE, coap_encode_var_bytes(buf, COAP_OBSERVE_ESTABLISH), buf); // LEE ADDED
        coap_add_option(response, COAP_OPTION_CONTENT_TYPE, coap_encode_var_bytes(buf, COAP_MEDIATYPE_TEXT_PLAIN), buf);

        ESP_LOGE(TAG,"Responding with [%s] length [%d]", (char*) response_data, strlen(response_data));
        coap_add_data  (response, strlen(response_data)+1, (unsigned char *)response_data);

        if (coap_send(ctx, local_interface, peer, response) == COAP_INVALID_TID) {

        }
    }
    else
    	async = coap_register_async(ctx, peer, request, COAP_ASYNC_SEPARATE | COAP_ASYNC_CONFIRM, (void*)"no data");
}

My overall scenario is that I have a number of devices which are sending updates to this device using a different mechanism. What I would like to happen is that this node then sends these updates automatically to the observing clients. My logic was to send the initial GET and return the async ack from the server. Then as the new messages come in, I flag the resource as dirty and then it sends the updates to the observing clients.

Having said this, having now got the hard coded message sent back OK, I am now unclear how I would get the data into the handler to then add to the response message. I could use a queue or something but unclear when to take the message off the queue. So have a few questions

  1. Is there a better way of achieving this?
  2. If I have several observers, will the handler be called once and then send the same message to each observer or will it be called once per observer? If the latter, how will I know when to remove the message from the queue

I have seen some detailed documentation on the web describing the protocol and the API guide for libcoap but was wondering whether there was any other broader documentation containing the sort of information you both have provided above.

Thanks again for your help

Lee.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 4, 2019

I guess that I am confused by the parameters you have defined for async_handler() - it looks like you are not using the latest master / develop / 4.2.0 code.

Any callback handler for a resource request will always have the response PDU already initialized (whether triggered by an actual GET (by handle_request()) or Observe trigger (by coap_notify_observers())) and when the callback returns, the response PDU is either forwarded on or dropped (primarily on whether response->code is set or not for handle_request()).

So, in your case, you should NOT be creating a new PDU and sending it - update response PDU as appropriate.

I do not think using async is appropriate here - primarily set up for testing async delayed responses. I think that you should be modelling things based on hnd_get() in examples/coap-server.c which then returns the updated value to all of the observing clients. man coap_observe(3) may hep you here as well.

@leenowell
Copy link
Author

Hi,

Thanks for your reply - very helpful. I am running this on an ESP32 and the version of libcoap which is distributed with it is

git describe --tags
v4.1.1-401-g6468887

When you say the response PDU is already initialised, does that mean that all the options etc. are automatically set and all I need to do is add the data? Also I don't need to explicitly call coap_send()?

I guess for my initial GET I could just return a "no data" response synchronously and then go from there. Digging around a bit I found that I could add attributes to the resource using coap_add_attr() before setting it as dirty. Was thinking that I could use this to send the necessary data to the get handler to enable it to create the response (I will need to know the device ID and the data string sent from the sending device). Is this right technique?

Finally, neither of the following works on my system

man coap_observe -  returns "No manual entry for coap_observe"
man coap_observe(3) - returns "bash: syntax error near unexpected token `(

I believe I have followed the installation instructions using autogen.sh but this was a few weeks ago so may have missed something,

Thanks once again for all your help it has made a big difference to my understanding,

Lee.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 5, 2019

Your code version is old - I suggest that you go with the current develop / master / 4.2.0 code where a lot of things have been addressed that you may stumble into if possible. Your version does not have the man pages - I suggest you go to https://libcoap.net/doc/reference/4.2.0/manpage.html and look there.

Also I don't need to explicitly call coap_send()?

Correct - all this is handled for you by the caller of the get handler callback. You just need, at a minimum, to set response->code.

coap_add_attr() is the wrong thing to use - it is used for describing attributes for the resource which can be looked up by using (for example) coap-client coap://127.0.0.1/.well-known/core.

How you store the date you send off to the observe clients is up to you - it could be a static variable that observer responses pick up and send, or it could be in a static list keyed by the resource name if there are to be multiple variables monitored etc. hnd_get() in examples/coap-server in 4.2.0 or later does a dynamic resource lookup to get the right data to send back to the client. The dynamic resource lists are set up elsewhere in the code.

@leenowell
Copy link
Author

Thanks for this, the man pages will be a big help. I will upgrade tonight - hopefully it is compatible with the ESP-IDF I have :)

The overall paradigm I am trying to replicate is sort of like observing a list of items such that if one updates it sends only that update to the observers. In this case, I would either need to send the update to the handler or the unique ID such that the handler can retrieve the data. I thought this scenario would be achieved using the resource name to get the list and then using query syntax to access individual items if needed. Is this incorrect?

Clearly in my specific example there is additional complexity around there is no actual data store to retrieve the data from and the requests come the other way but the paradigm is still the same.

Thanks again

Lee.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 5, 2019

coap_resource_t (4.2.0) has

  /**
   * This pointer is under user control. It can be used to store context for
   * the coap handler.
   */
  void *user_data;

which can be used to hold whatever your changing data is. Then whenever this data is changed, you just need to call coap_resource_notify_observers(your_resource, NULL); and then all the client observers (via your hnd_get() callback function will get updated. Any client that subscribes to observe 'your resource' will get updated.

@leenowell
Copy link
Author

Ah perfect thanks... Unfortunately I am struggling to update the libcoap version that I am using as part of the ESP_IDF. Although I have dropped the new version into the same place as the old, for some reason the compiler can no longer see the header files. I have raised a question on their forum to see how to do this.

Thanks once again for your help

Lee.

@obgm
Copy link
Owner

obgm commented Mar 6, 2019

You may need to adjust the include paths, i.e. -I must point to the directory containing coap2. Most likely, you need to adjust COMPONENT_ADD_INCLUDEDIRS in components/coap/component.mk.

@leenowell
Copy link
Author

Thanks for this. Looks like component.mk didn't come with the new version of libcoap so have copied it across from the old one and adjusted the settings to get past the error. I now have got past most of them so hopefully a couple of more tweaks to go.

Thanks

Lee.

@obgm
Copy link
Owner

obgm commented Mar 6, 2019

Correct: component.mk is part of the ESP-IDF and thus included there, not in libcoap. If you succeed with your adjustments, please contribute as PR to the ESP-IDF.

@leenowell
Copy link
Author

Will do. Will take a look tonight.

@leenowell
Copy link
Author

So have tried to get this working and am struggling. I took a clean version of the library, extracted it into the component/coap directory and renamed it to libcoap. I then did the following

./autogen.sh
./configure  --disable-documentation --enable-examples --disable-dtls –enable-shared
make

I had to remove the –enable-tests option as it gave errors about CUnit not found and I couldn't resolve it.

I think I have done the config wrong as I get a variety of issues so wonder whether I should be specifying the build parameter or cross compile maybe/.

  1. it is looking for syslog .h so I commented out the define in config.h
  2. Including the header files from coap2 async.c fails as debug.h is not found
  3. Including the header files from coap I get the following error message
    /opt/esp/esp-idf/components/coap/libcoap/include/coap/address.h:102:19: error: unknown type name 'coap_address_t'
    coap_address_init(coap_address_t *addr) {
    Which to be honest I don't understand as coap_address_t is typedef'd at the top of the file.

I assume there is some conflict between the lib being built on Ubuntu and the project that uses is build for ESP32?

@obgm obgm added the esp32 label Mar 7, 2019
@obgm
Copy link
Owner

obgm commented Mar 7, 2019

Which version of libcoap do you use? There should not be any config.h or debug.h.

Which definition of coap_address_t do you think is being used? Your configure invocation suggests that you are building for POSIX but for ESP32 you will most likely want to have the LWiP port. You might want to take a look into examples/lwip/Makefile for a working LWiP configuration.

@leenowell
Copy link
Author

So I went to code, selected develop, picked release-4.2.0 and downloaded the zip file. I then extracted that into the components directory that is part of the IDF structure and then did the above. So... I believe 4.2.0 but maybe I did something wrong? I wonder if there are legacy files from the old version somewhere although I did try an auto-generated --clean.

For my situation, I was wondering whether I should be setting the build / cross compile options on the configure to tell it that I don't want the code generated for the host OS but for ESP - the fact that the #define to use syslog was pointing me in this line of thinking

@obgm
Copy link
Owner

obgm commented Mar 7, 2019

Sorry, I just noted that the esp-idf sets WITH_POSIX and brings its own socket adaptation. So, it might be possible to go for the POSIX variant. Anyway, you do not have to (and should not run) configure because everything you need to configure is already provided in components/coap/port (you may need to add some things, though).

@leenowell
Copy link
Author

Ah ok will give it a try without config. From memory though before I run it I believe some of the header files didnt exist and we .h.in or something. The port directory was in the old version but not the new one. Does that mean I should copy that across too? Sorry not at my pc to check properly.

@obgm
Copy link
Owner

obgm commented Mar 7, 2019

Yes, you will need to copy/keep/adjust from components/coap the following files and directories:

CMakeLists.txt
component.mk
Makefile.projbuild
port

And then clone libcoap as subdirectory libcoap there as well.

In addition to COMPONENT_ADD_INCLUDEDIRS you will have to adjust COMPONENTS_OBJS in component.mk. (And you should hopefully be able to remove the flags that suppress the compiler warnings.)

@leenowell
Copy link
Author

leenowell commented Mar 7, 2019

Hi,

I have updated as you suggested but it seems like initially when I extract everything and do autogen the include files are in include/coap2 and notably no debug.h. I then try to make my coap server project and somehow it seems to generate a bunch of files in include/coap and notably a debug.h file. It seems that some of the libcoap header and source files #include debug.h.

File list in coap is


address.h  coap.h.in    encode.h     mem.h     prng.h       uri.h
async.h    coap_io.h    hashkey.h    net.h     resource.h   uthash.h
bits.h     coap_time.h  libcoap.h    option.h  str.h        utlist.h
block.h    debug.h      lwippools.h  pdu.h     subscribe.h

File list in coap2 is

address.h     coap_dtls.h     coap_io.h       lwippools.h  prng.h       uthash.h
async.h       coap_event.h    coap_session.h  mem.h        resource.h   utlist.h
bits.h        coap_hashkey.h  coap_time.h     net.h        str.h
block.h       coap.h.in       encode.h        option.h     subscribe.h
coap_debug.h  coap.h.windows  libcoap.h       pdu.h        uri.h

This is the output of "diff --brief" on the coap vs coap2 directory. Seems like a bunch of files are very different between the 2.

Files coap/address.h and coap2/address.h differ
Files coap/async.h and coap2/async.h differ
Files coap/bits.h and coap2/bits.h differ
Files coap/block.h and coap2/block.h differ
Only in coap2: coap_debug.h
Only in coap2: coap_dtls.h
Only in coap2: coap_event.h
Only in coap2: coap_hashkey.h
Files coap/coap.h.in and coap2/coap.h.in differ
Only in coap2: coap.h.windows
Files coap/coap_io.h and coap2/coap_io.h differ
Only in coap2: coap_session.h
Files coap/coap_time.h and coap2/coap_time.h differ
Only in coap: debug.h
Files coap/encode.h and coap2/encode.h differ
Only in coap: hashkey.h
Files coap/libcoap.h and coap2/libcoap.h differ
Files coap/lwippools.h and coap2/lwippools.h differ
Files coap/mem.h and coap2/mem.h differ
Files coap/net.h and coap2/net.h differ
Files coap/option.h and coap2/option.h differ
Files coap/pdu.h and coap2/pdu.h differ
Files coap/prng.h and coap2/prng.h differ
Files coap/resource.h and coap2/resource.h differ
Files coap/str.h and coap2/str.h differ
Files coap/subscribe.h and coap2/subscribe.h differ
Files coap/uri.h and coap2/uri.h differ
Files coap/uthash.h and coap2/uthash.h differ
Files coap/utlist.h and coap2/utlist.h differ


Adding just coap2 gives compile errors looking for debug.h. adding both coap2 and coap gives a load of conflicts. I assume we are not expecting the coap directory to be generated? Does debug.h get generated during the compile normally?

Definitely one step closer

Thanks

Lee.

@obgm
Copy link
Owner

obgm commented Mar 8, 2019

No.
Just remove anything from your old libcoap version that comes with the esp-idf and clone libcoap-4.2.0 as described before. Do not autogenerate anything. The files you need are already there. If some file wants to include debug.h from libcoap, just fix that include directive (try including coap_debug.h instead). There is no debug.h any more in libcoap.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 8, 2019

I had a quick look last night. The issue (in part) is down to the git submodule logic in esp-idf pulling in the old version of libcoap and hence the ongoing confusion.

I am looking at how easy it is to bring esp-idf up to the 4.2.0 standard - I could have something ready by Monday. The biggest change will be to port/coap_io_socket.c, as well as /examples/protocols/coap_clent and /examples/protocols/coap_server.

@obgm
Copy link
Owner

obgm commented Mar 8, 2019

In the long run this would require mbedTLS integration for libcoap :-)

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 8, 2019

.. a step at a time !

@leenowell
Copy link
Author

Ah thanks.... I have some time tonight to look at this so if there is anything you want me to check / do / validate, please let me know.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 8, 2019

@leenowell If you go to ~/esp/esp-idf/components/coap/libcoap and then do git checkout release-4.2.0 , that should get you the correct version of libcoap in your build tree. Then in ~/esp/esp-idf, install this patch esp-idf.txt on a clean environment as patch -p1 < esp-idf.txt which then enables libcoap to be built.

esp-idf.txt

I have build coap_client and coap_server esp-idf examples, but have not been able to run the code on my build system.

If this works for you, I can then create a PR for esp-idf.

@leenowell
Copy link
Author

Unfortunately, I fell at the first hurdle. git checkout release-4.2.0 fails with

error: pathspec 'release-4.2.0' did not match any file(s) known to git.

I tried git checkout -b release-4.2.0 obgm/libcoap

and got

4.2.0 obgm/libcoap
fatal: Cannot update paths and switch to branch 'release-4.2.0' at the same time.
Did you intend to checkout 'components/coap/libcoap/obgm/libcoap' which can not be resolved as commit?

After a bit of googling I then tried this


lee@leelaptop:/opt/esp/esp-idf/components/coap/libcoap$ git checkout -b release-4.2.0
D	components/coap/CMakeLists.txt
D	components/coap/Makefile.projbuild
D	components/coap/component.mk
D	components/coap/port/coap_io_socket.c
D	components/coap/port/include/coap/coap.h
D	components/coap/port/include/coap_config.h
D	components/coap/port/include/coap_config_posix.h
Switched to a new branch 'release-4.2.0'
lee@leelaptop:/opt/esp/esp-idf/components/coap/libcoap$ ls
lee@leelaptop:/opt/esp/esp-idf/components/coap/libcoap$ git checkout release-4.2.0
D	components/coap/CMakeLists.txt
D	components/coap/Makefile.projbuild
D	components/coap/component.mk
D	components/coap/port/coap_io_socket.c
D	components/coap/port/include/coap/coap.h
D	components/coap/port/include/coap_config.h
D	components/coap/port/include/coap_config_posix.h
Already on 'release-4.2.0'
lee@leelaptop:/opt/esp/esp-idf/components/coap/libcoap$ ls
lee@leelaptop:/opt/esp/esp-idf/components/coap/libcoap$ git fetch
remote: Enumerating objects: 2534, done.
remote: Counting objects: 100% (2534/2534), done.
remote: Compressing objects: 100% (111/111), done.
remote: Total 4104 (delta 2437), reused 2500 (delta 2413), pack-reused 1570
Receiving objects: 100% (4104/4104), 1.97 MiB | 885.00 KiB/s, done.
Resolving deltas: 100% (3111/3111), completed with 967 local objects.
From https://github.com/espressif/esp-idf
 * [new branch]      bd8733f7   -> origin/bd8733f7
 * [new branch]      d96f6d6b   -> origin/d96f6d6b
   a62cbfe..bba89e1  master     -> origin/master
   3fc3282..1b1053c  release/v3.0 -> origin/release/v3.0
   7fe18ef..cea310d  release/v3.1 -> origin/release/v3.1
   bed50a9..a7dc804  release/v3.2 -> origin/release/v3.2
 * [new branch]      release/v3.3 -> origin/release/v3.3
 * [new tag]         v3.3-beta2 -> v3.3-beta2
 * [new tag]         v3.1.3     -> v3.1.3
 * [new tag]         v3.2-beta3 -> v3.2-beta3
Fetching submodule components/bt/lib
remote: Enumerating objects: 33, done.
remote: Counting objects: 100% (33/33), done.
remote: Compressing objects: 100% (21/21), done.
remote: Total 33 (delta 22), reused 23 (delta 12), pack-reused 0
Unpacking objects: 100% (33/33), done.
From https://github.com/espressif/esp32-bt-lib
   06c3f28..48ecf82  master     -> origin/master
   bc66c9d..20feea1  release/v3.1 -> origin/release/v3.1
   f718106..1f6837b  release/v3.2 -> origin/release/v3.2
Fetching submodule components/esp32/lib
remote: Enumerating objects: 273, done.
remote: Counting objects: 100% (273/273), done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 353 (delta 266), reused 267 (delta 262), pack-reused 80
Receiving objects: 100% (353/353), 2.03 MiB | 1.07 MiB/s, done.
Resolving deltas: 100% (302/302), completed with 31 local objects.
From https://github.com/espressif/esp32-wifi-lib
   4123071..61530b0  master     -> origin/master
   3a6449c..bcb6ae7  release/v3.0 -> origin/release/v3.0
   f5ce277..21ffb68  release/v3.1 -> origin/release/v3.1
   ec07b86..4a4b808  release/v3.2 -> origin/release/v3.2

Still nothing in the directory.... wonder if I have now goosed my environment :)

Any ideas?

thanks
Lee.

@leenowell
Copy link
Author

ok - I have set logging to 9 and got it invoking gdb stub..... Output is....

I (2715) CoAP_client: Connected to AP
I (2725) CoAP_client: DNS lookup succeeded. IP=104.196.15.150
Guru Meditation Error: Core  0 panic'ed (LoadStoreError). Exception was unhandled.
Core 0 register dump:
PC      : 0x4008ee62  PS      : 0x00050033  A0      : 0x4008ed6b  A1      : 0x3ffc6160  
0x4008ee62: vPortYieldFromInt at /opt/esp/esp-idf/components/freertos/portasm.S:595

0x4008ed6b: _frxt_int_exit at /opt/esp/esp-idf/components/freertos/portasm.S:206

A2      : 0x4008ecec  A3      : 0x00000000  A4      : 0x00000001  A5      : 0x4008ecec  
0x4008ecec: _frxt_int_enter at /opt/esp/esp-idf/components/freertos/portasm.S:119

0x4008ecec: _frxt_int_enter at /opt/esp/esp-idf/components/freertos/portasm.S:119

A6      : 0x00000001  A7      : 0x00000000  A8      : 0x80081522  A9      : 0x3ffb0660  
A10     : 0x3ffb0c94  A11     : 0x00000001  A12     : 0x3ffc61d8  A13     : 0x0000cdcd  
A14     : 0x00000001  A15     : 0x00000000  SAR     : 0x00000016  EXCCAUSE: 0x00000003  
EXCVADDR: 0x4008ecec  LBEG    : 0x4000c28c  LEND    : 0x4000c296  LCOUNT  : 0x00000000  
0x4008ecec: _frxt_int_enter at /opt/esp/esp-idf/components/freertos/portasm.S:119


ELF file SHA256: 32db1a6ddd2a3b395e6643ce8bce9feda3d21027fbc2707894fdb516c54cd082

Backtrace: 0x4008ee62:0x3ffc6160 0x4008ed68:0x3ffc6170
0x4008ee62: vPortYieldFromInt at /opt/esp/esp-idf/components/freertos/portasm.S:595

0x4008ed68: _frxt_int_exit at /opt/esp/esp-idf/components/freertos/portasm.S:205


Entering gdb stub now.

where is....

(gdb) where
#0  vPortYieldFromInt () at /opt/esp/esp-idf/components/freertos/portasm.S:595
#1  0x4008ed6b in _frxt_int_exit ()
    at /opt/esp/esp-idf/components/freertos/portasm.S:205
(gdb) 

thread apply all where


(gdb) thread apply all where

Thread 1 (Thread <main>):
#0  vPortYieldFromInt () at /opt/esp/esp-idf/components/freertos/portasm.S:595
#1  0x4008ed6b in _frxt_int_exit ()
    at /opt/esp/esp-idf/components/freertos/portasm.S:205
(gdb)

@leenowell
Copy link
Author

So have had a look at the latest one that triggered gdb and must say I have absolutely no idea what this file is doing so I'm no help at all I'm afraid.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 12, 2019

My educated guess is that by the time gdb was triggered, everything had simply exited.
No idea at to why setting the logging level higher makes a difference unless you we not using te updated coap_server/coap_client in the new push I did to esp-idf earlier today.
The following lines were deleted in this push

coap_set_log_handler(logging_handler);
coap_set_show_pdu_output(0);

I suggest that you put in this change again to see what happens in components/coap/port/include/coap_config_posix.h

+#define coap_log(level, ...) printf("log %s %d\n", __FUNCTION__, __LINE__)`
+
 #endif /* WITH_POSIX */
 #endif /* COAP_CONFIG_POSIX_H_ */

Beyond that, without an ESP environment I am not going to be able to help much more

@leenowell
Copy link
Author

I have applied that fix and looks like we have some progress. It has detected a stack overflow....

I (2222) CoAP_client: DNS lookup succeeded. IP=104.196.15.150
log coap_new_client_session 674
log coap_session_send 231
***ERROR*** A stack overflow in task coap has been detected.
abort() was called at PC 0x4008714c on core 0
0x4008714c: vApplicationStackOverflowHook at /opt/esp/esp-idf/components/esp32/panic.c:715


ELF file SHA256: d1d061ba675623d4ef2125165b789233cfc06e10ce30f7579db3a7060dc73a43

Backtrace: 0x40086ee0:0x3ffc5890 0x40087135:0x3ffc58b0 0x4008714c:0x3ffc58d0 0x4008d364:0x3ffc58f0 0x4008edb8:0x3ffc5910 0x4008ed6e:0x00000000
0x40086ee0: invoke_abort at /opt/esp/esp-idf/components/esp32/panic.c:715

0x40087135: abort at /opt/esp/esp-idf/components/esp32/panic.c:715

0x4008714c: vApplicationStackOverflowHook at /opt/esp/esp-idf/components/esp32/panic.c:715

0x4008d364: vTaskSwitchContext at /opt/esp/esp-idf/components/freertos/tasks.c:5093

0x4008edb8: _frxt_dispatch at /opt/esp/esp-idf/components/freertos/portasm.S:406

0x4008ed6e: _frxt_int_exit at /opt/esp/esp-idf/components/freertos/portasm.S:206


Entering gdb stub now.
$T0b#e6GNU gdb (crosstool-NG crosstool-ng-1.22.0-73-ge28a011) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-build_pc-linux-gnu --target=xtensa-esp32-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/esp/esp-idf/examples/protocols/coap_client/build/coap_client.elf...done.
Remote debugging using /dev/ttyUSB0
0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffbfc48, pvBuffer=0x3ffc1070, 
    xTicksToWait=4294967295, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
1592					portYIELD_WITHIN_API();
(gdb) where
#0  0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffbfc48, pvBuffer=0x3ffc1070, 
    xTicksToWait=4294967295, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
#1  0x400f0f74 in queue_recv_wrapper (queue=0x3ffbfc48, item=0x3ffc1070, 
    block_time_tick=4294967295)
    at /opt/esp/esp-idf/components/esp32/esp_adapter.c:302
#2  0x4008b280 in ppTask ()
#3  0x4008cd6c in vPortTaskWrapper (pxCode=0x4008b250 <ppTask>, pvParameters=0x0)
    at /opt/esp/esp-idf/components/freertos/port.c:143
(gdb) thread apply all where

Thread 10 (Thread 9):
#0  0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffafe10, pvBuffer=0x0, 
    xTicksToWait=4294967295, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
#1  0x4008203f in ipc_task (arg=0x0) at /opt/esp/esp-idf/components/esp32/ipc.c:51
#2  0x4008cd6c in vPortTaskWrapper (pxCode=0x40082010 <ipc_task>, pvParameters=0x0)
    at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 9 (Thread 8):
#0  0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffbe8c4, pvBuffer=0x3ffbf940, 
    xTicksToWait=4294967295, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
#1  0x400f19ea in esp_event_loop_task (pvParameters=0x0)
    at /opt/esp/esp-idf/components/esp32/event_loop.c:53
#2  0x4008cd6c in vPortTaskWrapper (pxCode=0x400f19d8 <esp_event_loop_task>, 
    pvParameters=0x0) at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 8 (Thread 7):
#0  0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffaea50, pvBuffer=0x0, 
    xTicksToWait=4294967295, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
#1  0x400d29e7 in timer_task (arg=0x0)
    at /opt/esp/esp-idf/components/esp32/esp_timer.c:316
#2  0x4008cd6c in vPortTaskWrapper (pxCode=0x400d29d4 <timer_task>, 
    pvParameters=0x0) at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 7 (Thread 6):
#0  0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffb9958, pvBuffer=0x0, 
    xTicksToWait=4294967295, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
#1  0x4008203f in ipc_task (arg=0x1) at /opt/esp/esp-idf/components/esp32/ipc.c:51
#2  0x4008cd6c in vPortTaskWrapper (pxCode=0x40082010 <ipc_task>, pvParameters=0x1)
    at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 6 (Thread 5):
#0  0x4008eaa0 in prvProcessTimerOrBlockTask (xNextExpireTime=<optimized out>, 
    xListWasEmpty=<optimized out>)
    at /opt/esp/esp-idf/components/freertos/timers.c:589
#1  0x4008eb93 in prvTimerTask (pvParameters=0x0)
    at /opt/esp/esp-idf/components/freertos/timers.c:544
#2  0x4008cd6c in vPortTaskWrapper (pxCode=0x4008eb84 <prvTimerTask>, 
    pvParameters=0x0) at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 5 (Thread 4):
#0  0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffbd7b8, pvBuffer=0x3ffbe5d0, 
    xTicksToWait=9, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
---Type <return> to continue, or q <return> to quit---
#1  0x40137fa8 in sys_arch_mbox_fetch (mbox=<optimized out>, msg=0x3ffbe5d0, 
    timeout=<optimized out>)
    at /opt/esp/esp-idf/components/lwip/port/esp32/freertos/sys_arch.c:297
#2  0x4012a859 in sys_timeouts_mbox_fetch (mbox=0x3ffb84b8 <mbox>, msg=0x3ffbe5d0)
    at /opt/esp/esp-idf/components/lwip/lwip/src/core/timeouts.c:430
#3  0x4012753f in tcpip_thread (arg=<optimized out>)
    at /opt/esp/esp-idf/components/lwip/lwip/src/api/tcpip.c:109
#4  0x4008cd6c in vPortTaskWrapper (pxCode=0x40127524 <tcpip_thread>, 
    pvParameters=0x0) at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 4 (Thread 3):
#0  0x401429e6 in esp_pm_impl_waiti ()
    at /opt/esp/esp-idf/components/esp32/pm_esp32.c:487
#1  0x400d32a2 in esp_vApplicationIdleHook ()
    at /opt/esp/esp-idf/components/esp32/freertos_hooks.c:63
#2  0x4008dd94 in prvIdleTask (pvParameters=0x0)
    at /opt/esp/esp-idf/components/freertos/tasks.c:3412
#3  0x4008cd6c in vPortTaskWrapper (pxCode=0x4008dd88 <prvIdleTask>, 
    pvParameters=0x0) at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 3 (Thread 2):
#0  0x401429e6 in esp_pm_impl_waiti ()
    at /opt/esp/esp-idf/components/esp32/pm_esp32.c:487
#1  0x400d32a2 in esp_vApplicationIdleHook ()
    at /opt/esp/esp-idf/components/esp32/freertos_hooks.c:63
#2  0x4008dd94 in prvIdleTask (pvParameters=0x0)
    at /opt/esp/esp-idf/components/freertos/tasks.c:3412
#3  0x4008cd6c in vPortTaskWrapper (pxCode=0x4008dd88 <prvIdleTask>, 
    pvParameters=0x0) at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 2 (Thread 1):
#0  0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffbfc48, pvBuffer=0x3ffc1070, 
    xTicksToWait=4294967295, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
#1  0x400f0f74 in queue_recv_wrapper (queue=0x3ffbfc48, item=0x3ffc1070, 
    block_time_tick=4294967295)
    at /opt/esp/esp-idf/components/esp32/esp_adapter.c:302
#2  0x4008b280 in ppTask ()
#3  0x4008cd6c in vPortTaskWrapper (pxCode=0x4008b250 <ppTask>, pvParameters=0x0)
    at /opt/esp/esp-idf/components/freertos/port.c:143

Thread 1 (Remote target):
#0  0x4008cbc9 in xQueueGenericReceive (xQueue=0x3ffbfc48, pvBuffer=0x3ffc1070, 
    xTicksToWait=4294967295, xJustPeeking=0)
    at /opt/esp/esp-idf/components/freertos/queue.c:1592
#1  0x400f0f74 in queue_recv_wrapper (queue=0x3ffbfc48, item=0x3ffc1070, 
    block_time_tick=4294967295)
    at /opt/esp/esp-idf/components/esp32/esp_adapter.c:302
---Type <return> to continue, or q <return> to quit---
#2  0x4008b280 in ppTask ()
#3  0x4008cd6c in vPortTaskWrapper (pxCode=0x4008b250 <ppTask>, pvParameters=0x0)
    at /opt/esp/esp-idf/components/freertos/port.c:143
(gdb) 

@leenowell
Copy link
Author

leenowell commented Mar 12, 2019

Looking at this further. Seems a load of threads on xQueueGenericReceive wonder if there is a recursive loop or something gone a bit wild and hence causing stack overflow?

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 12, 2019

There needs to be some proper debugging of what is going on - stepping through the code to see what is breaking there. I do not have a ESP environment to do this against.

For example, coap_example_task does not appear in your gdb output.

However https://github.com/Ebiroll/qemu_esp32 may be of help here - I need the rom.bin and rom1.bin file as described in the README.md to see if I can use this environment.

2. Dump rom1.bin and rom.bin
~/esp/esp-idf/components/esptool_py/esptool/esptool.py --chip esp32 -b 921600 -p /dev/ttyUSB0 dump_mem 0x40000000 0x000C2000 rom.bin
~/esp/esp-idf/components/esptool_py/esptool/esptool.py --chip esp32 -b 921600 -p /dev/ttyUSB0 dump_mem 0x3FF90000 0x00010000 rom1.bin

I do not think you need to do Step 1. but...... These files are needed for me to do Step 6.

These rom*.bin files will be too large to add this issue. Please email me them as zip'd attachments.

@leenowell
Copy link
Author

OK will take a look at this tomorrow and send them over.

Depending where you are based, I could potentially loan you an ESP32 but do you have a JTAG adapter to connect it to? I don't have one and without it I don't believe you can do proper debugging and are essentially left with what we have already been doing.

@leenowell
Copy link
Author

Have just emailed the files. Please let me know if you don;t get them and will resend.

@jitin17
Copy link

jitin17 commented Mar 13, 2019

@leenowell @mrdeep1 I have verified PR. The stack size of coap_client example was 2K, which is extremely low, and this is the reason behind the crash. I found the stack requirement to be little less than 4K. So, @leenowell try updating the stack size of coap_example_task task in coap_client example to 4K and let me know if it works.

@leenowell
Copy link
Author

I tried updating

xTaskCreate(coap_example_task, "coap", 2048, NULL, 5, NULL);
to

xTaskCreate(coap_example_task, "coap", 4096, NULL, 5, NULL);
but updating to

xTaskCreate(coap_example_task, "coap", 10000, NULL, 5, NULL);

Seems to do the trick :). The log is now....


I (0) cpu_start: App cpu up.
I (407) heap_init: Initializing. RAM available for dynamic allocation:
I (414) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (420) heap_init: At 3FFB9538 len 00026AC8 (154 KiB): DRAM
I (427) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (433) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (439) heap_init: At 4008FB60 len 000104A0 (65 KiB): IRAM
I (446) cpu_start: Pro cpu start user code
I (128) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (232) wifi: wifi driver task: 3ffc1134, prio:23, stack:3584, core=0
I (232) wifi: wifi firmware version: 53ea8b1
I (232) wifi: config NVS flash: enabled
I (232) wifi: config nano formating: disabled
I (242) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (252) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (282) wifi: Init dynamic tx buffer num: 32
I (282) wifi: Init data frame dynamic rx buffer num: 32
I (282) wifi: Init management frame dynamic rx buffer num: 32
I (282) wifi: Init management short buffer num: 32
I (282) wifi: Init static rx buffer size: 1600
I (292) wifi: Init static rx buffer num: 10
I (292) wifi: Init dynamic rx buffer num: 32
I (392) phy: phy_version: 4100, 6fa5e27, Jan 25 2019, 17:02:06, 0, 0
I (392) wifi: mode : sta (3c:71:bf:96:e6:40)
I (512) wifi: new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (1502) wifi: state: init -> auth (b0)
I (1502) wifi: state: auth -> assoc (0)
I (1512) wifi: state: assoc -> run (10)
I (1562) wifi: connected with Lounge, channel 1, bssid = 00:f2:01:18:c9:c0
I (1562) wifi: pm start, type: 1

I (2712) event: sta ip: 192.168.1.232, mask: 255.255.255.0, gw: 192.168.1.1
I (2712) CoAP_client: Connected to AP
I (2732) CoAP_client: DNS lookup succeeded. IP=104.196.15.150
log coap_new_client_session 674
log coap_session_send 231
v:1 t:CON c:GET i:46d0 {} [ ]
log coap_wait_ack 870
log coap_read_session 1152
v:1 t:ACK c:2.05 i:46d0 {} [ Content-Format:text/plain, Block2:0/M/64, Size2:448 ] :: '************************************************************\x0ACoA'
log coap_remove_from_queue 1413
Received:
************************************************************
CoAlog coap_session_send 231
v:1 t:CON c:GET i:46d1 {} [ Block2:1/_/64 ]
log coap_wait_ack 870
log coap_read_session 1152
v:1 t:ACK c:2.05 i:46d1 {} [ Content-Format:text/plain, Block2:1/M/64 ] :: 'P RFC 7252                              Cf 2.0.0-SNAPSHOT\x0A******'
log coap_remove_from_queue 1413
P RFC 7252                              Cf 2.0.0-SNAPSHOT
******log coap_session_send 231
v:1 t:CON c:GET i:46d2 {} [ Block2:2/_/64 ]
log coap_wait_ack 870
log coap_read_session 1152
v:1 t:ACK c:2.05 i:46d2 {} [ Content-Format:text/plain, Block2:2/M/64 ] :: '******************************************************\x0AThis serv'
log coap_remove_from_queue 1413
******************************************************
This servlog coap_session_send 231
v:1 t:CON c:GET i:46d3 {} [ Block2:3/_/64 ]
log coap_wait_ack 870
log coap_read_session 1152
v:1 t:ACK c:2.05 i:46d3 {} [ Content-Format:text/plain, Block2:3/M/64 ] :: 'er is using the Eclipse Californium (Cf) CoAP framework\x0Apublishe'
log coap_remove_from_queue 1413
er is using the Eclipse Californium (Cf) CoAP framework
publishelog coap_session_send 231
v:1 t:CON c:GET i:46d4 {} [ Block2:4/_/64 ]
log coap_wait_ack 870
log coap_read_session 1152
v:1 t:ACK c:2.05 i:46d4 {} [ Content-Format:text/plain, Block2:4/M/64 ] :: 'd under EPL+EDL: http://www.eclipse.org/californium/\x0A\x0A(c) 2014, '
log coap_remove_from_queue 1413
d under EPL+EDL: http://www.eclipse.org/californium/

(c) 2014, log coap_session_send 231
v:1 t:CON c:GET i:46d5 {} [ Block2:5/_/64 ]
log coap_wait_ack 870
log coap_read_session 1152
v:1 t:ACK c:2.05 i:46d5 {} [ Content-Format:text/plain, Block2:5/M/64 ] :: '2015, 2016 Institute for Pervasive Computing, ETH Zurich and oth'
log coap_remove_from_queue 1413
2015, 2016 Institute for Pervasive Computing, ETH Zurich and othlog coap_session_send 231
v:1 t:CON c:GET i:46d6 {} [ Block2:6/_/64 ]
log coap_wait_ack 870
log coap_read_session 1152
v:1 t:ACK c:2.05 i:46d6 {} [ Content-Format:text/plain, Block2:6/_/64 ] :: 'ers\x0A************************************************************'
log coap_remove_from_queue 1413
ers
************************************************************
log coap_session_free 184
I (4592) CoAP_client: Connected to AP
I (4592) CoAP_client: DNS lookup succeeded. IP=104.196.15.150

Does this mean we have got to the bottom of the issue? Thanks both for all your help :). Could you let me know when to take a clean pull and fix for the server too and I will be able to test the together later.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 13, 2019

@jitin17 Many thanks for looking into this - it was driving me nuts!

@leenowell Before I make the changes and do another push of the code, please try removing

#define coap_log(level, ...) printf("log %s %d\n", __FUNCTION__, __LINE__) from components/coap/port/include/coap_config_posix.h and retest coap_client.

Then make the same stack size change to coap_server and see if that now works as well.

@leenowell
Copy link
Author

Yes huge shout out to @mrdeep1. An incredibly tough nut to crack with multiple underlying problems and to make it more difficult no device to test it on :) Thanks for your tenacity. :)

I am out all day today so won't be able to run the test until this evening but . Will try and talk someone else through it so I can give you the update earlier.

Out of interest, what did qemu_esp32 enable you to do?

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 13, 2019

I have pushed the code with the stack size changed to 10240 for both coap_client and coap_server for testing.

The use of qemu_esp32 etc. in theory gives me an ESP emulation environment that I can test things out against from my Linux environment.

@leenowell
Copy link
Author

Ok thanks. So all I need to do is take a fresh pull of your ESP-IDF branch again and compile both client and server and test?

Did qemu_esp32 work? I have been looking for something like that to ease the build test cycle also means I can develop on the go. Wonder if it enables proper debugging on Linux.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 13, 2019

Yes, a fresh pull and git checkouts.

qemu_esp32 is a work in progress - I have not had a chance to try out your .bin files to find out what the next hurdle is.

@leenowell
Copy link
Author

Assuming we have sorted the other issue I'll take a look tonight too and see whether I can get it working.

@leenowell
Copy link
Author

Hi,

Well... looks like we have a breakthrough :). Both client and server compile and run fine. I have tested the client against the default URI and the server URI and both work fine. For the server, the client receives "no data" which looking at the code looks correct. Only thing to say is that the client seems to fire requests off very quickly so could do with a pause between each get.

So.... where does this leave us? Was the ultimate root cause the stack size issue or are there other changes needed to libcoap to support ESP? I seem to recall you mentioning that ESP may not be supporting different fields on the recv and also we disabled logging etc.

What is the best way for me to revert back to the normal ESP_IDF branch and also get the necessary fixes? Will have a look at QEMU now....

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 13, 2019

Excellent news

There are several things needed here

  1. Increase stack size in coap_server / coap_client
  2. Update changes to upgrade libcoap to 4.2.0 in ESP-IDF
  3. Make some changes (hence port/coap_io.c) to support the ESP-IDF way of doing i/o and get around limitations of the ESP-IDF port of recvmsg() and partly of the sendmsg() implementation.

No idea as to why the client repeats firing - I will double check, but I am sure that this is done the same way in the original code.

Otherwise. I will work on getting these changes into ESP-IDF.

To revert back , you just need to git clone the current esp-idf repository (espressif, not mrdeep1 in hte path). But then you will not be able to use libcoap 4.2.0 - all the fixes are in my copy of the repository under branch libcoap-4.2.0.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 13, 2019

The original client will fire off many times as far as I can tell from reading the code until it is terminated. I am going to be including the following change in the code I push, so it only fires off once.

diff --git a/examples/protocols/coap_client/main/coap_client_example_main.c b/examples/protocols/coap_client/main/coap_client_example_main
index 483b31b..76810a7 100644
--- a/examples/protocols/coap_client/main/coap_client_example_main.c
+++ b/examples/protocols/coap_client/main/coap_client_example_main.c
@@ -293,6 +293,8 @@ clean_up:
         if (session) coap_session_release(session);
         if (ctx) coap_free_context(ctx);
         coap_cleanup();
+        /* Only send the request off once */
+        break;
     }
 
     vTaskDelete(NULL);

@leenowell
Copy link
Author

So for the moment, should I continue with your branch so I can use 4.2.0?

In terms of the client example, the repeated firing I think is useful. If you added a delay of a couple of seconds that means that you can see the 2 communicating and not miss everything because it is too quick.

In terms of QEMU - I am having a few issues with it. I suspect the .bin files I sent you are not correct. Also, how you would add this to your own project (e.g. the coap_client) is unclear. If you are planning on trying to get it working, let me know and I will send you some fresh .bin files and send you lessons learned so far :)

Thanks for all your help on the other issue.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 13, 2019

You are welcome to put in a sleep(2) if you want. Current pushed code has the code above in it.

In terms of the QEMU stuff - I was going in that direction to see if I could debug what is going on. The .bin files could be wrong - but I was not able to get things going with the ones you sent me. I don't plan to spend much time on it at present. That said, it would be could to have a fresh copy of the .bin files and see what you learnt (send it by email).

@leenowell
Copy link
Author

OK - let me try and get the coap_client running in QEMU and I will send you the working bin files and how I got it to work - assuming I get there!

@leenowell
Copy link
Author

Hi,

I have tried to get QEMU working and to be honest am struggling so have raised an question on that forum. So.... have reverted my attention back to the original project above.

In my project, I have an ESP32 which receives data on ESPNow forwards it to ESPMesh and ultimately I need to get these updates to the end client via an observable get. In esp-idf, ESPNow and ESPMesh run on separate tasks/ thread so I have had to bridge between the two with an xQueue.

Based on my current plan, I would need to call coap_resource_notify_observers in the mesh receive callback but don't have a coap resource to pass to the call as I would have if I were to do this from a coap handler.

So... my question is....

  1. Do I need to worry about which task/ thread I am calling coap_resource_notify_observers from?
  2. What is the best way to handle resource situation?
  3. In the server example, the put handler calls coap_resource_notify_observers at the beginning of the function but I am unclear how in my scenario (given I am not in a doap handler) when / how I would add my data to a response for it then to be sent on.

Thanks for your help

Lee.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 16, 2019

I likewise have not had any time to play with the QEMU stuff.

Do I need to worry about which task/ thread I am calling coap_resource_notify_observers from?

Yes - libcoap is only single threaded with no multi-thread protection, so can only be called by the coap application

What is the best way to handle resource situation?

It can only be done in the coap application. I know nothing about the esp-idf environment, but given that you added in a xQueue to send the data to ESPnow to ESPMesh, is there any reason why you cannot do the same with the coap application?

In the server example, the put handler calls coap_resource_notify_observers at the beginning of the function but I am unclear how in my scenario (given I am not in a coap handler) when / how I would add my data to a response for it then to be sent on.

You could check in the while() loop that contains the coap_run_once() whether there is any new data, and if so (perhaps save it away somewhere) and then call coap_resource_notify_observers()

@leenowell
Copy link
Author

Thanks for getting back to me. All the above (i.e. the mesh, espnow and coap ) is in the same application. When you refer to "coap application'" above do you mean the thread that is running the run once loop? If so, sounds like xQueue is the answer.

I wasn't sure what you mean but you last answer. If I use an xQueue, I assume my mesh receive will add the message to the queue and the runonce coap loop will look for items on the queue add the data to the userdata on the resource and then call to notify the observers. Is this what you were suggesting?

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Mar 16, 2019

Yes - the coap specific code can only be on a single thread - that is what I was referring to as the "coap application" - sorry about the confusion.

I wasn't sure what you mean but you last answer. If I use an xQueue, I assume my mesh receive will add the message to the queue and the runonce coap loop will look for items on the queue add the data to the userdata on the resource and then call to notify the observers. Is this what you were suggesting?

Yes - call coap_resource_notify_observers() only whenever there is a change.
However, https://tools.ietf.org/html/rfc7641#section-3.3.1 and https://tools.ietf.org/html/rfc7641#section-4.3.1 indicate the server should set the MAX-AGE option, and the server should trigger an observer notifier when the max-age expires - else the client should separately request an update when the max-age expires.

@leenowell
Copy link
Author

I have implemented it with xQueue between the threads and seems to work fine. Thanks for your help. Please ping me if you get anywhere with QEMU and I will do the same.

@mrdeep1
Copy link
Collaborator

mrdeep1 commented Apr 24, 2019

@leenowell Can this be closed now?

@obgm
Copy link
Owner

obgm commented Sep 20, 2019

Closing as the issue seems to be solved.

@obgm obgm closed this as completed Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants