-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async GET for Observe resource causes core dump #306
Comments
Handlers for observable resources may be called with the |
Thanks @obgm for getting back to me. I understand your comments about the request being NULL and it makes sense to me. Unfortunately, I didn't quite follow the second part of your answer - sorry. Are you saying that on the initial get of an observed resource I need to store the request such that then if the handler is called with a NULL (i.e. the resource has been updated) I can add the saved request to the call? If I am reading the last sentence correctly, do you mean that I shouldn't call coap_register_async() for the notification anyway? If so, what should I do? I guess maybe the bit I am missing is without the request, how does it know where to send the response to. Sorry if I am being dozy! Do you have an example of this scenario by any chance? thanks again Lee. |
The async code is primarily to test out the asynchronous responses inherent within the CoAP protocol. It is unclear as to what you are expecting to get sent back from the coap-server with an observe triggered response. When the async process completes (delayed by the defined amount of time), it simply sends back a "done" message. For example for
With the current hnd_get_async() in coap-server function, if called by an observe trigger (happens once per sec with current code, assuming the resource for hnd_get_async() is set as observeable), then as there is not a request from the client per-se, the request variable will be NULL. However, response will have been set up and you would be expected to populate it at a minimum with the token and a response code before returning from the function. Data may be useful to add in, but this data needs to be defined by you based on the current observe triggered response. For an observeable call to hnd_get_async() you should not set up another coap_register_async() as this async has already been registered. Then at some point in the future (based on the delay count), the "done" message will get sent. |
Thanks for the explanation. I think between the 2 the responses I think I see the gap in my understanding so thanks both for that. I had incorrectly assumed I needed the request to determine where to send the response to and the token. Also, I had assumed once you started with async all responses had to go the same way. I have now updated my get handler to do the following which seems to work (although for some reason the first 4 calls get a RST on the server). First call does the Asych response and then subsequent ones are updates when the resource is marked dirty.
My overall scenario is that I have a number of devices which are sending updates to this device using a different mechanism. What I would like to happen is that this node then sends these updates automatically to the observing clients. My logic was to send the initial GET and return the async ack from the server. Then as the new messages come in, I flag the resource as dirty and then it sends the updates to the observing clients. Having said this, having now got the hard coded message sent back OK, I am now unclear how I would get the data into the handler to then add to the response message. I could use a queue or something but unclear when to take the message off the queue. So have a few questions
I have seen some detailed documentation on the web describing the protocol and the API guide for libcoap but was wondering whether there was any other broader documentation containing the sort of information you both have provided above. Thanks again for your help Lee. |
I guess that I am confused by the parameters you have defined for async_handler() - it looks like you are not using the latest master / develop / 4.2.0 code. Any callback handler for a resource request will always have the response PDU already initialized (whether triggered by an actual GET (by handle_request()) or Observe trigger (by coap_notify_observers())) and when the callback returns, the response PDU is either forwarded on or dropped (primarily on whether response->code is set or not for handle_request()). So, in your case, you should NOT be creating a new PDU and sending it - update response PDU as appropriate. I do not think using async is appropriate here - primarily set up for testing async delayed responses. I think that you should be modelling things based on hnd_get() in examples/coap-server.c which then returns the updated value to all of the observing clients. man coap_observe(3) may hep you here as well. |
Hi, Thanks for your reply - very helpful. I am running this on an ESP32 and the version of libcoap which is distributed with it is
When you say the response PDU is already initialised, does that mean that all the options etc. are automatically set and all I need to do is add the data? Also I don't need to explicitly call coap_send()? I guess for my initial GET I could just return a "no data" response synchronously and then go from there. Digging around a bit I found that I could add attributes to the resource using coap_add_attr() before setting it as dirty. Was thinking that I could use this to send the necessary data to the get handler to enable it to create the response (I will need to know the device ID and the data string sent from the sending device). Is this right technique? Finally, neither of the following works on my system
I believe I have followed the installation instructions using autogen.sh but this was a few weeks ago so may have missed something, Thanks once again for all your help it has made a big difference to my understanding, Lee. |
Your code version is old - I suggest that you go with the current develop / master / 4.2.0 code where a lot of things have been addressed that you may stumble into if possible. Your version does not have the man pages - I suggest you go to https://libcoap.net/doc/reference/4.2.0/manpage.html and look there.
Correct - all this is handled for you by the caller of the get handler callback. You just need, at a minimum, to set response->code. coap_add_attr() is the wrong thing to use - it is used for describing attributes for the resource which can be looked up by using (for example) How you store the date you send off to the observe clients is up to you - it could be a static variable that observer responses pick up and send, or it could be in a static list keyed by the resource name if there are to be multiple variables monitored etc. hnd_get() in examples/coap-server in 4.2.0 or later does a dynamic resource lookup to get the right data to send back to the client. The dynamic resource lists are set up elsewhere in the code. |
Thanks for this, the man pages will be a big help. I will upgrade tonight - hopefully it is compatible with the ESP-IDF I have :) The overall paradigm I am trying to replicate is sort of like observing a list of items such that if one updates it sends only that update to the observers. In this case, I would either need to send the update to the handler or the unique ID such that the handler can retrieve the data. I thought this scenario would be achieved using the resource name to get the list and then using query syntax to access individual items if needed. Is this incorrect? Clearly in my specific example there is additional complexity around there is no actual data store to retrieve the data from and the requests come the other way but the paradigm is still the same. Thanks again Lee. |
coap_resource_t (4.2.0) has
which can be used to hold whatever your changing data is. Then whenever this data is changed, you just need to call |
Ah perfect thanks... Unfortunately I am struggling to update the libcoap version that I am using as part of the ESP_IDF. Although I have dropped the new version into the same place as the old, for some reason the compiler can no longer see the header files. I have raised a question on their forum to see how to do this. Thanks once again for your help Lee. |
You may need to adjust the include paths, i.e. |
Thanks for this. Looks like component.mk didn't come with the new version of libcoap so have copied it across from the old one and adjusted the settings to get past the error. I now have got past most of them so hopefully a couple of more tweaks to go. Thanks Lee. |
Correct: |
Will do. Will take a look tonight. |
So have tried to get this working and am struggling. I took a clean version of the library, extracted it into the component/coap directory and renamed it to libcoap. I then did the following
I had to remove the –enable-tests option as it gave errors about CUnit not found and I couldn't resolve it. I think I have done the config wrong as I get a variety of issues so wonder whether I should be specifying the build parameter or cross compile maybe/.
I assume there is some conflict between the lib being built on Ubuntu and the project that uses is build for ESP32? |
Which version of libcoap do you use? There should not be any Which definition of coap_address_t do you think is being used? Your configure invocation suggests that you are building for POSIX but for ESP32 you will most likely want to have the LWiP port. You might want to take a look into |
So I went to code, selected develop, picked release-4.2.0 and downloaded the zip file. I then extracted that into the components directory that is part of the IDF structure and then did the above. So... I believe 4.2.0 but maybe I did something wrong? I wonder if there are legacy files from the old version somewhere although I did try an auto-generated --clean. For my situation, I was wondering whether I should be setting the build / cross compile options on the configure to tell it that I don't want the code generated for the host OS but for ESP - the fact that the #define to use syslog was pointing me in this line of thinking |
Sorry, I just noted that the esp-idf sets |
Ah ok will give it a try without config. From memory though before I run it I believe some of the header files didnt exist and we .h.in or something. The port directory was in the old version but not the new one. Does that mean I should copy that across too? Sorry not at my pc to check properly. |
Yes, you will need to copy/keep/adjust from
And then clone libcoap as subdirectory libcoap there as well. In addition to |
Hi, I have updated as you suggested but it seems like initially when I extract everything and do autogen the include files are in include/coap2 and notably no debug.h. I then try to make my coap server project and somehow it seems to generate a bunch of files in include/coap and notably a debug.h file. It seems that some of the libcoap header and source files #include debug.h. File list in coap is
File list in coap2 is
This is the output of "diff --brief" on the coap vs coap2 directory. Seems like a bunch of files are very different between the 2.
Adding just coap2 gives compile errors looking for debug.h. adding both coap2 and coap gives a load of conflicts. I assume we are not expecting the coap directory to be generated? Does debug.h get generated during the compile normally? Definitely one step closer Thanks Lee. |
No. |
I had a quick look last night. The issue (in part) is down to the git submodule logic in esp-idf pulling in the old version of libcoap and hence the ongoing confusion. I am looking at how easy it is to bring esp-idf up to the 4.2.0 standard - I could have something ready by Monday. The biggest change will be to port/coap_io_socket.c, as well as /examples/protocols/coap_clent and /examples/protocols/coap_server. |
In the long run this would require mbedTLS integration for libcoap :-) |
.. a step at a time ! |
Ah thanks.... I have some time tonight to look at this so if there is anything you want me to check / do / validate, please let me know. |
@leenowell If you go to I have build coap_client and coap_server esp-idf examples, but have not been able to run the code on my build system. If this works for you, I can then create a PR for esp-idf. |
Unfortunately, I fell at the first hurdle.
I tried and got
After a bit of googling I then tried this
Still nothing in the directory.... wonder if I have now goosed my environment :) Any ideas? thanks |
ok - I have set logging to 9 and got it invoking gdb stub..... Output is....
where is....
thread apply all where
|
So have had a look at the latest one that triggered gdb and must say I have absolutely no idea what this file is doing so I'm no help at all I'm afraid. |
My educated guess is that by the time gdb was triggered, everything had simply exited.
I suggest that you put in this change again to see what happens in components/coap/port/include/coap_config_posix.h
Beyond that, without an ESP environment I am not going to be able to help much more |
I have applied that fix and looks like we have some progress. It has detected a stack overflow....
|
Looking at this further. Seems a load of threads on xQueueGenericReceive wonder if there is a recursive loop or something gone a bit wild and hence causing stack overflow? |
There needs to be some proper debugging of what is going on - stepping through the code to see what is breaking there. I do not have a ESP environment to do this against. For example, coap_example_task does not appear in your gdb output. However https://github.com/Ebiroll/qemu_esp32 may be of help here - I need the rom.bin and rom1.bin file as described in the README.md to see if I can use this environment.
I do not think you need to do Step 1. but...... These files are needed for me to do Step 6. These rom*.bin files will be too large to add this issue. Please email me them as zip'd attachments. |
OK will take a look at this tomorrow and send them over. Depending where you are based, I could potentially loan you an ESP32 but do you have a JTAG adapter to connect it to? I don't have one and without it I don't believe you can do proper debugging and are essentially left with what we have already been doing. |
Have just emailed the files. Please let me know if you don;t get them and will resend. |
@leenowell @mrdeep1 I have verified PR. The stack size of |
I tried updating
Seems to do the trick :). The log is now....
Does this mean we have got to the bottom of the issue? Thanks both for all your help :). Could you let me know when to take a clean pull and fix for the server too and I will be able to test the together later. |
@jitin17 Many thanks for looking into this - it was driving me nuts! @leenowell Before I make the changes and do another push of the code, please try removing
Then make the same stack size change to coap_server and see if that now works as well. |
Yes huge shout out to @mrdeep1. An incredibly tough nut to crack with multiple underlying problems and to make it more difficult no device to test it on :) Thanks for your tenacity. :) I am out all day today so won't be able to run the test until this evening but . Will try and talk someone else through it so I can give you the update earlier. Out of interest, what did qemu_esp32 enable you to do? |
I have pushed the code with the stack size changed to 10240 for both coap_client and coap_server for testing. The use of qemu_esp32 etc. in theory gives me an ESP emulation environment that I can test things out against from my Linux environment. |
Ok thanks. So all I need to do is take a fresh pull of your ESP-IDF branch again and compile both client and server and test? Did qemu_esp32 work? I have been looking for something like that to ease the build test cycle also means I can develop on the go. Wonder if it enables proper debugging on Linux. |
Yes, a fresh pull and git checkouts. qemu_esp32 is a work in progress - I have not had a chance to try out your .bin files to find out what the next hurdle is. |
Assuming we have sorted the other issue I'll take a look tonight too and see whether I can get it working. |
Hi, Well... looks like we have a breakthrough :). Both client and server compile and run fine. I have tested the client against the default URI and the server URI and both work fine. For the server, the client receives "no data" which looking at the code looks correct. Only thing to say is that the client seems to fire requests off very quickly so could do with a pause between each get. So.... where does this leave us? Was the ultimate root cause the stack size issue or are there other changes needed to libcoap to support ESP? I seem to recall you mentioning that ESP may not be supporting different fields on the recv and also we disabled logging etc. What is the best way for me to revert back to the normal ESP_IDF branch and also get the necessary fixes? Will have a look at QEMU now.... |
Excellent news There are several things needed here
No idea as to why the client repeats firing - I will double check, but I am sure that this is done the same way in the original code. Otherwise. I will work on getting these changes into ESP-IDF. To revert back , you just need to git clone the current esp-idf repository (espressif, not mrdeep1 in hte path). But then you will not be able to use libcoap 4.2.0 - all the fixes are in my copy of the repository under branch libcoap-4.2.0. |
The original client will fire off many times as far as I can tell from reading the code until it is terminated. I am going to be including the following change in the code I push, so it only fires off once.
|
So for the moment, should I continue with your branch so I can use 4.2.0? In terms of the client example, the repeated firing I think is useful. If you added a delay of a couple of seconds that means that you can see the 2 communicating and not miss everything because it is too quick. In terms of QEMU - I am having a few issues with it. I suspect the .bin files I sent you are not correct. Also, how you would add this to your own project (e.g. the coap_client) is unclear. If you are planning on trying to get it working, let me know and I will send you some fresh .bin files and send you lessons learned so far :) Thanks for all your help on the other issue. |
You are welcome to put in a sleep(2) if you want. Current pushed code has the code above in it. In terms of the QEMU stuff - I was going in that direction to see if I could debug what is going on. The .bin files could be wrong - but I was not able to get things going with the ones you sent me. I don't plan to spend much time on it at present. That said, it would be could to have a fresh copy of the .bin files and see what you learnt (send it by email). |
OK - let me try and get the coap_client running in QEMU and I will send you the working bin files and how I got it to work - assuming I get there! |
Hi, I have tried to get QEMU working and to be honest am struggling so have raised an question on that forum. So.... have reverted my attention back to the original project above. In my project, I have an ESP32 which receives data on ESPNow forwards it to ESPMesh and ultimately I need to get these updates to the end client via an observable get. In esp-idf, ESPNow and ESPMesh run on separate tasks/ thread so I have had to bridge between the two with an xQueue. Based on my current plan, I would need to call So... my question is....
Thanks for your help Lee. |
I likewise have not had any time to play with the QEMU stuff.
Yes - libcoap is only single threaded with no multi-thread protection, so can only be called by the coap application
It can only be done in the coap application. I know nothing about the esp-idf environment, but given that you added in a xQueue to send the data to ESPnow to ESPMesh, is there any reason why you cannot do the same with the coap application?
You could check in the while() loop that contains the coap_run_once() whether there is any new data, and if so (perhaps save it away somewhere) and then call coap_resource_notify_observers() |
Thanks for getting back to me. All the above (i.e. the mesh, espnow and coap ) is in the same application. When you refer to "coap application'" above do you mean the thread that is running the run once loop? If so, sounds like xQueue is the answer. I wasn't sure what you mean but you last answer. If I use an xQueue, I assume my mesh receive will add the message to the queue and the runonce coap loop will look for items on the queue add the data to the userdata on the resource and then call to notify the observers. Is this what you were suggesting? |
Yes - the coap specific code can only be on a single thread - that is what I was referring to as the "coap application" - sorry about the confusion.
Yes - call coap_resource_notify_observers() only whenever there is a change. |
I have implemented it with xQueue between the threads and seems to work fine. Thanks for your help. Please ping me if you get anywhere with QEMU and I will do the same. |
@leenowell Can this be closed now? |
Closing as the issue seems to be solved. |
Hi All,
I modified the example client and server "Hello World" example to make the async GET resource"Observable". The initial GET works fine but then having set the resource as dirty and called coap_check_notify() it core dumps. It looks like the reason is that handler receives a NULL request pointer and the call to coap_register_async calls coap_transaction_id which ultimately derefences the NULL pointer.
I have tried checking for NULL request and artificially creating a request PDU to send to the coap_register_async call. This prevents the core dump but the client then rejects it (I assume because the id is unknown to it). On the server I get the following message "ALRT got RST for message X" and no errors on the client. I have tried the following and none work
Using this technique, the client receives the initial response then 12 updates and nothing else received.
The async handler is (sorry not sure why the code won't render correctly)
`static void
async_handler(coap_context_t *ctx, struct coap_resource_t *resource,
const coap_endpoint_t *local_interface, coap_address_t *peer,
coap_pdu_t *request, str *token, coap_pdu_t *response)
{
if (request == NULL)
{
ESP_LOGI(TAG, "Request is NULL so creating one ID[%d]", nRequestID);
}
`
Is this a bug or am I doing something wrong?
thanks
Lee.
The text was updated successfully, but these errors were encountered: