-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash (heap corruption) when rapidly refreshing the web page #394
Comments
Some stack traces:
|
Firefox 61.0.1 x64 doesn't cause crashes, whereas Chrome 68.0.3440.84 x64 does. |
First and last seem to be related to trying to free something that is already freed, though the second one looks similar. Could be related to “Connection: close” header for web sockets |
Could you please tell more details? Can I fix it as a user of the library, or it's you who should fix it? There is no active websocket in the example, by the way, although the browser tries to connect it. I will try to remove it from the webpage code and report back. |
No difference after removing websocket code from stored web page. I've noticed that "Connection: close" is sent with every response, not just web sockets. Is it a problem? |
Looks like a duplicate of #324. But I'm not sure. |
pull the latest changes for AsyncTCP and the WebServer. I just pushed a probable fix for all of those errors |
Looks a bit better, but still throwing the exception from time to time:
Decoded
One thing I noticed: After it crashed once I cant get it to crash again. Am I seeing things? |
it's just really hard to crash it. I know why it happens but I don't have a final fix yet... if you use the server normally, all is fine. Crashes when you refresh your browser while it's loading (that causes premature closing of connections and opening new ones) |
It seems that it works better than before. |
3 hours of uptime and crash: Backtrace: 0x4008888c:0x3ffd9e70 0x4008898b:0x3ffd9e90 0x400e8e53:0x3ffd9eb0 0x400885b5:0x3ffd9ee0 0x40084306:0x3ffd9f00 0x40084889:0x3ffd9f20 0x4000bec7:0x3ffd9f40 0x40105ff1:0x3ffd9f60 0x401060b3:0x3ffd9f80 0x400de7cd:0x3ffd9fa0 0x40102675:0x3ffd9fc0 Rebooting... |
please decode any backtrace that you post ;) I am still hunting causes for bad heap... I am getting to the point where they make no sense... I am checking everything for accessing already freed resources (that is usually what causes the above) but i am starting to fail. There are a few fixes to both AsyncTCP and the WebServer, make sure you have them all :) |
Dont know if its helpful for you that we post more and more exceptions, but here you go:
Decoded
|
Is it possible that the heap is already silently corrupted earlier and the backtrace doesn't show the root cause? |
I went back and fourth and covered everything that I can come up with. Given that exceptions happen even when running normal web server code (no web sockets or service events) and given that the server runs in a single thread... I am totally out of ideas. I double checked, triple checked... nothing if freed twice or accessed after free, so why the corruption...? Poked @igrr for ideas (no response yet)... no idea where to go next... I have traced JTAGed and done everything i can come up so far to try and figure out what chain of events might cause this... nothing. BTW when I use the server normally (no crazy reloads or else) everything is fine. I can see how there can be some issues with the web sockets and I will patch those down the line, but the web server itself... i see nothing |
BTW I could not crash the server for pages that run from flash (progmem and not spiffs) |
Thanks, that's an interesting idea! I will check and report back. Maybe it's SPIFFS who corrupts the heap, and web server is just poor victim. |
I communicate with the server rendered as chrome app. |
@bbasil2012 this with the latest tcp and server libs? |
@me-no-dev Yes |
I've found the following paragraph at https://arduino-esp8266.readthedocs.io/en/latest/faq/a02-my-esp-crashes.html
Wait, so the memory corruption can happen by simply staying too long in the asynchronous callback?? Or this applies to ESP8266 only, and not to ESP32? |
Folks, (I am totally new to github - bear with me, willing to learn) I am getting a similar (same?) issue with an almost unmodified example code.
Some of my exceptions:
(this one is rare, the other are the ususal ones)
|
One additional note: sometimes I get a different behavior.
|
Also got many errors while serving page with many resources from SPIFFS and rapidly refreshing ajax:
Most of errors gone when uncommented: #define DEBUGF(...) Serial.printf(VA_ARGS) but still have problem uploading multipart data with HTTP_POST request (only get first three parts of data than fails)
problem was because of sending 200 for every chunk of data. Seems that works only if send 200 on end of data. |
Simple bin upload. "Update.write"- upload only three parts, then crash... Part of code server.on("/update", HTTP_POST, [](AsyncWebServerRequest *request){ Update.write(data, len) DEBUG LOG rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) Backtrace: 0x4008f9c4:0x3ffc9b80 0x4008fb99:0x3ffc9ba0 0x400e05a9:0x3ffc9bc0 0x400819fb:0x3ffc9be0 0x400d8605:0x3ffc9c00 0x400d817c:0x3ffc9c20 0x400d8361:0x3ffc9c40 0x400d1d25:0x3ffc9c60 0x4014d0c3:0x3ffc9c90 0x4014cfea:0x3ffc9ce0 0x400d481a:0x3ffc9d10 0x400d5bae:0x3ffc9d70 0x400d5d79:0x3ffc9dc0 0x400d3739:0x3ffc9de0 0x400d3782:0x3ffc9e20 0x400d3a22:0x3ffc9e40 0x4008ba0d:0x3ffc9e70 Exception Decoder NOTE: |
But where did you put the SPIFFS.format() ? |
Here, |
@me21 did you find a resolution for this error? I am having a similar failure that I am trying to track down. From my main page if I click a link that directs to a new page and I return back and try it again several times I crash and reboot with: `CORRUPT HEAP: Bad head at 0x3ffb8768. Expected 0xabba1234 got 0x3ffb8940 Backtrace: 0x4008d540:0x3ffbb360 0x4008d771:0x3ffbb380 0x400fd93f:0x3ffbb3a0 0x4008d19d:0x3ffbb3d0 0x400858f2:0x3ffbb3f0 0x40086f1d:0x3ffbb410 0x4000bec7:0x3ffbb430 0x40136503:0x3ffbb450 0x40136563:0x3ffbb470 0x401365ab:0x3ffbb490 0x40138cca:0x3ffbb4b0 0x40138d32:0x3ffbb4d0 0x400edcd9:0x3ffbb4f0 0x40134edc:0x3ffbb510 0x40089491:0x3ffbb540 Decoding stack results |
It appears my issue may be solved. I'll try my best to explain what was happening. My program had a main page that ran a function using xmlhttprequest to dynamically load data to it on a 1 second loop. This calls another page that gets the info in the background and posts it to the page you are viewing. This main page had links to redirect to other pages. If I would switch to one of these other pages and back several times eventually I would cause the error. What I think was happening was I was hitting the link to redirect at the same time that the function was calling for new data and it was switching to the new page but also I was getting the response from background page and that fouled things up. What seems to have fixed it is creating a flag and if its not set then my function sends the response to the main page with the new data. If any of my redirect pages load up they immediately set the flag so that the response from the background page will not be run. |
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
I find a similar error. I am working on characterising it better. If I repeatedly load a page with Edge I get a heap corruption - if I do the same thing with Chrome it is robust and does not generate the error. |
[STALE_CLR] This issue has been removed from the stale queue. Please ensure activity to keep it openin the future. |
up |
My issue seems to be a conflict with async requests to an NTP server. Hard to reproduce in a useful way. |
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
stale bot is not useful ... it degrades development ... please eliminate this feature from this repository ... |
[STALE_CLR] This issue has been removed from the stale queue. Please ensure activity to keep it openin the future. |
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions. |
This should be reopened |
indeed |
same problem : ELF file SHA256: 0000000000000000 Backtrace: 0x4000c29b:0x3ffd7430 0x400dab22:0x3ffd7440 0x400dad25:0x3ffd74d0 0x4018a6be:0x3ffd7520 0x400d6f11:0x3ffd7540 0x4017590b:0x3ffd7560 0x40175a74:0x3ffd7590 0x401760c5:0x3ffd75b0 0x4008a30e:0x3ffd75e0 Rebooting... rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) PC: 0x4000c29b Decoding stack results |
Hi there, Same here. Not found final solution but... Chrome update (v54) and Firefox removes the ability to make synchronous XMLHttpRequest calls. |
@kalifsoup You can still do synchronous http calls but you need to approach it differently...
This approach will allow at most one request to be sent at a time, a similar approach can be used to background load various content on demand. |
I know I'm pulling this thread up from the grave, but I've been trying to resolve similar issues of my own, and have come up with a novel solutions. Ultimately, I was unable to resolve seemingly-random heap corruption issues inside of the async callback... so I gave up trying. Instead, I changed my approach entirely, and now only use the async callback to place the request data from the client into a custom queue, which is processed by my own thread. This gave me extremely reliable performance, and more flexibility with how I handle async requests from the client via a Web Socket. Anyway, this does lead me to suspect the likely cause of the issue: the async task's heap size is insufficient! I believe that the heap allocated to the async task is simply too low for what a lot of us are doing inside that callback (especially when we're parsing string data, or serialising to and from JSON). Reason I suspect this? Well, if I set too low a heap size on my own threads, I get the same behaviour. Increase the heap size, all is good. You can actually test this in your own callbacks by calling: Now, I'll be releasing a library shortly that sits on top of this one, and provides the queuing and queue handling behaviour. I'll come back and share the link once it's ready for human consumption (I have to first fully-separate my own solution from my program and prepare it as a separate library). Hope it helps :) |
@LK-Simon Sounds great! Thanks for taking your time to investigate and releasing a solution! ❤️ |
I don't want to be a jackass but there is already a stable library with almost the exact same syntax as this one but without all the Async TCP. It is written on top of the esp idf and it is stable AF. Its called PsychicHTTP. Iam using this one so far and no crash at all. It has everything that AsyncTCP has and more. |
Well, it is, but for those of us, which are using ESP32 and also ESP8266 and want to write firmware for both SOCs is PsychicHttp with no support for ESP8266 unfortunately useless so far IMHO. |
This is true. It does not support 8266 for now. If more ppl would contribute it could get there |
I just want to reiterate a point here: We should NOT be processing requests INSIDE the callback! The reasons:
And there's a third consideration specifically with ESPAsyncWebServer's Web Sockets...
We're supposed to use the Callback to ensure that a message is complete, then pass the message (unprocessed) off to our own processing queue to be handled by either the idle task (Thread) or our own (separate and appropriately-allocated) task (Thread). Side note: I'm saying "task (Thread)" because, in this context, they are used analogously. I'm packaging up my complete solution for this into a library others can consume (light-weight and dev-friendly)... but the processing part can be applied to literally anything! It's basically an "Event Engine" (same as I've written for Pascal/Delphi, Windows/Mac/Linux C/C++, ObjectiveC, Swift, Python, Lua, and even Java in a former life)... and the inbound messages from clients on the Socket get handed off to the Event Dispatcher, which then (on its own Thread) hands them off to the appropriate "Event Listeners" (which can either be a simple Callback on the Idle Thread, or separate Threads descending from my "Event Thread" type). Ultimately, this means we return from the Web Socket Callback Method very (very) quickly, and the Heap requirements don't exceed the max heap allocation for that task (Thread) because we don't copy or allocate any strings on the Heap (which is the most common cause of the problems you've all reported in this Issue) As for this PsychicHttp library, I'll take a look... I'm always happy to help flesh out compatibility to other hardware where I can :) |
I want to add one more observation: if the Client sends a String that meets the maximum packet size of the ESPAsyncWebServer library's Web Socket classes, even attempting to write that String out to the Serial Monitor will actually exceed the maximum allocated heap for the Callback's executing task (Thread). This is actually a bug in this library that needs to be resolved. The allocation available for the Heap should definitely not be lower than the defacto required allocation for the maximum permitted packet size being sent from the client! |
I've constructed an example which shows the crashes. Just connect to the ESP32 access point, open its default web page in browser, and hit F5 several times rapidly.
You should see a crash like this:
CORRUPT HEAP: Bad head at 0x3f801240. Expected 0xabba1234 got 0x3f801334<CR><LF> assertion "head != NULL" failed: file "/Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/heap/multi_heap_poisoning.c", line 205, function: multi_heap_free
.I have attached the project which should trigger the bug. I use ESP32-WROVER for this, if it matters.
espasyncwebserverissue.zip
The text was updated successfully, but these errors were encountered: