Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User exception (panic/abort/assert) after random number of successful requests #6811

Closed
6 tasks done
NiallRees opened this issue Nov 19, 2019 · 17 comments · Fixed by #6819
Closed
6 tasks done

User exception (panic/abort/assert) after random number of successful requests #6811

NiallRees opened this issue Nov 19, 2019 · 17 comments · Fixed by #6819
Assignees
Milestone

Comments

@NiallRees
Copy link

NiallRees commented Nov 19, 2019

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: [Feather HUZZAH ESP8266]
  • Core Version: [2.6.1]
  • Development Env: [Arduino IDE]
  • Operating System: [MacOS]

Settings in IDE

  • Module: [Feather HUZZAH ESP8266]
  • Flash Mode: [qio|dio|other]
  • Flash Size: [4MB]
  • lwip Variant: [v2 Lower Memory]
  • Reset Method: [ck|nodemcu]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [160MHz]
  • Upload Using: [SERIAL]
  • Upload Speed: [115200] (serial upload only)

Problem Description

I'm using the WiFiClientSecure client and sometimes during connection, other times after a request, User exception (panic/abort/assert) is raised and the board restarts. If it successfully makes a request, it crashes on the next attempt.

I've attached a decoded stack trace along with heap memory debug. Would really appreciate some help with this.

MCVE Sketch (WiFi creds left blank)

#include <ESP8266WiFi.h>
#include <WiFiClientSecure.h>
 
const char* ssid     = "";
const char* password = "";
 
const char* host = "script.google.com";
const int httpsPort = 443; 
String SCRIPT_ID = "AKfycbyt1zJXaOvHo2_cz7Mfp6ivhan6XcGtnQO_UQzGzq6ECh3G4Zgj";
 
void setup() {
  Serial.begin(115200);
  delay(100);
 
  Serial.println();
  Serial.println();
  Serial.print("Connecting to ");
  Serial.println(ssid);
  
  WiFi.begin(ssid, password);
  
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
 
  Serial.println("");
  Serial.println("WiFi connected");  
  Serial.println("IP address: ");
  Serial.println(WiFi.localIP());
}
 
void loop() {
  delay(5000);
 
  Serial.print("connecting to ");
  Serial.println(host);
  
  // Use WiFiClient class to create TCP connections
  WiFiClientSecure client;
  client.setInsecure();
  if (!client.connect(host, httpsPort)) {
    Serial.println("Connection failed");
    return;
  } else {
    Serial.println("Connection success!");
  }
  
  String url = String("/macros/s/" + SCRIPT_ID + "/exec?Datetime=1X=2&Y=3&Z=4");
  Serial.print("requesting URL: ");
  Serial.println(url);

  //get the sensor data in script format using get method
  client.print(String("GET ") + url + " HTTP/1.1\r\nHost: script.google.com\r\nUser-Agent: BuildFailureDetectorESP8266\r\nConnection: close\r\n\r\n");

  Serial.println("request sent");
}

Debug Messages

WiFi connected
IP address: 
192.168.1.190
connecting to script.google.com

User exception (panic/abort/assert)
Abort called

>>>stack>>>

ctx: bearssl
sp: 3fff178c end: 3fff180c offset: 0000
3fff178c:  3ffe992c 00000047 00000004 deadbeef  
3fff179c:  000000fe 00000000 00000000 00000000  
3fff17ac:  00000000 00000000 00000000 3fff17dc  
3fff17bc:  00000004 00000090 00000004 3ffe992c  
3fff17cc:  00003a98 00000000 3ffffeb0 40205c8e  
3fff17dc:  3ffe883e 00000000 4021d844 40205ca4  
3fff17ec:  3fff66c1 3fff2004 deadbeef 4020513d  
3fff17fc:  deadbeef 00000000 3ffffeb0 40203cbc  

ctx: cont
sp: 3ffffdd0 end: 3fffffc0 offset: 0000
3ffffdd0:  00003a98 00000000 00000008 402036d2  
3ffffde0:  00000073 00000005 9d2f9dce 00000000  
3ffffdf0:  00000001 00002834 3fff1814 4021d21e  
3ffffe00:  3fff1d14 3ffe87b0 3fff1814 00000001  
3ffffe10:  00000001 00000001 3ffffeb0 40203910  
3ffffe20:  3ffe87b0 00000000 3ffffeb0 40203af3  
3ffffe30:  40105249 3ffee960 3ffee960 0000000e  
3ffffe40:  00000000 00000000 00000000 40205d7f  
3ffffe50:  00000000 3ffef984 3ffffeb0 402026e0  
3ffffe60:  000001bb 00000d50 3fffff2c 3ffee904  
3ffffe70:  000001bb 3ffffeb0 3ffe87b0 3ffee904  
3ffffe80:  000001bb 3ffffeb0 3ffe87b0 40203c11  
3ffffe90:  40207928 aed33ad8 40207928 aed33ad8  
3ffffea0:  3fffdad0 3ffe84c0 3ffee87c 4020118d  
3ffffeb0:  402077c8 00000000 00003a98 40204eb9  
3ffffec0:  00000000 3ffef984 3fff1814 3ffefb3c  
3ffffed0:  00000000 00000000 3fff1814 00000000  
3ffffee0:  00000000 3fff6934 3fff6f04 00000000  
3ffffef0:  00000000 3fff256c 3ffefd14 3fff66bc  
3fffff00:  3fff691c 00000000 00000000 00000000  
3fffff10:  00004145 00000255 72000000 00000000  
3fffff20:  00000000 00000000 00000000 00000000  
3fffff30:  00000000 00000000 3ffe0001 3ffee904  
3fffff40:  40204be4 3ffee87c 3ffe87a0 40004bf0  
3fffff50:  00000000 3ffee87c 00000000 00000000  
3fffff60:  3ffe8600 00000000 00000000 40204f34  
3fffff70:  00000000 00ffffff 0101a8c0 40204fa8  
3fffff80:  3ffe84c8 3ffee844 3ffee87c 402010d4  
3fffff90:  40207928 be01a8c0 feefeffe feefeffe  
3fffffa0:  3fffdad0 00000000 3ffee8c4 402058dc  
3fffffb0:  feefeffe feefeffe 3ffe84e8 40100ea5  
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v482516e3

Decoded

0x40205c8e: operator+(StringSumHelper const&, String const&) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/WString.cpp line 408
0x4021d844: br_ssl_engine_recvrec_ack at src/ssl/ssl_engine.c line 290
0x40205ca4: operator+(StringSumHelper const&, String const&) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/WString.cpp line 410
0x4020513d: fs::FS::open(char const*, char const*) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/2.5.0-4-b40a506/xtensa-lx106-elf/include/c++/4.8.2/bits/shared_ptr_base.h line 781
0x40203cbc: BearSSL::WiFiClientSecure::available() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp line 386
0x402036d2: std::__shared_ptr ::operator=(std::__shared_ptr &&) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/2.5.0-4-b40a506/xtensa-lx106-elf/include/c++/4.8.2/bits/shared_ptr_base.h line 858
0x4021d21e: cc_do_sign at src/ssl/ssl_ccert_single_rsa.c line 131
0x40203910: BearSSL::WiFiClientSecure::~WiFiClientSecure() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp line 119
0x40203af3: BearSSL::WiFiClientSecure::_run_until(unsigned int, bool) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp line 533
0x40205d7f: String::equals(String const&) const at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/WString.cpp line 501
0x402026e0: ClientContext::_write_some() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/include/ClientContext.h line 504
0x40203c11: BearSSL::WiFiClientSecure::_write(unsigned char const*, unsigned int, bool) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClientSecureBearSSL.cpp line 288
0x40207928: WiFiClient::peek() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClient.cpp line 284
0x40207928: WiFiClient::peek() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClient.cpp line 284
0x4020118d: connectToWifi() at /Users/niallwoodward/silverton-dev/smart-crunch/smart-crutch-huzzah/smart-crutch-huzzah.ino line 45
0x402077c8: _GLOBAL__sub_D__Z15settimeofday_cbOSt8functionIFvvEE() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/sntp-lwip2.cpp line 494
0x40204eb9: TwoWire::write(unsigned char) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/Wire/Wire.cpp line 191
0x40204be4: BearSSL::CertStore::freeHashedTA(void*, br_x509_trust_anchor const*) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/CertStoreBearSSL.cpp line 219
0x40204f34: _GLOBAL__sub_I__ZN7TwoWire8rxBufferE() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/Wire/Wire.cpp line 333
0x40204fa8: fs::File::peek() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/FS.cpp line 73
0x402010d4: connectToWifi() at /Users/niallwoodward/silverton-dev/smart-crunch/smart-crutch-huzzah/smart-crutch-huzzah.ino line 38
0x40207928: WiFiClient::peek() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/libraries/ESP8266WiFi/src/WiFiClient.cpp line 284
0x402058dc: String::init() at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/WString.cpp line 131
0x40100ea5: umm_realloc(void*, size_t) at /Users/niallwoodward/Library/Arduino15/packages/esp8266/hardware/esp8266/2.6.1/cores/esp8266/umm_malloc/umm_malloc.cpp line 804

@NiallRees
Copy link
Author

NiallRees commented Nov 19, 2019

I've also just found that using host as script.google.com in https://github.com/esp8266/Arduino/blob/919c7535631d60e085f87d2b4c72d05a88651eed/libraries/ESP8266WiFi/examples/BearSSL_Validation/BearSSL_Validation.ino causes the exact same error.

@earlephilhower earlephilhower self-assigned this Nov 19, 2019
@earlephilhower
Copy link
Collaborator

@NiallRees can you try something simple on your end? I'm traveling for a few days and may not be able to run anything on real HW for a while.

Can you make:

#define _stackSize (5748/4)

into

#define _stackSize (6748/4) 

and re-try the connections with your app or the Validation example? That would add an additional 1K to the available stack for BearSSL and narrow it down to a stack overflow issue vs. something else.

@NiallRees
Copy link
Author

@earlephilhower thank you so much! It's working absolutely fine now. I've seen you've increased the stack size in previous releases, is the problem that certificates are getting larger and larger?

@earlephilhower
Copy link
Collaborator

Could you add the following lines to your sketch to figure out exactly how much bigger the stack usage was?

First, add

#include <StackThunk.h>

at the top of your sketch, then after connect:

Serial.printf("BSSL stack: %d\n", stack_thunk_get_max_usage());

It's not really a certificate size thing (they're processed byte-by-byte as they come in), but it could be a new encryption mode which is using more stack than others we've seen.

Thx!

@NiallRees
Copy link
Author

BSSL stack: 5636

@earlephilhower
Copy link
Collaborator

Thanks, @NiallRees . Unfortunately, that's actually saying that the required stack is smaller than the default value (5636 vs. 5748) which gives more questions than answers.

Is that # stable through multiple connections? Do you see some fluctuations between runs/connections?

My thinking is that there might be some ISR called which requires more than (5748-5636 = 112 bytes) of space, but it's not always called so you have the case where the unmodified code works most times, but when that ISR happens the SYS needs >112 bytes of stack...

The alternative is that it's not the stack, but memory corruption somewhere (possible, but we do run valgrind on the host to check for stuff like that). By increasing the stack you don't avoid the corruption, but the corruption ends up in some part of the stack that's not used anyway so it's invisible to the app.

@TD-er
Copy link
Contributor

TD-er commented Nov 20, 2019

What part is using so much stack?
Is it in the client object?
What if you declare the client object on the heap?

@devyte
Copy link
Collaborator

devyte commented Nov 20, 2019

@TD-er it's the BSSL stack, different from the SYS or CONT. There's a stack thunk mechanism involved to dynamically allocate and destroy it.

@TD-er
Copy link
Contributor

TD-er commented Nov 20, 2019

Ah, OK, didn't know that.

@earlephilhower earlephilhower added this to the 2.6.2 milestone Nov 21, 2019
@earlephilhower
Copy link
Collaborator

@NiallRees thanks for the detailed info. I was able to reproduce it using the BSSL_Validation.ino like you said. Sometimes it worked fine w/o any changes, but other times it would crash.

The issue is sometimes the max BSSL stack usage is 5636, which is fine, but other times I get that it's used 5828 which is past the stack end.

I'll bump up the stack allocation and tweak the validation exmable to dump the BSSL stack as well.

earlephilhower added a commit to earlephilhower/Arduino that referenced this issue Nov 21, 2019
Fixes esp8266#6811 which found an issue where connecting to scripts.google.com
would *occasionally* cause a crash.  On inspection, it was found that up
to 5828 bytes of stack were used once in a while, so expand the stack to
5900 bytes to cover this case plus a little extra.
earlephilhower added a commit that referenced this issue Nov 21, 2019
Fixes #6811 which found an issue where connecting to scripts.google.com
would *occasionally* cause a crash.  On inspection, it was found that up
to 5828 bytes of stack were used once in a while, so expand the stack to
5900 bytes to cover this case plus a little extra.
@jeffbuot
Copy link

jeffbuot commented Dec 23, 2019

Hi @earlephilhower , I'm a total newbie and I just started learning Arduino using nodemcu esp8266. Do you have details on how to fix this?

@NiallRees can you try something simple on your end? I'm traveling for a few days and may not be able to run anything on real HW for a while.

Can you make:

#define _stackSize (5748/4)

into

#define _stackSize (6748/4) 

and re-try the connections with your app or the Validation example? That would add an additional 1K to the available stack for BearSSL and narrow it down to a stack overflow issue vs. something else.

@Teddyz
Copy link

Teddyz commented Jan 1, 2020

FYI

I have (had) the same problem. I am connected to my local MQTT-broker, but close this connection for a short connection to Pushover (to send messages to my phone). This led to crash after a number of connects (in the range 2 to 20-something).

So far:
01 BSSL stack: 0 -> 4616
02 BSSL stack: 4616 ->5736
03 BSSL stack: 5736 -> 6088

And no crashes yet.

//#define _stackSize (5900/4)
#define _stackSize (6900/4)

@earlephilhower
Copy link
Collaborator

@Teddyz, you're saying you saw stack_thunk_get_max_usage return 6088? Like I said, the # in the code is empirically determined, so it's possible to need more than the 5900.

I think 6100 would be a better number. Could you give it a try in your app and report back? If that's good, we can update the main code to give it another 200bytes and hopefully get this all over with.

@Teddyz
Copy link

Teddyz commented Jan 1, 2020

I had this BSSL-check in my program running today:

#include <StackThunk.h>
uint16_t max_stack_usage = 0;
...
// In loop
  if (now() - prev1s >= 1UL) {
    //checkDelay("", 1);
    prev1s = now();
    relay (4); // update relay
    yield();
    //checkDelay("do every 1 seconds", 2);
    uint16_t stack_usage = stack_thunk_get_max_usage();
    static uint16_t usage_increases = 0;
    if ( stack_usage > max_stack_usage ) {
      Serial.printf("%02d BSSL stack: %d -> %d \n", ++usage_increases, max_stack_usage, stack_usage);
      max_stack_usage = stack_usage;
    }
    return;
  }

01 BSSL stack: 0 -> 4484
02 BSSL stack: 4484 -> 5796
03 BSSL stack: 5796 -> 5928
04 BSSL stack: 5928 -> 6056
05 BSSL stack: 6056 -> 6136

Soft WDT Crash after > 100 connections to api.pushover.net/1/messages.json. Plus the same number of connections to my local MQTT-broker.

01 BSSL stack: 0 -> 4516
02 BSSL stack: 4516 -> 5736
03 BSSL stack: 5736 -> 5796
04 BSSL stack: 5796 -> 5828
05 BSSL stack: 5828 -> 6104
06 BSSL stack: 6104 -> 6120 a

So far no crash.

The Soft WDT reset had the explanation below. I do not understand where it went wrong, but I also do not think it is related to this problem.

0x40100984: millis at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_wiring.cpp line 188
0x40100969: millis at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_wiring.cpp line 185
0x4020a4ea: lwmqtt_arduino_timer_get(void*) at D:\Mina Dokument\Arduino\libraries\MQTT\src/MQTTClient.cpp line 25
0x4010088c: ets_post at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_main.cpp line 160
0x4020abe8: now() at D:\Mina Dokument\Arduino\libraries\Time/Time.cpp line 252 (discriminator 1)
0x40205afc: loop at D:\Mina Dokument\Arduino\MQTT\mqtt_pocsag_v001/mqtt_pocsag_v001.ino line 252
0x4010088c: ets_post at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_main.cpp line 160
0x4020e1d8: loop_wrapper() at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/core_esp8266_main.cpp line 180
0x40101675: cont_wrapper at C:\Users\Dick\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\cores\esp8266/cont.S line 81

@Thanasist2
Copy link

i had problem with resets on boot after "Refreshing JWT" but since the mod everything works fine for 2 days now.
BSSL stack: 6132

@stasysdr
Copy link

Changing BSSL stack to 6648 was not enough for me, BSSL was timing out.
I tried to use netClient.setBufferSizes(1024, 1024) as explained here https://arduino-esp8266.readthedocs.io/en/latest/esp8266wifi/bearssl-client-secure-class.html#setbuffersizes-int-recv-int-xmit
and it works now without timeout.

@JamesDelfini
Copy link

Changing BSSL stack to 6648 was not enough for me, BSSL was timing out. I tried to use netClient.setBufferSizes(1024, 1024) as explained here https://arduino-esp8266.readthedocs.io/en/latest/esp8266wifi/bearssl-client-secure-class.html#setbuffersizes-int-recv-int-xmit and it works now without timeout.

I have similar issue on this thread causes exception (panic/abort/assert) but with different cause. Thanks to @stasysdr I also set the setBufferSizes to 1024, the default is 512 at WiFiClientBearSSL.cpp.

The error was due to the conflict between ESPAsyncWebServer, which receives web requests from the client and ESP8266 as the web server. During the client.connect if it receives a request then the exception will occur, maybe the connection needed more buffer than the default due to the "hanging process that got stuck" based on my testing.

I did not change the default stack size defined as #define _stackSize (6200/4) on StackThunk.cpp. Since the I was receiving BSSL stack size around 4300 to 4400.

No more crashes have occurred after I adjusted the buffer size. Before, sometimes it would go through, and sometimes it wouldn't when client.connect was hanging which is not good for ESP to get stuck on a process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants