Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESp8266 reboot when receiving data #108

Closed
bill-orange opened this issue Jul 24, 2019 · 26 comments
Closed

ESp8266 reboot when receiving data #108

bill-orange opened this issue Jul 24, 2019 · 26 comments

Comments

@bill-orange
Copy link

bill-orange commented Jul 24, 2019

I have a difficult to diagnose issue with ESP8266 running ESPSoftwareSerial (latest version). Randomly, in intervals of between 10 minutes and 6 hours the ESp8266 reboots while receiving data at 9600 baud. At much faster baud rates the problem seems to disappear.

Oddly, there is no debug information to decode. I see a couple of random characters such at "~fi" and then the reboot. This suggest a WDT issue.

I know that I am not providing much information. This is all I have. I suspect some sort of race condition.

Thoughts? Troubleshooting suggestions?

EDIT

With Debug enabled during compilation. I get this prior to the reboot.

dev 1153

@dok-net
Copy link
Collaborator

dok-net commented Jul 29, 2019

@bill-orange Are you using ESP8266 Arduino latest master from GitHub, have you tried EspSoftwareSerial 5.2.8 or another version, which one exactly?
If I understand you correctly,

random characters such at "~fi"

or

dev 1153

are on the USB serial debug output? Neither is from EspSoftwareSerial.
I you could provide a MCVE sketch to showcase the issue, it would be helpful, otherwise, there's no known issue that correlates to your bug. I am sorry in that case.

@bill-orange
Copy link
Author

bill-orange commented Jul 29, 2019

I just double checked, yes, I am using the ver. 5.2.8. I do have USB serial turned on and it is reporting the dev 1153 error. That error is apparently associated with some sort of WDT timeout.

Other folks have reported similar errors but not recently and not often.
example:
https://www.bountysource.com/issues/29875478-softwareserial-wdt-reset

I have continued to "play" with this code and have discovered a few thing. If I turn off just about all other code ( write to OLED via I2C and WiFi) the problem goes away. The problem also goes away if I up the baud rate from 9600 (using a different GPS module). Oddly, increasing processor speed does not increase or decrease the crash frequency. Another odd thing, as I looked around for other examples of this problem, they all were at 9600 baud. Coincidence?

I am thinking that this is a race condition.

I will try to come up with a minimal example but this is going to be tough. GPS tend to send data in bursts. The rate and pattern of the data at 9600 baud could be an issue. Even with an MCVE sketch, it would likely be necessary to reproduce most of my hardware environment. Never the less, I will work on it. Maybe I can write some sort of loop back that will generate a similar software serial load.

What would perhaps be helpful, is some insight on how to troubleshoot this within my hardware environment. What debug print statements, yield () statements or other devices could I add to my code or to EspSoftwareSerial 5.2.8 that might shed some light on this?

Bill

@dok-net
Copy link
Collaborator

dok-net commented Aug 27, 2019

@bill-orange Sorry about the delay - if you've not fixed this in your code yet, here's a question:
Could you please check, if the suspected WDT occurs during write()s? Just add some debug message before and after entering EspSoftwareSerial's write functions. Thanks.

@bill-orange
Copy link
Author

@dok-net

The WDT occurs during read() with the exact position being somewhat random. If other tasks are not used in the sketch, like writing to the OLED display and WiFi then the WDTs either disappear or are much less frequent. Slow baud rates like 9600 or 4800 baud appear to be the problem. Looking at the posts of others experiencing this problem, the 'burst' nature of GPS transmissions may also play a factor. So, the triggers are:

SoftwareSerial read()
low baud rate
data comes in bursts
other tasks running like OLED and WiFi in the sketch

Scattering yields() throughout write() in the library has no effect.

What I finally did to work around this problem is to write the system state to EEPROM (TNC on/off, display on/off, etc.) and retrieve the system state on a reboot from WDT. A button press during a manual reboot erases the current system state. This way the system becomes fault tolerant. This has the added advantage of allowing a graceful recovery from replacing a dead battery in the field.

Regardless of the success of the fix above, I would like to get to the bottom of this. I have another project that reports temperature, humidity, etVOC and eCO2 to Amazon's Alexa. I would like to add air particle counts. The sensor is now down to the affordable $25 range. Unfortunately, the sensor reports the data via serial rather than I2C. My solution of saving the system state would not work very well here since since the etVOC sensor takes some time to stabilize after a reboot.

@dok-net
Copy link
Collaborator

dok-net commented Aug 28, 2019

@bill-orange You got me wondering, what do you mean by "tasks" in this context? Are you using my CoopTasks, or some other library that modifies yield() and/or delay()?
In recent versions of EspSoftwareSerial, I can't imagine how read() is affected by bitrate.
"Scattering yields()" into the write() code would be a very awkward thing to do in the latest releases, are you sure you're looking at the same sources that I am :-) ?
I am myself using the SDS011 particulate matter sensor, at 9600bps on software serial. Btw, are you aware that the ESP8266 can Serial.swap() from USB to GPIOs D7/D8 (IOs 13/15), and you can serve the USB serial via SoftwareSerial then for diagnostics?
Now, about the read() watchdog timeout, have you ever placed yields() just ahead of and after calls to read()? I still can't image read() in current EspSoftwareSerial taking any serious amout of time, but if it's just pushing over the limit because of prior long-running activity without yield(), who knows?

@bill-orange
Copy link
Author

@dok-net

Sorry for the confusion. By other tasks, I mean WiFi and OLED use. I am using a Smart Delay function, however.

// ----------------------------------------------- Smart Delay ----------------------------------------------

// This custom version of delay() ensures that the gps object
// is being "fed".

static void smartDelay(unsigned long ms) {
 char inbyte;
 unsigned long start = millis();
 //make sure in loop you call handleFTP()!!
 //ftpSrv.handleFTP();

 while (ss.available())
 {
   inbyte = ss.read();
   currentSentence += inbyte;
   if (gps.encode(inbyte))
   {
     lastSentence = currentSentence;
     if (lastSentence.indexOf("GPGGA") >= 0 && lastSentence.length() <= 86 && debugPrint) {
       savedSentence = currentSentence;
     }
     if (lastSentence.indexOf("GPGGA") >= 0 && lastSentence.length() <= 86 && gps.location.isValid() && gps.satellites.value() > 1 && TNCwrite && NEMAPrintFlag) {
       lastSentence = lastSentence;
       Serial.print (lastSentence);
       NEMAPrintFlag = true;
     }
     currentSentence = "";
     return;
   }
   if (millis() - start >= ms) {
     return;
   }
 }
}

I have not heard of serial.swap before. I will have to do some research.

Yea, that's the particle sensor that I have been eyeing.

I have tried yields on either side if the read() call. I am pretty sure that I am working with the current library.

@bill-orange
Copy link
Author

This is getting off topic but I have a question regarding the SDS011. The eBay ads show an operating voltage of 5v. Are your running yours on 5 volts and did you have to do level conversion on the serial output?

@dok-net
Copy link
Collaborator

dok-net commented Aug 30, 2019

Per the spec sheet, 5V supply, Rx and Tx at 3.3V :-) I'm running power off the USB serial adapter on the ESP8266 breakout board, Rx/Tx directly connected to the GIOPs. Works as well with ESP32.
https://github.com/dok-net/esp_sds011/releases

@bill-orange
Copy link
Author

bill-orange commented Sep 17, 2019

Since I recompiled with version 5.3.0 I have not experienced a WDT crash. Its a bit early to declare victory since the crashes are random. At this point, its safe to say that they are considerably less frequent.

@nvicchi
Copy link

nvicchi commented Sep 23, 2019

Could you give a hint on how to compile latest version? With the version included in esp8266 2.5.2 it happens the same to me (with burst data entry I get a wdt reset).
But when I download an try to compile 5.3.0, I get the following error:

C:\Users\xxx\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h: At global scope:
C:\Users\xxx\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:145:8: error: expected ';' at end of member declaration
T& IRAM_ATTR pushpeek()
^
C:\Users\xxx\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:145:27: error: ISO C++ forbids declaration of 'pushpeek' with no type [-fpermissive]
T& IRAM_ATTR pushpeek()
C:\Users\xxx\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:157:10: error: expected ';' at end of member declaration
bool IRAM_ATTR push();
^

The 5.3.0 version installed via the Library Manager, but had to delete the versión included in 2.5.2 to get Arduino to use it.

Thanks!

@bill-orange
Copy link
Author

@nvicchi Wow. I had no issues getting it to compile at all. There must be something different in your environment. Are you using the latest Arduino IDE? @dok-net Can you weigh in on this one?

@nvicchi
Copy link

nvicchi commented Sep 23, 2019

Arduino 1.8.10
NodeMCU 1.0 (ESP-12E Module).
esp8266 by ESP8266 Community (2.5.2)
Had to manually replace original library with 5.3.0

I just read that dok-net told you to update to masters ESP8266 Community in another issue. I will try that.

EDIT: It seems master is 2.5.2. I am lost :(

@bill-orange
Copy link
Author

bill-orange commented Sep 23, 2019

I am running Arduino 1.8.9. I would expect 1.8.10 to be fine. I am using the ESP 8266 core off the board manager. It appears to be 2.5.2. My library updated to 5.3.0 off the Library Manager. I have no clue what the problem might be. You could post to the ESP8266 Github site since this is likely to be an issue outside of espsoftwareserial anyway.

You have to be really careful in your posts there as far as completeness and format. I get my hand slapped frequently.

@nvicchi
Copy link

nvicchi commented Sep 23, 2019

Bill, thanks a lot. I was actually using 1.8.9 and updated in order to see if that was the issue.
I updated to 5.3.0 from the Library Manager originally. but it was complaining about having 2 candidates and that it would use the ESP8266 version, ignoring 5.3.0.

I will keep looking into this. If I have to, I will make a clean install in a virtual machine.

By the way, since your last post on the original issue was a week ago, could you confirm it is still working? Did you get any wdt resets?

@bill-orange
Copy link
Author

THE GPS is still running fine. The trouble with an intermittent is that you never know when it is really fixed!

I am now also running an SDS011 sensor (like @dok-net ) on a Huzzah Feather ESP8266 without any problems. Provisionally , things are looking good.

Good idea to test in the VM.

@nvicchi
Copy link

nvicchi commented Sep 23, 2019

Statistically, its a huge improvement. In my case, the error is 100% repeatable. I know exactly how to trigger it. So when and if I can compile the 5.3.0 version, I will let you know if its fixed or not.

@kross222
Copy link

Arduino 1.8.10
ESP-12E Module
esp8266 by ESP8266 Community (2.5.2)
original library 5.3.0

not compiled:

In file included from C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/SoftwareSerial.h:27:0,

       from C:\Users\non\Documents\Arduino\sketch_sep24a\sketch_sep24a.ino:5:

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:145:8: error: expected ';' at end of member declaration

 T& IRAM_ATTR pushpeek()

    ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:145:27: error: ISO C++ forbids declaration of 'pushpeek' with no type [-fpermissive]

 T& IRAM_ATTR pushpeek()

                       ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:157:10: error: expected ';' at end of member declaration

 bool IRAM_ATTR push();

      ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:157:10: error: redeclaration of 'bool circular_queue::IRAM_ATTR'

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:145:8: note: previous declaration 'T& circular_queue::IRAM_ATTR'

 T& IRAM_ATTR pushpeek()

    ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:157:25: error: ISO C++ forbids declaration of 'push' with no type [-fpermissive]

 bool IRAM_ATTR push();

                     ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:164:10: error: expected ';' at end of member declaration

 bool IRAM_ATTR push(T&& val);

      ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:164:10: error: redeclaration of 'bool circular_queue::IRAM_ATTR'

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:145:8: note: previous declaration 'T& circular_queue::IRAM_ATTR'

 T& IRAM_ATTR pushpeek()

    ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:164:32: error: ISO C++ forbids declaration of 'push' with no type [-fpermissive]

 bool IRAM_ATTR push(T&& val);

                            ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:171:10: error: expected ';' at end of member declaration

 bool IRAM_ATTR push(const T& val)

      ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:171:10: error: redeclaration of 'bool circular_queue::IRAM_ATTR'

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:145:8: note: previous declaration 'T& circular_queue::IRAM_ATTR'

 T& IRAM_ATTR pushpeek()

    ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:171:37: error: ISO C++ forbids declaration of 'push' with no type [-fpermissive]

 bool IRAM_ATTR push(const T& val)

                                 ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:253:16: error: expected initializer before 'circular_queue'

bool IRAM_ATTR circular_queue::push()

            ^

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src/circular_queue/circular_queue.h:268:16: error: expected initializer before 'circular_queue'

bool IRAM_ATTR circular_queue::push(T&& val)

            ^

Several libraries found for "SoftwareSerial.h"
Use: C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial
exit status 1
Compilation error for the Generic ESP8266 Module board.

@nvicchi
Copy link

nvicchi commented Sep 24, 2019

Kross, if you declare:
#define IRAM_ATTR
ouside the #ifndef" at the beginning of the header, you will be able to go further. This is ok since, from what I could found, by default it will be executing from ram, without the need to declare IRAM_ATTR.
Sadly, when you declare it, then the compiler will complain about "attachInterruptArg" not existing.

I could not look into this any further yet, but it seems that compatibility was dropped from version 5.0.4 to 5.2.

If you can modify 5.3.0 so that it compiles for ESP8266, please let me know.

There is only one thing I cannot understand, and that is why it works for bill-orange. Which board exactly are you choosing? It seems its only compatible with ESP32, but not with ESP8266.

@kross222
Copy link

kross222 commented Sep 24, 2019

i choose "generic esp8266" in Arduino IDE

@nvicchi
Copy link

nvicchi commented Sep 24, 2019

That last phrase was for bill-orange :)
Did you try modifying
#if !defined(ESP32) && !defined(ESP8266) #define ICACHE_RAM_ATTR #define IRAM_ATTR #endif

to

#if !defined(ESP32) && !defined(ESP8266) #define ICACHE_RAM_ATTR #define IRAM_ATTR #endif #define IRAM_ATTR

@bill-orange
Copy link
Author

@nvicchi @kross222 It's a mystery to me why I can compile. I have to assume that some other fix for some other reason fixed this too.

That being said, take a look at closed issues 112 and 107 in this git.

@kross222
Copy link

modifying it. So...
C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src\SoftwareSerial.cpp: In member function 'void SoftwareSerial::enableRx(bool)':

C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial\src\SoftwareSerial.cpp:140:121: error: 'attachInterruptArg' was not declared in this scope

         attachInterruptArg(digitalPinToInterrupt(m_rxPin), reinterpret_cast<void (*)(void*)>(rxBitISR), this, CHANGE);

Несколько библиотек найдено для "SoftwareSerial.h"
Используется: C:\Users\non\Documents\Arduino\libraries\EspSoftwareSerial
exit status 1
Ошибка компиляции для платы Generic ESP8266 Module.

@nvicchi
Copy link

nvicchi commented Sep 24, 2019

Success! Just tried it in a fresh Arduino install in Mac.
Install Arduino to Applications folder, updated esp8266 from github (just followed the readme instructions), then updated espsoftwareserial as in 107 issue.
Only tried to compile the test example, but worked!

Thanks bill!

Kross, follow those instructions and you should be fine!

@bill-orange
Copy link
Author

@nvicchi Excellent news! Now we can see if the new version resolves your wdt problem. Keep us posted.

@nvicchi
Copy link

nvicchi commented Sep 25, 2019

In fact it did! It doesn't fail anymore.
My test case was simple. If a burst of more than a certain amount of bytes (not sure how many, about 150 I might guess) arrived, it would immediately fail. If the number of bytes is of importance I can check. Now it doesn't fail anymore.

I still can make it fail if, at the same time that the burst arrives, I access a web server running in parallel. I don't care actually a lot about this situation right now.

By the way, I could succesfully compile it in windows too. It was harder (the doc states "the arduino directory" which pretty straight forward for linux and Mac, but it is not for Windows. You have /user/AppData location, Program files location, Documents locations. Tried several times and eventually worked in the Documents/Arduino location.

@bill-orange
Copy link
Author

Excellent! I will close this. @kross222 if you are still having problems it would be best to open a new issue. The title of this issue really does not reflect your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants