Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug SPI arbitration between FS and MCU? #1576

Closed
nouser2013 opened this issue Feb 5, 2016 · 11 comments
Closed

Debug SPI arbitration between FS and MCU? #1576

nouser2013 opened this issue Feb 5, 2016 · 11 comments

Comments

@nouser2013
Copy link
Contributor

Greetings advanced dev's,

I was having trouble with a sketch that serves HTTP requests while at the same time reading from SPIFFS (~200 bytes every 10ms from one single open file) for a Ticker based background task.

I assume that there are three parties accessing SPI flash:

  • The http server reads its PROGMEM stored data independently from SPIFFS via different methods (webserver send_P(), strcpy_P()+send()).
  • Then, there is the SPIFFS file reading.
  • Also, the MCU itself accesses the memory to fetch new instructions (I presume, as the whole sketch probably does not fit into memory).

This scenario crashes the ESP randomly, as described here. Either the IP subsystem freezes (not sending any wifi frames anymore), while background tickers continue to run (LED blinks, buttons work, serial output fine) or the ESP crashes with a stack trace completely.

I tracked this down to the SPIFFS. When I stop reading from the file completely (and increasing reading period interval just prolongs lifetime, but does not prevent the crashes, perhaps b/c at some point by random chance there is a clash), the webserver will continue to run indefinitely without any problems.

Is there any way to debug this ("SPIFFS" does not seem to be in the Debug-Level select list)?

@WereCatf
Copy link
Contributor

WereCatf commented Feb 5, 2016

Are you reading from SPIFFS inside a ticker-function? If yes, then you shouldn't be doing that. Ticker-functions are supposed to be kept real short and you shouldn't be performing blocking actions in them and instead just set a flag there and perform the actual blocking action inside the loop().

@nouser2013
Copy link
Contributor Author

Ah that's a valid point, thanks. I tried moving the code to loop(), but it really doesn't make a difference. ESP stops sending IP packets after 2-5 minutes with SPIFFS access, so I'm still leaning in the direction of arbitration.

@WereCatf
Copy link
Contributor

WereCatf commented Feb 5, 2016

Well, I have no idea what's wrong. I serve files from SPIFFS all the time without an issue and I also allow for uploading of files to SPIFFS via the web-server.

@igrr
Copy link
Member

igrr commented Feb 5, 2016

If you are using spiffs with me-no-dev's async webserver, then crashes are
kind of expected. SPIFFS is not thread-safe, and it expects to be called
from Arduino task only. This isn't the case with async stack.
If you are using it with sync webserver, then please share the sketch so we
can reproduce.

On Fri, Feb 5, 2016, 13:18 WereCatf notifications@github.com wrote:

Well, I have no idea what's wrong. I serve files from SPIFFS all the time
without an issue and I also allow for uploading of files to SPIFFS via the
web-server.


Reply to this email directly or view it on GitHub
#1576 (comment).

@nouser2013
Copy link
Contributor Author

Uhm, I may not completely understand. The webserver never reads SPIFFS, it just AsyncClient::write() s some char * (globally declared!) which are either sprintf()ed or strcpy_Ped for flash / progmem access. I'm fairly certain to not have buffer overruns. But at no point in any callback of the webserver do I have access to SPIFFS.

I'm not using AsyncWebserver, but AsyncTCP with a tiny GET parser with small c string functions, just to eliminate causes.

I had the one and only SPIFFS read access (File declared globally) in a Ticker callback, but moved this to loop() as suggested by @WereCatf . The code behaves still the same. First IP stack freeze, then, 5 mins later complete ESP crash with ~30 lines of stack trace.

And when IP stack is frozen, SPIFFS access still works, I checked via interactive shell on UART.

If I do not use SPIFFS read in loop(), system has been running for 30 minutes with TCP request "bombardments" with no error and still delivering the sprintf'ed dynamic content. At times, 2-3 connections wait for web content in a connection queue, works without problems.

I can try to use SyncTCP again, performance should be the same (if LWIP.a has a too small tcp_snd_buf compiled in, I'll have the 200ms Windows ACK delay). Will update as soon as I have something.

@nouser2013
Copy link
Contributor Author

Alright, after a lot of testing, here goes. The project is for WS2812 LEDs, therefore I use adafruits function to write the LEDs. The data sent to LEDs is read from an SPIFFS file, which also stores a delay until new LED data is to be sent ("animation"). Sending of LED data will only work flickerfree for a single frame of an animation (60 LEDs), if I disable and re-enable interrupts before and after the output function.

I removed everything asynchronous from the sketch, only ESP8266 Webserver with its handleClient() is in loop(). I also updated the sketch to have the delay Ticker set a flag which is evaluated in main and loads new LED data and displays the those values. Under those conditions the sketch runs stable for hours, but: as soon as the webserver is accessed and delivers data, LED display get stuck, obviously for as long as the webserver needs.

volatile bool loop_loadNewFrame;
void setup() { ... };
void loop() {
  webserver.handleClient();
  if (loop_loadNewFrame) {
    loop_loadNewFrame = false;
    ws2812_displayCurrentFrame();
    ws2812_loadNextFrameFromSPIFFS();
    animationDisplayTicker.once(ws2812_currentFrameDelay, animationDisplayTimer);
  }
}
TickerCallback animationDisplayTimer() {
  loop_loadNewFrame = true;
}
ws2812_displayCurrentFrame() {
  noInterrupts(); ws8212_write(); interrupts();
}

If I put the ws2812 code from loop() inside the ticker and access SPIFFS from there, the sketch will lockup WiFi IP core eventually and shortly after WDT reset.

On the other hand, I need the webserver to run independently from the animation. Perhaps I'm doing something wrong? Loading the whole animation into RAM does not seem feasible...

@me-no-dev
Copy link
Collaborator

@igrr where is that optimistic_yield in SPIFFS that you think is making SPIFFS usage not thread safe?

@nouser2013
Copy link
Contributor Author

Hmm, I may have been wrong after all. Even when reading from loop() the sketch (above pseudo code structure) was leaking memory every second. I then removed the ws2812_displayCurrentFrame() completely while maintaining the other stuff. Leaking gone. Even @me-no-dev Async Classes work perfectly now.

I then put the ws2812_displayCurrentFrame() back in but without the two interrupts statements. Of course, LEDs will flicker now, but no leaking, and runs indefinitely stable with 37k heap, and one http request every 0.5s.

I've seen the source of the interrupt() / noInterrupt() macros, but why do they have this large influence on the whole device? When writing 60 ws2812 LEDs, interrupts are disabled for 60 * 3 * 8 * 1.25us + 50us = 1491,25us ~= 1,5ms. Is this bad? What else could I do to send the string to the LEDs flickerfree? I used to do the same thing on NodeMCU, worked without difficulties.

@me-no-dev
Copy link
Collaborator

use the i2s implementation or maybe even the serial :)
i2s has DMA which can hold the data and let you do your thing without making the sketch wait.
the serial implementation also has buffer (128 bytes) and if 30 leds can fit into that then you'll be fine with it also.

@nouser2013
Copy link
Contributor Author

Small update here. I'm using @cnlohr s I2S implementation (stripped of his unnecessary stuff). It seems to work stable only if I'm making sure that only one single SPIFFS function is "active" at a time. If I'm reading a file and that gets interrupted by e.g. a SPIFFS dirlist ==> crash the ESP. The drawback obviously being no serial input anymore :( but I can live with that.

@cnlohr
Copy link

cnlohr commented Apr 4, 2016

I am curious what you think may cause that? I am unaware of any times buffer underflows, etc. can cause a reboot!

@igrr igrr closed this as completed Jun 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants