Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git: Webserver leaks on LmacRxBlk and crashes (exception 9 & 24) #1458

Closed
andig opened this issue Jan 17, 2016 · 24 comments
Closed

git: Webserver leaks on LmacRxBlk and crashes (exception 9 & 24) #1458

andig opened this issue Jan 17, 2016 · 24 comments

Comments

@andig
Copy link
Contributor

andig commented Jan 17, 2016

After adding HTTPClient to my sketch I've started to see various restarts.

Update not related to HTTPClient. Also getting those crashes with webserver.

Crash logs:

http://pastebin.com/31v49QZC
http://pastebin.com/PnjZaGzC
http://pastebin.com/QwhjzZPL
http://pastebin.com/vdZBxTGs

The first might be most interesting as it shows something else, too. Free heap before adding httpclient was 31k, after adding it went down to 25k.

All new variables are local to a single function (except for global httpclient). This is the (new) code that causes the problem:

void OneWirePlugin::upload() {
  for (int8_t i=0; i<devs; i++) {
    char uuid_c[40];
    getUuid(uuid_c, i);

    // uuid configured?
    if (strlen(uuid_c) > 0) {
      float val = getValue(i);

      char val_c[16];
      if (isnan(val))
        strcpy(val_c, "null");
      else
        dtostrf(val, -4, 2, val_c);

      String uri = g_middleware + "/data/" + String(uuid_c) + ".json?value=" + String(val_c);
      http.begin(uri);
      int httpCode = http.POST("");
      DEBUG_ONEWIRE("[1wire] upload %d %s\n", httpCode, uri.c_str());
      http.end();
    }
  }
}
@Links2004
Copy link
Collaborator

you never read the data coming from server.
this can make problems, since the TCP buffer get full.

latest git has this:
fb55e91

and may more imported rc1 runs on SDK 1.5.0 which is very buggy.
latest git has SDK 1.5.1, if possible update to latest git.

@andig
Copy link
Contributor Author

andig commented Jan 18, 2016

latest git has SDK 1.5.1, if possible update to latest git.

I'm already on latest git, including fb55e91. I can furthermore confirm crashes even without HTTPClient and with sufficient heap available.

@andig andig changed the title 2.1.0-rc1/git: Various crashes with HTTPClient (exception 9 & 24) 2.1.0-rc1/git: Various crashes with webserver (exception 9 & 24) Jan 18, 2016
@andig
Copy link
Contributor Author

andig commented Jan 18, 2016

I've done some further testing. Crashes are happening once there is some load on the system.

Test scenario: 2 sequences calling the json api in parallel, heap is being monitored to serial. This is running fine for ages.

Then: use browser (Windows = delayed ACK) to load index.html. Log starts to see LmacRxBlk:1 messages. Once those messages appear loading the index.html stalls and sonner or later the ESP crashes. For log see http://pastebin.com/vDmNrvpm#174.

LmacRxBlk:1 also seems related to #1450 but I couldn't find where its generated in the code?

@andig
Copy link
Contributor Author

andig commented Jan 18, 2016

The problem apparently is with LmacRxBlk in that it leaks memory even when the web server doesn't have any handlers associated.

Once LmacRxBlk has been received once, even if load is taken off the esp, it will eventually loose and restart wifi connection and/or crash (with current git).

To put load on the system use this html file. in my case running on Windows PC- might be related to the delayed ACK issue?

<!DOCTYPE html>
<html>
<body>
<script src="https://code.jquery.com/jquery-2.1.4.min.js"></script>
<script>
$(document).ready(function() {
    window.setInterval(function() {
        $.getJSON("http://vzero-edd834.local", {
            timeout: 100
        });
    }, 100);
});
</script>
</body>
</html>

Update I've tried reproducing the memory leak on 2.0.0 stable and wasn't able to do so. It seems that something in current git since 2.0 might be causing the crash behavior.

@andig andig changed the title 2.1.0-rc1/git: Various crashes with webserver (exception 9 & 24) git: Webserver leaks on LmacRxBlk and crashes (exception 9 & 24) Jan 18, 2016
@andig
Copy link
Contributor Author

andig commented Jan 18, 2016

This is the shortest log I could find to repro the crash with the sketch/ calling code above:

[wifi] waiting for connection..............................
[wifi] IP address: 192.168.0.30
OTA server at: vzero-edd834.local:8266
41320
41312
41192
40872
40848
40712
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
pm open,type:2 0
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
LmacRxBlk:1
36592
Fatal exception 9(LoadStoreAlignmentCause):
epc1=0x401055b5, epc2=0x00000000, epc3=0x00000000, excvaddr=0xa65d893a, depc=0x00000000

@andig
Copy link
Contributor Author

andig commented Jan 18, 2016

And here is a log of core+wifi up to the exception: http://pastebin.com/rRBzsRJr

@Links2004
Copy link
Collaborator

LmacRxBlk is a overflow of the input buffer,
the ESP is simple overloaded.
see: #1450

you can test the Async webserver:
https://github.com/me-no-dev/ESPAsyncWebServer

@andig
Copy link
Contributor Author

andig commented Jan 18, 2016

Thanks @Links2004 I'll check the async server. However- even if the input buffer overflows- should the WS handle this gracefully? The mem leak/ corruption shouldn't be happening in any case?

@igrr
Copy link
Member

igrr commented Jan 18, 2016

The overflow happens at MAC level, so this is even below the TCP/IP stack.
It's pretty hard to handle this condition gracefully at app layer. There is
a solution for this, but it relies on dynamic memory allocation... So you
can still flood the ESP easily.

On Mon, Jan 18, 2016, 21:17 andig notifications@github.com wrote:

Thanks @Links2004 https://github.com/Links2004 I'll check the async
server. However- even if the input buffer overflows- should the WS handle
this gracefully? The mem leak/ corruption shouldn't be happening in any
case?


Reply to this email directly or view it on GitHub
#1458 (comment).

@andig
Copy link
Contributor Author

andig commented Jan 18, 2016

The overflow happens at MAC level, so this is even below the TCP/IP stack. It's pretty hard to handle this condition gracefully at app layer.

Pheew- thats bad news indeed.

@andig
Copy link
Contributor Author

andig commented Jan 20, 2016

I have some troubles converting my handlers to the async server (sorry- C is not my primary language). The handlers have the signature of ArRequestHandlerFunction:

void handleRoot(AsyncWebServerRequest *request) {}

yet

g_server.on("/index.html", handleRoot);

gives compile error:

sketch\webserver.cpp: In function 'void webserver_start()':
webserver.cpp:308: error: no matching function for call to 'AsyncWebServer::on(const char [12], <unresolved overloaded function type>)'
   g_server.on("/index.html", handleRoot);
                    ^
sketch\webserver.cpp:308:40: note: candidates are:
In file included from sketch\webserver.cpp:8:0:
C:\andi\arduino\hardware\eps8266com\esp8266\libraries\ESPAsyncWebServer\src/ESPAsyncWebServer.h:258:10: note: void AsyncWebServer::on(const char*, ArRequestHandlerFunction)
     void on(const char* uri, ArRequestHandlerFunction onRequest);
      ^
C:\andi\arduino\hardware\eps8266com\esp8266\libraries\ESPAsyncWebServer\src/ESPAsyncWebServer.h:258:10: note:   no known conversion for argument 2 from '<unresolved overloaded function type>' to 'ArRequestHandlerFunction {aka std::function<void(AsyncWebServerRequest*)>}'
C:\andi\arduino\hardware\eps8266com\esp8266\libraries\ESPAsyncWebServer\src/ESPAsyncWebServer.h:259:10: note: void AsyncWebServer::on(const char*, WebRequestMethod, ArRequestHandlerFunction)
     void on(const char* uri, WebRequestMethod method, ArRequestHandlerFunction onRequest);
      ^

As workaround I'm simply wrapping it, but would like to know if I can do without the wrapper:

g_server.on("/", [](AsyncWebServerRequest *request) {
    handleRoot(request);
});

Sorry if this is more a C than an ESP question ;)

@igrr
Copy link
Member

igrr commented Jan 20, 2016

no known conversion for argument 2 from '<unresolved overloaded function type>' to 'ArRequestHandlerFunction {aka std::function<void(AsyncWebServerRequest*)>}'

Do you have another function in the sketch with the same name? Compiler thinks that this is an overloaded function.

@drmpf
Copy link

drmpf commented Jan 20, 2016

I think I had a similar problem, so I used

void pfodESP8266WebConfig::handleRoot(pfodESP8266WebConfig* ptr) {
  ptr->rootHandler();
}

void pfodESP8266WebConfig::handleConfig(pfodESP8266WebConfig* ptr) {
  ptr->configHandler();
}
void pfodESP8266WebConfig::handleNotFound(pfodESP8266WebConfig* ptr) {
  ptr->rootHandler();
}

void pfodESP8266WebConfig::init(uint8_t _basicStartAddress) {
  basicStartAddress = _basicStartAddress;
  webserver.begin();
  webserverPtr = &webserver;
  webserverPtr->on ( "/", std::bind(&pfodESP8266WebConfig::handleRoot, this));
  webserverPtr->on ( "/config", std::bind(&pfodESP8266WebConfig::handleConfig, this));
  webserverPtr->onNotFound( std::bind(&pfodESP8266WebConfig::handleNotFound, this));
}

@andig
Copy link
Contributor Author

andig commented Jan 20, 2016

Do you have another function in the sketch with the same name? Compiler thinks that this is an overloaded function.

No- it's the only handleRoot function. Wrapping it is fine, so never mind (might be added to ESPAsync samples @me-no-dev).

Async server is reproducible panicing for me as soon as I start using SPIFFS when reading a (larger) file though:

[webserver] uri: / args: 0
[core] heap: 27712
[webserver] string constructed
[webserver] file opened

Panic C:\andi\arduino\hardware\eps8266com\esp8266\cores\esp8266\core_esp8266_main.cpp:85 __yield

ctx: sys 
sp: 3ffffbc0 end: 3fffffb0 offset: 01b0

>>>stack>>>
3ffffd70:  402484dc 0000000b 3ffffdd8 402136dd  

This is the code:

void handleRoot(AsyncWebServerRequest *request)
{
  DEBUG_SERVER("[webserver] uri: %s args: %d\n", request->url().c_str(), request->params());
  DEBUG_HEAP;

  String indexHTML = F("<!DOCTYPE html><html><head><title>File not found</title></head><body><h1>File not found.</h1></body></html>");
  DEBUG_SERVER("[webserver] string constructed\n");

  File indexFile = SPIFFS.open(F("/index.html"), "r");
  DEBUG_SERVER("[webserver] file opened\n");

  if (indexFile) {
    indexHTML = indexFile.readString();
    DEBUG_SERVER("[webserver] file read\n");
    indexFile.close();
    DEBUG_SERVER("[webserver] file closed\n");
    DEBUG_HEAP;
    DEBUG_SERVER("[webserver] content length: %d\n", indexHTML.length());
  }

  request->send(200, "text/html", indexHTML);
}


void webserver_start()
{
  g_server.on("/", [](AsyncWebServerRequest *request) {
    handleRoot(request);
  });

  // start server
  g_server.begin();
}

@igrr
Copy link
Member

igrr commented Jan 20, 2016

Could you please move further AsyncWebServer discussion to https://github.com/me-no-dev/ESPAsyncWebServer/issues?

The reason it is panicking is that it calls File system APIs which expect to be used on the main thread.

@me-no-dev
Copy link
Collaborator

and you can request->send("/index.html"); and let the server deal with the file and all.
check the example supplied with the server library for SPIFFS usage and responses :)

@me-no-dev
Copy link
Collaborator

@igrr doesn't that __yield panic mean that something using yield was called from non-switching context?
since Async callbacks are such context I imagine. If so maybe the indexFile.readString(); is what triggers it?

@igrr
Copy link
Member

igrr commented Jan 20, 2016

Yes, every flash operation within SPIFFS HAL calls optimistic_yield, because flash access is a potentially long operation. I think we can relax this for reads, so you can at least read files from SPIFFS from tcp stack callbacks.

Edit: try removing this line:
https://github.com/esp8266/Arduino/blob/master/cores/esp8266/spiffs_hal.cpp#L46

@me-no-dev
Copy link
Collaborator

The example I provided uses SPIFFS and is a copy of the FSBrowser example we have for the regular WebServer. So I'm am positive that SPIFFS used as intended is working great (I have tested with files up to 2MB upload/download). There is also the static handler thing that I have ported over. Maybe since I do async reads of small portions, they do not trigger the optimistic_yield(), thus I have never seen that error before

@andig
Copy link
Contributor Author

andig commented Jan 20, 2016

and you can request->send("/index.html"); and let the server deal with the file and all.

@me-no-dev sending file via SPIFFS handler is wfm.

I need to use AsyncBasicResponse response though as I need to do some replacements inside the file before serving. Panic definitely happens on file read. I'm a little confused as to why this isn't an issue with the AsyncFileResponse as well since it also needs to read the file?

I'll move to ESPAsyncWebServer as requested.

@igrr as the root cause was crashes/leaks from MAC layer on overload- should this issue be closed?

@igrr
Copy link
Member

igrr commented Jan 20, 2016

Not really, i hope that one day we will find a solution or at least a workaround for these issues.

@me-no-dev
Copy link
Collaborator

@igrr how do you feel about replacing yield here and there with optimisic_yield or better have yield be optimistic.
The reason asking is because we have written over the time many methods to yield (like Stream::readString() above), which was fine up till now because all was blocking and called from the loop context, but those methods can no longer be used in async contexts without making them not to yield unnecessary.

@andig
Copy link
Contributor Author

andig commented Jan 23, 2016

tl:dr

I've migrated to the async webserver. After fixing some issues with code using yield (e.g. Stream::readString) I'm almost overwhelmed with the performance I'm getting. No more issues with parallel requests.

@me-no-dev
Copy link
Collaborator

@andig maybe close this now :)

@andig andig closed this as completed Jan 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants