-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mbedtls_ssl_handshake crash (PSRAM unicore + memw workaround) (IDFGH-3068) #5090
Comments
Thanks for reporting, we will look into. @szmodz |
updated the test code - noticed a bug, but I don't think it's relevant to the problem. |
maybe related: #5112 |
@iggr @Spritetm Currently testing with the above example, seems to work much better, will leave it a while. |
I've also pushed a modified version of the nop fix (not tested). Note the dupldst fix turned out not to be completely stable in my application (I think I know why). |
Can you expand a bit more on what you tried to fix in those modifications? I can reverse engineer them from your code probably, but if you can write down a line or two, that's probably faster. |
@Spritetm mostly the crashes as described in the linked example. You can reproduce them, right? Mostly i think this is a bug: https://github.com/szmodz/gcc/blob/9ec9bb29ecbc935015e487ae4d9397a8976099e7/gcc/config/xtensa/xtensa.c#L2342 Other changes mostly aimed at improving readability and preventing some redundant memws. Do you really need two memws for a 8/16 bit store near an unconditional jump? Let me know if something is still not clear - I'll try to expand or rephrase. |
@igrr @Spritetm the original nop-based workaround has a bug too. insns_since_store is incremented only here: it should also be incremented here:
[edit] shouldn't make a difference though as more nops would actually be put in. I actually tested my version of the nop workaround with neoniousTR's original test case, and it seems to work ok? |
pre-built linux toolchain: I've also made a minor forced-push to the repo. |
Wait, doesn't the esp32 have a 7 stage pipeline? it says so in the DS. The nop patch seems to assume a 5 stage pipeline. |
Yes, it does, but at that time experiments showed that the first two stages (or the last two, or the first and the last; I don't know exactly) seemingly weren't involved in the load/store inversion. |
Also, a bit late perhaps but I can verify the bug and the fact that this is likely a PSRAM bug. On my ESP32 ECO1, the program hangs; on an ESP32 ECO3, the program continues to run and crashes eventually, possibly because an out-of-memory error. Wrt the bugs: they indeed look like valid fixes/optimizations to me. Unless you think there's more work to be done there, we can take them and run them through our paces in our CI system. It's not perfect (as seemingly it didn't catch the previous bugs in the code) but it should at least tell us if there's some bug the mods may tickle. |
Good to go I suppose. Obviously I don't know the whole story behind the PSRAM issues. I didn't touch the dupldst workaround though, similar issues still apply. ECO3 crash - was that with the original https_mbedtls example, or with the modified mbedtls_crash? |
I don't think there are any memory leaks in mbedtls_crash. It's been running for days with the fixed memw workaround. I added printing of free internal and external RAM just now and it also looks ok. |
The ECO3 crash was with your modified code. As far as I can see, it spins up a lot of threads creating a TLS connection in parallel, so I assumed it would be an out-of-memory error. |
It does spin up threads, but the connection is closed as soon as the TLS handshake is done, it doesn't download any actual content. Should probably investigate. I'm not getting any crashes with the fixed toolchain: |
It could possibly be an Internet speed thing. If the connection is made fast enough, the tasks end pretty soon and they don't 'heap up'. My connection here isn't that great, and as such the tasks do get the chance to all be active at the same time, which presumably leads to the out-of-memory condition. |
Also, the number of TLS tasks is a constant 15. Each has an infinite loop. DRAM usage is constant, doesn't change at all during operation. Only PSRAM is freed / allocated. |
updated the example to include free memory information: the updated example is running 23 hours (fixed toolchain), no crashes / hangs so far. |
Made a change to the patch. Don't ask me why it matters, but it does. |
@szmodz Hi! https://github.com/szmodz/esp-idf/tree/mbedtls_crash is 404 Could you show the example? |
@antmak it's back up. |
Code to reproduce is based on a modified https_mbedtls example from esp-idf:
https://github.com/szmodz/esp-idf/commits/mbedtls_crash
The code always ends up causing a WDT timeout sooner or later (timeout set to 60s) in UNICORE mode. CONFIG_MBEDTLS_HARDWARE_xxx are disabled, but it doesn't really matter.
Seems to be MUCH more stable with dupldst rather than memw workaround (not sure if it's completely stable though - too soon to tell). No other changes were made, and the effect on stability is obvious.
Perhaps it would be a good idea to also supply newlib and sdk libs compiled with dupldst?
The handshake is using nonblocking sockets and resets the WDT, so I don't think it should ever exceed the 60s timeout. That, plus it seems stable (so far) with dupldst.
It's a pretty intensive load, but that way the problem can be reproduced quickly. It also occurs under normal load conditions, but not as often. I noticed it in my actual application code.
xtensa-esp32-elf-gcc --version
to find it): 2020r1example output:
The text was updated successfully, but these errors were encountered: