Skip to content

RTOS-SDK, ESP32 and the way forward #1319

@jmattsson

Description

@jmattsson

Edit: the below progress update refers to the dev-rtos branch which targeted the RTOS-SDK and the ESP31B. With the final release of the ESP32, Espressif abandoned the RTOS SDK in favour of their new IoT Development Framework (IDF). While the IDF is vastly superior to the previous SDKs, it does set our porting effort back a fair bit. Progress updates on the IDF/ESP32 port of NodeMCU can be found further down in this discussion.

With the ESP32 release coming up in a few months time, it's time to seriously start thinking about the way forward. I think it's a given that we'd all like to see NodeMCU run on the ESP32 as well. With the ESP32 there is only the RTOS SDK however, which means we really need to consider how to get ourselves switched from the non-OS SDK over to RTOS.

Since $work is rather interested in shifting some of our products over to the ESP32 I've had a bit of time to investigate the effort that will be required in terms of NodeMCU. I've been "spiking" over on the DiUS dev-rtos branch to see what I can get going. Here's the overview so far:

  • Make NodeMCU compile with RTOS-SDK headers rather than non-OS SDK headers. There's a bunch of glue in the sdk-overrides/directory which would need cleaning up, but overall this step wasn't too bad - the SDK functions are largely the same.
  • Make NodeMCU link with RTOS SDK. This took a bunch of changes, and a couple of functions needed to be stubbed for now.
  • Reimplement NodeMCU task interface on top of RTOS tasks. Thanks to Terry's earlier work, this seems to be relatively straight forward. Once it gets run-tested some issues may surface though (cue Jaws music...)
  • Complete reimplementation of our exception handler to allow constants in flash. Turns out the RTOS-SDK doesn't use any of the ROM functions for hooking exceptions (probably performance reasons), and the documented method of installing user hooks simply does not exist. Took a fair chunk of work to find a good way to hi-jack the UserExceptionVector, but on the upside it's now also a whole lot faster than the previous one.
  • Remove NodeMCU's partial libc implementation. This was conflicting with the SDK's libc and causing complete hangs. On the upside, there now is a real libc available. Almost all the various c_ prefixed functions (and a bunch of os_ prefixed ones) have been consolidated back to standard C library names.
  • Fixed SPI flash reading functions. The RTOS-SDK changed the flashchip variable from a pointer to a struct, so our use of it bombed completely...
  • Understand why printf() now doesn't work, but ets_printf() does. printf now working, without bounce buffers.
  • Get to the Lua prompt being printed. So far, so good.
  • Make UART driver RTOS compatible. The UART driver wasn't to blame, my buggy task.c implementation was. Whoopsie.
  • Get to a (mostly) working Lua prompt. This would be a major milestone, and hopefully be the starting point for others to join in the effort.
  • Deal with timer callbacks executing from a different task context. Considering that pretty much all NodeMCU code is written expecting run-to-completion semantics, switching to a preempting OS framework has huge potential for random lockups and crashes. My current approach is to dedicate a single RTOS task to running all of NodeMCU in, which should hopefully mitigate most issues. To make this happen I think I'll probably need to wrap the timer API to have the actual callbacks posted back to the NodeMCU task for execution. Having done the transition for the tmr module, the whole thing appears to be easier to just change in each place where needed than attempt to wrap everything. Besides, having high-priority timer callbacks might be useful for some drivers.
  • Understand which tasks the SDK callbacks execute from, and develop a strategy for dealing with that. Similar to the timers, but a bit more challenging, possibly. Our earlier work in not allowing Lua callbacks to run from SDK callbacks should help here and limit the amount of rework needed. I hope. As expected callbacks seem to be called from various RTOS tasks directly, such as the rtT high priority timer task and the tiT TCP/IP task, not to mention the uiT task which is what user_init() runs in. It will be up to everyone who is taking an SDK callback to either deal with it fully within that callback without referencing data used by other tasks, or copy the necessary information from said callback and relay it back to the main nodemcu RTOS task. Appropriate locking must be done though. I've updated the sntp module as an exercise, and while it grew a little bit it was pretty straight forward. [cue everyone pointing out things now wrong with it...]
  • Make output redirection work. Needs a putc handler installed which can queue characters across into the nodemcu task.
  • Make silencing of SDK output work. Espressif didn't provide a system_set_os_print() function in the RTOS-SDK and wants you to install a putc handler to suppress everything instead (which is useless, since then we'd have to put a mutex around each printf call if want system_set_os_printf() like functionality). I fixed this by placing all the SDK functions first in irom, and then in the wrapped printf() checking the return address - if it's in the SDK part of irom and we've flagged off SDK prints, then we suppress it. Rather sneaky, but works well and with almost no cost.
  • Revisit printf override. The internal print() function takes peculiar arguments, but we now match those to the letter I believe.
  • Find out what's using so much stack space. Currently I'm running the NodeMCU task with an unsupported stack size just to prevent everything from falling over due to the stack being smashed. If anyone has any stack analysis tools that could work for the ESP platform, I'd love to hear about them, since -fstack-usage is not available.
  • ??? (no profit guaranteed)
  • Reimplement the net module (and others) on top of lwIP API, since espconn is only partially supported on the ESP8266 RTOS, and not at all on the ESP32 RTOS. Probably look at including mbedTLS for TLS support.
  • Fix whatever other issues and races we encounter. There will be races, I'm sure. There will also be regressions
  • Look at upgrading various components and drivers to take better advantage of the RTOS aspects.

If we can get our current NodeMCU to run stable on the ESP8266 with the RTOS SDK, it should be quite easy to get ESP32 support in I believe. If/when I get my hands on ESP32 hardware, I'll have an even better idea.

Oh, and the dev-rtos branch is subject to force-pushing and other unpleasant things, and it is most assuredly not ready for public consumption, but if you want to track my progress you'll see it there.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions