Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should schedule_function() call esp_schedule()? #7661

Open
4 of 5 tasks
matthijskooijman opened this issue Oct 17, 2020 · 31 comments
Open
4 of 5 tasks

Should schedule_function() call esp_schedule()? #7661

matthijskooijman opened this issue Oct 17, 2020 · 31 comments

Comments

@matthijskooijman
Copy link

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
    I tried, but the latest master requires GCC10 and I'm not sure how to get this installed.
  • I have searched the issue tracker for a similar issue.
  • I have filled out all fields below.

Platform

  • Hardware: Feather Huzzah ESP8266
  • Core Version: 2.7.4
  • Development Env: Arduino IDE
  • Operating System: Ubuntu

Settings in IDE

-fqbn=esp8266:esp8266:huzzah:xtal=80,vt=flash,exception=legacy,ssl=all,eesz=4M2M,ip=lm2f,dbg=Serial,lvl=CORE,wipe=none,baud=460800

Problem Description

I'm building a sketch that essentially always sleeps, but needs to wakeup on a pin interrupt and then do some network traffic.

Some observations & assumptions:

  • When using light sleep, AFAIU I need to put esp_yield() in my loop for when it has nothing left to do (to ensure the CPU can sleep, rather than continuously having to keep calling loop()).
  • A sketch can be woken up from sleep by an interrupt using attachInterrupt() with e.g. the ONHIGH_WE.
  • You cannot do real work from an ISR (e.g. do network traffic), this must be done from the main loop (or another task). This can be done using schedule_function to call some code after the next loop iteration.
  • In the main loop function, the esp_yield(), will not return until esp_schedule() is called elsewhere.
  • So to make this work, from the interrupt handler, I now have to call both schedule_function() to schedule some code to run later, and esp_schedule() to actually make sure the loop function continues and the scheduled function can run.
  • When using other code that calls schedule_function() (e.g. Ticker.once_scheduled()) that I have no control over, this breaks (so e.g. for Ticker I have to do a non-scheduled Ticker.once() with a function that calls both schedule_function() and esp_schedule(), but that might not be always possible with other interfaces).

Maybe I'm misunderstanding how to approach my original problem, but if the above approach is correct, then maybe it would be good if schedule_function() would just always call esp_schedule()? (or, if it is harmful to call it when already inside the loop, then maybe the call can be done only when outside the main loop, e.g. by checking can_yield() or so?).

Doing so would mean that schedule_function() would just always work to ensure the given function is called, even when the mainloop is not currently running?

Sketch

To illustrate the problem, here's something that I think should turn the led on pin 0 on (LOW), and schedule a function that turns the led off after 5 seconds. I left out the actual sleeping and interrupt handling out, to simplify the example. If it helps to clarify, I can provide a more complete example with those as well.

#include <Arduino.h>
#include <Ticker.h>

Ticker ticker;

void setup() {
  pinMode(0, OUTPUT);
  digitalWrite(0, LOW);
}

extern "C" void esp_yield(void);

void loop() {
  ticker.once_scheduled(5, [](){ digitalWrite(0, HIGH); });
  esp_yield();
}

Debug Messages

SDK:2.2.2-dev(38a443e)/Core:2.7.3-3-g2843a5ac=20703003/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-30-g92add50/BearSSL:5c771be

(nothing else appears on serial)

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 21, 2020

Here is a quick hack for a test-proposal:

  • use recurrent scheduled function instead of scheduled functions (interval can be 0 and the lambda must return false so it is executed once, if that matches your use-case),
  • apply this minor change in the core (cores/esp8266/core_esp8266_main.cpp)
static inline void esp_yield_within_cont() __attribute__((always_inline));
static void esp_yield_within_cont() {
+       run_scheduled_recurrent_functions();
        cont_yield(g_pcont);
        s_cycles_at_yield_start = ESP.getCycleCount();
-       run_scheduled_recurrent_functions();
}

extern "C" void __esp_yield() {
    if (can_yield()) {
        esp_yield_within_cont();
    }
}

With this patch, (recurrent) scheduled functions are called before effectively yielding instead of after.

Background: recurrent scheduled functions are scheduled functions:

  • that auto trigger themselves at given interval until they return false
  • of which interval checking occurs at every loop and every yield

edit: embedded Ticker library does not use recurrent scheduled functions.

@d-a-v d-a-v self-assigned this Oct 22, 2020
@d-a-v d-a-v added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Oct 22, 2020
@matthijskooijman
Copy link
Author

Thanks for your suggestion. However, I think it does not solve the problem yet, since recurrent functions are also only called when the loop is either running, or about to be suspended with esp_yield()? IOW, they wil not cause the loop to resume, just like normal scheduled functions cannot do so?

I guess that this does mean that my original request to call esp_schedule from schedule_function should maybe be extended to schedule_recurrent_function as well?

I haven't tested this yet (no hardware nearby), but IIUC you suggest changing my tests sketch like this:

#include <Arduino.h>
#include <Ticker.h>

Ticker ticker;

void setup() {
  pinMode(0, OUTPUT);
  digitalWrite(0, LOW);
}

extern "C" void esp_yield(void);

void loop() {
  ticker.once(5, [](){
    schedule_recurrent_function_us(0 /*us*/, []() {
      digitalWrite(0, HIGH);
      return false; /* Run only once */
    });
  });
  esp_yield();
}

AFAICS, even with your suggested patch to the core, this will not run the recurrent function?

Given that the recurrent function can be run at a later time as well, it could also replace the Ticker completely. However, AFAICS that won't fix the problem (now nothing runs after 5 seconds, rather than just the ticker firing and scheduling a function that does not run). Also, AFAICS recurrent functions do not actually set timers, so they cannot cause wakeup from sleep, unlike Ticker (which is the reason I'm esp_yielding in the first place).

For completeness, I think this would look like this:

#include <Arduino.h>

void setup() {
  pinMode(0, OUTPUT);
  digitalWrite(0, LOW);
}

extern "C" void esp_yield(void);

void loop() {
  schedule_recurrent_function_us(5000000 /*us*/, []() {
    digitalWrite(0, HIGH);
    return false; /* Run only once */
  });
  esp_yield();
}

@devyte
Copy link
Collaborator

devyte commented Oct 22, 2020

Why are you calling esp_yield inside the loop? remove that.
Edit: oh, I think I understand what you're trying to do. You basically want to suspend CONT, and have it resume later when a pin interrupt happens.

  • How often does the external pin interrupt happen?
  • Why light sleep?
  • Have you looked at deep sleep?
  • Why a scheduled function at all?

I would add logic to the loop to detect the start and end of your network transaction.

  • The start condition would get triggered by the external interrupt.
  • The end condition would get triggered once the transaction is done, and would result in suspending CONT
  • The interrupt would just trigger the start and wouldn't schedule anything, nor would it do any work.

@matthijskooijman
Copy link
Author

From my original post:

When using light sleep, AFAIU I need to put esp_yield() in my loop for when it has nothing left to do (to ensure the CPU can sleep, rather than continuously having to keep calling loop()).

And:

I left out the actual sleeping and interrupt handling out, to simplify the example. If it helps to clarify, I can provide a more complete example with those as well.

@devyte
Copy link
Collaborator

devyte commented Oct 22, 2020

See my edit above.

@matthijskooijman
Copy link
Author

How often does the external pin interrupt happen?

When I press a button, usually minutes apart.

Why light sleep?

To stay connected to the AP and have the fastest possible response time after pressing the button combined with the lowest power usage (battery powered application).

Have you looked at deep sleep?

Yes, but that would have longer response time due to needing to reassociate to the network.

Why a scheduled function at all?

Because doing network traffic (i.e. an HTTP request and waiting from the response) did not work from inside the button ISR (which makes sense), but also not from the Ticker handler IIRC (which I think is just another task and not a timer ISR, but it still didn't work). It's been a while since I dug into this, but IIRC some of the networking code was written in a way that only worked from inside the main loop / CONT.

The start condition would get triggered by the external interrupt.
The end condition would get triggered once the transaction is done, and would result in suspending CONT
The interrupt would just trigger the start and wouldn't schedule anything, nor would it do any work.

Note that my actual case is slightly complicated since the work is done 1000ms after the interrupt (which is what I'm using Ticker for now), but the flow could be the same (ISR triggers Ticker, ticker triggers start condition).

But this would also require the interrupt to call esp_schedule to resume CONT, right? So then I would call esp_schedule() and flag a start condition, which is handled by the loop. In that case, I might as well call esp_schedule() and schedule_function() (which allows slightly nicer code structure, since the code for the work can be kept where it is triggered, rather than in the loop), which is also my current workaround.

But the main point of this request is that IMHO schedule_function should just work, even when CONT is suspended, by resuming CONT when it is suspended (regardless of why it is suspended of who is calling schedule_function why).

@devyte
Copy link
Collaborator

devyte commented Oct 22, 2020

Because doing network traffic (i.e. an HTTP request and waiting from the response) did not work from inside the button ISR (which makes sense), but also not from the Ticker handler IIRC (which I think is just another task and not a timer ISR, but it still didn't work). It's been a while since I dug into this, but IIRC some of the networking code was written in a way that only worked from inside the main loop / CONT.

That's not what I meant by "why a scheduled function at all". You can't do network ops from an ISR, and the Ticker cb executes in SYS, not in CONT, so also no. But that doesn't mean a scheduled function. What I meant was: why are you set on a scheduled function at all? There are other ways to trigger loop code, such as a global flag, which is what I'm suggesting with the start/end.

You can think of it as a state machine.

  • The initial state, I. e. first time through, it calls the sleep code (yield)
  • An external trigger, I. e. the ISR, resumes the loop (schedule), and transitions to start state
  • After whatever network initial comms are done, you transition to a comms state to actually do the communications
  • Once the comms are done, you transition into the initial state again, I. e. sleep (yield)
  • the above should have appropriate error handling and timeouts
  • all of the above doesn't happen in a single loop pass, the loop executes as many times as needed to achieve all of the transitions. In each pass, the code decides which state handler has to be called

Of course, I'm simplifying. There could be more states needed in your code, that's something you would need to figure out.

@matthijskooijman
Copy link
Author

There are other ways to trigger loop code, such as a global flag, which is what I'm suggesting with the start/end.

Thanks for that suggestion, I can see how that would indeed work. However, that's not the way I want to structure my code, scheduling functions to be called ASAP with schedule_function is pretty much exactly what I do want. So, I come back to my original suggestion.

Any response to the suggestion itself? If the answer to my suggestion is "No, schedule_function should not call esp_schedule "because that causes problems or is unwanted for other reasons, and/or "If you call esp_yield, you're responsible to call esp_schedule yourself too, that's fine, then this issue can be closed and I'll stick to my current approach (using the non-scheduled TIcker and calling both schedule_function and esp_schedule in the non-scheduled callback).

@devyte
Copy link
Collaborator

devyte commented Oct 22, 2020

At top level, you're not supposed to use esp_yield/esp_schedule yourself. Those are low level functions that deal with the interactions between SYS and CONT. The user is supposed to only call yield/delay, which wraps the details. As such, if you do decide to use them, strictly speaking you are responsible for doing it correctly.

Having said that, your general idea seems to have merit, and I would say it's a valid use case.

Changing how the scheduled functions are currently called on switching to SYS and back could cause trouble. We've already gone through several incarnations of that, and I'd prefer to keep it as it is, i. e. call on returning.

There is also the fact that the interrupt could come in while there is an ongoing delay, in which case things might get... interesting.

Let's see if your current approach can be improved.

@d-a-v d-a-v removed the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Oct 22, 2020
@d-a-v
Copy link
Collaborator

d-a-v commented Oct 23, 2020

@matthijskooijman

From OP:

When using other code that calls schedule_function() (e.g. Ticker.once_scheduled()) that I have no control over, this breaks (so e.g. for Ticker I have to do a non-scheduled Ticker.once() with a function that calls both schedule_function() and esp_schedule(), but that might not be always possible with other interfaces).

it would be good if schedule_function() would just always call esp_schedule()?

After rereading, there is a misunderstanding from my side.
Let's suppose esp_schedule() (that will "unsuspend" the next call to esp_yield() - or resume the current one) is called by schedule_function() which is itself called by ticker.once_scheduled(5 seconds, ...)

void loop() {
  ticker.once_scheduled(5, [](){ digitalWrite(0, HIGH); });
  esp_yield();
}

The esp_yield() above will be "unsuspended" because the added call to schedule_function().
That will lead to the end of the loop but will not trigger your user tickered function (delayed to 5s later).
Then loop restarts, and the tickered function is replaced by a new one that is supposed to be fired 5s later.
And this story loops over. The sketch will not be sleeping, but the tickered function will not be fired.

Scheduled function will not work (*) if there is an esp_yield() that prevents any further loop (with or without any esp_schedule() at registering time). Once esp_yield() has been executed, the esp has to be waken up by something coming from SYS (an ISR, a timer, network event...).

(*) except if delay is 0 and recurrent scheduled functions are used, that leads to your first answer:

If we claim that every esp_yield() should match an esp_schedule(), (push/pop, call/return, malloc/free) then your first example in your first answer

void loop() {
  ticker.once(5, [](){
    schedule_recurrent_function_us(0 /*us*/, []() {
      digitalWrite(0, HIGH);
      return false; /* Run only once */
    });
  });
  esp_yield();
}

should be written like:

void loop() {
  ticker.once(5, [](){
    schedule_recurrent_function_us(0 /*us*/, []() {
      digitalWrite(0, HIGH);
      return false; /* Run only once */
    });
    esp_schedule(); // <== unhidden esp_yield() counterpart
  });
  esp_yield();
}

Thus, I agree with you about this:

But this would also require the interrupt to call esp_schedule to resume CONT, right? So then I would call esp_schedule() and flag a start condition, which is handled by the loop. In that case, I might as well call esp_schedule() and schedule_function() (which allows slightly nicer code structure, since the code for the work can be kept where it is triggered, rather than in the loop), which is also my current workaround.

And disagree with that:

But the main point of this request is that IMHO schedule_function should just work, even when CONT is suspended, by resuming CONT when it is suspended (regardless of why it is suspended of who is calling schedule_function why).

.. because

  • one _yield one _schedule is a nice rule :)
  • resuming CONT (calling esp_schedule()) at registering time makes no sense when the time interval is not 0.
    Would it make sense to call esp_schedule() only when time interval is 0 ?

However, that's not the way I want to structure my code, scheduling functions to be called ASAP with schedule_function is pretty much exactly what I do want. So, I come back to my original suggestion.
Any response to the suggestion itself?

You had the answer before I made my own way through it:

"If you call esp_yield, you're responsible to call esp_schedule yourself too"

and I'll stick to my current approach (using the non-scheduled TIcker and calling both schedule_function and esp_schedule in the non-scheduled callback).

That's indeed my POV. And your use-case should go into or update an already existing example.

@matthijskooijman
Copy link
Author

Thanks for more in-depth replies, much appreciated.

Responding to @devyte first:

At top level, you're not supposed to use esp_yield/esp_schedule yourself. Those are low level functions that deal with the interactions between SYS and CONT. The user is supposed to only call yield/delay, which wraps the details.

Yeah, I guessed as much, but I couldn't think of any way to get the light sleeping to work correctly without esp_yield().

Having said that, your general idea seems to have merit, and I would say it's a valid use case.

Ideally, there would be a well-supported/well-defined API to support this usecase, if esp_yield() / esp_schedule() is not it :-)

There is also the fact that the interrupt could come in while there is an ongoing delay, in which case things might get... interesting.

Do you mean the case where, inside the delay CONT is currently suspended but will be resumed after a timer, so calling esp_schedule() in that case would end up with one resume to many in the queue? If so, this is essentially the one suspend == one resume rule that @d-a-v also states.

This one-on-one mapping is something that might indeed be tricky to take into account (In my original sketch that I did not show here, these "extra resumes" might also occur, though since the loop does not do anything except esp_yield() that probably does not cause any issues).

Thinking about this, maybe a higher level API for this might also solve the one-on-one mapping problem, by exposing a suspend_loop() function that suspends the loop and resume_loop() that resumes the loop only if it is currently suspended? I.e. something like:

bool loop_suspended = false;
void suspend_loop() {
  loop_suspended = true;
  esp_yield();
}
void resume_loop() {
  if (loop_suspended) {
    loop_suspended = false
    esp_schedule();
  }
}

The idea as that there is a separate mechanism (the loop_suspended variable) that ensures that for these higher level API functions will balance their yield/schedule calls. And then things like schedule_function could maybe just call resume_loop, which just does the right thing in all cases.

However, after writing the above, I realize that this does not actually work like this yet, i.e. if you suspend_loop and something else calls esp_schedule and then you call suspend_loop again, the balance is off. I'd have to think a bit about how this API should work exactly (and what the constraints on them are), but I'm out of time for now :-)

Then, @d-a-v writes:

The esp_yield() above will be "unsuspended" because the added call to schedule_function().
That will lead to the end of the loop but will not trigger your user tickered function (delayed to 5s later).

resuming CONT (calling esp_schedule()) at registering time makes no sense when the time interval is not 0.
Would it make sense to call esp_schedule() only when time interval is 0 ?

I think both of these statements suggest that you're thinking about calling esp_schedule at the moment where you start the ticker? My suggestion would to call it inside schedule_function, which is called by Ticker after 5s, not when starting the ticker?

Another way to look at it, is to note that schedule_function always schedules a function to be run ASAP, which somewhat corresponds to interval=0.

However, I do now realize that my suggestion could only possibly work for schedule_function, not for schedule_recurrent_function, since that does not trigger a timer, but just polls the time and thus only works when the mainloop is still running. So my suggestion would introduce some inconsistency then, if schedule_function would work like this and schedule_recurrent_function would not.

one _yield one _schedule is a nice rule :)

Yeah, I think I can imagine how that would be required. Then having a function that maybe calls esp_schedule is going to be counter-productive.

That's indeed my POV. And your use-case should go into or update an already existing example.

That would indeed make sense. Maybe the documentation could also be somewhat updated about how this stuff works internally (even if a better API for this would be added).

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 23, 2020

The esp_yield() above will be "unsuspended" because the added call to schedule_function().
That will lead to the end of the loop but will not trigger your user tickered function (delayed to 5s later).

resuming CONT (calling esp_schedule()) at registering time makes no sense when the time interval is not 0.
Would it make sense to call esp_schedule() only when time interval is 0 ?

I think both of these statements suggest that you're thinking about calling esp_schedule at the moment where you start the ticker? My suggestion would to call it inside schedule_function, which is called by Ticker after 5s, not when starting the ticker?

This API is tricky. esp_schedule() does not necessarily act at the time of calling it. This function posts an event in the underlying OS (that should be freertos hidden behind the nonosdkapi). If it was posted in CONT, this event is taken into account after esp_yield() is called; When posted from SYS, it wakes up from last already-called-yield.
Briefly, this event triggers resuming to the last esp_yield() at next context change.

So my suggestion would introduce some inconsistency then, if schedule_function would work like this and schedule_recurrent_function would not.

Historically, scheduled functions are triggerred "ASAP" meaning at next loop. They can be much delayed in some situations.
There was a need to trigger them both regularly and also before next loop, in case of too large delay. So recurrent scheduled function appeared, with polled timers, but are checked also when yield() is called. They are both polled/software only.

That would indeed make sense. Maybe the documentation could also be somewhat updated about how this stuff works internally (even if a better API for this would be added).

Someone may correct me about what's above. Indeed that should be documented. That is (probably) mainly the result of fine hacking nonos-sdk to cope with both its requirement and arduino needs (just check cont.S). This is not current maintainers work but should give insights on how to move to rtos-sdk while still keeping up with the current model.

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 2, 2020

@matthijskooijman Did we answer to the issue you opened ?

edit:

latest master requires GCC10 and I'm not sure how to get this installed.

by running cd tools; ./get.py

@matthijskooijman
Copy link
Author

@matthijskooijman Did we answer to the issue you opened ?

Well, I think the answer is "No, schedule_function() should not call esp_schedule(), since that messes up the balance", but then I think a new API (for suspending the mainloop and returning to it from elsewhere) might be useful to better support my usecase. I'm not quite sure how I'd want this to work yet, though, that needs a bit more thought. Would it be ok to leave this issue open for further discussion on this subject?

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 10, 2020

We didn't consider a call to esp_schedule() from inside the Ticker's ISR, after the user function is scheduled.

That would break the "one esp_yield() = one esp_schedule()" pseudo rule from above but it can be made possible with a clear indication from the API name:

We could add a Ticker::once_scheduled_plus_esp_schedule() (Ticker::wake_and_schedule() ? or a any meaning name) that would call esp_shedule() from under the hood.

(edit: to be clear: that would be a new method from ticker with this functionality: "schedule_function()+esp_schedule() in ISR")

@matthijskooijman
Copy link
Author

We could add a Ticker::once_scheduled_plus_esp_schedule() (Ticker::wake_and_schedule() ? or a any meaning name) that would call esp_shedule() from under the hood.

That would work, but I was thinking of (also) adding some extra API to be called from the loop() method with easier to use semantics than calling esp_yield() (maybe using some refcounting or so to ensure the yield/schedule balance is preserved). But as I said, haven't thought this through yet...

@dok-net
Copy link
Contributor

dok-net commented Apr 10, 2021

@matthijskooijman This is taken from the LowPowerDemo.ino example, and in my understanding does exactly what you wanted, all within the limits of public and documented APIs. What do you think?

Sketch source:

#include <ESP8266WiFi.h>
#include <coredecls.h>         // crc32()

[removed for readability, latest version in posting below, last version that was here is available via edit history]

Output:

Reset reason = External System
I'm awake and starting the Low Power tests
connecting to WiFi xxxxxxxxxx
IP address: 
aaa.bbb.ccc.ddd
Enter loop
Forced Light Sleep, wake with GPIO interrupt
CPU going to sleep, pull WAKE_UP_PIN low to wake it immediately (press the switch)
millis() = 8982

pushing button

Woke from Light Sleep - this is the callback
millis() = 9848
Woke from Light Sleep - this is the resumed sleep function
WiFi didn't resume, trying to connect from scratch
connecting to WiFi dok-net.1
IP address: 
192.168.178.35
wakeupPinIsr triggered times = 74098
Exit loop
Woke from Light Sleep - this was scheduled from the ISR
millis() = 13653
Woke from Light Sleep - this was scheduled from the callback

This suggests, that no, there is no need to call esp_schedule() from schedule_function(), this issue could be closed.
Caveat: I haven't yet measured electrical current.

@dok-net
Copy link
Contributor

dok-net commented Apr 12, 2021

So here's another one, with the buzzer as "proof" that at least the PWM/waveform timer gets restored. Actual effect on power consumption is pending, @Tech-TX will add that info shortly:

#include <ESP8266WiFi.h>
#include <coredecls.h>         // crc32()
[removed for readability, latest version in posting below, last version that was here is available via edit history]

@dok-net
Copy link
Contributor

dok-net commented Apr 12, 2021

@mcspr AFAIK the changes in PR #7956 don't contain any fixes to the internals of the power saving modes, only the API is being cleaned up. If that's right, please let me point out that WIFI_SHUTDOWN / WIFI_RESUME do not work with light sleep, the MCU does not return from light sleep when the pin is pulled low. As you can tell from the sketch above, WIFI_OFF and the full connect sequence do work at present.
I've pulled #7956, actually, light sleep is not entered at all by the sketch.

@Tech-TX
Copy link
Contributor

Tech-TX commented Apr 12, 2021

Current during Timed/Forced Light Sleep is roughly 0.49mA, well within the range for Light Sleep. That's for the second chunk of code. I'll test the first one presently.
edit: the first test code is running the same current.

I'm running debug level = CORE, should I disable that? It tracks WiFi connections and the sleep mode commands, but may be corrupting the output as the core tries to tell us it's going to sleep.

connected with >SSID<, channel 8
dhcp client start...
pm open,type:2 0
ip:192.168.0.23,mask:255.255.255.0,gw:192.168.0.1
IP address:
192.168.0.23
Exit loop
Woke from Light Sleep - this was scheduled from the callback
Enter loop
Forced Light Sleep, wake with GPIO interrupt
state: 5 -> 0 (0)
rm 0
pm close 7
del if0
usl
mode : null
CPU going to sleep, pull WAKE_UP_PIN low to wake it immediately (press the switch)
millis() = 307777
fpm open,type:1 0

When it wakes up, there's junk in the TX buffer, so apparently that didn't get flushed.
(can't paste the text from the monitor 'cos there's an illegal ASCII character in that junk...)
Untitled 1

Oh, THAT'S rude. I turned off debug, and got an illegal instruction as it was initially trying to connect WiFi. It only did it once, it was OK after that.

Exception 0: Illegal instruction
PC: 0x4021d29f
EXCVADDR: 0x00000000

Decoding stack results
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x402030c0: loop_task(ETSEvent*) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 213
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x402010a0: esp8266::polledTimeout::timeoutTemplate >::expiredOneShot() const at C:<source path>\cores\esp8266/PolledTimeout.h line 234
0x40201260: setup() at C:<source path>\cores\esp8266/PolledTimeout.h line 167
0x40202c41: String::changeBuffer(unsigned int) at C:<source path>\cores\esp8266\WString.cpp line 202
0x4020424c: uart_write(uart_t*, char const*, size_t) at C:<source path>\cores\esp8266\uart.cpp line 546
0x4020424c: uart_write(uart_t*, char const*, size_t) at C:<source path>\cores\esp8266\uart.cpp line 546
0x401004d9: millis() at C:<source path>\cores\esp8266\core_esp8266_wiring.cpp line 193
0x401003c4: ets_post(uint8, ETSSignal, ETSParam) at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 181
0x40201db8: ESP8266WiFiSTAClass::localIP() at C:<source path>\cores\esp8266/Printable.h line 33
0x402031eb: __yield() at C:<source path>\cores\esp8266/core_esp8266_features.h line 65
0x40201286: setup() at C:\Users\Admin\Documents\Arduino\Dirk__1/Dirk__1.ino line 116
0x40203270: loop_wrapper() at C:<source path>\cores\esp8266\core_esp8266_main.cpp line 198

BTW, turning off debug DID eliminate the random junk in the TX buffer when it came out of Light Sleep.

@dok-net
Copy link
Contributor

dok-net commented Apr 13, 2021

One can see that I switched from gpio_pin_wakeup_enable(GPIO_ID_PIN(WAKE_UP_PIN), GPIO_PIN_INTR_LOLEVEL); to attachInterrupt(WAKE_UP_PIN, wakeupPinIsr, ONLOW_WE);. I have determined that calling gpio_pin_wakeup_enable predictably alters the ISR mode to level interrupt. I believe this would be quite destructive to most IRQ-based libs and sketches that are not themselves prepared for the burst of interrupts this causes, in fact, more often than not the overhead of the ISR invocations leads to a WDT exception.
It would seem like a good idea to review which protocols are interrupt based and if they expose re-associating their ISRs on-the-fly, such that before light sleep, specialized ISRs can be attached that are called by level, defer once to the operational ISR, and re-enable that ISR in place of themselves with the proper running IRQ mode. How about HW serial, SW serial, I2C? What else?

@mcspr
Copy link
Collaborator

mcspr commented Apr 13, 2021

Not sure of the ping and the example above. Following the note about the isr, this works... as expected?

#include <Arduino.h>
#include <Ticker.h>
#include <ESP8266WiFi.h>

bool isr { false };
bool fpm { false };

void isr_wakeup() IRAM_ATTR;
void isr_wakeup() {
    isr = true;
}

void fpm_wakeup() IRAM_ATTR;
void fpm_wakeup() {
    fpm = true;
}

void setup() {
#if 1
    enableWiFiAtBootTime();
#endif
    Serial.begin(115200);

    WiFi.persistent(false);
    WiFi.mode(WIFI_OFF);

    pinMode(5, INPUT_PULLUP);
    attachInterrupt(5, isr_wakeup, ONLOW_WE);

    static Ticker sleep;
    sleep.once_ms(100, []() {
        Serial.println("opening fpm");
        wifi_fpm_set_sleep_type(LIGHT_SLEEP_T);
        wifi_fpm_open();
        wifi_fpm_set_wakeup_cb(fpm_wakeup);
        wifi_fpm_do_sleep(0xFFFFFFF);
    });
}

void loop() {
    delay(100);
    if (isr) {
        isr = false;
        Serial.println("triggered isr wakeup cb");
    }

    if (fpm) {
        fpm = false;
        Serial.println("triggered fpm wakeup cb");
    }
    Serial.print('.');
}

On setup() system is put to a halt, touching GND with GPIO5 wakes things up and starts flooding the console with dots

@dok-net
Copy link
Contributor

dok-net commented Apr 13, 2021

@mcspr Um. No. That's an obfuscation. Using a timer when timers need to be stopped. The way that the sleep timer is supposed to trigger while the delay(100) is suspended works by pure chance, delay(99) and the loop runs. Plus, don't replicate the code from my sample, let's prove it works and then implement it in core and expose an API. Let's negotiate that API.

@mcspr
Copy link
Collaborator

mcspr commented Apr 13, 2021

I have not read the example, just the comment about the ONLOW_WE mode.
Can you elaborate on why timer is an obfuscation? The code above just meant to run the fpm functions in sys, loop is irrelevant as it does not do anything useful

@dok-net
Copy link
Contributor

dok-net commented Apr 13, 2021

You don't think you code example is open to race conditions? What are you trying to prove with it, irrelevant loop, no schedule_function?

@dok-net
Copy link
Contributor

dok-net commented Apr 13, 2021

This may be a blueprint for inclusion as a light-sleep convenience API. It also attempts to show how ISRs can be switched between wakeup from sleep mode and regular working mode. There is a small window where bouncing signals on the GPIO may prevent the wakeup mode to enter correctly, in that case, the MCU sleeps depending on the timeout parameter.

#include <user_interface.h>
#include <coredecls.h>
#include <interrupts.h>

extern os_timer_t* timer_list;
namespace {
    sleep_type_t saved_sleep_type;
    os_timer_t* saved_timer_list;
    fpm_wakeup_cb saved_wakeupCb;
}

bool forcedLightSleepBegin(uint32_t duration_us = 0, fpm_wakeup_cb wakeupCb = nullptr) {
    // Setting duration to 0xFFFFFFF, it disconnects the RTC timer
    if (!duration_us || duration_us > 0xFFFFFFF) {
        duration_us = 0xFFFFFFF;
    }
    wifi_fpm_close();
    saved_sleep_type = wifi_fpm_get_sleep_type();
    wifi_fpm_set_sleep_type(LIGHT_SLEEP_T);
    wifi_fpm_open();
    saved_wakeupCb = wakeupCb;
    wifi_fpm_set_wakeup_cb([]() {
        if (saved_wakeupCb) saved_wakeupCb();
        esp_schedule();
        });
    {
        esp8266::InterruptLock lock;
        saved_timer_list = timer_list;
        timer_list = nullptr;
    }
    return wifi_fpm_do_sleep(duration_us) == 0;
}

/// The prior sleep type is restored, but only as automatic.
/// If any forced sleep mode was effective before forcedLightSleepBegin, it must be explicitly re-entered.
void forcedLightSleepEnd(bool cancel = false) {
    if (!cancel) {
#ifdef HAVE_ESP_SUSPEND
        esp_suspend();
#else
        esp_yield();  // it goes to sleep from SYS context and waits for an interrupt
#endif
    }
    {
        esp8266::InterruptLock lock;
        timer_list = saved_timer_list;
    }
    wifi_fpm_close();
    wifi_fpm_set_sleep_type(saved_sleep_type);
}

#include <Schedule.h>
#include <PolledTimeout.h>

#define WAKE_UP_PIN D3  // D3/GPIO0, can also force a serial flash upload with RESET
// you can use any GPIO for WAKE_UP_PIN except for D0/GPIO16 as it doesn't support interrupts

void IRAM_ATTR wakeupPinIsr() {
    schedule_function([]() { Serial.println("GPIO went from HI to LO"); });
}

void IRAM_ATTR wakeupPinIsrWE() {
    schedule_function([]() { Serial.println("GPIO wakeup IRQ"); });
    wakeupPinIsr();
    attachInterrupt(WAKE_UP_PIN, wakeupPinIsr, FALLING);
}

//void wakeupCallback() {
//}

void setup() {
    Serial.begin(74880);
    while (!Serial);
    delay(100);
    pinMode(LED_BUILTIN, OUTPUT);  // activity and status indicator
    digitalWrite(LED_BUILTIN, LOW);  // turn on the LED
    pinMode(WAKE_UP_PIN, INPUT_PULLUP);  // polled to advance tests, interrupt for Forced Light Sleep
    attachInterrupt(WAKE_UP_PIN, wakeupPinIsr, FALLING);
}

using oneShotYieldMs = esp8266::polledTimeout::timeoutTemplate<false, esp8266::polledTimeout::YieldPolicy::YieldOrSkip>;
oneShotYieldMs gotoSleep(2000);

void loop() {
    if (gotoSleep && forcedLightSleepBegin(10000000/*, wakeupCallback*/)) {
        // No new timers, no delay(), between forcedLightSleepBegin() and forcedLightSleepEnd().
        // Only ONLOW_WE or ONHIGH_WE interrupts work, no edge, that's an SDK or CPU limitation.
        // If the GPIO is in wakeup state while attaching the interrupt, it cannot trigger a wakeup,
        // but any sleep duration will be honored.
        bool wakeupPinIsHigh = digitalRead(WAKE_UP_PIN);
        // the GPIO might still bounce to LOW between both digital reads, disabling wakeup
        if (wakeupPinIsHigh) attachInterrupt(WAKE_UP_PIN, wakeupPinIsrWE, ONLOW_WE);
        wakeupPinIsHigh &= digitalRead(WAKE_UP_PIN);
        digitalWrite(LED_BUILTIN, HIGH);  // turn the LED off so they know the CPU isn't running
        forcedLightSleepEnd(!wakeupPinIsHigh);
        digitalWrite(LED_BUILTIN, LOW);  // turn on the LED
        if (wakeupPinIsHigh) gotoSleep.reset();
    }
}

@Tech-TX
Copy link
Contributor

Tech-TX commented Apr 13, 2021

@dok-net Works fine for me Dirk, 0.495mA when it's in Sleep, wakes on timer expire or with external interrupt.

@dok-net
Copy link
Contributor

dok-net commented Apr 14, 2021

@d-a-v d-a-v removed their assignment Apr 15, 2021
@matthijskooijman
Copy link
Author

@dok-net, thanks for your suggestions. Took me a while to find time to look at all comments, working my way through them now. It's been a while since I worked with ESP8266 code for this project, so when I say things that do not make sense, I might be working from faulty memory, so apologies in advance :-p

In the first version of your sketch (found in the edit history of #7661 (comment)), you used a delay() inside loop(). I suspect this may solve some of the problems regarding yield/schedule by internally balancing those calls, but it also means that there is a forced (an unneeded) wakeup everytime the delay ends. Also, if you want to do other things in the loop (i.e. between wakeups), that delay will add latency between the wakeup and when the loop runs again.

Your last example seems to indeed solve this by using (through forcedLightSleepEnd) esp_yield() instead, which is also what I'm doing in my code that prompted this question.

Looking at your last example, I believe that instead of letting schedule_function() call esp_schedule() (which I originally suggested, but as discussed before messes up the balance), you let the wakeup callback function call esp_schedule(). Since the sleep happens only after esp_yield(), I guess this ensures to keep the balance of yield/schedule, so this sounds like a reasonable approach. Also, you're now using a forced light sleep, rather than automatic light sleep as I was using, but if you just do repeated light sleeps (possibly with unlimited timeout) in the loop, the net effect is probably the same. In other words, having a proper force light sleep API is probably sufficient to solve my usecase.

However, I do have a few questions about your code:

  • AFAICS, the esp_schedule() call only happens in the callback passed to wifi_fpm_set_wakeup_cb(). I think I considered this as well, but the documentation for wifi_fpm_do_sleep says "fpm_wakeup_cb_func will be called after system wakes up only if the force sleep time out (wifi_fpm_do_sleep and the parameter is not 0xFFFFFFF).", which I've interpreted as "it will not be called when waking up due to GPIO". Or is that an incorrect interpretation?
  • Why the split in forcedLightSleepBegin() and forcedLightSleepEnd()? At first glance it looks like a way to prevent a race condition involving the GPIO ISR triggering just before the sleep is started, but I'm not entirely sure what happens in forcedLightSleepBegin() that must really happen before the ISR setup and GPIO reads? At best, it seems that this approach makes the window between reading the GPIO and the esp_yield() that triggers the sleep smaller, but does not close it entirely (AFAICS you can still have that the digitalReads in loop() return HIGH, so the sleep is not canceled, but the ISR triggers between the last digitalRead and the actual sleep, which then switches the ISR from ONLOW_WE to FALLING and goes to sleep without GPIO wakup enabled -> race condition).
    On AVR, you typically fix a similar race condition (where the ISR can decide there is more work to do and the sleep must be cancelled) by disabling interrupts between deciding to go to sleep and the actual sleep, which works because re-enabling interrupts just before sleeping is guaranteed to not trigger any ISRs before the sleep. No clue if any such thing is available on ESP8266, though, with the SDK layer on top.
    In my own ESP8266 code, I think I solved this issue by letting the ISR itself just switch between ONLOW_WE and ONHIGH_WE (with some extra code to only do work on falling edges), so there is a GPIO wakeup available all the time. I think this approach might also work here to close this race condition (if it actually exists). This does not prevent sleeping when the pin is high, though.
  • Why do you clear the timer_list when doing a sleep? If there are timers active, shouldn't these still be serviced (and delay the sleep or limit its duration instead)?

@dok-net
Copy link
Contributor

dok-net commented Apr 24, 2021

@matthijskooijman It's good seeing that you mostly agree with the way I've implemented things. With regard to where and when the wakeup callback does or does not get invoked, I would like to ask you to run the supplied example sketches and see for yourself - testing and reporting back from another person beats me checking my own code any time.

Clearing and restoring the timer_list was previously discussed elsewhere, I've also performed testing on it. I would like to point on that enforcement of the forced light sleep is the stated purpose of the API, so no, it's not permitted for any other timers but the RTC timer, to end the forced light sleep. In fact, it would only happen as a race condition anyway, given how these timers cannot trigger when light sleep is truly engaged, stopping the CPU and in consequence the responsible timer clock for them.

Now, concerning your observations about forcedLightSleepBegin(), forcedLightSleepEnd() and the example ForcedLightSleep.ino. The example is synthetic, no doubt. I take it that your concerns are about the example code, less so about the implementation of the new API functions. My adhoc code for toggling between level and edge triggering may well be imperfect, just outlining a general possible approach to the issue that's not completely thought out. Disabling interrupts as you suggest would IMHO not be a perfect solution, as I expect entering the idle task with disabled interrupts would defeat at minimum any possible wakeup by interrupt - so you would be able to make the window of opportunity only smaller for an interrupt that occurs on the wakeup GPIO before force sleep is engaged, but not close it. I think all the Espressif examples suffer this. If the callback actually occurs for wakeup from GPIO, that is not an appropriate place to switch to the edge triggering instead, if I recall correctly, the level-trigger IRQ almost completely hogs the MCU if left attached while the level doesn't change.
Please, by all means, run the code, and report any findings based on it and possibly suggest better alternatives.

@dok-net
Copy link
Contributor

dok-net commented Apr 24, 2021

I now see that my comment // the GPIO might still bounce to LOW between both digital reads, disabling wakeup reveals that I knew this example's solution was imperfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants