Skip to content

[Schedule Commands] Only schedule commands not running Arduino stack#1838

Closed
TD-er wants to merge 2 commits intoletscontrolit:megafrom
TD-er:bugfix/schedule_commands_based_on_stack
Closed

[Schedule Commands] Only schedule commands not running Arduino stack#1838
TD-er wants to merge 2 commits intoletscontrolit:megafrom
TD-er:bugfix/schedule_commands_based_on_stack

Conversation

@TD-er
Copy link
Copy Markdown
Member

@TD-er TD-er commented Oct 1, 2018

Since command scheduling may be calling yield, move commands to the queue only when called from a callback function.

Since command scheduling may be calling `yield`, move commands to the queue only when called from a callback function.
@s0170071
Copy link
Copy Markdown
Contributor

s0170071 commented Oct 2, 2018

I would like to understand what you did here but I just don't get it. Can you please add a sentence or two to help me out here ? I try to keep up to speed on stability issues ;-)

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Oct 2, 2018

The old implementation executed the command immediately (without placing it in the queue) when called from the web interface. That would make sense, because then you get the output also shown in the browser.

2 weeks ago, I changed all (back) to run immediately, but that led to reported crashes.
So this is my new attempt (and I am happy you're also looking into this before it will be merged) to make the commands execute immediately.
There is however a number of commands that would call either delay or yield and maybe called from a callback function.

Since I'm looking into how the stack(s) work in ESPxx, I found this function to determine if it is called from system stack or Arduino stack. (meaning can you call 'yield' and other functions only available in Arduino stack)

Now to what's this PR about...
The queue handler is called from Arduino stack, so if for some reason a command is issued from a callback function (not sure yet when that will happen), it will be re-scheduled to be handled from the Arduino stack.

So I will make a test build from this PR, so anyone can test and hopefully not crash the nodes.
It is meant to make the commands run immediately again, since rules depend on some order of actions.

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Oct 2, 2018

Related issues:
#1828
#1712
#1759
#1625

And maybe related:
#1709
#1662
#1683
#1647

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Oct 2, 2018

I added a fix for SendToHTTP timeouts to this PR. (see #1830 )
I will later look into other uses of WiFiClient and think about how to handle the timeouts.
Maybe we should also add a client/stream timeout to each controller setting, since timeout definitions are also depending on the set host. (if it is a slow server for example)

The default timeout of Stream is 1000 msec, which may be a little too long.
I guess 100 msec is more than enough for most and in a local LAN, it could be even less.

@sakinit
Copy link
Copy Markdown
Contributor

sakinit commented Oct 2, 2018

I also don't get it. I think it is best to discern the ESP8266 from the ESP32 to get things clear.
On the ESP8266 isn't all code called from setup() and loop() using the Arduino stack? I don't see that you have to use canYield() within that code.

The ESP32 is a different story. But in that context talking about an Arduino and System stack is unclear to me. In that context, each created task (e.g. created by xTaskCreatePinnedToCore()) has its own stack. In this context you get task-to-task communication and one of the methods to isolate code to one environment is to push commands on a queue and let only one tasks handle the commands. Locking should be used when queue handling is involved.

@sakinit
Copy link
Copy Markdown
Contributor

sakinit commented Oct 2, 2018

In the ESP32 environment you can also compare the active task with the task that should run the code and decide to run or 'schedule' the code but I rather prefer the previous method mentioned.

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Oct 2, 2018

You can call code from other stack than from the Arduino stack.
That's the whole reason I tried to move it to a scheduler. (see #1574 and #1625 )

When you run code from a callback function, it is not within the scope of setup or loop.
You then end up with crashes.
Example of such an error:
Panic C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\ hardware\esp8266\2.4.1\cores\esp8266\core_esp8266_main.cpp:99 __yield

This is not ESP32 related.
And like I said, I did remove the command handling from the scheduler before which resulted in a number of 'crash reports'. So I really would like to know what is causing these crashes and to rule out the yield, I am now trying to use that as a filter to run commands from the scheduler.
When commands are originating from a callback, the result is not likely to be shown somewhere, so some async execution is not a problem. (that's what I also thought about commands from rules, but that appeared to be a problem :( )
Either way, if this filter will cause command execution to be moved to the scheduler, it would have gone wrong when executed, since a lot of commands directly or indirectly call delay or yield.

Also don't confuse the idea of the scheduler I am now talking about with the RTOS scheduler.
The scheduler used in ESPeasy is something I wrote to make sure everything is executed in the right order and preferably at the right moment/interval.

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Oct 3, 2018

Did anybody test this PR?
I would rather not have to revert it -again- if it is still failing
Normal 4096 build for testing.
I will later today add other calls to setTimeout() for other instances of WiFiclient. (not yet included)

@clumsy-stefan
Copy link
Copy Markdown
Contributor

@TD-er: I'm just compiling and flashing a few units with this code... will report if I find issues with it!

@clumsy-stefan
Copy link
Copy Markdown
Contributor

unfortunately no luck... sending a comand (event) via controller or on the web-console immediately triggers a software watchdog (see below).... can't currently debug more as I'm remote...
please note: it's self compiled...

however using latest mega-commit from this morning (b1c777e) works without issues (also self compiled)...

Free Mem | 15192 (13464 - sendContentBlocking)
Free Stack | 3552 (1508 - parseTemplate)
Boot | Manual reboot (3)
Reset Reason | Software Watchdog

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Oct 3, 2018

@clumsy-stefan do you have an example of such a command?

@clumsy-stefan
Copy link
Copy Markdown
Contributor

Event,ch1=1
With a rule that checks for this event and sets a gpio accordingly.

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Oct 3, 2018

Can you give the corresponding rules lines?
Since simply sending an event is not really triggering a crash here.

@clumsy-stefan
Copy link
Copy Markdown
Contributor

clumsy-stefan commented Oct 3, 2018

sure, sorry:

on ch1#switch=0 do 
  gpio,12,1 
endon 
on ch1#switch=1 do 
  gpio,12,0 
endon 
on ch1 do 
  inputswitchstate,4,%eventvalue% 
  TaskRun,5 
endon

where Task Nr. 5 is labelled "ch1" with a value of "switch" configured as "push button active low" on gpio-1

PS: sorry, I can't get the lines to format correctly in here..

@artua
Copy link
Copy Markdown

artua commented Oct 8, 2018

Hello,
I have an issue of the Dummy device value not set immediately ONLY if I have OpenHAB MQTT enabled. Is this the same issue or I should post another one?

Rules:

on SW2#On=1 do
  TaskValueSet 5,2,1
 if [touch#run2]=1
// run timer only if dummy device set
  timerSet,2,1
 endif
endon

Log file (MQTT enabled):

482314541: EVENT: SW2#On=1.00
482314550: ACT  : TaskValueSet 5,2,1
482314576: Lev.1: [if 0.00=1]=false
482314583: EVENT: SW2#On=1.00 Processing time:42 milliSeconds
482314592: MQTT : /L_Kitchen/SW2/On 1
482314634: Command: taskvalueset

Log file (MQTT disabled):

482453482: EVENT: SW2#On=1.00
482453491: ACT : TaskValueSet 5,2,1
482453517: Lev.1: [if 1.00=1]=true
482453520: ACT : timerSet,2,1
482453532: EVENT: SW2#On=1.00 Processing time:50 milliSeconds
482453543: Command: taskvalueset
482453545: Command: timerset

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Oct 8, 2018

@artua at first glance it looks to be related to the commands still running from the queue.
I really must find out why reverting that change is causing crashes.

@uzi18
Copy link
Copy Markdown
Contributor

uzi18 commented Nov 7, 2018

@TD-er it is good idea to only schedule network related commands, but before schedule parse strings to change all data to actual at moment of schedule,what do you think?

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Nov 7, 2018

@uzi18 I am not really sure.
The main problem I didn't think of when moving the commands to the scheduler, is that calling a command from within a rule will no longer guarantee it has been performed.
So the flow of a rule has changed.

If we move it back to how it was (without running via the scheduler), it will cause resets. (no idea why, stack overflow???)
Also, if we only run network related commands via the scheduler, I don't know if that will also result in counter-intuitive response, where commands may not yet be executed.

@artua
Copy link
Copy Markdown

artua commented Nov 8, 2018 via email

@Grovkillen
Copy link
Copy Markdown
Member

@artua are you sure they are slow? I would think they are "too fast" since the will trigger simultaneously.

@giig1967g
Copy link
Copy Markdown
Contributor

@artua are you sure they are slow? I would think they are "too fast" since the will trigger simultaneously.

Hi @Grovkillen
what do you men that they are too fast?
Please elaborate so I can fine tune the parameters according to the user experience.

@Grovkillen
Copy link
Copy Markdown
Member

If the commands are part of the scheduler they will trigger almost simultaneously which is not how the rules used to work. They were sequential then.

I actually love the new way of handling rules but many users do not grasp that aspect of rules. So we have discussed having both sequential and simultaneous execution of commands. So it's not actually something for this PR but a complete over haul of how the rules engine should work.

@giig1967g
Copy link
Copy Markdown
Contributor

If the commands are part of the scheduler they will trigger almost simultaneously which is not how the rules used to work. They were sequential then.

I actually love the new way of handling rules but many users do not grasp that aspect of rules. So we have discussed having both sequential and simultaneous execution of commands. So it's not actually something for this PR but a complete over haul of how the rules engine should work.

Sorry, I thought you were talking about the speed of double click and longpress.

@artua: can you elaborate regarding your experience with doubleclick and longpress? It's useful for me to tweak the parameters.

@artua
Copy link
Copy Markdown

artua commented Nov 8, 2018 via email

@giig1967g
Copy link
Copy Markdown
Contributor

See my first message. That’s part of the code to make a short/long press logics. It didn’t work as expected. The dummy device var didn’t set before processing next line so the timer didn’t run.

I understand.
You could try for now the embedded doubleclick/longpress logic for switches.
Enable them in the Switch options and change the rule as:

on SW2#On=10 do
  //longpress actions
endon
on SW2#On=3 do
  //doubleclick actions
endon

@uzi18
Copy link
Copy Markdown
Contributor

uzi18 commented Nov 9, 2018

@TD-er did You checked if you got crash also on ESP32 where more memory is available?
It is better to fix rules to work realible.
If You cant send data by controller from whatever reason it is not possible to react with rules, default queue policy in controller is drop something - new or old one.
Question is how to know if command must be queued or executed?
Some are interpreted and executed by plugins and some with ESPEasy core.

@TD-er
Copy link
Copy Markdown
Member Author

TD-er commented Nov 9, 2018

@uzi18 An event has an origin, so we can use that to decide how to execute it.
I already check for the webserver as source (run command from web page) to skip the scheduler.
But when I redirect all to execute immediately, it will crash.

I did notice yesterday, it is hardly doing any checks on UDP data and from there you may also execute commands.
So maybe we should test again by setting the (new) origin "rules" and also execute directly, just to test how that will work out.

@artua
Copy link
Copy Markdown

artua commented Nov 18, 2018

@giig1967g the embedded double/long click works, but I receive two events anyway:

280968: SW : GPIO=0 State=0 Output value=1
280971: EVENT: SW1#Switch=1.00
282068: SW : LongPress: GPIO= 0 State=0 Output value=11
282070: EVENT: SW1#Switch=11.00
283181: SW : GPIO=0 State=1 Output value=0
283184: EVENT: SW1#Switch=0.00
284268: SW : LongPress: GPIO= 0 State=1 Output value=10
284270: EVENT: SW1#Switch=10.00

But I need to drive two different scenarios separately (short press and long press), so I need to go old way and set the values of Dummy device and then check them on Switch off. But the values checking still broken (I think) because of command queue.. Any ideas how to deal with that new short/long press without setting/checking values?

BTW, TaskValueSet seems to be broken, no effect on

On SW1#On=11
TaskValueSet 5,1,1
endon

@TD-er

@giig1967g
Copy link
Copy Markdown
Contributor

@artua:

BTW, TaskValueSet seems to be broken, no effect on

On SW1#On=11
TaskValueSet 5,1,1
endon

I think it should be:

On SW1#Switch=11
TaskValueSet 5,1,1
endon

Regarding the longpress, please tell me how you would like it.
As fas as I understand, you would prefer to receive the event on button release instead of button press. Correct?

@uzi18
Copy link
Copy Markdown
Contributor

uzi18 commented Nov 18, 2018

On SW1#On=0 do
TaskValueSet 5,2,2
endon

@artua
Copy link
Copy Markdown

artua commented Nov 18, 2018

Ok, now it's working again.

On SW1#On=11 do
 TaskValueSet 5,1,1
 event,sw_long
endon

On SW1#On=0 do
 if [touch#run1]=0
  event,sw_short
 else
  TaskValueSet 5,1,0
 endif
endon

Thanks everyone!

But I don't think the same can be done with double click.. Any ideas?

@TD-er TD-er closed this Nov 19, 2018
@TD-er TD-er deleted the bugfix/schedule_commands_based_on_stack branch November 19, 2018 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants