Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P016] DecodeType UNKNOWN was not on the web interface (decodetypes UNKNOWN and UNUSED were mixed up!) #4907

Merged
merged 31 commits into from
Feb 11, 2024

Conversation

uwekaditz
Copy link
Contributor

NEW: For using any IR message in the [P016] Command map the protocol RAW needs to be set
FIX: uint64ToString() in a debug message in PLUGIN_TEN_PER_SECOND can not handle decode_type_t::UNKNOWN (-1)
@tonhuisman I'm not sure if DECODE_HASH should get its own protocol?

src/_P016_IR.ino Outdated
}
else {
Log += F("\' type: 0x");
Log += uint64ToString(results.decode_type);
Copy link
Member

@TD-er TD-er Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have a base argument here, so it isn't a proper hex value and thus the leading 0x in the string is a bit misleading to say the least.

In ESPEasy src/src/Helpers/StringConverter_Numerical.h there are ull2String and ll2String to print uint64_t and int64_t respectively.

Also IMHO there should be no negative hex values, so better not mark it as a hex formatted value (leading 0x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@tonhuisman
Copy link
Contributor

tonhuisman commented Dec 29, 2023

@tonhuisman I'm not sure if DECODE_HASH should get its own protocol?

Hm, though I've done quite some work on this plugin and the IR library, I'm absolutely no expert on the matter, so that question should probably be redirected to the maintainers of the IRremoteESP8266 library.

@uwekaditz
Copy link
Contributor Author

@TD-er and @tonhuisman:
After the last two merges any command that is an oledframedcmd results with
line 226 ExecuteCommand_all(EventValueSource::Enum::VALUE_SOURCE_SYSTEM, CommandLines[i].Command);
in P016_data_struct::ExecuteCode() in an exception and a restart of the whole system.
I didn't change anything and before merging everything run fine.
Note:
The function P016_data_struct::ExecuteCode() is called within the PLUGIN_TEN_PER_SECOND but it was no problem before.

I have already tried

  • String(CommandLines[i].Command).c_str() in the function call -> still exception
  • checked that all commands used in my [P016] setting are correctly terminated by a zero (0x00)
  • running another command (e.g. TaskValueSet,Temp,TempA,1.0) -> runs fine
  • using a different EventValueSource (e.g. EventValueSource::Enum::VALUE_SOURCE_SERIAL)-> still exception
  • tested each oledframedcmd used in my [P016] setting in Tools->Command and Submit -> runs fine

What is wrong?

@tonhuisman
Copy link
Contributor

line 226 ExecuteCommand_all(EventValueSource::Enum::VALUE_SOURCE_SYSTEM, CommandLines[i].Command);
in P016_data_struct::ExecuteCode() in an exception and a restart of the whole system.

Can you get a stacktrace of that exception? (When using VSCode/PIO, using the Monitor feature, it will try to catch that when the unit crashes, though it may take a few attempts for the monitor to actually grab the stacktrace...)

@tonhuisman
Copy link
Contributor

There seem to be 2 potential issues:

  • The parsing of the command doesn't decently handle an empty command (spaces only, f.e.) (fixed by TD-er PR, you can cherry-pick that)
  • The plugin code doesn't a) check the vector size when checking for commands, and b) if the command is really not empty

Size check could be fixed by this change: (line 222 of P016_data_struct.cpp)

      if ((i < CommandLines.size()) && (CommandLines[i].Command[0] != 0)) {

The empty command check could be done at saving the commands, didn't try to fix that yet

@uwekaditz
Copy link
Contributor Author

I did the changes @tonhuisman proposed and merged the latest mega (Fix crash in command arg parsing with nullptr string) into this PR.
The system stll crashes if any oledframedcmd is execute by receiving a defined IR code.

For me it looks like a stack overflow because the crash dump not even fits in the VS terminal window.
I captured one crash, but I'm not sure if the decoding started at the beginning of the dump because I can't see the beginning.
Here is also the result after building

RAM:   [======    ]  56.8% (used 46536 bytes from 81920 bytes)
Flash: [========= ]  90.7% (used 947056 bytes from 1044464 bytes)

Crash dump.txt

@tonhuisman
Copy link
Contributor

For me it looks like a stack overflow because the crash dump not even fits in the VS terminal window.

That's a very plausible cause, as both P016 and P036 are using quite a lot of memory and stack.
One way of reducing the size of P016 is to copy CustomIR-sample.h to CustomIR.h (this file is in .gitignore, so you can't commit it by accident), disable any unneeded IR protocols and create a Custom IR build. That could improve things.

@uwekaditz
Copy link
Contributor Author

For me it looks like a stack overflow because the crash dump not even fits in the VS terminal window.

That's a very plausible cause, as both P016 and P036 are using quite a lot of memory and stack. One way of reducing the size of P016 is to copy CustomIR-sample.h to CustomIR.h (this file is in .gitignore, so you can't commit it by accident), disable any unneeded IR protocols and create a Custom IR build. That could improve things.

I always have my own CustomIR.h, only DECODE_HASH and DECODE_NEC are set.
CustomIR.h.txt

@tonhuisman
Copy link
Contributor

Would it be possible for you to test this on an ESP32, to determine if it's a code-issue or 'simply' stack related?

@tonhuisman
Copy link
Contributor

Is your oledframedcmd command directly executed from the IR plugin, or have you moved it to rules, by using event,thisIRevent as an intermediate?

@TD-er
Copy link
Member

TD-er commented Jan 18, 2024

For me it looks like a stack overflow because the crash dump not even fits in the VS terminal window.

That's a very plausible cause, as both P016 and P036 are using quite a lot of memory and stack. One way of reducing the size of P016 is to copy CustomIR-sample.h to CustomIR.h (this file is in .gitignore, so you can't commit it by accident), disable any unneeded IR protocols and create a Custom IR build. That could improve things.

I always have my own CustomIR.h, only DECODE_HASH and DECODE_NEC are set. CustomIR.h.txt

This file doesn't have P036 enabled?

Which command do you give, how it is it being processed? (e.g received via MQTT, from rules, from HTTP-GET, via serial?)

@uwekaditz
Copy link
Contributor Author

For me it looks like a stack overflow because the crash dump not even fits in the VS terminal window.

That's a very plausible cause, as both P016 and P036 are using quite a lot of memory and stack. One way of reducing the size of P016 is to copy CustomIR-sample.h to CustomIR.h (this file is in .gitignore, so you can't commit it by accident), disable any unneeded IR protocols and create a Custom IR build. That could improve things.

I always have my own CustomIR.h, only DECODE_HASH and DECODE_NEC are set. CustomIR.h.txt

This file doesn't have P036 enabled?

Which command do you give, how it is it being processed? (e.g received via MQTT, from rules, from HTTP-GET, via serial?)

The P036 is enabled in my custom.h.

As I mentioned above the oledframedcmd is processed by
ExecuteCommand_all(EventValueSource::Enum::VALUE_SOURCE_SYSTEM, CommandLines[i].Command);
in P016_data_struct::ExecuteCode() (line 226).
The function P016_data_struct::ExecuteCode() is called within the PLUGIN_TEN_PER_SECOND but it was no problem before the merges.

@uwekaditz
Copy link
Contributor Author

Is your oledframedcmd command directly executed from the IR plugin, or have you moved it to rules, by using event,thisIRevent as an intermediate?

As I mentioned above the oledframedcmd is processed by
ExecuteCommand_all(EventValueSource::Enum::VALUE_SOURCE_SYSTEM, CommandLines[i].Command);
in P016_data_struct::ExecuteCode() (line 226).
The function P016_data_struct::ExecuteCode() is called within the PLUGIN_TEN_PER_SECOND but it was no problem before the merges.

@TD-er
Copy link
Member

TD-er commented Jan 19, 2024

In that function, there is a stack allocated object, which is relatively large:

case PLUGIN_TEN_PER_SECOND:
    {
      decode_results results;

This is 80 bytes.

But anyway, what is the origin of the oledframedcmd command?
Where does it come from?
Do you trigger events in the rules when handling the event?

@uwekaditz
Copy link
Contributor Author

I also have also a modiefied section in platformio_esp82xx_envs.ini to enable custom.h:

[env:custom_IR_ESP8266_4M1M]
extends                   = esp8266_4M1M
platform                  = ${ir.platform}
platform_packages         = ${ir.platform_packages}
build_flags               = ${ir.build_flags} 
                            ${esp8266_4M1M.build_flags} 
                            -DPLUGIN_BUILD_CUSTOM
; commented                           -DPLUGIN_BUILD_IR
; following two line added
                            -D_IR_ENABLE_DEFAULT_=false
                            -DUSE_IR
lib_ignore                = ESP32_ping
                            ESP32WebServer
                            ServoESP32
                            ESP32HTTPUpdateServer
                            adafruit/Adafruit GFX Library@^1.11.1
                            LOLIN_EPD
                            Adafruit ILI9341 ESPEasy
                            adafruit/Adafruit BusIO
                            Adafruit NeoPixel
                            NeoPixelBus_wrapper
                            NeoPixelBus by Makuna
                            Adafruit NeoMatrix via NeoPixelBus
                            Adafruit Motor Shield V2 Library
                            Adafruit_ST77xx
                            Adafruit NeoMatrix
                            I2C AXP192 Power management
                            EspSoftwareSerial
; following line added
                            HeatpumpIR
extra_scripts             = pre:tools/pio/pre_custom_esp82xx_IR.py
                            ${extra_scripts_esp8266.extra_scripts}
                            pre:tools/pio/ir_build_check.py

@uwekaditz
Copy link
Contributor Author

In that function, there is a stack allocated object, which is relatively large:

case PLUGIN_TEN_PER_SECOND:
    {
      decode_results results;

This is 80 bytes.

But anyway, what is the origin of the oledframedcmd command? Where does it come from? Do you trigger events in the rules when handling the event?

The oledframe commands are defined inside the P016 settings (see jpg)
settings

@TD-er
Copy link
Member

TD-er commented Jan 19, 2024

OK, and it is when using this exact pull request?

@uwekaditz
Copy link
Contributor Author

Yes, it is except my custom.h

@TD-er
Copy link
Member

TD-er commented Jan 19, 2024

I will have a look at your code to see if things can be improved or maybe something a-typical is happening when receiving IR commands.

Copy link
Contributor Author

@uwekaditz uwekaditz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo error in the first commit
@TD-er I'm not sure if I should merge your branch, too.
Therefore I reverted the merge.

@TD-er
Copy link
Member

TD-er commented Jan 22, 2024

Typo error in the first commit @TD-er I'm not sure if I should merge your branch, too. Therefore I reverted the merge.

Ehh now you really lost me...
I spent all evening/night yesterday in reducing the stack scope of P016 and today about 6 hours fixing out-of-bound issues on the OLED display library and now those must be reverted?

Why?????

@TD-er
Copy link
Member

TD-er commented Jan 22, 2024

I found the first commit in your branch feature/ESP32_IDF_5_1 that causes the crashes. It is from 12.11.2023 18:22 marked as

[ESP8266 Setup] Fix bootloop during initial setup Custom ESP8266 build.

The changes of this commit are quit simple:

  • Adding a global boolean in _Plugin_init.cpp
    bool _Plugin_init_setupDone = false;
  • and adding a local boolean in EspEasy_Console_t::reInit()
    bool somethingChanged = false;

So you mean that with this commit it all became unstable?

d4c7252

src/_P016_IR.ino Outdated
log += description;
addLogMove(LOG_LEVEL_INFO, log);
}
addLogMove(LOG_LEVEL_INFO, strformat(F("AC State: %s"), description));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails compilation as you have to pass the .c_str() pointer for strformat to handle in the %s parameter.

@uwekaditz
Copy link
Contributor Author

uwekaditz commented Jan 23, 2024 via email

@uwekaditz
Copy link
Contributor Author

uwekaditz commented Jan 23, 2024 via email

@uwekaditz
Copy link
Contributor Author

uwekaditz commented Jan 23, 2024 via email

@TD-er
Copy link
Member

TD-er commented Jan 23, 2024

I already split the P036 code to another PR, so I guess we can keep this PR just about the P016.

I'm looking into the code a bit more and I'm still not 100% convinced it is a stack overflow.
But if it is, then we should make a separate "command queue" to be handled separately.
Just like the event queue.

When receiving commands from some external source, which does not need an immediate reply, then those could be executed in the next loop without any issue. So I guess we should add some queue for at least the P016 and maybe some others too.

If you could test this PR (with only the P036 code) to make sure I didn't break anything, then we can merge it and then merge it into this PR so we essentially move this PR to be on top of the other code changes.

@uwekaditz
Copy link
Contributor Author

uwekaditz commented Jan 23, 2024 via email

@TD-er
Copy link
Member

TD-er commented Jan 23, 2024

@uwekaditz
I added a separate PR to allow to queue the commands to run from a lower position on the stack.
See: #4947

uwekaditz and others added 2 commits January 23, 2024 19:52
Use the new property addToQueue in ExecuteCommand_all() due to the lack of resources
Using strformat() for the debug messages
Heap and memory can be reported (P016_CHECK_HEAP)
@uwekaditz
Copy link
Contributor Author

@TD-er Thank you very much 👋
Using the new property addToQueue in ExecuteCommand_all() works very well.

  • I tested all my [P016] setting using the oledframedcmd.
  • The system is also stable if the web interface is open.
  • Even saving the P016 settings on the devices page did not crash anymore.

src/_P016_IR.ino Outdated
Comment on lines 682 to 690
# ifdef P016_CHECK_HEAP
if (loglevelActiveFor(LOG_LEVEL_INFO)) {
addLogMove(LOG_LEVEL_INFO, strformat(
F("Before decode: FreeMem: %d FreeStack:%d / After: FreeMem: %d FreeStack:%d"),
fMem, fFreeStack,
FreeMem(), getCurrentFreeStack()));
}
# endif // ifdef P016_CHECK_HEAP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not indenting the #ifdef's, and not formatting the source using Uncrustify, horribly breaks the indentation guides (lines) in modern editors like VSCode... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn' uncrustify my changes. 😭
On the other hand it is much easier for me to read the code if the #ifdef's start at the beginning of the line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to read the code if the #ifdef's start at the beginning of the line.

Nope, not for me, that totally breaks the normal flow of the code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the Uncrustify function in VScode, but position of the #ifdef's were not changed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not configured in the settings for Uncrustify 🥲 and I haven't found the courage to learn yet another complicated config...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means it has to be done manually?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I usually do, as I like my sourcecode prettified 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope you enjoy my code now too 😃

The position of the #ifdef's were not changed!
@TD-er
Copy link
Member

TD-er commented Jan 23, 2024

@TD-er Thank you very much 👋 Using the new property addToQueue in ExecuteCommand_all() works very well.

  • I tested all my [P016] setting using the oledframedcmd.
  • The system is also stable if the web interface is open.
  • Even saving the P016 settings on the devices page did not crash anymore.

That's good to hear :)

The position of the #ifdef's were not changed!
@uwekaditz
Copy link
Contributor Author

@tonhuisman I'm not sure if DECODE_HASH should get its own protocol?

Hm, though I've done quite some work on this plugin and the IR library, I'm absolutely no expert on the matter, so that question should probably be redirected to the maintainers of the IRremoteESP8266 library.

I created an issue about the protocol type for DECODE_HASH, see crankyoldgit/IRremoteESP8266#2057

@uwekaditz
Copy link
Contributor Author

@tonhuisman I'm not sure if DECODE_HASH should get its own protocol?

Hm, though I've done quite some work on this plugin and the IR library, I'm absolutely no expert on the matter, so that question should probably be redirected to the maintainers of the IRremoteESP8266 library.

I created an issue about the protocol type for DECODE_HASH, see crankyoldgit/IRremoteESP8266#2057

The maintainers of the IRremoteESP8266 library do suggest not to use DECODE_HASH at all because it is not sure if the same key on an IR remote will always issue the same code (timing should be a big issue!).
I checked it on a cheap, noname Chinese IR remote used for Christmas illumination.
Over more than one year I get exactly the same code as hash. Even on a second IR of the same type I get the identical code.
I'm going to change this PR to use decode type UNKNOWN as the result of the hash decoding as it is defined in the IRremoteESP8266 library.

Up to now, in the P016 settings decode type UNKNOWN is also used to mark a valid command that should not be used.
This I'm going to change.

BUG: Decode types UNKNOWN and UNUSED were mixed up
CHG: Reverted changes for this PR in \lib\IRremoteESP8266\src\IRtext.cpp
CHG: Workaround for decode type UNKNOWN is not necessary
CHG: Initalisation of the variables with decode type UNUSED (=0)
@uwekaditz uwekaditz changed the title [P016] Add protocol RAW to UI if 'Accept DecodeType UNKNOWN' is set [P016] DecodeType UNKNOWN was not on the web interface (decodetypes UNKNOWN and UNUSED were mixed up!) Jan 26, 2024
@uwekaditz
Copy link
Contributor Author

At the end it was so simpel 😆

@TD-er TD-er merged commit 4463cfa into letscontrolit:mega Feb 11, 2024
168 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants