Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuleEngine tries to enable UI rules before automation addons have started #3272

Closed
florian-h05 opened this issue Dec 27, 2022 · 29 comments · Fixed by #3275
Closed

RuleEngine tries to enable UI rules before automation addons have started #3272

florian-h05 opened this issue Dec 27, 2022 · 29 comments · Fixed by #3275
Assignees

Comments

@florian-h05
Copy link
Contributor

Description

When openHAB starts up, the Rule Engine tries to load and activate UI rules, but fails with

[ERROR] [ipt.internal.ScriptEngineManagerImpl] - ScriptEngine for language 'application/javascript;version=ECMAScript-2021' could not be found for identifier: 90826c44-30e5-4e92-badc-f3a1f8f56f6f

This problem seems to already exist in openHAB 3.4, but it was discovered on the openHAB 4 SNAPSHOTS.

See https://community.openhab.org/t/openhab-4-0-snapshot-discussion/142322/26?u=florian-h05 for NashornJS on openHAB 4.
See https://community.openhab.org/t/js-scripting-performance-problems-compared-to-nashhorn/142568?u=florian-h05 for GraalJS on openHAB 3.4.

Steps To Reproduce

On openHAB 4.0:

  1. Install the JS Scripting (Nashorn) addon
  2. Create a UI rule with a system start level trigger using NashornJS
  3. Restart openHAB
  4. During openHAB startup, you'll watch such an error message

Expected behaviour

The Rule Engine should wait for availability of the ScriptEngines provided by automation addons such as NashornJS or GraalJS before attempting to load or run a UI rule.

/cc @J-N-K

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/js-scripting-performance-problems-compared-to-nashhorn/142568/4

@J-N-K
Copy link
Member

J-N-K commented Dec 27, 2022

But that is expected behavior and IMO correct. You can create rules with more than one action, one in JS, one in DSL and one ItemCommandAction. If a script engine cannot be created, execution fails and that is logged. There is no way to detect if the scripting add-on is just not started or not installed.

a bug would be if that results in OH lock-up (like reported in the forum thread). I can‘t reproduce that.

@florian-h05
Copy link
Contributor Author

The ScriptEngine creation fails because the addon is not loaded yet.
That`s because the rule engine is enabled before the automation addons are enabled.
IMO this is a bug, either the start level for the rule engine is too low or the start level for automation addons is too high.
If I rethink this, I would say that the automation addons start to late, they should have a lower start level to start before the rule engine.

@J-N-K
Copy link
Member

J-N-K commented Dec 27, 2022

But as I said: in an OSGi environment there is no guaranteed bundle loading/activation order (except dependencies). But since scripting add-ons are optional we can‘t know which are installed but not yet loaded (or even failing to load).

Anyway, I‘ll have a look if we can improve the situation. But I fail to see how that would result in a total lock-up.

@J-N-K J-N-K self-assigned this Dec 27, 2022
@florian-h05
Copy link
Contributor Author

Okay thanks.
I think we are misunderstanting us a bit:

But I fail to see how that would result in a total lock-up.

This issue is not about the big problem in the community post where openHAB stopped working.
This issue is only about the log message and the problem related to this, that start level triggers don't work.

@florian-h05
Copy link
Contributor Author

IMO it's just a decision whether the Rule Engine starts later or the automation addons start earlier.

@J-N-K
Copy link
Member

J-N-K commented Dec 27, 2022

I can't reproduce it with the JS scripting add-on installed (maybe my test setup is too small to create significant delays in bundle-loading, but I do see the messages when I uninstall JS scripting. They stop immediately when I install the add-on, so it seems to work as expected.

We can't control which bundle is loaded first if we don't know which bundles need to be loaded. If we know that the rule engine needs language 1 and language 2, then we can make it wait for both to be activated (by using ready markers).

But that is not the case here: The rule engine can work even if no language is installed at all: there are actions that do not depend on scripting.

Besides the log messages (which may be a bit too high with ERROR): is there anything not working as expected?

@florian-h05
Copy link
Contributor Author

I can only sometimes reproduce on my test setup using the following rule:

configuration: {}
triggers:
  - id: "2"
    configuration:
      startlevel: 70
    type: core.SystemStartlevelTrigger
conditions: []
actions:
  - inputs: {}
    id: "1"
    configuration:
      blockSource: <xml xmlns="https://developers.google.com/blockly/xml"><block
        type="oh_print" id="rnN_-b6n(g|{_$sj:Q0;" x="144" y="161"><value
        name="message"><shadow type="text" id="kV1XotX;Xt%!N?vyRh`i"><field
        name="TEXT">Hello world</field></shadow></value></block></xml>
      type: application/javascript;version=ECMAScript-2021
      script: |
        print('Hello world');
    type: script.ScriptAction

I can't reproduce it with the JS scripting add-on installed (maybe my test setup is too small to create significant delays in bundle-loading, but I do see the messages when I uninstall JS scripting. They stop immediately when I install the add-on, so it seems to work as expected.

Yes, I think bundle loading is just too fast on both our test systems, but this issue will definitely occur, because rule engine starts at start level 50 and automation addons have start level 80 (see https://github.com/openhab/openhab-addons/blob/a4a8d5d85f4d3d5ec148bf036255454dbf0a41f0/bundles/org.openhab.automation.jsscripting/src/main/feature/feature.xml#L8).

But that is not the case here: The rule engine can work even if no language is installed at all: there are actions that do not depend on scripting.

I undertstand that.

Besides the log messages (which may be a bit too high with ERROR): is there anything not working as expected?

Start level triggers simply don't work for file based rules and you can't use scripts as actions for UI rules with start level triggers.

WDYT about changing the start level of automation addons?

@J-N-K
Copy link
Member

J-N-K commented Dec 27, 2022

Don't confuse openHAB start levels with OSGi start levels.

The start-levels in the feature.xml are OSGi start levels. Except some special bundles they start at level 80 (that is configured either in the feature or as global default in etc/confiog.properties). Since the add-ons depend on the org.openhab.core.automation` bundle, they cannot be started earlier than that bundle.

The openHAB start-level is totally independent from that and depend on the availability of ready markers that are set by openHAB components when a certain condition is fulfilled. If such a start level is not reached, this does not prevent the bundle from starting, it may only prevent execution of some parts of code.

You can see the effect of that if you look at the openHAB start-level of a system where some things fail to load. It's 70 but your scripting add-on works fine. The start-level trigger triggers on openHAB start-levels, not on OSGi start-levels.

In StartLevelService we set the openHAB start-level to 10 when the OSGi start-level reaches 80:

private void handleOSGiStartlevel() {
        FrameworkStartLevel sl = this.bundleContext.getBundle(0).adapt(FrameworkStartLevel.class);
        int defaultStartLevel = sl.getInitialBundleStartLevel();
        int startLevel = sl.getStartLevel();
        if (startLevel >= defaultStartLevel && openHABStartLevel < 10) {
            setStartLevel(10);
        } else if (startLevel < defaultStartLevel && openHABStartLevel >= 10) {
            setStartLevel(0);
        }
    }

It may make sense to implement #3073, that makes startup more deterministic, but it still does not prevent the error messages.

@florian-h05
Copy link
Contributor Author

Don't confuse openHAB start levels with OSGi start levels.

Thanks for your explanations, this helps a lot.
I am not sure how to proceed with the problem, that UI rules can run/load without the script engine they require being available.

Also see #3199, this is about problems with start level triggers in JSR223 rules

@J-N-K
Copy link
Member

J-N-K commented Dec 27, 2022

It depends on what you want to trigger. I would say that rule execution should be prevented before start level 50 (that is „rule engine ready“).

We could try to add ready markers for „all add-ons installed“ (from distribution feature installer and add-on services) to start level 10 (or a new level 15) and then use a bundle tracker that inspects all bundles and sets a ready marker for level 40 when all automation add-ons are active. Not sure if that’s working, but at least an idea.

@J-N-K
Copy link
Member

J-N-K commented Dec 28, 2022

Unfortunately I can't test it, because I don't see the issue on my systems, but I'm pretty confident that it works.

@florian-h05
Copy link
Contributor Author

I will see if I can test this, but I can't promise anything, not sure if I can reproduce on my dev system.

@J-N-K
Copy link
Member

J-N-K commented Dec 28, 2022

Please note that you need to modify the start-level configuration similar to the attached PR in openhab-distro to see any effect.

@Flole998
Copy link
Member

Is this similar/identical to #3073? I wanted to start a discussion about that a while ago, would be great if I finally found people to discuss it with ;)

@J-N-K
Copy link
Member

J-N-K commented Dec 29, 2022

It‘s related, but not the same. The difference is that in this case rules can’t run because the scripting engine is not available (or better: they run, it they fail). #3073 is about rules being triggered before the rule engine has finished initialization. And finally #3199 is about missed triggers if rules are added after the start level has already been reached.

The common thing is that startup was not really deterministic and I think we‘ll do much better once the PR are merged.

I‘m preparing a fix for #3073, too.

@florian-h05
Copy link
Contributor Author

I am not sure what I am doing wrong, but my system hangs a start level 30 because the org.openhab.core.automation.module.script bundle does never set the ready marker, because it never starts on my system:

openhab> bundle:list org.openhab.core.automation.module.script                                                       
START LEVEL 100 , List Threshold: 50
 ID │ State    │ Lvl │ Version            │ Name
────┼──────────┼─────┼────────────────────┼─────────────────────────────────────────────────────────────────────────
254 │ Resolved │  80 │ 4.0.0.202212301111 │ openHAB Core :: Bundles :: Automation Script Modules

I am on the latest 4.0.0-SNAPSHOT (Build #3258) on amd64.

@J-N-K
Copy link
Member

J-N-K commented Dec 30, 2022

Without the changes from this PR? What happens if you start 254?

@florian-h05
Copy link
Contributor Author

nano somehow destroyed the runtime/services.cfg when adding the ready marker as in your PR.
I've fixed my services.cfg, now the org.openhab.core.automation.module.script bundle properly starts.

Two things I noticed:

  • without the startlevel configuration change from your distro PR, the rule engine loads fine
  • with the startlevel configuration change from your distro PR, openHAB hungs at start level 30

@J-N-K
Copy link
Member

J-N-K commented Dec 30, 2022

If you change the startlevel configuration you need to also add the changes from this PR.

@florian-h05
Copy link
Contributor Author

This PR here: #3275?

@J-N-K
Copy link
Member

J-N-K commented Dec 30, 2022

Yes.

@florian-h05
Copy link
Contributor Author

My org.openhab.core.automation.module.script bundle is compiled from your PR.
The comment change to the StartLevelService.java can be ignored.

@florian-h05
Copy link
Contributor Author

My startup log, the ScriptEngineFactoryBundleTracker seems to recognize the bundle activations:

13:07:36.045 [INFO ] [del.core.internal.ModelRepositoryImpl] - Loading model 'issue_12577.items'
13:07:36.221 [DEBUG] [rnal.ScriptEngineFactoryBundleTracker] - Modified org.openhab.core.model.script.runtime: STARTING
13:07:36.223 [DEBUG] [rnal.ScriptEngineFactoryBundleTracker] - Modified org.openhab.core.model.script.runtime: ACTIVE
13:07:37.709 [INFO ] [b.core.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007
13:07:38.390 [DEBUG] [rnal.ScriptEngineFactoryBundleTracker] - Modified org.openhab.automation.jsscripting: STARTING
13:07:38.549 [DEBUG] [rnal.ScriptEngineFactoryBundleTracker] - Modified org.openhab.automation.jsscripting: ACTIVE
13:07:38.554 [DEBUG] [rnal.ScriptEngineFactoryBundleTracker] - Modified org.openhab.automation.jsscriptingnashorn: STARTING
13:07:38.688 [DEBUG] [rnal.ScriptEngineFactoryBundleTracker] - Modified org.openhab.automation.jsscriptingnashorn: ACTIVE
13:07:38.862 [INFO ] [hab.ui.habpanel.internal.HABPanelTile] - Started HABPanel at /habpanel

@J-N-K
Copy link
Member

J-N-K commented Dec 30, 2022

Found it. If the tracker is activated when the ready marker for start level 10 was already set then the ready marker for the script engine factories was never set.

@florian-h05
Copy link
Contributor Author

Great, now openHAB starts again.

I have a rule that runs on start level xx and used the Nashorn addon to log hello world, for the different start levels this happens:

  • 40: works fine now
  • 50: works fine now
  • 70: works fine now

I guess I don't need to test start levels > 70.

@florian-h05
Copy link
Contributor Author

One interesting thing I noticed: when I stop openHAB. this is logged:

2022-12-30 13:51:30.748 [INFO ] [nal.ScriptEngineFactoryBundleTracker] - All automation bundles ready.

@J-N-K
Copy link
Member

J-N-K commented Dec 30, 2022

I guess it's more cosmetic, but see my latest commit.

@florian-h05
Copy link
Contributor Author

This now solved as well. thanks 👍

I hope this gets reviewed soon.
You have many great PRs, sometimes I feel like they are too many to review in a shorter time ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants