Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[velux] Fix synchronisation of handler initialization and disposal #10449

Merged
merged 19 commits into from Apr 22, 2021

Conversation

andrewfg
Copy link
Contributor

@andrewfg andrewfg commented Apr 3, 2021

The Velux hub initialization and shutdown processes are very slow and deliberate. And sometimes when OH reloads the binding, it was possible for a newly created VeluxBridgeHandler instance's .initialize() method to be called before the prior VeluxBridgeHandler instance's .dispose() method had finished processing i.e. it was attempting to shut down and re-start the hub communication at the same time!

See this forum post

Therefore this PR introduces the following fixes resp. improvements:

  • The .initialize() / .dispose() methods now defer slow processes to scheduled tasks, so the OH caller thread returns faster.
  • The .initializeScheduled() method now waits for the Future<> result of a prior .disposeScheduled() (if any) before proceeding.
  • If a socket timeout occurs during the shut down process, the communication polling loop will exit faster.
  • Cleaned up the logging a bit, to eliminate un- needed logger.info() reports, and make others more easily understandable.
  • Cleaned up the initialization of the Velux Binding Info thing.

Signed-off-by: Andrew Fiddian-Green software@whitebear.ch

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg andrewfg added bug An unexpected problem or unintended behavior of an add-on additional testing preferred The change works for the pull request author. A test from someone else is preferred though. labels Apr 3, 2021
@andrewfg andrewfg added this to the 3.1 milestone Apr 3, 2021
@andrewfg
Copy link
Contributor Author

andrewfg commented Apr 3, 2021

@gs4711 feedback appreciated

@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/velux-new-openhab2-binding-feedback-welcome/32926/1261

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Copy link
Contributor

@gs4711 gs4711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Are your sure to remove the synchronized of the dispose method?

@andrewfg
Copy link
Contributor Author

andrewfg commented Apr 5, 2021

Are your sure to remove the synchronized of the dispose method?

Hmm. I have never actually observed OH calling dispose() on the same class instance more than once. Also the dispose() method of the parent class (that is overridden) is not declared synchronized, so I suspect the creators of OH did not foresee multiple calls to be made. ??

The real problem that I am trying to address, is when OH calls dispose() on one (old) class instance, and initialize() on another newly created class instance for the same hub. That is why the Future<> get() is made to cross synchronise the two methods between two separate class instances that relate to the same Bridge (IP address).

However, do you have a specific reason why it should be synchronized?

@andrewfg
Copy link
Contributor Author

andrewfg commented Apr 5, 2021

Guenther, some more thoughts on this..

  • If any method should be synchronized, it should really be disposeScheduled().
  • And since that is a scheduled task, it is possible that the thread scheduler would serialize the execution of the two tasks anyway.
  • However perhaps an even safer approach would be for disposeSceduled() to set a boolean flag disposeAlreadyDone and use it to shortcut any subsequent calls to disposeScheduled()..

Copy link
Contributor

@gs4711 gs4711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it for the time being!

But, in fact, IMHO it should be fixed in the framework.

@andrewfg
Copy link
Contributor Author

andrewfg commented Apr 6, 2021

@gs4711 I have a question concerning a potential a problem also with the Velux_Binding_Handler class.

During a jar reload I saw VeluxBindingHandler.initialize() being called before VeluxBridgeHandler.disposeScheduled() had completed (see log below) i.e. well before VeluxBridgeHandler.initializeScheduled() had completed. And since VeluxBindingHandler has Channels that depend on the state and quantity of the VeluxBridgeHandler's Channels, I think its call to updateVisibleInformation() might also interfere with the disposal and re-initialization sequence of the respective Bridges. Or??

So I think I should also defer the execution of VeluxBindingHandler.initialize() (resp. updateVisibleInformation()) until all of the respective Bridges have been cleanly disposed and re-initialized? What do you think?

2021-04-06 13:37:45.822 [INFO ] [internal.handler.VeluxBindingHandler] - Initializing VeluxBindingHandler for 'velux:binding:org_openhab_binding_velux'.
2021-04-06 13:37:46.068 [INFO ] [.internal.handler.VeluxBridgeHandler] - Velux Bridge 'velux:klf200:g24' shut down completed.

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg
Copy link
Contributor Author

andrewfg commented Apr 6, 2021

updateVisibleInformation() might also interfere with the disposal and re-initialization sequence of the respective Bridges

No, it does not actually interfere with the disposal and re-initialization. However the Velux Binding Handler Thing(s) display the wrong information because the respective Velux Bridge Handler Thing(s) have not had time to fully discover all their Things.

defer the execution .. until all of the respective Bridges have been cleanly disposed and re-initialized

My last three commits defer the binding information update until after the Bridges & Things are disposed and re-initialized.

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg andrewfg removed the additional testing preferred The change works for the pull request author. A test from someone else is preferred though. label Apr 8, 2021
@andrewfg andrewfg requested a review from a team April 8, 2021 18:59
Copy link
Member

@kaikreuzer kaikreuzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewfg Thanks for your refactoring here.
Looking at the code, I noticed that there are still many info logs that should indeed be debug level instead. I see that they were in there before, but as you cleaned up the logging, it would be nice if you could address those, too.

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Copy link
Member

@fwolter fwolter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the log levels!

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg
Copy link
Contributor Author

raise a sh*t storm, that Velux fixes this bug :)

We have tried this. But Velux won't even answer the emails. They say it is provided "as is"..

@andrewfg
Copy link
Contributor Author

keep in mind that everything you do, does delay the execution of systemctl stop openhab

Yes I know that; and I can see it in my tests; in my case 'stop' takes about 15 seconds; whereby I have 17 bindings, and I am not sure if the delay is due to the Velux or something else..

Anyway, let me think about this overnight, and I will eventually push another change. I will ping you back when I am done.

… timeout explicit; disable HSM on shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg andrewfg added the additional testing preferred The change works for the pull request author. A test from someone else is preferred though. label Apr 18, 2021
@andrewfg
Copy link
Contributor Author

I will eventually push another change. I will ping you back when I am done

I made another commit just now with the following changes..

  • Instead of synchronizing initializeSchedulerJob() and disposeSchedulerJob() tasks via a Future<> it now synchronizes them via the Strings.intern() object of the bridge's IP address. This is a much, much better solution since it prevents multiple/cross, initialization/disposal, via same/different class instances, to the same physical bridge (IP address).
  • Instead of shutdown() it now uses shutdownNow() to avoid waiting for scheduled communications tasks that have not yet been started.
  • To prevent OH systemctl stop openhab.service from stalling, the call to awaitTermination() times out after an explicitly declared constant period of COMMUNICATION_TASK_MAX_WAIT_SECS = 10 seconds.
  • The dispose() method additionally sends a disable House Status Monitor command to the bridge; thus stopping it queueing fresh events in its send buffer.

@fwolter I have tested start/ stop combinations of a) OH itself, b) the binding, and c) specific bridges, so I think you can complete your review of the new code now if you want. But notwithstanding this I set the 'Additional Testing Required' flag, so I can let it run for a few days in my live system; so please don't actually merge the PR until I clear that flag again.

@andrewfg
Copy link
Contributor Author

The dispose() method additionally sends a disable House Status Monitor command to the bridge; thus stopping it queueing fresh events in its send buffer

@gs4711 this might (finally) be the key that resolves the 'zombie socket' problem. => What do you think?

@andrewfg andrewfg requested a review from fwolter April 18, 2021 16:53
…M shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg
Copy link
Contributor Author

@fwolter I have been pressure testing this for a few days now, and it is all good to go.

@andrewfg andrewfg removed the additional testing preferred The change works for the pull request author. A test from someone else is preferred though. label Apr 21, 2021
Copy link
Member

@fwolter fwolter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fwolter fwolter merged commit 0b375dc into openhab:main Apr 22, 2021
themillhousegroup pushed a commit to themillhousegroup/openhab2-addons that referenced this pull request May 10, 2021
…penhab#10449)

* [velux] break on socket timeout if interrupted flag set

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] pause initialize until pending dispose is done

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synchronization key is IP address not ThingUID

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level, improve user comprehensibility

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] access updateBindingState() externally via static method

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] tweak logging

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] defer refreshBindingInfo until after initialized / disposed

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] refactor method names to show the type of scheduler they run on

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't terminate thread early

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] restore calls to updateBindingState

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] logger cosmetics

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging levels

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] change logger level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synch on ip address object instead of Future<>; make shutdown timeout explicit; disable HSM on shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] add lock modifier, remove duplicate synchronized, refactor HSM shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] shutdown code was not being called

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't convert InterruptedException to IOException

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] let shutdown explicity stop the polling loop

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: John Marshall <john.marshall.au@gmail.com>
computergeek1507 pushed a commit to computergeek1507/openhab-addons that referenced this pull request Jul 13, 2021
…penhab#10449)

* [velux] break on socket timeout if interrupted flag set

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] pause initialize until pending dispose is done

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synchronization key is IP address not ThingUID

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level, improve user comprehensibility

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] access updateBindingState() externally via static method

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] tweak logging

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] defer refreshBindingInfo until after initialized / disposed

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] refactor method names to show the type of scheduler they run on

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't terminate thread early

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] restore calls to updateBindingState

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] logger cosmetics

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging levels

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] change logger level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synch on ip address object instead of Future<>; make shutdown timeout explicit; disable HSM on shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] add lock modifier, remove duplicate synchronized, refactor HSM shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] shutdown code was not being called

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't convert InterruptedException to IOException

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] let shutdown explicity stop the polling loop

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
frederictobiasc pushed a commit to frederictobiasc/openhab-addons that referenced this pull request Oct 26, 2021
…penhab#10449)

* [velux] break on socket timeout if interrupted flag set

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] pause initialize until pending dispose is done

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synchronization key is IP address not ThingUID

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level, improve user comprehensibility

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] access updateBindingState() externally via static method

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] tweak logging

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] defer refreshBindingInfo until after initialized / disposed

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] refactor method names to show the type of scheduler they run on

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't terminate thread early

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] restore calls to updateBindingState

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] logger cosmetics

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging levels

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] change logger level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synch on ip address object instead of Future<>; make shutdown timeout explicit; disable HSM on shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] add lock modifier, remove duplicate synchronized, refactor HSM shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] shutdown code was not being called

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't convert InterruptedException to IOException

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] let shutdown explicity stop the polling loop

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
thinkingstone pushed a commit to thinkingstone/openhab-addons that referenced this pull request Nov 7, 2021
…penhab#10449)

* [velux] break on socket timeout if interrupted flag set

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] pause initialize until pending dispose is done

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synchronization key is IP address not ThingUID

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level, improve user comprehensibility

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] access updateBindingState() externally via static method

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] tweak logging

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] defer refreshBindingInfo until after initialized / disposed

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] refactor method names to show the type of scheduler they run on

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't terminate thread early

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] restore calls to updateBindingState

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] logger cosmetics

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging levels

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] change logger level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synch on ip address object instead of Future<>; make shutdown timeout explicit; disable HSM on shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] add lock modifier, remove duplicate synchronized, refactor HSM shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] shutdown code was not being called

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't convert InterruptedException to IOException

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] let shutdown explicity stop the polling loop

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg andrewfg deleted the velux-initialize-dispose-fix branch February 7, 2022 19:15
marcfischerboschio pushed a commit to bosch-io/openhab-addons that referenced this pull request May 5, 2022
…penhab#10449)

* [velux] break on socket timeout if interrupted flag set

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] pause initialize until pending dispose is done

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synchronization key is IP address not ThingUID

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging level, improve user comprehensibility

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] access updateBindingState() externally via static method

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] tweak logging

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] defer refreshBindingInfo until after initialized / disposed

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] refactor method names to show the type of scheduler they run on

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't terminate thread early

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] restore calls to updateBindingState

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] logger cosmetics

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] reduce logging levels

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] change logger level

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] synch on ip address object instead of Future<>; make shutdown timeout explicit; disable HSM on shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] add lock modifier, remove duplicate synchronized, refactor HSM shutdown

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] shutdown code was not being called

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] don't convert InterruptedException to IOException

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>

* [velux] let shutdown explicity stop the polling loop

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants