-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gardena] Fix handling of websocket connection losses that causes memory leaks #11825
[gardena] Fix handling of websocket connection losses that causes memory leaks #11825
Conversation
…ory leaks * The binding no longer restarts websockets more than once if the connection is lost * Fixes openhab#10516 Signed-off-by: Nico Brüttner <n@bruettner.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thank you! Just one small question below.
@gerrieg Would you also want to have a look?
logger.warn("Restarting GardenaSmart Webservice ({})", socket.getSocketID()); | ||
socket.stop(); | ||
// restart after 3 seconds | ||
scheduler.schedule(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to run this asynchronously, i.e. not within the synchronized call anymore? Or would a simple Thread.sleep(3000)
be a safer choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kai,
sorry for the late answer.
I did it this way, because the functions onWebSocketError
and onWebSocketClose
are called at the same time, if the socket connection closes unexpectedly. When only doing a Thread.sleep
before restarting the socket, the socket would restart twice.
To overcome this problem, I did scheduler.schedule()
to wait 3 seconds, before restarting the socket. Because this is done in a new thread, the first call to restartWebsocket
is immediately leaving the synchronized block, so when the second call enters the synchronized block, the socket is already closed, but not yet restarted. This prevents a second restart of the websocket.
I think this is safe and it also works fine in my openhab installation for several month now.
But if this is not a good practice, there are other options to solve this problem. For example:
- Only restart the socket, when
onWebSocketClose
is called and useonWebSocketError
for logging purposes only. This also works fine for me, because every time the connection is closed unexpectedly, both functions are called. But I am not a Java developer, so I don't know, if it is guaranteed, thatonWebSocketClose
is called every time when an error occurs... - If we still want to be able to restart the socket on calls to
onWebSocketError
and/oronWebSocketClose
, I could refactor the synchronized block to get rid ofscheduler.schedule()
und useThread.sleep(3000)
instead. I already tested this, and it works fine. The websocket is still only restarted once.
What do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments today about the logger level. Personally I would suggest you do something like the following..
private @Nullable ScheduledFuture<?> restartTask;
@Override
public void onWebSocketClose() {
// do your logging
...
// do your other stuff
...
scheduleRestart();
}
@Override
public void onWebSocketError() {
// do your logging
...
// do your other stuff
...
scheduleRestart();
}
private synchronized void scheduleRestart() {
ScheduledFuture<?> restartTask = this.restartTask;
if ((restartTask == null) || restartTask.isDone()) {
restartTask = scheduler.schedule(() -> { restartWebsockets(); }, 3, TimeUnit.SECONDS);
}
}
@Override
public synchronized void restartWebsockets() {
// your restart code here
this.restartTask = null;
}
...inding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartWebSocket.java
Outdated
Show resolved
Hide resolved
...hab.binding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartImpl.java
Show resolved
Hide resolved
Signed-off-by: Nico Brüttner <n@bruettner.de>
Signed-off-by: Nico Brüttner <n@bruettner.de>
@Bruetti1991 gentle reminder: you promised three small fixes above, but did not yet commit them. => Could you please do this, and therefore close the respective open issues, so that the PR can be merged? |
PS @Bruetti1991 I wonder how many of the other issues on this binding may have been resolved by your fix? => Any thoughts? |
@andrewfg Sorry for the delay. I just did not have much time for it in the last couple of weeks. I will also have a look at the other issues as soon as possible. |
Signed-off-by: Nico Brüttner <n@bruettner.de>
Signed-off-by: Nico Brüttner <n@bruettner.de>
…with HTTP 429 error (Too Many Requests) Signed-off-by: Nico Brüttner <n@bruettner.de>
…HTTP errors (except 429) Signed-off-by: Nico Brüttner <n@bruettner.de>
Signed-off-by: Nico Brüttner <n@bruettner.de>
@andrewfg I finally found the time to commit the requested fixed. I also changed two additional things:
|
|
Many thanks for the feedback. And the work. I added the 'fixes' keyword so that those issues will 'auto-close' when your PR is merged.
Ok, I will have a look at those issues separately to see if I have some suggestions on how to fix. |
@Bruetti1991 fyi, I will build this version and test it for a few days on my own system. |
...ardena/src/main/java/org/openhab/binding/gardena/internal/handler/GardenaAccountHandler.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Nico Brüttner <n@bruettner.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@Bruetti1991 I have been testing it for about one month now. Unfortunately I have been noticing more disconnects on this build than I did normally on the official release version. However it is not 100% clear to me why I am seeing such errors. I will keep observing & testing it. |
@Bruetti1991 to be even more specific, I am getting the following errors in my log, repeated exactly every two hours..
|
I am pretty sure that these errors are not related to my code changes. You would also get these errors with the official version of the binding. I think this is because of the many websocket disconnections other than 'Going away'... |
I dont know the api very well, but I wonder if we can eliminate the token expired disconnections by renewing the token just before it expires? |
@Bruetti1991 I think there is a fundamental problem in how the access token expiration is being handled. ProblemIMHO ... and that (therefore) WRONGLY triggers the going away log message here ... Proposed SolutionI think that @Bruetti1991 @kaikreuzer => any thoughts on this? |
@andrewfg I didn't follow your discussion and the code changes, but if it is possible to refresh the access token through the established websocket connection, I'd fully agree with you that it should not be closed, but only the token to be refreshed. |
@Bruetti1991 / @kaikreuzer after doing some further digging, I discovered that it is NOT a token expiry problem. It seems that the Gardena remote server is unilaterally closing the WebSocket after 120 minutes. This seems rather odd because the client (the binding) is sending regular WebSocket pings and getting regular pongs from the server. I have two hypotheses about why the server might unilaterally closing the WebSocket as below, but I would appreciate your thoughts too.
In the next days, I shall do some tests to see if either of the above may solve the problem. And I will get back to you. @Bruetti1991 while playing around with your code I also found some compiler warnings and a potential memory leak. Once I have finished the above-mentioned tests, I will come back with a further code review with suggested changes. |
The server does not send pings, so no client pongs are needed, so this is NOT the issue.
This may indeed be the reason why the server "goes away". Reason for me thinking this is that the server goes away after exactly 120 minutes, which is the default time limit for the TCP socket keep-alive mechanism. However in the Jetty framework, there seems to be no way to get at the underlying TCP socket, so I cannot test it. Therefore I think this issue of serve premature "going away" is NOT something we can solve in this PR, and may be something to look out for in future. (I will open an Issue to mark it). And instead, the 'solution' of not logging "going away" errors is probably the right / only solution. => I suggest to simply 'logger.debug()' all server socket close messages. I will therefore just post a new review with some further change requests on your code, so we can close off this PR asap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here a new (hopefully the last) review.
...inding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartWebSocket.java
Outdated
Show resolved
Hide resolved
...inding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartWebSocket.java
Outdated
Show resolved
Hide resolved
...inding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartWebSocket.java
Outdated
Show resolved
Hide resolved
...inding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartWebSocket.java
Outdated
Show resolved
Hide resolved
...inding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartWebSocket.java
Outdated
Show resolved
Hide resolved
...hab.binding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartImpl.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Nico Brüttner <n@bruettner.de>
Signed-off-by: Nico Brüttner <n@bruettner.de>
…d or timed out Signed-off-by: Nico Brüttner <n@bruettner.de>
Signed-off-by: Nico Brüttner <n@bruettner.de>
Right. I discovered the same thing. When getting the first token, the response contains I also took a look at the API for getting the websocket URLs. Along with the websocket URL you indeed get a validity value, but it only describes the "Time window for connection, starting from issuing this POST request, (seconds)". It is set to 10 seconds. So it is not useful for us. I agree with you, that this issue has most likely something to do how jetty handles the connection. So this should be fixed in another PR. |
...inding.gardena/src/main/java/org/openhab/binding/gardena/internal/GardenaSmartWebSocket.java
Outdated
Show resolved
Hide resolved
Agreed. See #12896 |
Signed-off-by: Nico Brüttner <n@bruettner.de>
Signed-off-by: Nico Brüttner <n@bruettner.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your work @Bruetti1991 and thanks for reviewing @andrewfg!
This pull request has been mentioned on openHAB Community. There might be relevant details there: https://community.openhab.org/t/gardena-error-429-limit-exceeded/137419/22 |
…ory leaks (openhab#11825) * [gardena] Fix handling of websocket connection losses that causes memory leaks * The binding no longer restarts websockets more than once if the connection is lost Signed-off-by: Nico Brüttner <n@bruettner.de>
…ory leaks (openhab#11825) * [gardena] Fix handling of websocket connection losses that causes memory leaks * The binding no longer restarts websockets more than once if the connection is lost Signed-off-by: Nico Brüttner <n@bruettner.de>
…ory leaks (openhab#11825) * [gardena] Fix handling of websocket connection losses that causes memory leaks * The binding no longer restarts websockets more than once if the connection is lost Signed-off-by: Nico Brüttner <n@bruettner.de> Signed-off-by: Andras Uhrin <andras.uhrin@gmail.com>
…ory leaks (openhab#11825) * [gardena] Fix handling of websocket connection losses that causes memory leaks * The binding no longer restarts websockets more than once if the connection is lost Signed-off-by: Nico Brüttner <n@bruettner.de>
…ory leaks (openhab#11825) * [gardena] Fix handling of websocket connection losses that causes memory leaks * The binding no longer restarts websockets more than once if the connection is lost Signed-off-by: Nico Brüttner <n@bruettner.de>
Fixes #10516
Fixes #11474
Fixes #10455
Fixes #12481
Issue
Fix