Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] updated disconnection logic and clean up unused noise #381

Merged
merged 1 commit into from
Oct 16, 2022

Conversation

digitaldan
Copy link
Contributor

Signed-off-by: Dan Cunningham dan@digitaldan.com

@digitaldan digitaldan self-assigned this Aug 22, 2022
@digitaldan digitaldan force-pushed the disconnect-cleanup branch 4 times, most recently from 344f4ea to 810415e Compare August 22, 2022 05:22
@digitaldan digitaldan changed the title [WIP] updated disconnection logic [WIP] updated disconnection logic and clean up unused noise Aug 22, 2022
app.js Outdated
@@ -458,41 +428,31 @@ io.use(function (socket, next) {
io.sockets.on('connection', function (socket) {
logger.info('openHAB-cloud: Incoming openHAB connection for uuid ' + socket.handshake.uuid);
socket.join(socket.handshake.uuid);
// Remove openHAB from offline array if needed
delete offlineOpenhabs[socket.handshake.uuid];
Openhab.findOne({
Copy link

@ssalonen ssalonen Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could replaced with pre = findOneAndUpdate(.., returnOriginal=true) atomic operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! Yes i think we could in fact use this method, and as an extra bonus, save ourselves an extra mongo call as well. I eliminated another expensive write to an unused collection in this function, so combined with this, both could really have a positive impact on performance when we have to restart a container and thousands of openHABs try and reconnect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

app.js Outdated
}
});

//actually would redis be better to store? How would we coordinate who send notification?
Copy link

@ssalonen ssalonen Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed is this notification sending a completely independent task? It could be triggered by
Regularly, checking redis for instances to notify?

Basically a completely different process.

Would then the offlineopenhabs be a redis backed object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, that was a note i left to my self when running through the code, did not mean to check it in :-)

If i was starting from scratch or doing a major refactor, then yes i would be using redis (or some queue/messaging like backend) to persist this kind of stuff in a more distributed friendly fashion. I again restrained from rewriting too much to keep the changes small so if something does go wrong it's easier to debug, and quicker to get this out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, makes sense not to bundle it in here.

@digitaldan digitaldan force-pushed the disconnect-cleanup branch 5 times, most recently from dd1b618 to a074496 Compare August 25, 2022 03:27
@digitaldan
Copy link
Contributor Author

I ended removing a bunch of dead code that has never been used. Also i put a data cap on our notification log as that collection on mongo is uncapped and has grown to over 40gb on production for no good reason.

Signed-off-by: Dan Cunningham <dan@digitaldan.com>
@ssalonen
Copy link

ssalonen commented Sep 19, 2022

For the record, this potentially resolves a race condition which leads to cloud thinking that client is offline even though it is connected. See #134 (comment) for more info

Quoting from our 1on1 discussion

Only failure scenario that comes to mind is race condition

  1. Client disconnects on node X, code continues to prepare to send “disconnected” notification and store new status to Mongo
  2. Node X hangs a moment
  3. Client reconnects on node Y code continues to prepare to send “connected” notification and store new status to Mongo
  4. Process on node Y commits to Mongo (online status)
  5. Process ok node X commits to Mongo (offline status)

I realized that any false online status will be always cleared by ping/pong. However, there are no mechanism in place to correct false offline status.

This PR utilizes unique IDs for sessions combined with Mongo atomic query/update commands to resolve the possible race conditions

@ssalonen
Copy link

@digitaldan how to get this merged?

@digitaldan
Copy link
Contributor Author

Yeah, i have been procrastinating a little as I need to block off time to do a proper deployment to the general service once we merge. I also then need to remove a few unused (but very large) collections from mongo. I'll probably do that Sunday, probably will post something to the forums later today about the upcoming maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants