Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework event distribution #6173

Open
stgraber opened this issue Sep 9, 2019 · 2 comments

Comments

@stgraber
Copy link
Member

commented Sep 9, 2019

A common issue we have with clustering is the network chatter caused by our current broadcast approach to event distribution where every single member connects to all other members.

To improve this, we should switch to using event hubs, Effectively select LXD servers (ideally not a database node) which will receive events from all members and then forward those onto all other members.

The setup should allow for multiple hubs to operate at the same time with LXD servers picking one at random and hubs forwarding events between themselves.

On the sending side, an event should be blocked until it's successfully made it to a hub.
The hub will then attempt to forward it to any other hub as well as attempt to send it to all its attached clients using our normal filtering logic.

This is all hidden away from the user, the user will keep connecting to /1.0/events and get the usual websocket stream of events from any of the API servers, they do not need to directly talk to a hub as with this model, all LXD servers still receive all events, they just do so using a single websocket connection to a hub rather than having to deal with one connection per server.

This will build on top of #6172 using the event-hub role.

Further down the line we may look into having /1.0/events be redirected to a event hub and stop having all LXD servers receive all events or switch to a more complex subscription mechanism where LXD servers subscribe to only those events they care about, but that's work for later and the reliability/load improvements coming from the first pass on this should be good enough for the cluster sizes we're looking at now (50 nodes up to 100 nodes).

@stgraber stgraber added the Feature label Sep 9, 2019
@stgraber stgraber added this to the soon milestone Sep 9, 2019
@turtle0x1

This comment has been minimized.

Copy link

commented Sep 12, 2019

using a single websocket connection to a hub rather than having to deal with one connection per server.

While this is a dream, will you be including the server the event came from in the response ?

If I understand correctly this is the new process:

  1. Connect to LXD (ip: 10.10.10.1) /1.0/events
  2. Get redirected to another LXD instance (ip: 10.10.10.2) that is an "event-hub"
  3. Establish web socket
  4. Receive events

I may have a problem that I still need to know if a "container was started" (any event really) on host 10.10.10.1 and not 10.10.10.2

@stgraber

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2019

@turtle0x1 that's what the Location field is for.

root@fw01:~# lxc monitor --type=lifecycle
location: fw01
metadata:
  action: container-stopped
  source: /1.0/containers/dnsr01
timestamp: "2019-09-13T01:45:30.765837451Z"
type: lifecycle


location: fw01
metadata:
  action: container-started
  source: /1.0/containers/dnsr01
timestamp: "2019-09-13T01:45:31.107453084Z"
type: lifecycle


location: fw02
metadata:
  action: container-stopped
  source: /1.0/containers/dnsr02
timestamp: "2019-09-13T01:44:34.670684968Z"
type: lifecycle


location: fw02
metadata:
  action: container-started
  source: /1.0/containers/dnsr02
timestamp: "2019-09-13T01:44:34.919073674Z"
type: lifecycle
@stgraber stgraber self-assigned this Oct 7, 2019
@stgraber stgraber modified the milestones: soon, lxd-3.19 Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.