Skip to content

Conversation

@markgoddard
Copy link

When you have around 60 baremetal nodes attached to a single switch, it
takes a long time to execute all those commands. This gets worse when
you limit the number of concurrent ssh connections.

Here we look to batch up commands to send to the switch together using a
single connection. The results of each port's commands are returned when
available.

This is implemented using etcd as a queueing system. Commands are added
to an input key, then a worker thread processes the available commands
for a particular switch device. We pull off the queue using the version
at which the keys were added, giving a FIFO style queue. The result of
each command set are added to an output key, which the original request
thread is watching. Distributed locks are used to serialise the
processing of commands for each switch device.

Various neat etcd features are used here to alleviate some of the issues
of distributed task coordination, including transactions, leases,
watches, historical key/value tracking, etc.

Co-Authored-By: Mark Goddard mark@stackhpc.com

Change-Id: I8c458bbc94df5630cfede5434bcdbe527988059c
(cherry picked from commit a915d2b39273c8c65eee4346d6ac4552ef170fd6)

@markgoddard markgoddard requested a review from a team as a code owner February 8, 2023 10:42
@markgoddard markgoddard self-assigned this Feb 8, 2023
@markgoddard markgoddard closed this Feb 8, 2023
@markgoddard markgoddard reopened this Feb 8, 2023
JohnGarbutt and others added 2 commits March 2, 2023 16:13
When you have around 60 baremetal nodes attached to a single switch, it
takes a long time to execute all those commands. This gets worse when
you limit the number of concurrent ssh connections.

Here we look to batch up commands to send to the switch together using a
single connection. The results of each port's commands are returned when
available.

This is implemented using etcd as a queueing system. Commands are added
to an input key, then a worker thread processes the available commands
for a particular switch device. We pull off the queue using the version
at which the keys were added, giving a FIFO style queue. The result of
each command set are added to an output key, which the original request
thread is watching. Distributed locks are used to serialise the
processing of commands for each switch device.

Various neat etcd features are used here to alleviate some of the issues
of distributed task coordination, including transactions, leases,
watches, historical key/value tracking, etc.

Co-Authored-By: Mark Goddard <mark@stackhpc.com>

Change-Id: I8c458bbc94df5630cfede5434bcdbe527988059c
(cherry picked from commit 45b237b)
The api_path was only added to the client helper function in etcd3gw 2.1.0.
This is available in upper constraints for Zed.

Change-Id: Ide34499f64f8e8a92be80a132ece6090701733a9
@markgoddard markgoddard merged commit c197957 into stackhpc/yoga Mar 6, 2023
@markgoddard markgoddard deleted the yoga-batching branch March 6, 2023 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants