Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support batching up commands #54

Merged
merged 1 commit into from
Mar 7, 2023
Merged

Conversation

jovial
Copy link

@jovial jovial commented Mar 6, 2023

When you have around 60 baremetal nodes attached to a single switch, it takes a long time to execute all those commands. This gets worse when you limit the number of concurrent ssh connections.

Here we look to batch up commands to send to the switch together using a single connection. The results of each port's commands are returned when available.

This is implemented using etcd as a queueing system. Commands are added to an input key, then a worker thread processes the available commands for a particular switch device. We pull off the queue using the version at which the keys were added, giving a FIFO style queue. The result of each command set are added to an output key, which the original request thread is watching. Distributed locks are used to serialise the processing of commands for each switch device.

Various neat etcd features are used here to alleviate some of the issues of distributed task coordination, including transactions, leases, watches, historical key/value tracking, etc.

Co-Authored-By: Mark Goddard mark@stackhpc.com

Change-Id: I8c458bbc94df5630cfede5434bcdbe527988059c (cherry picked from commit 45b237b) (cherry picked from commit 465c979)

@jovial jovial requested a review from a team as a code owner March 6, 2023 15:53
@jovial jovial marked this pull request as draft March 6, 2023 16:39
@jovial jovial force-pushed the feature/wallaby/batching branch 2 times, most recently from b54a8af to f1fd180 Compare March 6, 2023 16:51
@jovial
Copy link
Author

jovial commented Mar 6, 2023

Changes from upstream version

Removed unsupported argument:

+++ b/networking_generic_switch/batching.py
@@ -321,11 +321,9 @@ class SwitchBatch(object):
             ca_cert = params.get('ca_cert')
             cert_key = params.get('cert_key')
             cert_cert = params.get('cert_cert')
-            api_version = params.get('api_version', 'v3alpha')
             etcd_client = etcd3gw.client(
                 host=host, port=port, protocol=protocol,
                 ca_cert=ca_cert, cert_key=cert_key, cert_cert=cert_cert,
-                api_path='/' + api_version + '/',
                 timeout=30)
             self.queue = SwitchQueue(switch_name, etcd_client)
         else:

get_hostname() not available in wallaby:

--- a/networking_generic_switch/devices/netmiko_devices/__init__.py
+++ b/networking_generic_switch/devices/netmiko_devices/__init__.py
@@ -136,7 +136,7 @@ class NetmikoSwitch(devices.GenericSwitchDevice):
         elif CONF.ngs_coordination.backend_url:
             self.locker = coordination.get_coordinator(
                 CONF.ngs_coordination.backend_url,
-                ('ngs-' + device_utils.get_hostname()).encode('ascii'))
+                ('ngs-' + CONF.host).encode('ascii'))
             self.locker.start()
             atexit.register(self.locker.stop)
 

When you have around 60 baremetal nodes attached to a single switch, it
takes a long time to execute all those commands. This gets worse when
you limit the number of concurrent ssh connections.

Here we look to batch up commands to send to the switch together using a
single connection. The results of each port's commands are returned when
available.

This is implemented using etcd as a queueing system. Commands are added
to an input key, then a worker thread processes the available commands
for a particular switch device. We pull off the queue using the version
at which the keys were added, giving a FIFO style queue. The result of
each command set are added to an output key, which the original request
thread is watching. Distributed locks are used to serialise the
processing of commands for each switch device.

Various neat etcd features are used here to alleviate some of the issues
of distributed task coordination, including transactions, leases,
watches, historical key/value tracking, etc.

Co-Authored-By: Mark Goddard <mark@stackhpc.com>

Change-Id: I8c458bbc94df5630cfede5434bcdbe527988059c
(cherry picked from commit 45b237b)
(cherry picked from commit 465c979)
@jovial jovial marked this pull request as ready for review March 6, 2023 17:00
@jovial jovial merged commit a419423 into stackhpc/wallaby Mar 7, 2023
@jovial jovial deleted the feature/wallaby/batching branch March 7, 2023 09:14
jovial added a commit to stackhpc/stackhpc-kayobe-config that referenced this pull request Mar 7, 2023
This is used by scientific-openstack and is opt in via a config
option.

See: stackhpc/networking-generic-switch#54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants