Add support for OpenStack instances that require boot volumes #485

berghaus · 2019-08-02T09:14:18Z

The OpenStackNative cloud type now allows the specification of two
parameters boot_volume and boot_volume_gb_per_core that instruct
cloudscheduler to create a boot volume on instance creation. The size of
the boot volume is controlled by the second option.

berghaus · 2019-08-02T09:20:56Z

Where did the conflict come from .... hmmm.

The OpenStackNative cloud type now allows the specification of two parameters boot_volume and boot_volume_gb_per_core that instruct cloudscheduler to create a boot volume on instance creation. The size of the boot volume is controlled by the second option.

berghaus · 2019-08-02T09:28:24Z

Must have had a fork of the repo before and therefor missing a few of the most recent changes.

rseuster

Nice ! I like it. Just intentation in one place needs fixing

cloudscheduler/openstackcluster.py

rseuster · 2019-08-02T17:00:04Z

cloudscheduler/openstackcluster.py

+                else:
+                    bdm = None
+                    log.debug("creating boot volume")
+                    try:


Memo for a similar implementation in v2:
during my testing I got the impression that nested try except won't work, this should be checked and reviewed before we move this into V2

Yeah I had trouble with nested try except blocks in the past. Let me try to find the type of exceptions cinder throws ... maybe can be smarter here.

rseuster · 2019-08-02T17:02:20Z

cloudscheduler/openstackcluster.py

+                        bdm = {'vda': str(cv.id) + ':::1'}
+                    except Exception as e:
+                        log.error("failed to create boot volume: {}".format(e))
+                        raise e


For V2 we should also think about possible failures, e.g. during testing volumes were created fine, but creation of VMs failed e.g. due to wrong networks. This left the volume lying around and needed to be removed by hand. There's several ways to do it, IMHO but at this point in time nothing needed for V1.

So I have added something to the exception handling in vm_create, but to add some cleanup to the vm deletion means we have to change the database model for the VMs. Note sure how significant an edit that is in the end.

rseuster · 2019-08-02T18:41:11Z

cloudscheduler/cloud_management.py

@@ -382,6 +382,8 @@ def _cluster_from_config(cconfig, cluster):
                    keep_alive=keep_alive,
                    user_domain_name=get_or_none(cconfig, cluster, "user_domain_name"),
                    project_domain_name=get_or_none(cconfig, cluster, "project_domain_name")


comma missing in line 384 ;)

indeed , added

rseuster · 2019-08-02T18:43:49Z

cloudscheduler/openstackcluster.py

@@ -293,8 +334,8 @@ def vm_destroy(self, vm, return_resources=True, reason=""):
        except novaclient.exceptions.NotFound as e:
            log.error("VM %s not found on %s: removing from CS" % (vm.id, self.name))
        except Exception as e:
-            try:
-                log.error("Unhandled exception while destroying VM on %s : %s" % (self.name,e))
+                try:


These two lines got accidentally additional spaces it seems

how did those get there, hmmm

Fix indentation to prevent infinite loop, remove nested exception, delete volume if creation failed, and remove some unwanted spaces. The volume deletion after instance deletion at the moment is not checked. Remove nested exception Nesting exeptions does not work. Handle proper catching the eception thrown by the cinder client. Note sure if thie works with the import of the cinder cleint exceptions in the conditional. Delete volume if instance creation failed Add a comma on line 384 Remove accidental spaces

rseuster · 2019-08-06T21:23:54Z

The first VM from cloudscheduler v1 ran jobs, however, two more files required changes:

# diff -u cloudconfig.py.orig cloudconfig.py
--- cloudconfig.py.orig 2019-08-06 14:05:02.868418103 -0700
+++ cloudconfig.py      2019-08-06 14:04:37.156218534 -0700
@@ -89,7 +89,7 @@
     :param name: The name of cloud to operate on
     :return: True if conf good, False if problem detected
     """
-    valid_option_names = {'access_key_id', 'auth_dat_file', 'auth_url', 'blob_url', 'boot_timeout', 'cacert',
+    valid_option_names = {'access_key_id', 'auth_dat_file', 'auth_url', 'blob_url', 'boot_timeout', 'boot_volume', 'boot_volumee_gb_per_core', 'cacert',
                           'cloud_type', 'contextualization', 'cpu_archs', 'cpu_cores', 'host',
                           'image_attach_device', 'key_name', 'keycert', 'max_vm_mem', 'max_vm_storage', 'memory',
                           'networks', 'password', 'placement_zone', 'port', 'priority', 'project_id', 'project_domain_name',

and

# diff -u openstackcluster.py.orig openstackcluster.py
--- openstackcluster.py.orig    2019-08-06 13:39:28.135490782 -0700
+++ openstackcluster.py 2019-08-06 13:49:10.896025215 -0700
@@ -70,6 +70,8 @@
         self.cacert = cacert
         self.user_domain_name = user_domain_name if user_domain_name is not None else "Default"
         self.project_domain_name = project_domain_name if project_domain_name is not None else "Default"
+        self.boot_volume = boot_volume
+        self.boot_volume_gb_per_core = boot_volume_gb_per_core
         self.session = None
         try:
             authsplit = self.auth_url.split('/')
@@ -116,9 +118,9 @@
         import novaclient.exceptions
         use_cloud_init = use_cloud_init or config.use_cloud_init
         nova = self._get_creds_nova_updated()
-        if boot_volume:
+        if self.boot_volume:
             cinder = self._get_creds_cinder()
-            from cinderclient import exceptions as ccexceptions
+        from cinderclient import exceptions as ccexceptions
         if len(securitygroup) != 0:
             sec_group = []
             for group in securitygroup:
@@ -249,7 +251,7 @@
         if name:
             log.info("Trying to create VM on %s: " % self.name)
             try:
-                if not boot_volume:
+                if not self.boot_volume:
                     instance = nova.servers.create(name=name,
                                                    image=imageobj,
                                                    flavor=flavor,
@@ -262,8 +264,8 @@
                     bdm = None
                     log.debug("creating boot volume")
                     bv_name = "vol-{}".format(name)
-                    if boot_volume_gb_per_core:
-                        bv_size = boot_volume_gb_per_core * cpu_cores
+                    if self.boot_volume_gb_per_core:
+                        bv_size = self.boot_volume_gb_per_core * cpu_cores
                     else:
                         bv_size = 20
                     cv = cinder.volumes.create(name=bv_name,

all worked - the created volumes also dissappeared after the jobs finished and the VM retired !

I also had to update the python bindings to openstack - cinderclient wasn't installed previously on the machine where I tried (which was verifycs.heprc)

rseuster · 2019-08-06T21:24:40Z

BTW - the running code is on verifycs in /usr/local/lib/python2.7/site-packages/cloudscheduler

Add boot volume options to list of valid options. Set boot volume options as members of the OpenStack cluster class.

berghaus · 2019-08-14T07:07:49Z

So the last commit should have addressed the missing points. I'll try it at cern.

Do the python thing and try deletion and handle expected failures

berghaus · 2019-08-20T16:30:29Z

So that last push request is what is running on the cern CS.

The CPU cores need to come from the class. The boot volume configuration is delivered from the config as strings. Convert the strings to a bool or int as the case may be.

berghaus requested a review from rseuster August 2, 2019 09:14

berghaus force-pushed the master branch from 1e6c819 to 5e9e3c9 Compare August 2, 2019 09:26

rseuster suggested changes Aug 2, 2019

View reviewed changes

rseuster reviewed Aug 2, 2019

View reviewed changes

Allow boot volume config options as valid

894614c

Add boot volume options to list of valid options. Set boot volume options as members of the OpenStack cluster class.

Try to delete the boot volume

21b3d77

Do the python thing and try deletion and handle expected failures

Get cpu cores from class members

56b4b53

The CPU cores need to come from the class. The boot volume configuration is delivered from the config as strings. Convert the strings to a bool or int as the case may be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for OpenStack instances that require boot volumes #485

Add support for OpenStack instances that require boot volumes #485

berghaus commented Aug 2, 2019

berghaus commented Aug 2, 2019

berghaus commented Aug 2, 2019

rseuster left a comment

rseuster Aug 2, 2019

berghaus Aug 3, 2019

rseuster Aug 2, 2019

berghaus Aug 3, 2019

rseuster Aug 2, 2019

berghaus Aug 3, 2019

rseuster Aug 2, 2019

berghaus Aug 3, 2019

rseuster commented Aug 6, 2019 •

edited

rseuster commented Aug 6, 2019

berghaus commented Aug 14, 2019

berghaus commented Aug 20, 2019

Add support for OpenStack instances that require boot volumes #485

Are you sure you want to change the base?

Add support for OpenStack instances that require boot volumes #485

Conversation

berghaus commented Aug 2, 2019

berghaus commented Aug 2, 2019

berghaus commented Aug 2, 2019

rseuster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rseuster commented Aug 6, 2019 • edited

rseuster commented Aug 6, 2019

berghaus commented Aug 14, 2019

berghaus commented Aug 20, 2019

rseuster commented Aug 6, 2019 •

edited