Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for OpenStack instances that require boot volumes #485

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

berghaus
Copy link

@berghaus berghaus commented Aug 2, 2019

The OpenStackNative cloud type now allows the specification of two
parameters boot_volume and boot_volume_gb_per_core that instruct
cloudscheduler to create a boot volume on instance creation. The size of
the boot volume is controlled by the second option.

@berghaus berghaus requested a review from rseuster August 2, 2019 09:14
@berghaus
Copy link
Author

berghaus commented Aug 2, 2019

Where did the conflict come from .... hmmm.

The OpenStackNative cloud type now allows the specification of two
parameters boot_volume and boot_volume_gb_per_core that instruct
cloudscheduler to create a boot volume on instance creation. The size of
the boot volume is controlled by the second option.
@berghaus
Copy link
Author

berghaus commented Aug 2, 2019

Must have had a fork of the repo before and therefor missing a few of the most recent changes.

Copy link
Contributor

@rseuster rseuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ! I like it. Just intentation in one place needs fixing

cloudscheduler/openstackcluster.py Show resolved Hide resolved
else:
bdm = None
log.debug("creating boot volume")
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memo for a similar implementation in v2:
during my testing I got the impression that nested try except won't work, this should be checked and reviewed before we move this into V2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had trouble with nested try except blocks in the past. Let me try to find the type of exceptions cinder throws ... maybe can be smarter here.

bdm = {'vda': str(cv.id) + ':::1'}
except Exception as e:
log.error("failed to create boot volume: {}".format(e))
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For V2 we should also think about possible failures, e.g. during testing volumes were created fine, but creation of VMs failed e.g. due to wrong networks. This left the volume lying around and needed to be removed by hand. There's several ways to do it, IMHO but at this point in time nothing needed for V1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have added something to the exception handling in vm_create, but to add some cleanup to the vm deletion means we have to change the database model for the VMs. Note sure how significant an edit that is in the end.

@@ -382,6 +382,8 @@ def _cluster_from_config(cconfig, cluster):
keep_alive=keep_alive,
user_domain_name=get_or_none(cconfig, cluster, "user_domain_name"),
project_domain_name=get_or_none(cconfig, cluster, "project_domain_name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma missing in line 384 ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed , added

@@ -293,8 +334,8 @@ def vm_destroy(self, vm, return_resources=True, reason=""):
except novaclient.exceptions.NotFound as e:
log.error("VM %s not found on %s: removing from CS" % (vm.id, self.name))
except Exception as e:
try:
log.error("Unhandled exception while destroying VM on %s : %s" % (self.name,e))
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines got accidentally additional spaces it seems

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did those get there, hmmm

Fix indentation to prevent infinite loop, remove nested exception,
delete volume if creation failed, and remove some unwanted spaces. The
volume deletion after instance deletion at the moment is not checked.

Remove nested exception

Nesting exeptions does not work. Handle proper catching the eception
thrown by the cinder client. Note sure if thie works with the import of
the cinder cleint exceptions in the conditional.

Delete volume if instance creation failed

Add a comma on line 384

Remove accidental spaces
@rseuster
Copy link
Contributor

rseuster commented Aug 6, 2019

The first VM from cloudscheduler v1 ran jobs, however, two more files required changes:

# diff -u cloudconfig.py.orig cloudconfig.py
--- cloudconfig.py.orig 2019-08-06 14:05:02.868418103 -0700
+++ cloudconfig.py      2019-08-06 14:04:37.156218534 -0700
@@ -89,7 +89,7 @@
     :param name: The name of cloud to operate on
     :return: True if conf good, False if problem detected
     """
-    valid_option_names = {'access_key_id', 'auth_dat_file', 'auth_url', 'blob_url', 'boot_timeout', 'cacert',
+    valid_option_names = {'access_key_id', 'auth_dat_file', 'auth_url', 'blob_url', 'boot_timeout', 'boot_volume', 'boot_volumee_gb_per_core', 'cacert',
                           'cloud_type', 'contextualization', 'cpu_archs', 'cpu_cores', 'host',
                           'image_attach_device', 'key_name', 'keycert', 'max_vm_mem', 'max_vm_storage', 'memory',
                           'networks', 'password', 'placement_zone', 'port', 'priority', 'project_id', 'project_domain_name',

and

# diff -u openstackcluster.py.orig openstackcluster.py
--- openstackcluster.py.orig    2019-08-06 13:39:28.135490782 -0700
+++ openstackcluster.py 2019-08-06 13:49:10.896025215 -0700
@@ -70,6 +70,8 @@
         self.cacert = cacert
         self.user_domain_name = user_domain_name if user_domain_name is not None else "Default"
         self.project_domain_name = project_domain_name if project_domain_name is not None else "Default"
+        self.boot_volume = boot_volume
+        self.boot_volume_gb_per_core = boot_volume_gb_per_core
         self.session = None
         try:
             authsplit = self.auth_url.split('/')
@@ -116,9 +118,9 @@
         import novaclient.exceptions
         use_cloud_init = use_cloud_init or config.use_cloud_init
         nova = self._get_creds_nova_updated()
-        if boot_volume:
+        if self.boot_volume:
             cinder = self._get_creds_cinder()
-            from cinderclient import exceptions as ccexceptions
+        from cinderclient import exceptions as ccexceptions
         if len(securitygroup) != 0:
             sec_group = []
             for group in securitygroup:
@@ -249,7 +251,7 @@
         if name:
             log.info("Trying to create VM on %s: " % self.name)
             try:
-                if not boot_volume:
+                if not self.boot_volume:
                     instance = nova.servers.create(name=name,
                                                    image=imageobj,
                                                    flavor=flavor,
@@ -262,8 +264,8 @@
                     bdm = None
                     log.debug("creating boot volume")
                     bv_name = "vol-{}".format(name)
-                    if boot_volume_gb_per_core:
-                        bv_size = boot_volume_gb_per_core * cpu_cores
+                    if self.boot_volume_gb_per_core:
+                        bv_size = self.boot_volume_gb_per_core * cpu_cores
                     else:
                         bv_size = 20
                     cv = cinder.volumes.create(name=bv_name,

all worked - the created volumes also dissappeared after the jobs finished and the VM retired !

I also had to update the python bindings to openstack - cinderclient wasn't installed previously on the machine where I tried (which was verifycs.heprc)

@rseuster
Copy link
Contributor

rseuster commented Aug 6, 2019

BTW - the running code is on verifycs in /usr/local/lib/python2.7/site-packages/cloudscheduler

Add boot volume options to list of valid options. Set boot volume
options as members of the OpenStack cluster class.
@berghaus
Copy link
Author

So the last commit should have addressed the missing points. I'll try it at cern.

Do the python thing and try deletion and handle expected failures
@berghaus
Copy link
Author

So that last push request is what is running on the cern CS.

The CPU cores need to come from the class. The boot volume configuration
is delivered from the config as strings. Convert the strings to a bool
or int as the case may be.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants