New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for OpenStack instances that require boot volumes #485
base: master
Are you sure you want to change the base?
Conversation
Where did the conflict come from .... hmmm. |
The OpenStackNative cloud type now allows the specification of two parameters boot_volume and boot_volume_gb_per_core that instruct cloudscheduler to create a boot volume on instance creation. The size of the boot volume is controlled by the second option.
Must have had a fork of the repo before and therefor missing a few of the most recent changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ! I like it. Just intentation in one place needs fixing
cloudscheduler/openstackcluster.py
Outdated
else: | ||
bdm = None | ||
log.debug("creating boot volume") | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memo for a similar implementation in v2:
during my testing I got the impression that nested try except won't work, this should be checked and reviewed before we move this into V2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I had trouble with nested try except blocks in the past. Let me try to find the type of exceptions cinder throws ... maybe can be smarter here.
cloudscheduler/openstackcluster.py
Outdated
bdm = {'vda': str(cv.id) + ':::1'} | ||
except Exception as e: | ||
log.error("failed to create boot volume: {}".format(e)) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For V2 we should also think about possible failures, e.g. during testing volumes were created fine, but creation of VMs failed e.g. due to wrong networks. This left the volume lying around and needed to be removed by hand. There's several ways to do it, IMHO but at this point in time nothing needed for V1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have added something to the exception handling in vm_create
, but to add some cleanup to the vm deletion means we have to change the database model for the VMs. Note sure how significant an edit that is in the end.
cloudscheduler/cloud_management.py
Outdated
@@ -382,6 +382,8 @@ def _cluster_from_config(cconfig, cluster): | |||
keep_alive=keep_alive, | |||
user_domain_name=get_or_none(cconfig, cluster, "user_domain_name"), | |||
project_domain_name=get_or_none(cconfig, cluster, "project_domain_name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma missing in line 384 ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed , added
cloudscheduler/openstackcluster.py
Outdated
@@ -293,8 +334,8 @@ def vm_destroy(self, vm, return_resources=True, reason=""): | |||
except novaclient.exceptions.NotFound as e: | |||
log.error("VM %s not found on %s: removing from CS" % (vm.id, self.name)) | |||
except Exception as e: | |||
try: | |||
log.error("Unhandled exception while destroying VM on %s : %s" % (self.name,e)) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines got accidentally additional spaces it seems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did those get there, hmmm
Fix indentation to prevent infinite loop, remove nested exception, delete volume if creation failed, and remove some unwanted spaces. The volume deletion after instance deletion at the moment is not checked. Remove nested exception Nesting exeptions does not work. Handle proper catching the eception thrown by the cinder client. Note sure if thie works with the import of the cinder cleint exceptions in the conditional. Delete volume if instance creation failed Add a comma on line 384 Remove accidental spaces
The first VM from cloudscheduler v1 ran jobs, however, two more files required changes:
and
all worked - the created volumes also dissappeared after the jobs finished and the VM retired ! I also had to update the python bindings to openstack - cinderclient wasn't installed previously on the machine where I tried (which was verifycs.heprc) |
BTW - the running code is on verifycs in /usr/local/lib/python2.7/site-packages/cloudscheduler |
Add boot volume options to list of valid options. Set boot volume options as members of the OpenStack cluster class.
So the last commit should have addressed the missing points. I'll try it at cern. |
Do the python thing and try deletion and handle expected failures
So that last push request is what is running on the cern CS. |
The CPU cores need to come from the class. The boot volume configuration is delivered from the config as strings. Convert the strings to a bool or int as the case may be.
The OpenStackNative cloud type now allows the specification of two
parameters boot_volume and boot_volume_gb_per_core that instruct
cloudscheduler to create a boot volume on instance creation. The size of
the boot volume is controlled by the second option.