Add creation of RBD volumes and document it. #56

olifre · 2018-07-14T20:57:36Z

Fixes #54.

I'm unsure what to make configurable - cache option should likely be made configurable (and could go to ceph.conf), but discard (which works only with bus=scsi) seems like the best default to me.

Also, it seems deletion of volumes created this way fails with fog:
https://projects.theforeman.org/issues/12063

Fixes fog#54.

alexjfisher · 2019-04-11T08:52:26Z

I'm interested in this feature, but this PR was created quite a while back now. Could it be revived?

olifre · 2019-05-26T16:10:02Z

Could it be revived?

What do you mean with "revived"?
I'm still here and could rebase, but I did not do anything yet since I received no reply from anybody upstream 😢

alexjfisher · 2019-05-28T10:21:08Z

@olifre Hi! Sorry, I'm just a (foreman) user hoping this feature would be accepted.

I guess I should have pinged @plribeiro3000? Thanks.

alexjfisher · 2019-07-17T16:10:14Z

or maybe @strzibny ?

plribeiro3000 · 2019-07-17T16:57:24Z

lib/fog/libvirt/models/compute/templates/server.xml.erb

@@ -32,13 +32,36 @@
 <% end -%>
  <clock offset='utc'/>
  <devices>
+<% args = {}
+    File.readlines('/etc/foreman/ceph.conf').each do |line|


What happens if the file does not exist?

Thanks - that's handled in the commit I pushed just now.

plribeiro3000 · 2019-07-17T19:28:16Z

LGTM except some details that i don't have proper knowledge.

Is there a scenario where /etc/foreman/ceph.conf exists but vol.pool_name.include? args["libvirt_ceph_pool"] evaluates to false? And what about the vice versa? Is that something we should be worried here?

olifre · 2019-07-17T19:56:53Z

Is there a scenario where /etc/foreman/ceph.conf exists but vol.pool_name.include? args["libvirt_ceph_pool"] evaluates to false? And what about the vice versa? Is that something we should be worried here?

I think both these cases can only happen on user misconfiguration (e.g. filling /etc/foreman/ceph.conf with a broken pool name). So I do not think we need to worry about it here.

plribeiro3000 · 2019-07-17T21:09:21Z

@olifre In case the user missconfigure, which error he is gonna face?

My worry is that the error might not point to the right spot. If a simple condition might give a better output in case of error i believe it worth.

Can we add something to check this? Or even use the same condition in both lines?

olifre · 2019-07-18T00:44:21Z

@plribeiro3000 Rethinking about the two possibilities:

The first possibility would tbe that the user creates a volume in a pool which does not have configuration in /etc/foreman/ceph.conf. We have no way to distinguish whether that's meant to be a Ceph pool, or maybe in an LVM-based pool at this point, for example (we use the user configuration, i.e. the file to decide). So it's good that vol.pool_name.include? args["libvirt_ceph_pool"] evaluates to false even if /etc/foreman/ceph.conf is present since this may be what the user wants - the user might not be using Ceph exclusively. In other words: It would not be bad to generate an error, since it would mean that if the file exists, the user can only use Ceph RBD exclusively and any use of other pools would be turned off.
I'm not really sure when the vice-versa case should happen. The expression can never really evaluate to true if args is not filled from the file - right?

plribeiro3000 · 2019-07-18T16:51:03Z

@olifre I guess my concern would be to evaluate the same condition in both scenarios as it seems the 2 code blocks are co dependents.

Maybe we can have an extra conditional one of the blocks that evaluates if one of those 2 scenarios are happening and Log a message specifying it. WDYT?

olifre · 2019-07-18T17:10:03Z

@plribeiro3000 What exactly do you mean with "co-dependents"?

Maybe we can have an extra conditional one of the blocks that evaluates if one of those 2 scenarios are happening and Log a message specifying it. WDYT?

I don't think it's a good idea to log something in case of wanted behaviour, since this may irritate the user. Which kind of log message would you propose?
And how exactly would the second scenario be triggered?

strzibny · 2019-07-18T18:16:02Z

Shouldn't there be a nicer opt-in than just reading /etc/foreman/ceph.conf file? I mean other features have attribute in Compute::Server to determine the feature. Perhaps if there is an attribute with the ceph configuration then we know the user is interested in this.

olifre · 2019-07-18T18:20:09Z

It seemed the most straightforward solution to me as a Ceph user, since the format of the ceph configuration file is similar, and needs to be maintained by any ceph user anyways (at least for the list of mons) - and also the libvirt-secret has to be maintained outside of libfog.

plribeiro3000 · 2019-07-18T20:34:40Z

@olifre My understanding is that the 2 blocks need to work together for this feature to succeed.
But on each of those there is a different check which may lead to confusion if he user is not an expert with CEPH.

The best approach here would be to either have a configuration option like @strzibny suggested or a way to inform the user of the missconfiguration.

IMHO this code is not good because most of the users do not have proper knowledge of the tools and this kind of implementation does not help besides add more complexity to it.

plribeiro3000 · 2019-07-18T20:35:14Z

@plribeiro3000 What exactly do you mean with "co-dependents"?

I mean that the 2 blocks need to be executed for this feature to work, it can't be just one or another but rather both.

olifre · 2019-07-18T21:02:00Z

My understanding is that the 2 blocks need to work together for this feature to succeed.

Yes and no. The first block parses the config file. The second block is executed iff the user creates a volume inside a Ceph RBD pool - and must not be executed if the volume is created in a different pool. So both need different conditionals, and it's impossible to see if there is a misconfiguration algorithmically since the user may have a Ceph RBD pool and a non-Ceph pool, and may use both. So even if the configuration is there, there is a valid usecase to not execute the second part for some volumes, in case the VM which is created has it's volumes (or only some volumes) in a different pool.

But on each of those there is a different check which may lead to confusion if he user is not an expert with CEPH.

There is no Ceph knowledge needed here apart from the documentation part which I also added in this PR.
To rephrase the conditional:

The config file is always read and - in a way - global configuration.
The second part (volume creation) depends on which pool the volume is created in. If it's a Ceph RBD pool, the newly added code should be executed, if it's in a different pool, it should not be executed, and that's not an error.

The best approach here would be to either have a configuration option like @strzibny suggested or a way to inform the user of the missconfiguration.

I don't understand how to detect the actual misconfiguration case - can you point it out to me?

I also do not understand how a configuration option would help - having the configuration file is something globally needed, while the actual template generation depends on which pool the volume is created in, which may differ for each single volume if the user creates the volumes in different pools. Where would this global configuration option be, how should a tool like Foreman interface with it (there's no global libfog configuration in there as far as I can see), and how should the user maintain it? Adding a different kind of configuration in addition to the Ceph configuration any Ceph user has to maintain is an extra complication in my opinion. @alexjfisher , how would you expect configuration to be?

plribeiro3000 · 2019-07-18T23:00:58Z

@olifre As i mentioned before i do now know anything about CEPH to discuss the actual process of it so i'm trying to make sense out of it.

I do understand what you are saying about being an user miss configuration but as i see there is still some improvement we can do here to help out newcomers.

The way i see the conditional should be equal in both statements or at least make it in a way that the system will tell the user if something unexpected happens (Like configuring a ceph file for a pool where no volume is CEPH).

But i wont block this since i'm not an active contributor. If @strzibny thinks its good im ok with it.

olifre · 2019-07-18T23:09:27Z

The way i see the conditional should be equal in both statements or at least make it in a way that the system will tell the user if something unexpected happens (Like configuring a ceph file for a pool where no volume is CEPH).

I think you did not get the point I tried to make (or I did not get your point): If the condition would be the same, you would force the user to only ever create Ceph RBD volumes, breaking the existing use case of creating non-RBD libvirt volumes in a different pool. As the code is now:

The config file is always read (if it exists).
Depending on the libvirt pool the volume is created in, it will either be configured as a Ceph RBD volume (using the new code branch) or as a "classic" volume (existing code).

The user actively has to create a config file containing libvirt_ceph_pool=name_of_the_libvirt_pool_holding_ceph_rbd_volumes as I documented, but even if the user does that, volumes are still allowed to be created in other pools (which is an existing use case). So I don't see a chance for misconfiguration.

plribeiro3000 · 2019-07-19T00:18:47Z

Agree that we are not on the same page. Communication is hard. 😅

I understand that there are other possibilities here but i would like to make it a little bit better for a scenario where the user wants to use CEPH and does create a file but no volume is CEPH or he select volumes that are CEPH but does not create the configuration file.

If if vol.pool_name.include? args["libvirt_ceph_pool"] evaluates to false you are executing the old code so it is not true that if the conditional is the same the user wont be able to create other non-RBD libvirt volumes in a different pool.

I would change this it to check both states at least.

<% if File.file?('/etc/foreman/ceph.conf') && vol.pool_name.include?(args["libvirt_ceph_pool"]) %>
  <disk type='network' device='disk'>
    <driver name='qemu' type='<%= vol.format_type %>' cache='writeback' discard='unmap'/>
    <source protocol='rbd' name='<%= vol.path %>'>
      <% args["monitor"].split(",").each do |mon| %>
        <host name='<%= mon %>' port='<%= args["port"] %>'/>
      <% end %>
    </source>
    <auth username='<%= args["auth_username"] %>'>
      <secret type='ceph' uuid='<%= args["auth_uuid"] %>'/>
    </auth>
    <target dev='sd<%= ('a'..'z').to_a[volumes.index(vol)] %>' bus='scsi'/>
  </disk>
<% else %>
  <disk type='file' device='disk'>
    <driver name='qemu' type='<%= vol.format_type %>'/>
    <source file='<%= vol.path %>'/>
    <%# we need to ensure a unique target dev -%>
    <target dev='vd<%= ('a'..'z').to_a[volumes.index(vol)] %>' bus='virtio'/>
  </disk>
<% end %>

This condition makes a clearer statement of the condition necessary to execute the block of code:

If i have a configuration file AND this volume is in a CEPH pool
do this
otherwise
do the usual

olifre · 2019-07-19T00:21:59Z

Agree that we are not on the same page. Communication is hard. sweat_smile

Indeed - only now I get what you meant 😉.
Yes, that is indeed more readable - behaviour will be the same, since args will in any case not be filled if the file does not exist, but the code shows the intended behaviour better (the performance loss of an addtional stat syscall is reasonable I think). Will change, thanks!

Make the "config file exists" condition part of both checks.

plribeiro3000 · 2019-07-19T00:25:43Z

I'm glad we figured it out.

Thank you for bear with me as we get to understand each other. 🎉

plribeiro3000 · 2019-07-19T00:26:42Z

GTG for me!

olifre · 2019-07-19T00:27:19Z

@plribeiro3000 Thanks, also for bearing with me until we pulled this communication hurdle down 😄.

Bluewind · 2019-09-09T08:41:45Z

We'd love to use this as well. What's blocking this from getting merged?

Bluewind · 2019-09-09T08:47:08Z

lib/fog/libvirt/models/compute/templates/server.xml.erb

+      <auth username='<%= args["auth_username"] %>'>
+        <secret type='ceph' uuid='<%= args["auth_uuid"] %>'/>
+      </auth>
+      <target dev='sd<%= ('a'..'z').to_a[volumes.index(vol)] %>' bus='scsi'/>


Why does this use scsi instead of virtio? virtio appears to work fine for me and the regular disk block below also uses it.

virtio does not support discard / fstrim yet (unless you use very recent kernels). scsi however is mature, similar in performance, has device names matching what is encountered on a bare-metal host, and supports discard on LTS distributions, which grants orders of magnitude of space savings if used with Ceph RBD.

Bluewind · 2019-09-09T08:49:51Z

lib/fog/libvirt/models/compute/templates/server.xml.erb

+        <% end %>
+      </source>
+      <auth username='<%= args["auth_username"] %>'>
+        <secret type='ceph' uuid='<%= args["auth_uuid"] %>'/>


You could also fetch the secret by its name/usage with <secret type="ceph" usage="foo-secret-name-foo"/>. This makes the config much easier to read and to deploy since you do not have to deal with the secret UUID.

This is a good idea - will give it a go (not sure if I manage this week, though). Thanks!

@Bluewind I think the new syntax is not yet understood by libvirt as shipped with CentOS 7, which is still in wide use nowadays (that probably is also the reason why the Ceph docs use the UUID notation still on https://docs.ceph.com/docs/nautilus/rbd/libvirt/ ).
At least I tried:

<disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth type='ceph' username='libvirt'> <secret usage='client.libvirt secret'/> </auth> <source protocol='rbd' name='rbd/test-vm.XXXX-disk1'> .... </source> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>

just now on CentOS 7, and it tells me:

Error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domain.rng Extra element devices in interleave Element domain failed to validate content

So I think for portability /compatibility reasons, we have to stay with UUID for now.

I see. Could you maybe add settings for both options and use the name only when it is set to a non-empty string? In our setup (Ubuntu 18.04) the name already works just fine. Also, if you add the option now, people can switch whenever their environment supports it and they don't have to wait for another patch.

Good idea! Since I can not test myself, can you confirm the syntax I posted is correct?
I will then implement this hopefully next month (I am travelling to conferences a lot currently).

@Bluewind: To spell it out, you mean:

<disk type='network' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <auth username='libvirt'> <secret type='ceph' usage='client.libvirt secret'/> </auth> <source protocol='rbd' name='rbd/test-vm.XXXX-disk1'> .... </source> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>

works fine for you? This gives the same error for me on CentOS 7 as I stated above, so if you tell me this works, I'll take it for granted and implement it ;-).

Weird. Yes, that configuration looks like the one I tested.

I'm currently on vacation so I can't test this, but I think you can just set the type attribute with both elements and then it will probably work in either case. I can test this in around 1-2 weeks if you want.

@Bluewind I think setting type on both must fail due to XML schema validation. Since the version in CentOS 7 does not seem to support usage at all, I will then implement as follows:

If the user has specified a UUID, use that (compatible also with older versions).

If not, and a usage string has been specified, use that (works only with more recent distros).

In both cases, leaving the type attribute as part of the secret tag should work just fine.

@Bluewind This is implemented in the commit I pushed just now, including documentation. I tested that the XMLs are generated fine, as expected.

After a quick look, I'd say that this looks good. Thanks!

Only supported on more recent libvirt versions, allows to specify the RBD secret by name instead of by UUID.

olifre · 2019-10-05T17:48:50Z

@strzibny I believe the commit I added just now addresses all of the suggestions by @Bluewind ,
and can confirm it creates the XML files correctly (while I can't test the usage attribute myself in my environment, it matches documentation and the tests by @Bluewind ).

strzibny · 2019-10-06T17:45:19Z

I am unsure about coupling RBD feature with Foreman by hardcoding /etc/foreman/ceph.conf path.

The official documentation talks about the following locations (https://docs.ceph.com/docs/jewel/rados/configuration/ceph-conf/):

The default Ceph configuration file locations in sequential order include:

    $CEPH_CONF (i.e., the path following the $CEPH_CONF environment variable)
    -c path/path (i.e., the -c command line argument)
    /etc/ceph/ceph.conf
    ~/.ceph/config
    ./ceph.conf (i.e., in the current working directory)

I am not a Ceph guy, mind you. I am just suspicious about this path being the only option.

olifre · 2019-10-06T17:49:48Z

@strzibny The ceph configuration file has different syntax (and actually, in more recent ceph versions, it is going away). The config file used here only contains the minimum information needed to access ceph as a client, and additional information about the necessary libvirt secret to use.
So this is a different configuration file than the ceph.conf from the Ceph documentation and is, in fact, specific to this usecase.

Also, in common setups, Foreman will run on a node without direct access to Ceph (in our case, there's nothing from Ceph installed on the node) - but the configuration is still needed to pass it on to remote libvirt nodes upon creation of new VMs.

strzibny · 2019-10-07T15:30:28Z

In that case I think it's okay. If there would be someone requiring RBD in different setups, we can revisit it.

strzibny · 2019-10-07T15:33:26Z

@olifre can I ask you about https://projects.theforeman.org/issues/12063 which you raised as a concern in the beginning?

olifre · 2019-10-07T17:53:03Z

@strzibny This issue still persists. I am unsure about the correct fix - in principle, one could change:

fog-libvirt/lib/fog/libvirt/requests/compute/list_domains.rb

Lines 41 to 43 in ca92983

    
           def domain_volumes xml 
        
             xml_elements(xml, "domain/devices/disk/source", "file") 
        
           end

to fall back to the name attribute in case the file attribute does not exist, but I did not yet have the time to test this out. However, this affects any existing VMs not based on files (not only RBD volumes).

olifre · 2019-10-07T21:27:31Z

@strzibny I have finally investigated this more in-depth and using the following code instead:

        def domain_volumes xml
          vols_by_file = xml_elements(xml, "domain/devices/disk/source", "file")
          vols_by_name = xml_elements(xml, "domain/devices/disk/source", "name")
          vols = []
          vols_by_file.zip(vols_by_name).each do |by_file,by_name|
              vols.push(by_file.nil? ? by_name : by_file)
          end
          vols
        end

fixes it for any volume which is not file-based, but has a unique name (which, I think, is everything) and should not break existing cases.
Would you like to see this as part of this PR, or separately?

olifre · 2019-10-14T21:22:48Z

@strzibny Let me know which approach you prefer, and I'll take it 😉.

strzibny · 2019-10-15T07:40:57Z

@olifre cool. Can you add here as a new commit perhaps?

For network-based disks, the unique key is the name and there is no underlying file. Accept both from the XML, and prefer the file name if present.

olifre · 2019-10-15T07:50:35Z

@strzibny Done 😄.
Btw if you have a suggestion on how to make this ruby snippet more efficient, let me know, I'm not a ruby expert, but it works well in all my tests.

olifre · 2019-10-21T20:26:34Z

@strzibny Just a friendly ping: Do you see anything still missing here?

strzibny · 2019-10-28T12:10:58Z

I am going to merge this and do a release soon, thanks for your contributions everyone.

Add creation of RBD volumes and document it.

d1ed331

Fixes fog#54.

plribeiro3000 reviewed Jul 17, 2019

View reviewed changes

Check for existence of Ceph RBD config file before reading it.

ca3a388

Clarify logic of Ceph RBD creation.

5883807

Make the "config file exists" condition part of both checks.

Bluewind suggested changes Sep 9, 2019

View reviewed changes

strzibny mentioned this pull request Sep 23, 2019

Handle blank text in allocation #62

Merged

Add and document possibility to use auth_usage instead of auth_uuid.

d31a21e

Only supported on more recent libvirt versions, allows to specify the RBD secret by name instead of by UUID.

Handle domain volumes with disk source being not a file.

59643e5

For network-based disks, the unique key is the name and there is no underlying file. Accept both from the XML, and prefer the file name if present.

strzibny merged commit 7709a3b into fog:master Oct 28, 2019

strzibny mentioned this pull request Oct 28, 2019

Adds support for libvirt rbd volume backend #3285 #1

Closed

Add creation of RBD volumes and document it. #56

Add creation of RBD volumes and document it. #56

Conversation

olifre commented Jul 14, 2018

alexjfisher commented Apr 11, 2019

olifre commented May 26, 2019

alexjfisher commented May 28, 2019

alexjfisher commented Jul 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plribeiro3000 commented Jul 17, 2019

olifre commented Jul 17, 2019

plribeiro3000 commented Jul 17, 2019

olifre commented Jul 18, 2019 • edited Loading

plribeiro3000 commented Jul 18, 2019

olifre commented Jul 18, 2019

strzibny commented Jul 18, 2019

olifre commented Jul 18, 2019

plribeiro3000 commented Jul 18, 2019

plribeiro3000 commented Jul 18, 2019 • edited Loading

olifre commented Jul 18, 2019

plribeiro3000 commented Jul 18, 2019

olifre commented Jul 18, 2019

plribeiro3000 commented Jul 19, 2019

olifre commented Jul 19, 2019

plribeiro3000 commented Jul 19, 2019

plribeiro3000 commented Jul 19, 2019

olifre commented Jul 19, 2019

Bluewind commented Sep 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bluewind Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olifre Sep 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olifre commented Oct 5, 2019

strzibny commented Oct 6, 2019

olifre commented Oct 6, 2019

strzibny commented Oct 7, 2019

strzibny commented Oct 7, 2019

olifre commented Oct 7, 2019

olifre commented Oct 7, 2019

olifre commented Oct 14, 2019

strzibny commented Oct 15, 2019

olifre commented Oct 15, 2019

olifre commented Oct 21, 2019

strzibny commented Oct 28, 2019

olifre commented Jul 18, 2019 •

edited

Loading

plribeiro3000 commented Jul 18, 2019 •

edited

Loading

Bluewind Sep 9, 2019 •

edited

Loading

olifre Sep 14, 2019 •

edited

Loading