Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add creation of RBD volumes and document it. #56

Merged
merged 5 commits into from
Oct 28, 2019

Conversation

olifre
Copy link
Contributor

@olifre olifre commented Jul 14, 2018

Fixes #54.

I'm unsure what to make configurable - cache option should likely be made configurable (and could go to ceph.conf), but discard (which works only with bus=scsi) seems like the best default to me.

Also, it seems deletion of volumes created this way fails with fog:
https://projects.theforeman.org/issues/12063

@alexjfisher
Copy link

I'm interested in this feature, but this PR was created quite a while back now. Could it be revived?

@olifre
Copy link
Contributor Author

olifre commented May 26, 2019

Could it be revived?

What do you mean with "revived"?
I'm still here and could rebase, but I did not do anything yet since I received no reply from anybody upstream 😢

@alexjfisher
Copy link

@olifre Hi! Sorry, I'm just a (foreman) user hoping this feature would be accepted.

I guess I should have pinged @plribeiro3000? Thanks.

@alexjfisher
Copy link

or maybe @strzibny ?

@@ -32,13 +32,36 @@
<% end -%>
<clock offset='utc'/>
<devices>
<% args = {}
File.readlines('/etc/foreman/ceph.conf').each do |line|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the file does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - that's handled in the commit I pushed just now.

@plribeiro3000
Copy link
Member

LGTM except some details that i don't have proper knowledge.

Is there a scenario where /etc/foreman/ceph.conf exists but vol.pool_name.include? args["libvirt_ceph_pool"] evaluates to false? And what about the vice versa? Is that something we should be worried here?

@olifre
Copy link
Contributor Author

olifre commented Jul 17, 2019

Is there a scenario where /etc/foreman/ceph.conf exists but vol.pool_name.include? args["libvirt_ceph_pool"] evaluates to false? And what about the vice versa? Is that something we should be worried here?

I think both these cases can only happen on user misconfiguration (e.g. filling /etc/foreman/ceph.conf with a broken pool name). So I do not think we need to worry about it here.

@plribeiro3000
Copy link
Member

@olifre In case the user missconfigure, which error he is gonna face?

My worry is that the error might not point to the right spot. If a simple condition might give a better output in case of error i believe it worth.

Can we add something to check this? Or even use the same condition in both lines?

@olifre
Copy link
Contributor Author

olifre commented Jul 18, 2019

@plribeiro3000 Rethinking about the two possibilities:

  1. The first possibility would tbe that the user creates a volume in a pool which does not have configuration in /etc/foreman/ceph.conf. We have no way to distinguish whether that's meant to be a Ceph pool, or maybe in an LVM-based pool at this point, for example (we use the user configuration, i.e. the file to decide). So it's good that vol.pool_name.include? args["libvirt_ceph_pool"] evaluates to false even if /etc/foreman/ceph.conf is present since this may be what the user wants - the user might not be using Ceph exclusively. In other words: It would not be bad to generate an error, since it would mean that if the file exists, the user can only use Ceph RBD exclusively and any use of other pools would be turned off.
  2. I'm not really sure when the vice-versa case should happen. The expression can never really evaluate to true if args is not filled from the file - right?

@plribeiro3000
Copy link
Member

@olifre I guess my concern would be to evaluate the same condition in both scenarios as it seems the 2 code blocks are co dependents.

Maybe we can have an extra conditional one of the blocks that evaluates if one of those 2 scenarios are happening and Log a message specifying it. WDYT?

@olifre
Copy link
Contributor Author

olifre commented Jul 18, 2019

@plribeiro3000 What exactly do you mean with "co-dependents"?

Maybe we can have an extra conditional one of the blocks that evaluates if one of those 2 scenarios are happening and Log a message specifying it. WDYT?

I don't think it's a good idea to log something in case of wanted behaviour, since this may irritate the user. Which kind of log message would you propose?
And how exactly would the second scenario be triggered?

@strzibny
Copy link
Member

Shouldn't there be a nicer opt-in than just reading /etc/foreman/ceph.conf file? I mean other features have attribute in Compute::Server to determine the feature. Perhaps if there is an attribute with the ceph configuration then we know the user is interested in this.

@olifre
Copy link
Contributor Author

olifre commented Jul 18, 2019

It seemed the most straightforward solution to me as a Ceph user, since the format of the ceph configuration file is similar, and needs to be maintained by any ceph user anyways (at least for the list of mons) - and also the libvirt-secret has to be maintained outside of libfog.

@plribeiro3000
Copy link
Member

@olifre My understanding is that the 2 blocks need to work together for this feature to succeed.
But on each of those there is a different check which may lead to confusion if he user is not an expert with CEPH.

The best approach here would be to either have a configuration option like @strzibny suggested or a way to inform the user of the missconfiguration.

IMHO this code is not good because most of the users do not have proper knowledge of the tools and this kind of implementation does not help besides add more complexity to it.

@plribeiro3000
Copy link
Member

plribeiro3000 commented Jul 18, 2019

@plribeiro3000 What exactly do you mean with "co-dependents"?

I mean that the 2 blocks need to be executed for this feature to work, it can't be just one or another but rather both.

@olifre
Copy link
Contributor Author

olifre commented Jul 18, 2019

My understanding is that the 2 blocks need to work together for this feature to succeed.

Yes and no. The first block parses the config file. The second block is executed iff the user creates a volume inside a Ceph RBD pool - and must not be executed if the volume is created in a different pool. So both need different conditionals, and it's impossible to see if there is a misconfiguration algorithmically since the user may have a Ceph RBD pool and a non-Ceph pool, and may use both. So even if the configuration is there, there is a valid usecase to not execute the second part for some volumes, in case the VM which is created has it's volumes (or only some volumes) in a different pool.

But on each of those there is a different check which may lead to confusion if he user is not an expert with CEPH.

There is no Ceph knowledge needed here apart from the documentation part which I also added in this PR.
To rephrase the conditional:

  • The config file is always read and - in a way - global configuration.
  • The second part (volume creation) depends on which pool the volume is created in. If it's a Ceph RBD pool, the newly added code should be executed, if it's in a different pool, it should not be executed, and that's not an error.

The best approach here would be to either have a configuration option like @strzibny suggested or a way to inform the user of the missconfiguration.

I don't understand how to detect the actual misconfiguration case - can you point it out to me?

I also do not understand how a configuration option would help - having the configuration file is something globally needed, while the actual template generation depends on which pool the volume is created in, which may differ for each single volume if the user creates the volumes in different pools. Where would this global configuration option be, how should a tool like Foreman interface with it (there's no global libfog configuration in there as far as I can see), and how should the user maintain it? Adding a different kind of configuration in addition to the Ceph configuration any Ceph user has to maintain is an extra complication in my opinion. @alexjfisher , how would you expect configuration to be?

@plribeiro3000
Copy link
Member

@olifre As i mentioned before i do now know anything about CEPH to discuss the actual process of it so i'm trying to make sense out of it.

I do understand what you are saying about being an user miss configuration but as i see there is still some improvement we can do here to help out newcomers.

The way i see the conditional should be equal in both statements or at least make it in a way that the system will tell the user if something unexpected happens (Like configuring a ceph file for a pool where no volume is CEPH).

But i wont block this since i'm not an active contributor. If @strzibny thinks its good im ok with it.

@olifre
Copy link
Contributor Author

olifre commented Jul 18, 2019

The way i see the conditional should be equal in both statements or at least make it in a way that the system will tell the user if something unexpected happens (Like configuring a ceph file for a pool where no volume is CEPH).

I think you did not get the point I tried to make (or I did not get your point): If the condition would be the same, you would force the user to only ever create Ceph RBD volumes, breaking the existing use case of creating non-RBD libvirt volumes in a different pool. As the code is now:

  • The config file is always read (if it exists).
  • Depending on the libvirt pool the volume is created in, it will either be configured as a Ceph RBD volume (using the new code branch) or as a "classic" volume (existing code).

The user actively has to create a config file containing libvirt_ceph_pool=name_of_the_libvirt_pool_holding_ceph_rbd_volumes as I documented, but even if the user does that, volumes are still allowed to be created in other pools (which is an existing use case). So I don't see a chance for misconfiguration.

@plribeiro3000
Copy link
Member

Agree that we are not on the same page. Communication is hard. 😅

I understand that there are other possibilities here but i would like to make it a little bit better for a scenario where the user wants to use CEPH and does create a file but no volume is CEPH or he select volumes that are CEPH but does not create the configuration file.

If if vol.pool_name.include? args["libvirt_ceph_pool"] evaluates to false you are executing the old code so it is not true that if the conditional is the same the user wont be able to create other non-RBD libvirt volumes in a different pool.

I would change this it to check both states at least.

<% if File.file?('/etc/foreman/ceph.conf') && vol.pool_name.include?(args["libvirt_ceph_pool"]) %>
  <disk type='network' device='disk'>
    <driver name='qemu' type='<%= vol.format_type %>' cache='writeback' discard='unmap'/>
    <source protocol='rbd' name='<%= vol.path %>'>
      <% args["monitor"].split(",").each do |mon| %>
        <host name='<%= mon %>' port='<%= args["port"] %>'/>
      <% end %>
    </source>
    <auth username='<%= args["auth_username"] %>'>
      <secret type='ceph' uuid='<%= args["auth_uuid"] %>'/>
    </auth>
    <target dev='sd<%= ('a'..'z').to_a[volumes.index(vol)] %>' bus='scsi'/>
  </disk>
<% else %>
  <disk type='file' device='disk'>
    <driver name='qemu' type='<%= vol.format_type %>'/>
    <source file='<%= vol.path %>'/>
    <%# we need to ensure a unique target dev -%>
    <target dev='vd<%= ('a'..'z').to_a[volumes.index(vol)] %>' bus='virtio'/>
  </disk>
<% end %>

This condition makes a clearer statement of the condition necessary to execute the block of code:

If i have a configuration file AND this volume is in a CEPH pool
do this
otherwise
do the usual

@olifre
Copy link
Contributor Author

olifre commented Jul 19, 2019

Agree that we are not on the same page. Communication is hard. sweat_smile

Indeed - only now I get what you meant 😉.
Yes, that is indeed more readable - behaviour will be the same, since args will in any case not be filled if the file does not exist, but the code shows the intended behaviour better (the performance loss of an addtional stat syscall is reasonable I think). Will change, thanks!

Make the "config file exists" condition part
of both checks.
@plribeiro3000
Copy link
Member

I'm glad we figured it out.

Thank you for bear with me as we get to understand each other. 🎉

@plribeiro3000
Copy link
Member

GTG for me!

@olifre
Copy link
Contributor Author

olifre commented Jul 19, 2019

@plribeiro3000 Thanks, also for bearing with me until we pulled this communication hurdle down 😄.

@Bluewind
Copy link
Contributor

Bluewind commented Sep 9, 2019

We'd love to use this as well. What's blocking this from getting merged?

<auth username='<%= args["auth_username"] %>'>
<secret type='ceph' uuid='<%= args["auth_uuid"] %>'/>
</auth>
<target dev='sd<%= ('a'..'z').to_a[volumes.index(vol)] %>' bus='scsi'/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this use scsi instead of virtio? virtio appears to work fine for me and the regular disk block below also uses it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtio does not support discard / fstrim yet (unless you use very recent kernels). scsi however is mature, similar in performance, has device names matching what is encountered on a bare-metal host, and supports discard on LTS distributions, which grants orders of magnitude of space savings if used with Ceph RBD.

<% end %>
</source>
<auth username='<%= args["auth_username"] %>'>
<secret type='ceph' uuid='<%= args["auth_uuid"] %>'/>
Copy link
Contributor

@Bluewind Bluewind Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also fetch the secret by its name/usage with <secret type="ceph" usage="foo-secret-name-foo"/>. This makes the config much easier to read and to deploy since you do not have to deal with the secret UUID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea - will give it a go (not sure if I manage this week, though). Thanks!

Copy link
Contributor Author

@olifre olifre Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bluewind I think the new syntax is not yet understood by libvirt as shipped with CentOS 7, which is still in wide use nowadays (that probably is also the reason why the Ceph docs use the UUID notation still on https://docs.ceph.com/docs/nautilus/rbd/libvirt/ ).
At least I tried:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      <auth type='ceph' username='libvirt'>
        <secret usage='client.libvirt secret'/>
      </auth>
      <source protocol='rbd' name='rbd/test-vm.XXXX-disk1'>
        ....
      </source>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

just now on CentOS 7, and it tells me:

Error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domain.rng
Extra element devices in interleave
Element domain failed to validate content

So I think for portability /compatibility reasons, we have to stay with UUID for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Could you maybe add settings for both options and use the name only when it is set to a non-empty string? In our setup (Ubuntu 18.04) the name already works just fine. Also, if you add the option now, people can switch whenever their environment supports it and they don't have to wait for another patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Since I can not test myself, can you confirm the syntax I posted is correct?
I will then implement this hopefully next month (I am travelling to conferences a lot currently).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bluewind: To spell it out, you mean:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      <auth username='libvirt'>
        <secret type='ceph' usage='client.libvirt secret'/>
      </auth>
      <source protocol='rbd' name='rbd/test-vm.XXXX-disk1'>
        ....
      </source>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

works fine for you? This gives the same error for me on CentOS 7 as I stated above, so if you tell me this works, I'll take it for granted and implement it ;-).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. Yes, that configuration looks like the one I tested.

I'm currently on vacation so I can't test this, but I think you can just set the type attribute with both elements and then it will probably work in either case. I can test this in around 1-2 weeks if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bluewind I think setting type on both must fail due to XML schema validation. Since the version in CentOS 7 does not seem to support usage at all, I will then implement as follows:

  • If the user has specified a UUID, use that (compatible also with older versions).
  • If not, and a usage string has been specified, use that (works only with more recent distros).

In both cases, leaving the type attribute as part of the secret tag should work just fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bluewind This is implemented in the commit I pushed just now, including documentation. I tested that the XMLs are generated fine, as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a quick look, I'd say that this looks good. Thanks!

Only supported on more recent libvirt versions,
allows to specify the RBD secret by name instead of by UUID.
@olifre
Copy link
Contributor Author

olifre commented Oct 5, 2019

@strzibny I believe the commit I added just now addresses all of the suggestions by @Bluewind ,
and can confirm it creates the XML files correctly (while I can't test the usage attribute myself in my environment, it matches documentation and the tests by @Bluewind ).

@strzibny
Copy link
Member

strzibny commented Oct 6, 2019

I am unsure about coupling RBD feature with Foreman by hardcoding /etc/foreman/ceph.conf path.

The official documentation talks about the following locations (https://docs.ceph.com/docs/jewel/rados/configuration/ceph-conf/):

The default Ceph configuration file locations in sequential order include:

    $CEPH_CONF (i.e., the path following the $CEPH_CONF environment variable)
    -c path/path (i.e., the -c command line argument)
    /etc/ceph/ceph.conf
    ~/.ceph/config
    ./ceph.conf (i.e., in the current working directory)

I am not a Ceph guy, mind you. I am just suspicious about this path being the only option.

@olifre
Copy link
Contributor Author

olifre commented Oct 6, 2019

@strzibny The ceph configuration file has different syntax (and actually, in more recent ceph versions, it is going away). The config file used here only contains the minimum information needed to access ceph as a client, and additional information about the necessary libvirt secret to use.
So this is a different configuration file than the ceph.conf from the Ceph documentation and is, in fact, specific to this usecase.

Also, in common setups, Foreman will run on a node without direct access to Ceph (in our case, there's nothing from Ceph installed on the node) - but the configuration is still needed to pass it on to remote libvirt nodes upon creation of new VMs.

@strzibny
Copy link
Member

strzibny commented Oct 7, 2019

In that case I think it's okay. If there would be someone requiring RBD in different setups, we can revisit it.

@strzibny
Copy link
Member

strzibny commented Oct 7, 2019

@olifre can I ask you about https://projects.theforeman.org/issues/12063 which you raised as a concern in the beginning?

@olifre
Copy link
Contributor Author

olifre commented Oct 7, 2019

@strzibny This issue still persists. I am unsure about the correct fix - in principle, one could change:

def domain_volumes xml
xml_elements(xml, "domain/devices/disk/source", "file")
end

to fall back to the name attribute in case the file attribute does not exist, but I did not yet have the time to test this out. However, this affects any existing VMs not based on files (not only RBD volumes).

@olifre
Copy link
Contributor Author

olifre commented Oct 7, 2019

@strzibny I have finally investigated this more in-depth and using the following code instead:

        def domain_volumes xml
          vols_by_file = xml_elements(xml, "domain/devices/disk/source", "file")
          vols_by_name = xml_elements(xml, "domain/devices/disk/source", "name")
          vols = []
          vols_by_file.zip(vols_by_name).each do |by_file,by_name|
              vols.push(by_file.nil? ? by_name : by_file)
          end
          vols
        end

fixes it for any volume which is not file-based, but has a unique name (which, I think, is everything) and should not break existing cases.
Would you like to see this as part of this PR, or separately?

@olifre
Copy link
Contributor Author

olifre commented Oct 14, 2019

@strzibny Let me know which approach you prefer, and I'll take it 😉.

@strzibny
Copy link
Member

@olifre cool. Can you add here as a new commit perhaps?

For network-based disks, the unique key is the name
and there is no underlying file. Accept both from the XML,
and prefer the file name if present.
@olifre
Copy link
Contributor Author

olifre commented Oct 15, 2019

@strzibny Done 😄.
Btw if you have a suggestion on how to make this ruby snippet more efficient, let me know, I'm not a ruby expert, but it works well in all my tests.

@olifre
Copy link
Contributor Author

olifre commented Oct 21, 2019

@strzibny Just a friendly ping: Do you see anything still missing here?

@strzibny
Copy link
Member

I am going to merge this and do a release soon, thanks for your contributions everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ceph RBD volume creation
5 participants