Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iscsi auth using k8s secrets #358

Merged
merged 12 commits into from
Aug 18, 2017
Merged

Conversation

davidvossel
Copy link
Member

@davidvossel davidvossel commented Aug 4, 2017

This feature lets us map k8s secrets to VMs for iscsi auth.

Workflow

  1. create a k8s secret containing the password and username fields.
cat << END > my-chap-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: myK8sSecretID
  namespace: default

apiVersion: v1
kind: Secret
metadata:
  name: my-chap-secret
type: "kubernetes.io/iscsi-chap"  
data:
  node.session.auth.username: $(echo "myUsername" | base64 -w0)
  node.session.auth.password: $(echo "mySuperSecretPassword" | base64 -w0)
END
  1. create a vm that references the name given to the k8s secret in the iscsi usage field
cat << END > my-vm.yaml
kind: VM
metadata:
  name: testvm
  namespace: default
spec:
  domain:
    devices:
      disks:
      - Auth:
          secret:
            type: iscsi
            usage: my-chap-secret
        type: network
        snapshot: external
        device: disk
        driver:
          name: qemu
          type: raw
          cache: none
        source:
          host:
            name: iscsi-demo-target.default
            port: "3260"
          protocol: iscsi
          name: iqn.2017-01.io.kubevirt:sn.42/2
        target:
          dev: vda
    memory:
      unit: MB
      value: 64
    os:
      type:
        os: hvm
    type: qemu
END
  1. Add the secret and vm to the cluster.
kubectl create -f my-chap-secret.yaml
kubectl create -f my-vm.yaml

From there, the password field in the k8s secret will automatically be mapped to a libvirt secret when the VM is scheduled to a node allowing the iscsi auth to work without any further configuration.

Container Registry Disk Iscsi Auth

The second part of this PR involves auto generating k8s user/secret combos for container registry disks associated with a VM. This ensures CRD disks can only be accessed by their corresponding VM process.

In the future it would also be wise for us to introduce a network policy that further locks down CRD disks to the local node.

func (l *LibvirtDomainManager) RemoveVMSecrets(vm *v1.VM) error {
domName := VMNamespaceKeyFunc(vm)

// TODO this is horribly inefficient. Maybe there's a better way?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of everything in this patch series, this is the part that has given me the most grief. Somehow we need to garbage collect libvirt secrets associated with VMs once those VMs are undefined.

The problem here is that there's no efficient (or elegant) way of maintaining this relationship between the secret and the vm... at least none that I have found.

The result is I have to index every local secret stored in libvirt and use the secret's 'description' field to maintain this mapping between secret and domain.

I'm not happy with this. I don't have a better solution right now. Maybe someone else has a clever solution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohr made a good observation offline about this. Since virt-handler is the only consumer of the libvirt api, we can cache the secret list in a map to make this more efficient.

I'll update the patch shortly to do this.

@mykaul
Copy link

mykaul commented Aug 6, 2017

David, why not use K8S iSCSI w/chap and attach a volume to the VM's container and use it from there?
What's the benefit of involving libvirt here?
I'm looking at https://github.com/kubernetes/examples/blob/master/staging/volumes/iscsi/README.md and I'm not sure I'm seeing any differences in functionality?

@davidvossel
Copy link
Member Author

hey @mykaul

I'm looking at https://github.com/kubernetes/examples/blob/master/staging/volumes/iscsi/README.md and I'm not sure I'm seeing any differences in functionality?

I can see why this looks similar. The big difference here in the document you're referencing k8s has an iscsi device backing a volume that is mounted into the container.

This quote from the doc you referenced might help with making the distinction.
"If you have followed the instructions in the links above you may have partitioned the device, the iSCSI volume plugin does not currently support partitions so format the device as one partition or leave the device raw and Kubernetes will partition and format it one first mount."

Using libvirt, we want to consume iscsi as a block device, and not a mounted filesystem.

In the future, it looks likely that k8s will offer block storage device access. There's a good chance we will be able to support using k8s features like this at some point, but at the moment it doesn't appear to offer what we need for virtualization.

You've brought up a good point though. We need to start working with the upstream k8s community and make them aware of our use case. That could influence priorities of k8s supported features like this.

@mykaul
Copy link

mykaul commented Aug 7, 2017

@davidvossel , in general, I'd say that KubeVirt's strategic direction is to work with upstream Kubernetes and assist, develop and review features there so we can consume them natively from it. Of course, this may not always be feasible from both feature-set or time-perspective, but I think in cases where we diverge and implement our own we might be creating a gap.

K8S are indeed working on raw block @ kubernetes/community#805 .

@davidvossel
Copy link
Member Author

Of course, this may not always be feasible from both feature-set or time-perspective, but I think in cases where we diverge and implement our own we might be creating a gap.

I agree, this is a tough balance for us right now.

@mykaul The doc you referenced about k8s iscsi support gave me an idea.

K8s requires iscsi auth secrets to be formatted like this. https://github.com/kubernetes/examples/blob/master/staging/volumes/iscsi/chap-secret.yaml

I'll just modify my PR to require the k8s format for the iscsi secrets. In the future we'll be able to switch to k8s iscsi support transparently on our backend if we want.

Basically, this lets us move forward with iscsi auth, but do it in a way that lets us converge with k8s easily in the future if they provide iscsi block device access.

@davidvossel davidvossel force-pushed the iscsi-auth-v1 branch 2 times, most recently from e8eafd6 to bfd6e9f Compare August 7, 2017 20:15
@davidvossel
Copy link
Member Author

@rmohr @mykaul

I've rebased the patch and made a few changes.

  1. libvirt secrets are stored in a cache which makes removal much more efficient.

  2. We are now conforming our k8s secrets just like the iscsi chap secrets k8s natively supports. https://github.com/kubernetes/examples/tree/master/staging/volumes/iscsi. I updated the PR description to reflect this change as well.

Item 2 also fixes the weird situation where the username was put in the VM definition, but the password was in the k8s secret. Now both username and password are in the k8s secret. The k8s secret is referenced in the 'usage' field of the disk auth. Behind the scenes we do all the magic to make that work in virt-handler when the VM is launching.

By formatting our secrets like this, we can silently begin using the native k8s iscsi support once we can have direct access to the block device (rather than k8s trying to mount it as a file system)

@mykaul
Copy link

mykaul commented Aug 8, 2017

I assume similar to OpenStack[1] it does not support Discovery CHAP.
I assume similar to OpenStack[?] it does not support mutual auth.

Both are reasonable limitations, but perhaps should be documented somewhere.

[1] https://github.com/openstack/cinder/blob/master/cinder/volume/targets/iscsi.py#L153

@@ -33,6 +33,7 @@ import (
corev1 "kubevirt.io/kubevirt/pkg/api/v1"
"kubevirt.io/kubevirt/pkg/logging"
"kubevirt.io/kubevirt/pkg/precond"
registrydisk "kubevirt.io/kubevirt/pkg/registry-disk"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is becoming a little to ugly, if we do storage specific initialization in the controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiand The registry disk feature involves placing containers in the virt-launcher pod (which is created/managed by virt-controller). That's why there's registry-disk coordination bits in the controller. So yes, it is storage specific, but it is also directly involved with the pod the virt-controller manages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks - I'm getting a better feeling for the issue now.

precond.MustNotBeEmpty(string(vm.GetObjectMeta().GetUID()))

err := registrydisk.Initialize(vm, v.KubeCli)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - Initialization is okay, but it should not be tied to the storage method.

I.e. I do see that we want to attach secrets in this step, but this should be about attaching the secret - and the registry disk is just one thing consuming it later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabiand the iscsi secrets feature and registry disks are completely independent things.

virt-controller is not involved at all for someone who is using iscsi auth for their VM explicitly. For example, all someone has to do to use iscsi auth is create the k8s secret and define the VM with an Auth field that references the k8s secret. Only virt-handler will be involved in that flow.

The registry disk feature consumes the iscsi auth feature by dynamically generating k8s secrets in virt-controller. That's what is being done in this initialization step. We're initializing everything that needs to be filled in before the VM is scheduled related to registry disks.

Because the registry disk feature involves the VM's virt-launcher pod (where registry disks are launched), it seems logical that virt-controller coordinates these parts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

@davidvossel
Copy link
Member Author

I assume similar to OpenStack[1] it does not support Discovery CHAP.
I assume similar to OpenStack[?] it does not support mutual auth.
@mykaul
Right, the only thing we get with this feature is the target authenticates the initiator... so just client authentication. This is a limitation of how we're using libvirt for iscsi.

I'll note this. I need to move what I have in the PR description to a .md file in the doc's directory. I'll note this there.

@davidvossel
Copy link
Member Author

another thing @rmohr has pointed out to me is that there is currently a 1:1 relationship between k8s secrets and vms... meaning because of the way libvirt secrets are managed and garbage collected, we can't safely use the same secret for multiple VMs. I've figured out a way to fix this. I'll update the patch with that and the .md doc shortly.

@davidvossel
Copy link
Member Author

patches updated.

  • added iscsi related documentation in /docs folder
  • added the ability to associate a single k8s secret with multiple VMs
  • improved libvirt secret cache by keying of domain name

@davidvossel
Copy link
Member Author

retest this please

@davidvossel
Copy link
Member Author

alrighty, got tests passing again

@rmohr @fabiand @mykaul How do you all feel about this patch now. Is this something everyone is comfortable with merging at this point?

We have a pretty solid plan for migrating to native k8s iscsi support in the future once k8s provides the actual block device rather than requiring volumes to be a mountable filesystem.

The k8s secret used for KubeVirt iscsi now matches exactly what is used in native k8s. This means we can pivot on the backend to use native k8s for iscsi in the future without impacting users.

@rmohr
Copy link
Member

rmohr commented Aug 11, 2017

retest this please

Copy link
Member

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the secret introduction, but a few things:
You might want to provide the secret feature in a different PR, including a test case, that should be easy and straight forward to merge.

When it comes to the usage of the secret in the CRD (we should find a different acronym, now that CRDs are so popular in K8s) then it looks like a mechanism to ensure exclusive access to the freshly created iSCSI target.
To me this implementation looks redundant to the PV accessModes and the PVC concept.

What you are trying to solve (IIUIC, and please correct me if I am wrong) here is to give a consumer (pod/VM) the exclusive right to consume a specific volume (the iSCSI target).
This problem has been solved with PV, PVC and access modes as well.

In order to avoid reimplemting this functionality and to strengthen Kubernetes, do you also see a way to achieve your goal by leveraging the Kubernetes PV/PVC, accessMode functionality?

Btw - A known gap is that not all K8s volume plugins are enforcing the access modes - but this is something we could work on.

metadata:
name: myK8sSecretID
namespace: default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want a document delimiter here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, i don't think that's necessary. The secret name can just be formatted however k8s allows it to be. usually '-' is the delimiter. I don't think we need to be explicit about it.

name: my-chap-secret
type: "kubernetes.io/iscsi-chap"
data:
node.session.auth.username: $(echo "myUsername" | base64 -w0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning and to add a link to the stock iSCSI K8s auth to showcase that the format aligns (https://github.com/kubernetes/kubernetes/blob/master/examples/volumes/iscsi/chap-secret.yaml)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

domain:
devices:
disks:
- Auth:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo, or any reason for a capital A in Auth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, i'll fix it

name: qemu
type: raw
cache: none
source:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the auth feature work when a PVC is used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iscsi persistent volume is just a wrapper around the options available to an iscsi volume. I verified this in the k8s code base.

The example below is how iscsi auth should look like with a PVC.

In this example, the user would create the chap-secret k8s secret and then define this PV which references the secret.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: iscsi-disk-custom
  labels:
    os: "none"
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  iscsi:
    iqn: iqn.2017-01.io.kubevirt:sn.42
    lun: 1
    targetPortal: iscsi-demo-target.default.svc.cluster.local
    chapAuthDiscovery: true
    chapAuthSession: true
    secretRef: 
       name: chap-secret

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I understand - I was actually after understanding: How is the auth parameter interpreted if a PVC is used as a disk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good point. I need to add a couple of lines of code there to do the mapping correctly on the virt-handler side. It will work fine once I do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in latest update

@@ -33,6 +33,7 @@ import (
corev1 "kubevirt.io/kubevirt/pkg/api/v1"
"kubevirt.io/kubevirt/pkg/logging"
"kubevirt.io/kubevirt/pkg/precond"
registrydisk "kubevirt.io/kubevirt/pkg/registry-disk"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks - I'm getting a better feeling for the issue now.

precond.MustNotBeEmpty(string(vm.GetObjectMeta().GetUID()))

err := registrydisk.Initialize(vm, v.KubeCli)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

@davidvossel
Copy link
Member Author

When it comes to the usage of the secret in the CRD (we should find a different acronym, now that CRDs are so popular in K8s) then it looks like a mechanism to ensure exclusive access to the freshly created iSCSI target.
To me this implementation looks redundant to the PV accessModes and the PVC concept.

@fabiand These are separate concepts solving separate problems. PVCs are solving a problem with exclusive access that we don't have with CRDs. Iscsi auth is a security feature that both PVCs and CRDs need.

Access modes are a k8s cluster feature that (attempt to) provide guarantees about how volumes are accessed by pods. ReadWriteOnce, ReadOnlyMany, ReadWriteMany...

For CRDs we don't have a situation where multiple pods/nodes/vms attempt to grab ownership of the same iscsi device. There is no resource contention. The CRD is directly correlated to a specific VM and no other VM is going to attempt to access another VMs CRDs. This is inherent to how the feature is designed. We don't need PVCs to express this for CRDs. Basically we get it for free.

Authentication is something different. Auth gives us guarantees that nothing outside of the clusters control can access the CRDs. For example, with auth we limit the ability for an unauthorized user or app to attempt to connect to the iscsi device. This is for security. Both CRDs and iscsi devices managed by PVs need auth as a security feature.

@davidvossel
Copy link
Member Author

In order to avoid reimplemting this functionality and to strengthen Kubernetes, do you also see a way to achieve your goal by leveraging the Kubernetes PV/PVC, accessMode functionality?

sorry, I kind of implicitly answered this, but probably need to directly answer it.

PVCs are a feature for managing access to network block devices that there is contention for. With the ReadWriteOnce access model, there is an assumption that there are potentially multiple consumers of a datasource that k8s needs to coordinate access to. Without contention, the access modes feature wouldn't be necessary.

We don't have that contention with a CRD. The disk is actually a part of the thing that consumes it. The virt-launcher pod + VM process are an atomic unit. The CRDs are containers in the virt-launcher pod and directly tied to the VM lifecycle. Because of all this, we're guaranteed a 1:1 ratio. no contention.

Here's an example that might make it more clear. If the VM consuming a CRD fails and is started again on another node, the CRDs between the first and second run are completely different. If the same VM definition happened to run twice for some reason because of a cluster network partition, the CRDs associated with each VM process are not shared. As long as we maintain the 1:1 relationship of virt-launcher pods to VMs, there will never be a time where a CRD is accessed by more than one VM process.

All that is to say, without data contention, the CRD feature doesn't gain anything of value by introducing the PVC concept. We already have exclusive access for free.

Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
…crets

Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
… for disk auth

Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
@davidvossel
Copy link
Member Author

latest update includes

  • kubevirt trunk rebase
  • iscsi auth documentation updates
  • iscsi PVs with auth now work

Copy link
Member

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just merging this to unblock further testing.

Despite that thee is agreement that the registry disk mechanism needs to be reworked once we have POCed some relevant mechanics (sharing discs between containres).

@davidvossel
Copy link
Member Author

Despite that thee is agreement that the registry disk mechanism needs to be reworked once we have POCed some relevant mechanics (sharing discs between containres).

i created an issue related to this. #373

@fabiand fabiand merged commit 64f2da5 into kubevirt:master Aug 18, 2017
kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Nov 6, 2020
mzzgaopeng pushed a commit to mzzgaopeng/kubevirt that referenced this pull request Mar 8, 2021
…PATH

Hardcoding the list separator character as ":" causes CNI to fail when parsing
CNI_PATH on other operating systems. For example, Windows uses ";" as list
separator because ":" can legally appear in paths such as "C:\path\to\file".
This change replaces use of ":" with OS-agnostic APIs or os.PathListSeparator.

Fixes kubevirt#358
mzzgaopeng pushed a commit to mzzgaopeng/kubevirt that referenced this pull request Mar 8, 2021
…PATH

Hardcoding the list separator character as ":" causes CNI to fail when parsing
CNI_PATH on other operating systems. For example, Windows uses ";" as list
separator because ":" can legally appear in paths such as "C:\path\to\file".
This change replaces use of ":" with OS-agnostic APIs or os.PathListSeparator.

Fixes kubevirt#358
mzzgaopeng pushed a commit to mzzgaopeng/kubevirt that referenced this pull request Mar 8, 2021
kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Dec 7, 2021
…rt#358)

* Copy files to 1.18 directory, set version

Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com>

* Rebase on master, update version, add changes from 1.17

Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com>

* Add k8s-1.18 image

Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants