Iscsi auth using k8s secrets #358

davidvossel · 2017-08-04T19:54:26Z

This feature lets us map k8s secrets to VMs for iscsi auth.

Workflow

create a k8s secret containing the password and username fields.

cat << END > my-chap-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: myK8sSecretID
  namespace: default

apiVersion: v1
kind: Secret
metadata:
  name: my-chap-secret
type: "kubernetes.io/iscsi-chap"  
data:
  node.session.auth.username: $(echo "myUsername" | base64 -w0)
  node.session.auth.password: $(echo "mySuperSecretPassword" | base64 -w0)
END

create a vm that references the name given to the k8s secret in the iscsi usage field

cat << END > my-vm.yaml
kind: VM
metadata:
  name: testvm
  namespace: default
spec:
  domain:
    devices:
      disks:
      - Auth:
          secret:
            type: iscsi
            usage: my-chap-secret
        type: network
        snapshot: external
        device: disk
        driver:
          name: qemu
          type: raw
          cache: none
        source:
          host:
            name: iscsi-demo-target.default
            port: "3260"
          protocol: iscsi
          name: iqn.2017-01.io.kubevirt:sn.42/2
        target:
          dev: vda
    memory:
      unit: MB
      value: 64
    os:
      type:
        os: hvm
    type: qemu
END

Add the secret and vm to the cluster.

kubectl create -f my-chap-secret.yaml
kubectl create -f my-vm.yaml

From there, the password field in the k8s secret will automatically be mapped to a libvirt secret when the VM is scheduled to a node allowing the iscsi auth to work without any further configuration.

Container Registry Disk Iscsi Auth

The second part of this PR involves auto generating k8s user/secret combos for container registry disks associated with a VM. This ensures CRD disks can only be accessed by their corresponding VM process.

In the future it would also be wise for us to introduce a network policy that further locks down CRD disks to the local node.

davidvossel · 2017-08-04T19:59:20Z

pkg/virt-handler/virtwrap/manager.go

+func (l *LibvirtDomainManager) RemoveVMSecrets(vm *v1.VM) error {
+	domName := VMNamespaceKeyFunc(vm)
+
+	// TODO this is horribly inefficient. Maybe there's a better way?


Out of everything in this patch series, this is the part that has given me the most grief. Somehow we need to garbage collect libvirt secrets associated with VMs once those VMs are undefined.

The problem here is that there's no efficient (or elegant) way of maintaining this relationship between the secret and the vm... at least none that I have found.

The result is I have to index every local secret stored in libvirt and use the secret's 'description' field to maintain this mapping between secret and domain.

I'm not happy with this. I don't have a better solution right now. Maybe someone else has a clever solution?

@rmohr made a good observation offline about this. Since virt-handler is the only consumer of the libvirt api, we can cache the secret list in a map to make this more efficient.

I'll update the patch shortly to do this.

mykaul · 2017-08-06T07:37:49Z

David, why not use K8S iSCSI w/chap and attach a volume to the VM's container and use it from there?
What's the benefit of involving libvirt here?
I'm looking at https://github.com/kubernetes/examples/blob/master/staging/volumes/iscsi/README.md and I'm not sure I'm seeing any differences in functionality?

davidvossel · 2017-08-07T13:01:17Z

hey @mykaul

I'm looking at https://github.com/kubernetes/examples/blob/master/staging/volumes/iscsi/README.md and I'm not sure I'm seeing any differences in functionality?

I can see why this looks similar. The big difference here in the document you're referencing k8s has an iscsi device backing a volume that is mounted into the container.

This quote from the doc you referenced might help with making the distinction.
"If you have followed the instructions in the links above you may have partitioned the device, the iSCSI volume plugin does not currently support partitions so format the device as one partition or leave the device raw and Kubernetes will partition and format it one first mount."

Using libvirt, we want to consume iscsi as a block device, and not a mounted filesystem.

In the future, it looks likely that k8s will offer block storage device access. There's a good chance we will be able to support using k8s features like this at some point, but at the moment it doesn't appear to offer what we need for virtualization.

You've brought up a good point though. We need to start working with the upstream k8s community and make them aware of our use case. That could influence priorities of k8s supported features like this.

mykaul · 2017-08-07T13:54:39Z

@davidvossel , in general, I'd say that KubeVirt's strategic direction is to work with upstream Kubernetes and assist, develop and review features there so we can consume them natively from it. Of course, this may not always be feasible from both feature-set or time-perspective, but I think in cases where we diverge and implement our own we might be creating a gap.

K8S are indeed working on raw block @ kubernetes/community#805 .

davidvossel · 2017-08-07T16:50:55Z

Of course, this may not always be feasible from both feature-set or time-perspective, but I think in cases where we diverge and implement our own we might be creating a gap.

I agree, this is a tough balance for us right now.

@mykaul The doc you referenced about k8s iscsi support gave me an idea.

K8s requires iscsi auth secrets to be formatted like this. https://github.com/kubernetes/examples/blob/master/staging/volumes/iscsi/chap-secret.yaml

I'll just modify my PR to require the k8s format for the iscsi secrets. In the future we'll be able to switch to k8s iscsi support transparently on our backend if we want.

Basically, this lets us move forward with iscsi auth, but do it in a way that lets us converge with k8s easily in the future if they provide iscsi block device access.

davidvossel · 2017-08-07T21:13:58Z

@rmohr @mykaul

I've rebased the patch and made a few changes.

libvirt secrets are stored in a cache which makes removal much more efficient.
We are now conforming our k8s secrets just like the iscsi chap secrets k8s natively supports. https://github.com/kubernetes/examples/tree/master/staging/volumes/iscsi. I updated the PR description to reflect this change as well.

Item 2 also fixes the weird situation where the username was put in the VM definition, but the password was in the k8s secret. Now both username and password are in the k8s secret. The k8s secret is referenced in the 'usage' field of the disk auth. Behind the scenes we do all the magic to make that work in virt-handler when the VM is launching.

By formatting our secrets like this, we can silently begin using the native k8s iscsi support once we can have direct access to the block device (rather than k8s trying to mount it as a file system)

mykaul · 2017-08-08T07:24:59Z

I assume similar to OpenStack[1] it does not support Discovery CHAP.
I assume similar to OpenStack[?] it does not support mutual auth.

Both are reasonable limitations, but perhaps should be documented somewhere.

[1] https://github.com/openstack/cinder/blob/master/cinder/volume/targets/iscsi.py#L153

fabiand · 2017-08-08T09:19:05Z

pkg/virt-controller/services/vm.go

@@ -33,6 +33,7 @@ import (
 	corev1 "kubevirt.io/kubevirt/pkg/api/v1"
 	"kubevirt.io/kubevirt/pkg/logging"
 	"kubevirt.io/kubevirt/pkg/precond"
+	registrydisk "kubevirt.io/kubevirt/pkg/registry-disk"


This is becoming a little to ugly, if we do storage specific initialization in the controller.

@fabiand The registry disk feature involves placing containers in the virt-launcher pod (which is created/managed by virt-controller). That's why there's registry-disk coordination bits in the controller. So yes, it is storage specific, but it is also directly involved with the pod the virt-controller manages.

Yes, thanks - I'm getting a better feeling for the issue now.

fabiand · 2017-08-08T09:23:26Z

pkg/virt-controller/services/vm.go

 	precond.MustNotBeEmpty(string(vm.GetObjectMeta().GetUID()))

+	err := registrydisk.Initialize(vm, v.KubeCli)


Yes - Initialization is okay, but it should not be tied to the storage method.

I.e. I do see that we want to attach secrets in this step, but this should be about attaching the secret - and the registry disk is just one thing consuming it later on.

@fabiand the iscsi secrets feature and registry disks are completely independent things.

virt-controller is not involved at all for someone who is using iscsi auth for their VM explicitly. For example, all someone has to do to use iscsi auth is create the k8s secret and define the VM with an Auth field that references the k8s secret. Only virt-handler will be involved in that flow.

The registry disk feature consumes the iscsi auth feature by dynamically generating k8s secrets in virt-controller. That's what is being done in this initialization step. We're initializing everything that needs to be filled in before the VM is scheduled related to registry disks.

Because the registry disk feature involves the VM's virt-launcher pod (where registry disks are launched), it seems logical that virt-controller coordinates these parts.

Thanks for the explanation.

davidvossel · 2017-08-08T12:57:18Z

I assume similar to OpenStack[1] it does not support Discovery CHAP.
I assume similar to OpenStack[?] it does not support mutual auth.
@mykaul
Right, the only thing we get with this feature is the target authenticates the initiator... so just client authentication. This is a limitation of how we're using libvirt for iscsi.

I'll note this. I need to move what I have in the PR description to a .md file in the doc's directory. I'll note this there.

davidvossel · 2017-08-08T14:47:07Z

another thing @rmohr has pointed out to me is that there is currently a 1:1 relationship between k8s secrets and vms... meaning because of the way libvirt secrets are managed and garbage collected, we can't safely use the same secret for multiple VMs. I've figured out a way to fix this. I'll update the patch with that and the .md doc shortly.

davidvossel · 2017-08-08T20:32:52Z

patches updated.

added iscsi related documentation in /docs folder
added the ability to associate a single k8s secret with multiple VMs
improved libvirt secret cache by keying of domain name

davidvossel · 2017-08-10T17:47:22Z

retest this please

davidvossel · 2017-08-10T19:43:04Z

alrighty, got tests passing again

@rmohr @fabiand @mykaul How do you all feel about this patch now. Is this something everyone is comfortable with merging at this point?

We have a pretty solid plan for migrating to native k8s iscsi support in the future once k8s provides the actual block device rather than requiring volumes to be a mountable filesystem.

The k8s secret used for KubeVirt iscsi now matches exactly what is used in native k8s. This means we can pivot on the backend to use native k8s for iscsi in the future without impacting users.

rmohr · 2017-08-11T07:14:30Z

retest this please

fabiand

I really like the secret introduction, but a few things:
You might want to provide the secret feature in a different PR, including a test case, that should be easy and straight forward to merge.

When it comes to the usage of the secret in the CRD (we should find a different acronym, now that CRDs are so popular in K8s) then it looks like a mechanism to ensure exclusive access to the freshly created iSCSI target.
To me this implementation looks redundant to the PV accessModes and the PVC concept.

What you are trying to solve (IIUIC, and please correct me if I am wrong) here is to give a consumer (pod/VM) the exclusive right to consume a specific volume (the iSCSI target).
This problem has been solved with PV, PVC and access modes as well.

In order to avoid reimplemting this functionality and to strengthen Kubernetes, do you also see a way to achieve your goal by leveraging the Kubernetes PV/PVC, accessMode functionality?

Btw - A known gap is that not all K8s volume plugins are enforcing the access modes - but this is something we could work on.

fabiand · 2017-08-15T08:17:42Z

docs/iscsi-authentication.md

+metadata:
+  name: myK8sSecretID
+  namespace: default
+


Do you want a document delimiter here?

nah, i don't think that's necessary. The secret name can just be formatted however k8s allows it to be. usually '-' is the delimiter. I don't think we need to be explicit about it.

fabiand · 2017-08-15T08:20:02Z

docs/iscsi-authentication.md

+  name: my-chap-secret
+type: "kubernetes.io/iscsi-chap"  
+data:
+  node.session.auth.username: $(echo "myUsername" | base64 -w0)


Worth mentioning and to add a link to the stock iSCSI K8s auth to showcase that the format aligns (https://github.com/kubernetes/kubernetes/blob/master/examples/volumes/iscsi/chap-secret.yaml)

fabiand · 2017-08-15T08:20:25Z

docs/iscsi-authentication.md

+  domain:
+    devices:
+      disks:
+      - Auth:


Is this a typo, or any reason for a capital A in Auth?

typo, i'll fix it

fabiand · 2017-08-15T08:22:40Z

docs/iscsi-authentication.md

+          name: qemu
+          type: raw
+          cache: none
+        source:


How does the auth feature work when a PVC is used?

iscsi persistent volume is just a wrapper around the options available to an iscsi volume. I verified this in the k8s code base.

The example below is how iscsi auth should look like with a PVC.

In this example, the user would create the chap-secret k8s secret and then define this PV which references the secret.

apiVersion: v1 kind: PersistentVolume metadata: name: iscsi-disk-custom labels: os: "none" spec: capacity: storage: 1Gi accessModes: - ReadWriteOnce iscsi: iqn: iqn.2017-01.io.kubevirt:sn.42 lun: 1 targetPortal: iscsi-demo-target.default.svc.cluster.local chapAuthDiscovery: true chapAuthSession: true secretRef: name: chap-secret

Right, I understand - I was actually after understanding: How is the auth parameter interpreted if a PVC is used as a disk?

ah, good point. I need to add a couple of lines of code there to do the mapping correctly on the virt-handler side. It will work fine once I do that.

fixed in latest update

fabiand · 2017-08-15T08:30:05Z

pkg/virt-controller/services/vm.go

@@ -33,6 +33,7 @@ import (
 	corev1 "kubevirt.io/kubevirt/pkg/api/v1"
 	"kubevirt.io/kubevirt/pkg/logging"
 	"kubevirt.io/kubevirt/pkg/precond"
+	registrydisk "kubevirt.io/kubevirt/pkg/registry-disk"


Yes, thanks - I'm getting a better feeling for the issue now.

fabiand · 2017-08-15T08:30:16Z

pkg/virt-controller/services/vm.go

 	precond.MustNotBeEmpty(string(vm.GetObjectMeta().GetUID()))

+	err := registrydisk.Initialize(vm, v.KubeCli)


Thanks for the explanation.

davidvossel · 2017-08-15T13:32:22Z

When it comes to the usage of the secret in the CRD (we should find a different acronym, now that CRDs are so popular in K8s) then it looks like a mechanism to ensure exclusive access to the freshly created iSCSI target.
To me this implementation looks redundant to the PV accessModes and the PVC concept.

@fabiand These are separate concepts solving separate problems. PVCs are solving a problem with exclusive access that we don't have with CRDs. Iscsi auth is a security feature that both PVCs and CRDs need.

Access modes are a k8s cluster feature that (attempt to) provide guarantees about how volumes are accessed by pods. ReadWriteOnce, ReadOnlyMany, ReadWriteMany...

For CRDs we don't have a situation where multiple pods/nodes/vms attempt to grab ownership of the same iscsi device. There is no resource contention. The CRD is directly correlated to a specific VM and no other VM is going to attempt to access another VMs CRDs. This is inherent to how the feature is designed. We don't need PVCs to express this for CRDs. Basically we get it for free.

Authentication is something different. Auth gives us guarantees that nothing outside of the clusters control can access the CRDs. For example, with auth we limit the ability for an unauthorized user or app to attempt to connect to the iscsi device. This is for security. Both CRDs and iscsi devices managed by PVs need auth as a security feature.

davidvossel · 2017-08-15T14:12:06Z

In order to avoid reimplemting this functionality and to strengthen Kubernetes, do you also see a way to achieve your goal by leveraging the Kubernetes PV/PVC, accessMode functionality?

sorry, I kind of implicitly answered this, but probably need to directly answer it.

PVCs are a feature for managing access to network block devices that there is contention for. With the ReadWriteOnce access model, there is an assumption that there are potentially multiple consumers of a datasource that k8s needs to coordinate access to. Without contention, the access modes feature wouldn't be necessary.

We don't have that contention with a CRD. The disk is actually a part of the thing that consumes it. The virt-launcher pod + VM process are an atomic unit. The CRDs are containers in the virt-launcher pod and directly tied to the VM lifecycle. Because of all this, we're guaranteed a 1:1 ratio. no contention.

Here's an example that might make it more clear. If the VM consuming a CRD fails and is started again on another node, the CRDs between the first and second run are completely different. If the same VM definition happened to run twice for some reason because of a cluster network partition, the CRDs associated with each VM process are not shared. As long as we maintain the 1:1 relationship of virt-launcher pods to VMs, there will never be a time where a CRD is accessed by more than one VM process.

All that is to say, without data contention, the CRD feature doesn't gain anything of value by introducing the PVC concept. We already have exclusive access for free.

Signed-off-by: David Vossel <dvossel@redhat.com>

…crets Signed-off-by: David Vossel <dvossel@redhat.com>

Signed-off-by: David Vossel <dvossel@redhat.com>

Signed-off-by: David Vossel <davidvossel@gmail.com>

… for disk auth Signed-off-by: David Vossel <davidvossel@gmail.com>

Signed-off-by: David Vossel <dvossel@redhat.com>

Signed-off-by: David Vossel <davidvossel@gmail.com>

davidvossel · 2017-08-16T22:00:46Z

latest update includes

kubevirt trunk rebase
iscsi auth documentation updates
iscsi PVs with auth now work

fabiand

I'm just merging this to unblock further testing.

Despite that thee is agreement that the registry disk mechanism needs to be reworked once we have POCed some relevant mechanics (sharing discs between containres).

davidvossel · 2017-08-17T13:59:40Z

Despite that thee is agreement that the registry disk mechanism needs to be reworked once we have POCed some relevant mechanics (sharing discs between containres).

i created an issue related to this. #373

…PATH Hardcoding the list separator character as ":" causes CNI to fail when parsing CNI_PATH on other operating systems. For example, Windows uses ";" as list separator because ":" can legally appear in paths such as "C:\path\to\file". This change replaces use of ":" with OS-agnostic APIs or os.PathListSeparator. Fixes kubevirt#358

Vendor update go-iptables

…rt#358) * Copy files to 1.18 directory, set version Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com> * Rebase on master, update version, add changes from 1.17 Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com> * Add k8s-1.18 image Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com>

davidvossel requested review from fabiand and rmohr August 4, 2017 19:54

davidvossel commented Aug 4, 2017

View reviewed changes

davidvossel force-pushed the iscsi-auth-v1 branch 2 times, most recently from e8eafd6 to bfd6e9f Compare August 7, 2017 20:15

fabiand reviewed Aug 8, 2017

View reviewed changes

davidvossel force-pushed the iscsi-auth-v1 branch from d4cf7b5 to ee6e90d Compare August 10, 2017 19:44

fabiand requested a review from aglitke August 15, 2017 08:15

fabiand added the kind/enhancement label Aug 15, 2017

fabiand reviewed Aug 15, 2017

View reviewed changes

fabiand added the kind/proposal label Aug 15, 2017

davidvossel added 5 commits August 16, 2017 09:48

Libvirt iscsi disk auth support using k8s secrets

0da2b2b

Signed-off-by: David Vossel <dvossel@redhat.com>

Container Registry Disk iscsi auth using generated k8s secrets

01190cb

Signed-off-by: David Vossel <dvossel@redhat.com>

Make libvirt secrets more effiencient with local cache

5d07340

Conform KubeVirt iscsi secrets format to match k8s iscsi CHAP auth se…

a9b393a

…crets Signed-off-by: David Vossel <dvossel@redhat.com>

Add iscsi auth documentation

c598de9

Signed-off-by: David Vossel <dvossel@redhat.com>

davidvossel added 5 commits August 16, 2017 09:51

ensure secrets are cleaned up between functest runs

e048c79

Signed-off-by: David Vossel <davidvossel@gmail.com>

Allow a single k8s secret to be referenced by multiple VM definitions…

35ee6fc

… for disk auth Signed-off-by: David Vossel <davidvossel@gmail.com>

Improve the efficiency of libvirt secret cache

100d114

Signed-off-by: David Vossel <dvossel@redhat.com>

Add missing mock gen updates

2d4944b

Signed-off-by: David Vossel <dvossel@redhat.com>

Update registry disk package to use kubecli

fc3ca93

Signed-off-by: David Vossel <dvossel@redhat.com>

davidvossel force-pushed the iscsi-auth-v1 branch from ee6e90d to fc3ca93 Compare August 16, 2017 14:14

davidvossel added 2 commits August 16, 2017 17:47

Add k8s secret ref when iscsi PV is referenced as a VM disk

123d6b6

Signed-off-by: David Vossel <davidvossel@gmail.com>

Update iscsi documentation

1f67e67

Signed-off-by: David Vossel <davidvossel@gmail.com>

davidvossel force-pushed the iscsi-auth-v1 branch from 68a3600 to 1f67e67 Compare August 16, 2017 21:57

fabiand approved these changes Aug 17, 2017

View reviewed changes

fabiand merged commit 64f2da5 into kubevirt:master Aug 18, 2017

kubevirt-bot pushed a commit to kubevirt-bot/kubevirt that referenced this pull request Nov 6, 2020

fix make cluster-up hang (kubevirt#358)

bcc2f5d

mzzgaopeng pushed a commit to mzzgaopeng/kubevirt that referenced this pull request Mar 8, 2021

Merge pull request kubevirt#358 from mccv1r0/ipt-vendor

f36dbc2

Vendor update go-iptables

		precond.MustNotBeEmpty(string(vm.GetObjectMeta().GetUID()))

		err := registrydisk.Initialize(vm, v.KubeCli)

Iscsi auth using k8s secrets #358

Iscsi auth using k8s secrets #358

Conversation

davidvossel commented Aug 4, 2017 • edited

Workflow

Container Registry Disk Iscsi Auth

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mykaul commented Aug 6, 2017

davidvossel commented Aug 7, 2017

mykaul commented Aug 7, 2017 • edited

davidvossel commented Aug 7, 2017

davidvossel commented Aug 7, 2017

mykaul commented Aug 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidvossel commented Aug 8, 2017

davidvossel commented Aug 8, 2017

davidvossel commented Aug 8, 2017

davidvossel commented Aug 10, 2017

davidvossel commented Aug 10, 2017

rmohr commented Aug 11, 2017

fabiand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidvossel Aug 15, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidvossel commented Aug 15, 2017

davidvossel commented Aug 15, 2017

davidvossel commented Aug 16, 2017

fabiand left a comment

Choose a reason for hiding this comment

davidvossel commented Aug 17, 2017

davidvossel commented Aug 4, 2017 •

edited

mykaul commented Aug 7, 2017 •

edited

davidvossel Aug 15, 2017 •

edited