-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iscsi auth using k8s secrets #358
Conversation
pkg/virt-handler/virtwrap/manager.go
Outdated
func (l *LibvirtDomainManager) RemoveVMSecrets(vm *v1.VM) error { | ||
domName := VMNamespaceKeyFunc(vm) | ||
|
||
// TODO this is horribly inefficient. Maybe there's a better way? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of everything in this patch series, this is the part that has given me the most grief. Somehow we need to garbage collect libvirt secrets associated with VMs once those VMs are undefined.
The problem here is that there's no efficient (or elegant) way of maintaining this relationship between the secret and the vm... at least none that I have found.
The result is I have to index every local secret stored in libvirt and use the secret's 'description' field to maintain this mapping between secret and domain.
I'm not happy with this. I don't have a better solution right now. Maybe someone else has a clever solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmohr made a good observation offline about this. Since virt-handler is the only consumer of the libvirt api, we can cache the secret list in a map to make this more efficient.
I'll update the patch shortly to do this.
David, why not use K8S iSCSI w/chap and attach a volume to the VM's container and use it from there? |
hey @mykaul
I can see why this looks similar. The big difference here in the document you're referencing k8s has an iscsi device backing a volume that is mounted into the container. This quote from the doc you referenced might help with making the distinction. Using libvirt, we want to consume iscsi as a block device, and not a mounted filesystem. In the future, it looks likely that k8s will offer block storage device access. There's a good chance we will be able to support using k8s features like this at some point, but at the moment it doesn't appear to offer what we need for virtualization. You've brought up a good point though. We need to start working with the upstream k8s community and make them aware of our use case. That could influence priorities of k8s supported features like this. |
@davidvossel , in general, I'd say that KubeVirt's strategic direction is to work with upstream Kubernetes and assist, develop and review features there so we can consume them natively from it. Of course, this may not always be feasible from both feature-set or time-perspective, but I think in cases where we diverge and implement our own we might be creating a gap. K8S are indeed working on raw block @ kubernetes/community#805 . |
I agree, this is a tough balance for us right now. @mykaul The doc you referenced about k8s iscsi support gave me an idea. K8s requires iscsi auth secrets to be formatted like this. https://github.com/kubernetes/examples/blob/master/staging/volumes/iscsi/chap-secret.yaml I'll just modify my PR to require the k8s format for the iscsi secrets. In the future we'll be able to switch to k8s iscsi support transparently on our backend if we want. Basically, this lets us move forward with iscsi auth, but do it in a way that lets us converge with k8s easily in the future if they provide iscsi block device access. |
e8eafd6
to
bfd6e9f
Compare
I've rebased the patch and made a few changes.
Item 2 also fixes the weird situation where the username was put in the VM definition, but the password was in the k8s secret. Now both username and password are in the k8s secret. The k8s secret is referenced in the 'usage' field of the disk auth. Behind the scenes we do all the magic to make that work in virt-handler when the VM is launching. By formatting our secrets like this, we can silently begin using the native k8s iscsi support once we can have direct access to the block device (rather than k8s trying to mount it as a file system) |
I assume similar to OpenStack[1] it does not support Discovery CHAP. Both are reasonable limitations, but perhaps should be documented somewhere. [1] https://github.com/openstack/cinder/blob/master/cinder/volume/targets/iscsi.py#L153 |
pkg/virt-controller/services/vm.go
Outdated
@@ -33,6 +33,7 @@ import ( | |||
corev1 "kubevirt.io/kubevirt/pkg/api/v1" | |||
"kubevirt.io/kubevirt/pkg/logging" | |||
"kubevirt.io/kubevirt/pkg/precond" | |||
registrydisk "kubevirt.io/kubevirt/pkg/registry-disk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is becoming a little to ugly, if we do storage specific initialization in the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiand The registry disk feature involves placing containers in the virt-launcher pod (which is created/managed by virt-controller). That's why there's registry-disk coordination bits in the controller. So yes, it is storage specific, but it is also directly involved with the pod the virt-controller manages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks - I'm getting a better feeling for the issue now.
precond.MustNotBeEmpty(string(vm.GetObjectMeta().GetUID())) | ||
|
||
err := registrydisk.Initialize(vm, v.KubeCli) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - Initialization is okay, but it should not be tied to the storage method.
I.e. I do see that we want to attach secrets in this step, but this should be about attaching the secret - and the registry disk is just one thing consuming it later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiand the iscsi secrets feature and registry disks are completely independent things.
virt-controller is not involved at all for someone who is using iscsi auth for their VM explicitly. For example, all someone has to do to use iscsi auth is create the k8s secret and define the VM with an Auth field that references the k8s secret. Only virt-handler will be involved in that flow.
The registry disk feature consumes the iscsi auth feature by dynamically generating k8s secrets in virt-controller. That's what is being done in this initialization step. We're initializing everything that needs to be filled in before the VM is scheduled related to registry disks.
Because the registry disk feature involves the VM's virt-launcher pod (where registry disks are launched), it seems logical that virt-controller coordinates these parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
I'll note this. I need to move what I have in the PR description to a .md file in the doc's directory. I'll note this there. |
another thing @rmohr has pointed out to me is that there is currently a 1:1 relationship between k8s secrets and vms... meaning because of the way libvirt secrets are managed and garbage collected, we can't safely use the same secret for multiple VMs. I've figured out a way to fix this. I'll update the patch with that and the .md doc shortly. |
patches updated.
|
retest this please |
alrighty, got tests passing again @rmohr @fabiand @mykaul How do you all feel about this patch now. Is this something everyone is comfortable with merging at this point? We have a pretty solid plan for migrating to native k8s iscsi support in the future once k8s provides the actual block device rather than requiring volumes to be a mountable filesystem. The k8s secret used for KubeVirt iscsi now matches exactly what is used in native k8s. This means we can pivot on the backend to use native k8s for iscsi in the future without impacting users. |
d4cf7b5
to
ee6e90d
Compare
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the secret introduction, but a few things:
You might want to provide the secret feature in a different PR, including a test case, that should be easy and straight forward to merge.
When it comes to the usage of the secret in the CRD (we should find a different acronym, now that CRDs are so popular in K8s) then it looks like a mechanism to ensure exclusive access to the freshly created iSCSI target.
To me this implementation looks redundant to the PV accessModes and the PVC concept.
What you are trying to solve (IIUIC, and please correct me if I am wrong) here is to give a consumer (pod/VM) the exclusive right to consume a specific volume (the iSCSI target).
This problem has been solved with PV, PVC and access modes as well.
In order to avoid reimplemting this functionality and to strengthen Kubernetes, do you also see a way to achieve your goal by leveraging the Kubernetes PV/PVC, accessMode functionality?
Btw - A known gap is that not all K8s volume plugins are enforcing the access modes - but this is something we could work on.
metadata: | ||
name: myK8sSecretID | ||
namespace: default | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want a document delimiter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah, i don't think that's necessary. The secret name can just be formatted however k8s allows it to be. usually '-' is the delimiter. I don't think we need to be explicit about it.
name: my-chap-secret | ||
type: "kubernetes.io/iscsi-chap" | ||
data: | ||
node.session.auth.username: $(echo "myUsername" | base64 -w0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning and to add a link to the stock iSCSI K8s auth to showcase that the format aligns (https://github.com/kubernetes/kubernetes/blob/master/examples/volumes/iscsi/chap-secret.yaml)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
docs/iscsi-authentication.md
Outdated
domain: | ||
devices: | ||
disks: | ||
- Auth: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo, or any reason for a capital A
in Auth
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, i'll fix it
name: qemu | ||
type: raw | ||
cache: none | ||
source: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the auth
feature work when a PVC is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iscsi persistent volume is just a wrapper around the options available to an iscsi volume. I verified this in the k8s code base.
The example below is how iscsi auth should look like with a PVC.
In this example, the user would create the chap-secret k8s secret and then define this PV which references the secret.
apiVersion: v1
kind: PersistentVolume
metadata:
name: iscsi-disk-custom
labels:
os: "none"
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
iscsi:
iqn: iqn.2017-01.io.kubevirt:sn.42
lun: 1
targetPortal: iscsi-demo-target.default.svc.cluster.local
chapAuthDiscovery: true
chapAuthSession: true
secretRef:
name: chap-secret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I understand - I was actually after understanding: How is the auth parameter interpreted if a PVC is used as a disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good point. I need to add a couple of lines of code there to do the mapping correctly on the virt-handler side. It will work fine once I do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in latest update
pkg/virt-controller/services/vm.go
Outdated
@@ -33,6 +33,7 @@ import ( | |||
corev1 "kubevirt.io/kubevirt/pkg/api/v1" | |||
"kubevirt.io/kubevirt/pkg/logging" | |||
"kubevirt.io/kubevirt/pkg/precond" | |||
registrydisk "kubevirt.io/kubevirt/pkg/registry-disk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks - I'm getting a better feeling for the issue now.
precond.MustNotBeEmpty(string(vm.GetObjectMeta().GetUID())) | ||
|
||
err := registrydisk.Initialize(vm, v.KubeCli) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation.
@fabiand These are separate concepts solving separate problems. PVCs are solving a problem with exclusive access that we don't have with CRDs. Iscsi auth is a security feature that both PVCs and CRDs need. Access modes are a k8s cluster feature that (attempt to) provide guarantees about how volumes are accessed by pods. ReadWriteOnce, ReadOnlyMany, ReadWriteMany... For CRDs we don't have a situation where multiple pods/nodes/vms attempt to grab ownership of the same iscsi device. There is no resource contention. The CRD is directly correlated to a specific VM and no other VM is going to attempt to access another VMs CRDs. This is inherent to how the feature is designed. We don't need PVCs to express this for CRDs. Basically we get it for free. Authentication is something different. Auth gives us guarantees that nothing outside of the clusters control can access the CRDs. For example, with auth we limit the ability for an unauthorized user or app to attempt to connect to the iscsi device. This is for security. Both CRDs and iscsi devices managed by PVs need auth as a security feature. |
sorry, I kind of implicitly answered this, but probably need to directly answer it. PVCs are a feature for managing access to network block devices that there is contention for. With the ReadWriteOnce access model, there is an assumption that there are potentially multiple consumers of a datasource that k8s needs to coordinate access to. Without contention, the access modes feature wouldn't be necessary. We don't have that contention with a CRD. The disk is actually a part of the thing that consumes it. The virt-launcher pod + VM process are an atomic unit. The CRDs are containers in the virt-launcher pod and directly tied to the VM lifecycle. Because of all this, we're guaranteed a 1:1 ratio. no contention. Here's an example that might make it more clear. If the VM consuming a CRD fails and is started again on another node, the CRDs between the first and second run are completely different. If the same VM definition happened to run twice for some reason because of a cluster network partition, the CRDs associated with each VM process are not shared. As long as we maintain the 1:1 relationship of virt-launcher pods to VMs, there will never be a time where a CRD is accessed by more than one VM process. All that is to say, without data contention, the CRD feature doesn't gain anything of value by introducing the PVC concept. We already have exclusive access for free. |
Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
…crets Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
… for disk auth Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
Signed-off-by: David Vossel <dvossel@redhat.com>
ee6e90d
to
fc3ca93
Compare
Signed-off-by: David Vossel <davidvossel@gmail.com>
Signed-off-by: David Vossel <davidvossel@gmail.com>
68a3600
to
1f67e67
Compare
latest update includes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just merging this to unblock further testing.
Despite that thee is agreement that the registry disk mechanism needs to be reworked once we have POCed some relevant mechanics (sharing discs between containres).
i created an issue related to this. #373 |
…PATH Hardcoding the list separator character as ":" causes CNI to fail when parsing CNI_PATH on other operating systems. For example, Windows uses ";" as list separator because ":" can legally appear in paths such as "C:\path\to\file". This change replaces use of ":" with OS-agnostic APIs or os.PathListSeparator. Fixes kubevirt#358
…PATH Hardcoding the list separator character as ":" causes CNI to fail when parsing CNI_PATH on other operating systems. For example, Windows uses ";" as list separator because ":" can legally appear in paths such as "C:\path\to\file". This change replaces use of ":" with OS-agnostic APIs or os.PathListSeparator. Fixes kubevirt#358
Vendor update go-iptables
…rt#358) * Copy files to 1.18 directory, set version Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com> * Rebase on master, update version, add changes from 1.17 Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com> * Add k8s-1.18 image Signed-off-by: Daniel Hiller <daniel.hiller.1972@gmail.com>
This feature lets us map k8s secrets to VMs for iscsi auth.
Workflow
From there, the password field in the k8s secret will automatically be mapped to a libvirt secret when the VM is scheduled to a node allowing the iscsi auth to work without any further configuration.
Container Registry Disk Iscsi Auth
The second part of this PR involves auto generating k8s user/secret combos for container registry disks associated with a VM. This ensures CRD disks can only be accessed by their corresponding VM process.
In the future it would also be wise for us to introduce a network policy that further locks down CRD disks to the local node.