Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remaining work for Azure support #10412

Closed
2 of 10 tasks
kenji-cloudnatix opened this issue Dec 11, 2020 · 28 comments
Closed
2 of 10 tasks

Remaining work for Azure support #10412

kenji-cloudnatix opened this issue Dec 11, 2020 · 28 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@kenji-cloudnatix
Copy link
Contributor

kenji-cloudnatix commented Dec 11, 2020

This issues track the remaining TODOs from #3957 (Azure support)

  • Be able to configure some more parameters for VM Scale Set such as Windows
  • Be able to configure some more parameters for Azure Disks such as zone
  • Be able to configure some more parameters for subnets such as security groups
  • Support DNS
  • Support tag update of virtual networks
  • Support ACL for Azure blob VFSImplement
  • Implement azureCloudImplementation.GetApiIngressStatus. Currently we don't create any resources for ingress to the API server.
  • Be able to mount etcd volumes from protokube. Currently we rely on etcd-manager for volume mount
  • Populate node labels from nodeIdentifier.IdentifyNodeChange.
  • Implement azureCloudImplementation.buildCloudInstanceGroup so that kops can correclty tell whether a VM is up-to-date or not.
@Aut0R3V
Copy link

Aut0R3V commented Jan 9, 2021

anything I can do here?

@olemarkus
Copy link
Member

Any of these are up for grabs, I would think.

@rifelpet
Copy link
Member

rifelpet commented Jan 9, 2021

I think we can remove Be able to mount etcd volumes from protokube. Currently we rely on etcd-manager for volume mount from the list. That is only needed for the "legacy" etcd option before etcd-manager. Given that it will eventually go away, I don't see a need to add support for it in Azure.

@ngalantowicz
Copy link
Contributor

Not sure how tasks are divided up for an issue like this or if work started needs to be reported, but folks I work with and myself are interested in starting

Implement azureCloudImplementation.GetApiIngressStatus. Currently we don't create any resources for ingress to the API server.

Will be first time kops contributors, so any pertinent info to contributing that may be omitted from relevant docs is much appreciated

@rifelpet
Copy link
Member

@ngalantowicz thanks for stepping up! here is the function that needs implementing. you'll see the return type contains either an IP or Hostname. For clusters that use api load balancers with hostnames we specify the hostname, otherwise use the api load balancer's IP(s). For clusters that don't use api load balancers we return a list of the master instances' IPs. A good example of this is the openstack implementation.

For Azure, we don't yet support API load balancers so you'll only need to handle getting the master VMs' IPs.

Unfortunately we dont yet have azure code for listing VMs, only VMScaleSets. You'll likely need to add a new file and client for VMs (it only needs to support listing them, not create / update / delete) and add the client to the AzureCloud interface.

With the VM listing functionality you'll be able to list them all and filter for the VMs that are associated with the cluster and that have the master role tag.

I hope this helps, let me know if you need any additional assistance

@kenji-cloudnatix
Copy link
Contributor Author

Thanks, @ngalantowicz , @rifelpet ! Just wanted to make a quick comment for listing VMs. I hink we can reuse the code in etcd-manager like this.

@NickSchleicher
Copy link
Contributor

is there any additional information in regards to
Populate node labels from nodeIdentifier.IdentifyNodeChange?

i assume it might just mean labeling the nodes with the labels that can be specified in the instance group as seen below, by grabbing the tags from the cloud, but want to be sure i cover the whole depth of it

  nodeLabels:
    tier: web
    ...

i was going to start work on that, but it looks like there might be an existing error in this domain

I0120 18:14:43.199009       1 node_controller.go:144] sending patch for node "web000001": "{\"metadata\":{\"labels\":{\"\":\"web\"}}}"
W0120 18:14:43.204913       1 node_controller.go:110] failed to patch node labels on web000001: error applying patch to node: Node "web000001" is invalid: [metadata.labels: Invalid value: "": name part must be non-empty, metadata.labels: Invalid value: "": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')]

InstanceGroupNameTag = "kops.k8s.io_instancegroup"

info.Labels[strings.TrimPrefix(k, InstanceGroupNameTag)] = *v

Screen Shot 2021-01-20 at 1 47 30 PM

the tag that is in the cloud for the VMSS is also the "prefix" so the key is left as an empty string. Do we need this label and if so, what should the key be?

Also partially related, I noticed an instance group is required to have the label kops.k8s.io/cluster or else you'll get the error
error: must specify "kops.k8s.io/cluster" label with cluster name to create instanceGroup
is it an issue that this label has a / in it and never makes it to the cloud as a tag

@kenji-cloudnatix
Copy link
Contributor Author

Thanks for looking into, @NickSchleicher ! This looks like a bug in my original code. I believe we need to trim with clusterNodeTemplateLabel defined in

clusterNodeTemplateLabel = "k8s.io_cluster_node-template_label_"
.

You're right on the label kops.k8s.io/cluster as well. Currently CloudTagsForInstanceGroup has a logic to replace all occurrences of "/" to "_". I believe we need to convert "_" back to "/" in

info.Labels[strings.TrimPrefix(k, InstanceGroupNameTag)] = *v
.

@ngalantowicz
Copy link
Contributor

Reaching out with for more info
Support tag update of virtual networks

Seeing

// Only allow tags to be updated.
if changes.Tags == nil {
return nil
}
// TODO(kenji): Fix this. Update is not supported yet as updating the tag will recreate a virtual network,
// which causes an InUseSubnetCannotBeDeleted error.
klog.Infof("Skip updating a Virtual Network with name: %s", fi.StringValue(e.Name))
return nil

An interesting behavior to note regarding route tables (which task and client files seem to be a mirror of the vnet files):
When creating a cluster with existing route table no new route table is created however the inUseError does occur and cluster delete fails due to such. If I manually remove the tags from the route table and run delete again no issues.

Are these two things similar and is there any vision/insight on how youd go about implementing tag update support?
Im thinking the work is to update tags when using pre-existing resources and remove tags when pre-existing resource was used is the goal?

@kenji-cloudnatix

@kenji-cloudnatix
Copy link
Contributor Author

Im thinking the work is to update tags when using pre-existing resources and remove tags when pre-existing resource was used is the goal?

Yes, I think so, but I'm not certain if removing the tag is what kops does for other CSPs. I think other people can chime-in.

The issues with updating a tag of vnet was that it causes an InUseSubnetCannotBeDeleted error. I haven't fully figured out a cause, but my guess is that we need to set the Subnet field of VirtualNetworkPropertiesFormat.

https://raw.githubusercontent.com/Azure/azure-sdk-for-go/master/services/network/mgmt/2020-06-01/network/models.go

Otherwise, the update request attempts to delete an existing subnet (and then the request fails).

@rifelpet
Copy link
Member

FWIW, based on the code in pkg/resources it doesn't seem that Kops removes any tags on shared resources when a cluster is deleted.

@NickSchleicher
Copy link
Contributor

From a glance it looks like we don't have support yet for multi AZ instances
https://github.com/kubernetes/kops/blob/master/pkg/model/azuremodel/vmscaleset.go#L68

Coming from an AWS background where the subnets are zonal, but from looking at the other models, it looks like to go about implementing this it would more closely track the GCE model?

@olemarkus
Copy link
Member

Is the CSI driver used for Azure Disks?
K8s is looking to disable the in-tree disk plugins as of kubernetes 1.22 and we need to be ready for that. See #10777

@kenji-cloudnatix
Copy link
Contributor Author

@olemarkus , I see. No, the CSI driver is not used for Azure Disks. Thanks!

@hakman
Copy link
Member

hakman commented Feb 23, 2021

kOps v1.20.0-beta.1 was just released today. The plan is to have another beta in 1-2 weeks and the final release in in 3-4 weeks.
Please test the new Azure support as much as possible and report any issues you find.

@rifelpet
Copy link
Member

Would folks find terraform support for azure clusters useful? If someone is interested in implementing it I can provide assistance.

@finnje
Copy link

finnje commented Mar 9, 2021

I wanted to test out kOps in azure. Where can I get the kops binary that supports the --azure args?

@ngalantowicz
Copy link
Contributor

I wanted to test out kOps in azure. Where can I get the kops binary that supports the --azure args?

@finnje can build from master branch and or kops release v1.20.0-beta.1 supports azure features.

@hakman
Copy link
Member

hakman commented Apr 17, 2021

@kenji-cloudnatix Do you think we could also add an entry with recommended image in the channels file for Azure?
I guess the URN would be something like Canonical:0001-com-ubuntu-server-focal:20_04-lts:20.04.20210315 and the entry would be similar to the GCE one:

kops/channels/alpha

Lines 77 to 80 in 5bb54a3

- name: "ubuntu-os-cloud/ubuntu-2004-focal-v20210315"
providerID: gce
architectureID: amd64
kubernetesVersion: ">=1.18.0"

@kenji-cloudnatix
Copy link
Contributor Author

I see. @hakman , let me take a look!

@alexivanoff
Copy link

Is it possible to configure Availability zone for master and node Scale Sets?
It tried to use "--zones eastus-1, eastus-2, eastus-3" but kops just created 3 Scale Sets without "Availability zone" settings.

@johngmyers
Copy link
Member

Would it be possible for Azure to implement a fi.Authenticator/fi.Verifier pair so that instances can authenticate bootstrap requests to kops-controller using cloud-provided instance credentials?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2021
@rifelpet
Copy link
Member

rifelpet commented Oct 4, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 1, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests