New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support antrea as network policy provider in kube-up #100736
Conversation
@anfernee: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: anfernee The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@anfernee would it be possible to first get a linux / antrea change in? (before we try to do windows?). I'd expect a PR that introduces antrea on linux with network policy switched on, followed by a periodic CI job that exercises these changes and runs the network policy tests (and the results show up in test-grid). Then we figure out how to do the same with windows. WDYT? |
I tried linux version as well, but it failed to start on COS images. I don't know why yet (created antrea-io/antrea#2021). The change will help us testing antrea-windows compatibility, which could be used to build a verification pipeline for antrea. It would be very helpful for that task if this can be merged separately. |
@anfernee you don't need to use COS images, you can use kube-up with ubuntu images too! |
@anfernee So let's see if there is a limited set of changes that will help run network policy tests on linux first. |
@@ -0,0 +1,208 @@ | |||
apiVersion: apps/v1 | |||
kind: DaemonSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @jkh52
There are concerns with the total number of DaemonSets we are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the commit references windows but I notice this says linux in the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I repurposed this to support linux as well. @jkh52 what is the concern about the number of DaemonSets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience with DaemonSets, when there are enough nodes (example: hundreds or more) and HA clusters (3 masters) the proxy-server could have runaway memory ballooning (at times like rolling master restarts, when we expect many agent grpc connections). I found the root cause to be: the agent authentication path in proxy-server was getting throttled by client-go and accumulating resources.
I found stability improvements by tuning:
- proxy-agent syncinterval
- proxy-server k8s client throttling (Add flags to allow tuning k8s client throttling. kubernetes-sigs/apiserver-network-proxy#192)
- capping the maximum number of agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there node startup scripts that this could be placed in? Or is this solving a chicken/egg problem where Antrea network fabric is not known to be selected before the node boots?
@cheftako: GitHub didn't allow me to request PR reviews from the following users: jkh52. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -566,3 +573,18 @@ export WINDOWS_ENABLE_DSR="${WINDOWS_ENABLE_DSR:-false}" | |||
# TLS_CIPHER_SUITES defines cipher suites allowed to be used by kube-apiserver. | |||
# If this variable is unset or empty, kube-apiserver will allow its default set of cipher suites. | |||
export TLS_CIPHER_SUITES="" | |||
|
|||
# Optional: URL to download antrea-cni.exe for Windows node | |||
export WINDOWS_ANTREA_CNI_BINARY_URL="${WINDOWS_ANTREA_CNI_BINARY_URL:-https://github.com/vmware-tanzu/antrea/releases/download/v0.13.1/antrea-cni-windows-x86_64.exe}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want CNI and Agent at the same version? If so it may make sense to use a common version or URL base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely, but I am not assuming that. It could be different .gke-x
versions in the future, so I would vote keep the flexibility there. Also, the URL from github.com and gcr.io might be using different patterns.
# Download ovs installer | ||
$tmp_dir = 'C:\k8s_tmp' | ||
New-Item -Force -ItemType 'directory' $tmp_dir | Out-Null | ||
$filename = 'Install-OVS.ps1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work for GKE with Google-signed OVS.
This script only works for self-signed OVS.
We have to change the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest version of the script, we added a new command-line option, ImportCertificate
(https://github.com/vmware-tanzu/antrea/blob/002379983cbf685ffb1623cfb953d4443b18ef14/hack/windows/Install-OVS.ps1#L21). You can set it to False
when you bring your own signed binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Sure, that could be a good solution, but even then, my other comment (line 483) still applies: input arguments must change.
- The other issue we have is that this file is not the right place to become so specific about the network policy component.
Here we have to only decide if we do or don't want to install a nwk-policy component, and if yes, we have to delegate the control to a more component-specific file (which would be Antrea installer in this case) to take care of the installation for us. If we directly install Antrea here, over time as more and more network components emerge this file will get polluted and messy with all the new installation calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep it as is. @mmottaghi can you add a patch to support officially signed ovs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
fyi @jayunit100 |
/cc @jayunit100 |
Don't we already run network policy tests with kube-up on linux with calico? cc @aojea |
yes we do, https://testgrid.k8s.io/sig-network-gce#network-policies,%20google-gce , it is Calico because the code was there since the beginning and we needed a way to test the new testing implemented in the e2e framework for network policies I thought this was only for windows, I didn't know the goal was to set this for linux too ... I think we shouldn't build a CNI testing matrix in k/k ... there are more than 20 CNI plugins in the web https://kubernetes.io/docs/concepts/cluster-administration/networking/ , what happen if a new plugin wants to add its job too? |
also, this has 26 fields in v1alpha1, is this stable enough to gate code in k/k? |
antrea recently reached 1.0.0 milestone. @antoninbas Is there a plan to promote the api to |
Original purpose is to support windows. There was a comment to enable that on Linux nodes as well, because Linux nodes are required, and it's strange to leave it behind. I think it makes sense, so added both. |
We do intend to promote the APIs to Beta and eventually GA, but we make a decision for each CRD separately for each new Antrea release. The Antrea policy API is still going through changes, even though we consider the implementation itself stable. Note that these Antrea-specific APIs are of course not exercised by any upstream test… |
/hold |
Let me suggest a different direction, why don't you use the new repo that the network policy folks are going to create? You can use prow from there and we don't keep growing the tech debt in k/k ,
Also, we avoid this ^^^, Kubernetes network policy is the same as Ingress, it only defines the API not the implementation |
I've removed the v1alpha1 APIs and changed the config to disable other features that relies on the alpha apis. |
* Linux Support Use the following environment to enable antrea on linux for kube-up: ``` export NETWORK_POLICY_PROVIDER=antrea ``` Currently, antrea is not supported on cos yet, so use ubuntu instead. For example, the following environments: ``` export KUBE_NODE_OS_DISTRIBUTION=ubuntu export KUBE_GCE_NODE_IMAGE=ubuntu-2004-focal-v20200423 export KUBE_GCE_NODE_PROJECT=ubuntu-os-cloud ``` * Windows support Use the following environment to enable antrea on windows for kube-up: ``` export WINDOWS_NETWORK_POLICY_PROVIDER=antrea ``` This commit also allows to specify which windows image to use in the cluster by ``` export WINDOWS_NODE_IMAGE_PROJECT=<project_name> export WINDOWS_NODE_IMAGE=<image_name> ```
These CRDs are not required for basic functionality, so remove them.
@anfernee: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Can someone take another look at this :) |
repeating again, please feel to experiment in this PR. but we will not merge this as mentioned several times earlier in this PR. |
Hey @dims this is already the smallest change to support antrea. What do you suggest to make this mergeable? |
hostNetwork: true | ||
containers: | ||
- name: node-init | ||
image: gcr.io/google-containers/startup-script:v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use v2? v1 was built in 2016.
@@ -0,0 +1,208 @@ | |||
apiVersion: apps/v1 | |||
kind: DaemonSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there node startup scripts that this could be placed in? Or is this solving a chicken/egg problem where Antrea network fabric is not known to be selected before the node boots?
valueFrom: | ||
fieldRef: | ||
fieldPath: spec.nodeName | ||
image: projects.registry.vmware.com/antrea/antrea-ubuntu:v1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these images be hosted from a canonical Kubernetes repository?
@@ -167,7 +167,7 @@ export ENABLE_DOCKER_REGISTRY_CACHE=true | |||
|
|||
# Optional: Deploy a L7 loadbalancer controller to fulfill Ingress requests: | |||
# glbc - CE L7 Load Balancer Controller | |||
export ENABLE_L7_LOADBALANCING="${KUBE_ENABLE_L7_LOADBALANCING:-glbc}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad merge? Why is this getting deleted? Or is the comment above meant to be preserved?
@@ -83,17 +83,20 @@ function set-linux-node-image() { | |||
# Requires: | |||
# WINDOWS_NODE_OS_DISTRIBUTION | |||
# Sets: | |||
# WINDOWS_NODE_IMAGE_PROJECT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the set-windows-node-image function being changed? I wouldn't expect that to be updated.
@@ -117,6 +117,25 @@ write_files: | |||
[Install] | |||
WantedBy=multi-user.target | |||
|
|||
- path: /lib/systemd/system/google-network-daemon.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will google-network-daemon be installed as a systemd service on all COS/GCI images going forward even when Antrea isn't installed or is there a flag somewhere that ensures this is only active when Antrea is enabled.
AntreaPolicy: false | ||
Traceflow: false | ||
clientConnection: | ||
kubeconfig: \etc\kubernetes\antrea.kubeconfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix with C:
or use /
delimiters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyje \etc\kubernetes\antrea.kubeconfig
works.
This is the way the Antrea team uses it.
@@ -966,6 +1140,9 @@ function Prepare-CniNetworking { | |||
Install_Cni_Binaries | |||
Configure_Dockerd_CniNetworking | |||
} | |||
if (${env:WINDOWS_NETWORK_POLICY_PROVIDER} -eq "antrea") { | |||
Configure_Antrea_CniNetworking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tested with containerd as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried linux version as well, but it failed to start on COS images. I don't know why yet (created antrea-io/antrea#2021). The change will help us testing antrea-windows compatibility, which could be used to build a verification pipeline for antrea. It would be very helpful for that task if this can be merged separately.
In my opinion, new addons and callers should not be added to this directory. Every new thing added is one more obstacle to cleaning up it and removing it.
- apiGroups: | ||
- core.antrea.tanzu.vmware.com | ||
resources: | ||
- clustergroups | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- create | ||
- update | ||
- patch | ||
- delete | ||
- apiGroups: | ||
- crd.antrea.io | ||
resources: | ||
- clustergroups | ||
verbs: | ||
- get | ||
- list | ||
- watch | ||
- create | ||
- update | ||
- patch | ||
- delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this addon doing with these permissions (and all the CRUD permissions below). this seems like a very wide set of permissions
@anfernee: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There is ongoing discussion where this change should land, so mark it as draft. |
@ALL It looks like this |
Use the following environment to enable antrea on linux for kube-up:
Currently, antrea is not supported on cos yet, so use ubuntu instead.
For example, the following environments:
Use the following environment to enable antrea on windows for kube-up:
This commit also allows to specify which windows image to use in the
cluster by
What type of PR is this?
/kind feature
What this PR does / why we need it:
The PR allows to create antrea-enabled windows nodes with
kube-up.sh
. It only allows to create windows nodes now, because there is some problem running antrea on COS based images.Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
n/a