Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support antrea as network policy provider in kube-up #100736

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1,616 changes: 1,616 additions & 0 deletions cluster/addons/antrea/common.yaml

Large diffs are not rendered by default.

262 changes: 262 additions & 0 deletions cluster/addons/antrea/linux/daemonset.yaml
@@ -0,0 +1,262 @@
kind: DaemonSet
apiVersion: apps/v1
metadata:
labels:
app: antrea
addonmanager.kubernetes.io/mode: Reconcile
component: antrea-node-init
name: antrea-node-init
namespace: kube-system
spec:
selector:
matchLabels:
app: antrea
component: antrea-node-init
template:
metadata:
labels:
app: antrea
component: antrea-node-init
spec:
nodeSelector:
antrea.io/ds-ready: "true"
kubernetes.io/os: linux
hostPID: true
hostNetwork: true
containers:
- name: node-init
image: gcr.io/google-containers/startup-script:v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use v2? v1 was built in 2016.

imagePullPolicy: IfNotPresent
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/default/
name: host-etc-default
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
set -o errexit
set -o pipefail
set -o nounset

echo "Node initialization start"
sed 's/^ip_aliases = .*/ip_aliases = false/g' -i /etc/default/instance_configs.cfg

# kill restart google-network-daemon then systemd on host will restart it
killall google_network_daemon

echo "Node initialization complete"
volumes:
- hostPath:
path: /etc/default
name: host-etc-default
---
apiVersion: apps/v1
kind: DaemonSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @jkh52
There are concerns with the total number of DaemonSets we are running.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the commit references windows but I notice this says linux in the path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I repurposed this to support linux as well. @jkh52 what is the concern about the number of DaemonSets?

Copy link
Contributor

@jkh52 jkh52 Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience with DaemonSets, when there are enough nodes (example: hundreds or more) and HA clusters (3 masters) the proxy-server could have runaway memory ballooning (at times like rolling master restarts, when we expect many agent grpc connections). I found the root cause to be: the agent authentication path in proxy-server was getting throttled by client-go and accumulating resources.

I found stability improvements by tuning:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there node startup scripts that this could be placed in? Or is this solving a chicken/egg problem where Antrea network fabric is not known to be selected before the node boots?

metadata:
labels:
app: antrea
addonmanager.kubernetes.io/mode: Reconcile
component: antrea-agent
name: antrea-agent
namespace: kube-system
spec:
selector:
matchLabels:
app: antrea
component: antrea-agent
template:
metadata:
labels:
app: antrea
component: antrea-agent
spec:
containers:
- args:
- --config
- /etc/antrea/antrea-agent.conf
- --logtostderr=false
- --log_dir=/var/log/antrea
- --alsologtostderr
- --log_file_max_size=100
- --log_file_max_num=4
- --v=0
command:
- antrea-agent
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
image: projects.registry.vmware.com/antrea/antrea-ubuntu:v1.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these images be hosted from a canonical Kubernetes repository?

livenessProbe:
exec:
command:
- /bin/sh
- -c
- container_liveness_probe agent
failureThreshold: 5
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
name: antrea-agent
ports:
- containerPort: 10350
name: api
protocol: TCP
readinessProbe:
failureThreshold: 5
httpGet:
host: localhost
path: /readyz
port: api
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
resources:
requests:
cpu: 200m
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/antrea/antrea-agent.conf
name: antrea-config
readOnly: true
subPath: antrea-agent.conf
- mountPath: /var/run/antrea
name: host-var-run-antrea
- mountPath: /var/run/openvswitch
name: host-var-run-antrea
subPath: openvswitch
- mountPath: /var/lib/cni
name: host-var-run-antrea
subPath: cni
- mountPath: /var/log/antrea
name: host-var-log-antrea
- mountPath: /host/proc
name: host-proc
readOnly: true
- mountPath: /host/var/run/netns
mountPropagation: HostToContainer
name: host-var-run-netns
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
- args:
- --log_file_max_size=100
- --log_file_max_num=4
command:
- start_ovs
image: projects.registry.vmware.com/antrea/antrea-ubuntu:v1.0.0
livenessProbe:
exec:
command:
- /bin/sh
- -c
- timeout 10 container_liveness_probe ovs
failureThreshold: 5
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 10
name: antrea-ovs
resources:
requests:
cpu: 200m
securityContext:
capabilities:
add:
- SYS_NICE
- NET_ADMIN
- SYS_ADMIN
- IPC_LOCK
volumeMounts:
- mountPath: /var/run/openvswitch
name: host-var-run-antrea
subPath: openvswitch
- mountPath: /var/log/openvswitch
name: host-var-log-antrea
subPath: openvswitch
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
initContainers:
- command:
- install_cni
image: projects.registry.vmware.com/antrea/antrea-ubuntu:v1.0.0
name: install-cni
resources:
requests:
cpu: 100m
securityContext:
capabilities:
add:
- SYS_MODULE
volumeMounts:
- mountPath: /etc/antrea/antrea-cni.conflist
name: antrea-config
readOnly: true
subPath: antrea-cni.conflist
- mountPath: /host/etc/cni/net.d
name: host-cni-conf
- mountPath: /host/opt/cni/bin
name: host-cni-bin
- mountPath: /lib/modules
name: host-lib-modules
readOnly: true
- mountPath: /var/run/antrea
name: host-var-run-antrea
nodeSelector:
antrea.io/ds-ready: "true"
kubernetes.io/os: linux
priorityClassName: system-node-critical
serviceAccountName: antrea-agent
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
volumes:
- configMap:
name: antrea-config-5ct9ktdt77
name: antrea-config
- hostPath:
path: /etc/cni/net.d
name: host-cni-conf
- hostPath:
path: /home/kubernetes/bin
name: host-cni-bin
- hostPath:
path: /proc
name: host-proc
- hostPath:
path: /var/run/netns
name: host-var-run-netns
- hostPath:
path: /var/run/antrea
type: DirectoryOrCreate
name: host-var-run-antrea
- hostPath:
path: /var/log/antrea
type: DirectoryOrCreate
name: host-var-log-antrea
- hostPath:
path: /lib/modules
name: host-lib-modules
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
updateStrategy:
type: RollingUpdate
---
4 changes: 4 additions & 0 deletions cluster/gce/config-common.sh
Expand Up @@ -163,3 +163,7 @@ export WINDOWS_INFRA_CONTAINER="k8s.gcr.io/pause:3.4.1"
export CSI_PROXY_STORAGE_PATH="https://storage.googleapis.com/gke-release/csi-proxy"
# Version for csi-proxy
export CSI_PROXY_VERSION="v0.2.2-gke.0"
# Path for antrea kubeconfig file on Windows nodes.
export WINDOWS_ANTREA_KUBECONFIG_FILE="${WINDOWS_K8S_DIR}\antrea.kubeconfig"
# Path for antrea config file on Windows nodes
export WINDOWS_ANTREA_CONFIG_FILE="${WINDOWS_K8S_DIR}\antrea.conf"
33 changes: 29 additions & 4 deletions cluster/gce/config-default.sh
Expand Up @@ -167,7 +167,7 @@ export ENABLE_DOCKER_REGISTRY_CACHE=true

# Optional: Deploy a L7 loadbalancer controller to fulfill Ingress requests:
# glbc - CE L7 Load Balancer Controller
export ENABLE_L7_LOADBALANCING="${KUBE_ENABLE_L7_LOADBALANCING:-glbc}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad merge? Why is this getting deleted? Or is the comment above meant to be preserved?



# Optional: Enable Metrics Server. Metrics Server should be enable everywhere,
# since it's a critical component, but in the first release we need a way to disable
Expand Down Expand Up @@ -198,6 +198,17 @@ export MASTER_NODE_LABELS="${KUBE_MASTER_NODE_LABELS:-}"
NON_MASTER_NODE_LABELS="${KUBE_NON_MASTER_NODE_LABELS:-}"
WINDOWS_NON_MASTER_NODE_LABELS="${WINDOWS_NON_MASTER_NODE_LABELS:-}"

# Network Policy plugin specific settings for Linux.
# none - No network policy plugin installed on Linux
# calico - Install calico on Linux nodes to provide network policy support
# antrea - Install antrea on Linux nodes to provide network policy support
NETWORK_POLICY_PROVIDER="${NETWORK_POLICY_PROVIDER:-none}"

# Network Policy plugin specific settings for Windows.
# none - No network policy plugin installed on Windows
# antrea - Install Antrea on Windows nodes to provide network policy support
export WINDOWS_NETWORK_POLICY_PROVIDER="${WINDOWS_NETWORK_POLICY_PROVIDER:-none}"

if [[ "${PREEMPTIBLE_MASTER}" == "true" ]]; then
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-preemptible=true"
WINDOWS_NODE_LABELS="${WINDOWS_NODE_LABELS},cloud.google.com/gke-preemptible=true"
Expand All @@ -211,6 +222,8 @@ fi
# Windows nodes do not support Calico.
if [[ ${NETWORK_POLICY_PROVIDER:-} == "calico" ]]; then
NON_MASTER_NODE_LABELS="${NON_MASTER_NODE_LABELS:+${NON_MASTER_NODE_LABELS},}projectcalico.org/ds-ready=true"
elif [[ ${NETWORK_POLICY_PROVIDER:-} == 'antrea' ]] || [[ "${WINDOWS_NETWORK_POLICY_PROVIDER:-}" == "antrea" ]]; then
NON_MASTER_NODE_LABELS="${NON_MASTER_NODE_LABELS:+${NON_MASTER_NODE_LABELS},}antrea.io/ds-ready=true"
fi

# Optional: Enable netd.
Expand Down Expand Up @@ -402,9 +415,6 @@ STORAGE_BACKEND=${STORAGE_BACKEND:-}
# Networking plugin specific settings.
NETWORK_PROVIDER="${NETWORK_PROVIDER:-kubenet}" # none, kubenet

# Network Policy plugin specific settings.
NETWORK_POLICY_PROVIDER="${NETWORK_POLICY_PROVIDER:-none}" # calico

export NON_MASQUERADE_CIDR="0.0.0.0/0"

# How should the kubelet configure hairpin mode?
Expand Down Expand Up @@ -566,3 +576,18 @@ export WINDOWS_ENABLE_DSR="${WINDOWS_ENABLE_DSR:-false}"
# TLS_CIPHER_SUITES defines cipher suites allowed to be used by kube-apiserver.
# If this variable is unset or empty, kube-apiserver will allow its default set of cipher suites.
export TLS_CIPHER_SUITES=""

# Optional: URL to download antrea-cni.exe for Windows node
export WINDOWS_ANTREA_CNI_BINARY_URL="${WINDOWS_ANTREA_CNI_BINARY_URL:-https://github.com/vmware-tanzu/antrea/releases/download/v0.13.1/antrea-cni-windows-x86_64.exe}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want CNI and Agent at the same version? If so it may make sense to use a common version or URL base.

Copy link
Member Author

@anfernee anfernee Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely, but I am not assuming that. It could be different .gke-x versions in the future, so I would vote keep the flexibility there. Also, the URL from github.com and gcr.io might be using different patterns.


# Optional: URL to download antrea-agent.exe for Windows node
export WINDOWS_ANTREA_AGENT_BINARY_URL="${WINDOWS_ANTREA_AGENT_BINARY_URL:-https://github.com/vmware-tanzu/antrea/releases/download/v0.13.1/antrea-cni-windows-x86_64.exe}"

# Optional: URL to a script that downloads and installs OVS for Windows node
export WINDOWS_OVS_INSTALLER_URL="${WINDOWS_OVS_INSTALLER_URL:-https://raw.githubusercontent.com/vmware-tanzu/antrea/v0.13.1/hack/windows/Install-OVS.ps1}"

# Optional: Image project for Windows node
WINDOWS_NODE_IMAGE_PROJECT=${WINDOWS_NODE_IMAGE_PROJECT:-windows-cloud}

# Optional: Image name for Windows node
WINDOWS_NODE_IMAGE=${WINDOWS_NODE_IMAGE:-}
3 changes: 3 additions & 0 deletions cluster/gce/config-test.sh
Expand Up @@ -274,6 +274,9 @@ export ENABLE_NODELOCAL_DNS=${KUBE_ENABLE_NODELOCAL_DNS:-false}
# Windows nodes do not support Calico.
if [[ ${NETWORK_POLICY_PROVIDER:-} = 'calico' ]]; then
NON_MASTER_NODE_LABELS="${NON_MASTER_NODE_LABELS:+${NON_MASTER_NODE_LABELS},}projectcalico.org/ds-ready=true"
elif [[ ${NETWORK_POLICY_PROVIDER:-} = 'antrea' ]]; then
# antrea-
NON_MASTER_NODE_LABELS="${NON_MASTER_NODE_LABELS:+${NON_MASTER_NODE_LABELS},}antrea.io/ds-ready=true"
fi

# Enable metadata concealment by firewalling pod traffic to the metadata server
Expand Down
14 changes: 14 additions & 0 deletions cluster/gce/gci/configure-helper.sh
Expand Up @@ -760,6 +760,9 @@ function create-master-auth {
if [[ -n "${KUBE_PROXY_TOKEN:-}" ]]; then
append_or_replace_prefixed_line "${known_tokens_csv}" "${KUBE_PROXY_TOKEN}," "system:kube-proxy,uid:kube_proxy"
fi
if [[ -n "${ANTREA_TOKEN:-}" ]]; then
append_or_replace_prefixed_line "${known_tokens_csv}" "${ANTREA_TOKEN}," "system:antrea,uid:antrea"
fi
if [[ -n "${NODE_PROBLEM_DETECTOR_TOKEN:-}" ]]; then
append_or_replace_prefixed_line "${known_tokens_csv}" "${NODE_PROBLEM_DETECTOR_TOKEN}," "system:node-problem-detector,uid:node-problem-detector"
fi
Expand Down Expand Up @@ -2772,6 +2775,7 @@ EOF
setup-addon-manifests "admission-controls" "limit-range" "gce"
fi
setup-addon-manifests "addons" "admission-resource-quota-critical-pods"

if [[ "${NETWORK_POLICY_PROVIDER:-}" == "calico" ]]; then
setup-addon-manifests "addons" "calico-policy-controller"

Expand All @@ -2781,7 +2785,17 @@ EOF
# Configure Calico CNI directory.
local -r ds_file="${dst_dir}/calico-policy-controller/calico-node-daemonset.yaml"
sed -i -e "s@__CALICO_CNI_DIR__@/home/kubernetes/bin@g" "${ds_file}"

fi

if [[ "${WINDOWS_NETWORK_POLICY_PROVIDER:-}" == "antrea" || "${NETWORK_POLICY_PROVIDER:-}" == "antrea" ]]; then
setup-addon-manifests "addons" "antrea"
fi

if [[ "${NETWORK_POLICY_PROVIDER:-}" == "antrea" ]]; then
setup-addon-manifests "addons" "antrea/linux"
fi

if [[ "${ENABLE_DEFAULT_STORAGE_CLASS:-}" == "true" ]]; then
setup-addon-manifests "addons" "storage-class/gce"
fi
Expand Down