Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to set FELIX_IGNORELOOSERPF for Calico #10441

Closed
mosheavni opened this issue Dec 17, 2020 · 2 comments · Fixed by #10442
Closed

Add option to set FELIX_IGNORELOOSERPF for Calico #10441

mosheavni opened this issue Dec 17, 2020 · 2 comments · Fixed by #10442
Assignees

Comments

@mosheavni
Copy link

mosheavni commented Dec 17, 2020

1. What kops version are you running? The command kops version, will display
this information.

Version 1.18.2

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-13T18:42:23Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:09:48Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
AWS

4-7. I updated my cluster's instance groups (nodes + masters) to use this image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201112.1 which has the sysctl conf of net.ipv4.conf.all.rp_filter set to 2.
This causes the calico node to not start and go on a crashloopback with the error:

2020-12-17 09:21:27.382 [WARNING][501] int_dataplane.go 357: Failed to query VXLAN device error=Link not found

According to this: kubernetes-sigs/kind#891 (comment)
Setting the env var FELIX_IGNORELOOSERPF=true on the calico daemonset resolves the issue, however, it is not persisted.
Is there any way to tell calico to start with this env var? Is there a different more sane solution?

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: clustername-us-east-1.k8s.local
spec:
  api:
    loadBalancer:
      type: Internal
  authorization:
    rbac: {}
  channel: stable
  cloudLabels:
    datadog: "true"
    k8s.io/cluster-autoscaler/enabled: "true"
    k8s.io/cluster-autoscaler/clustername-us-east-1.k8s.local: "true"
  cloudProvider: aws
  configBase: s3://state-store/clustername-us-east-1.k8s.local
  containerRuntime: docker
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    - instanceGroup: master-us-east-1b
      name: b
    - instanceGroup: master-us-east-1c
      name: c
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    - instanceGroup: master-us-east-1b
      name: b
    - instanceGroup: master-us-east-1c
      name: c
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeDNS:
    nodeLocalDNS:
      enabled: true
    provider: CoreDNS
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
  - 192.168.0.0/16
  - 10.1.0.0/16
  kubernetesVersion: v1.15.12
  masterInternalName: api.internal.clustername-us-east-1.k8s.local
  masterPublicName: api.clustername-us-east-1.k8s.local
  networkCIDR: 10.28.0.0/16
  networkID: vpc-reducted
  networking:
    calico:
      majorVersion: v3
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 192.168.0.0/16
  subnets:
  - cidr: 10.28.160.0/19
    name: us-east-1a
    type: Private
    zone: us-east-1a
  - cidr: 10.28.192.0/19
    name: us-east-1b
    type: Private
    zone: us-east-1b
  - cidr: 10.28.224.0/19
    name: us-east-1c
    type: Private
    zone: us-east-1c
  - cidr: 10.28.12.0/22
    name: utility-us-east-1a
    type: Utility
    zone: us-east-1a
  - cidr: 10.28.16.0/22
    name: utility-us-east-1b
    type: Utility
    zone: us-east-1b
  - cidr: 10.28.20.0/22
    name: utility-us-east-1c
    type: Utility
    zone: us-east-1c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-12-15T09:55:50Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: clustername-us-east-1.k8s.local
  name: master-us-east-1a
spec:
  cloudLabels:
    NI-Category: EC2
    NI-CostType: RI
    NI-Environment: Production
    NI-GroupName: Platform
    NI-Name: Production-k8s-master-RI
    NI-ServiceName: k8s-master
    NI-System: xSite
    datadog: "true"
    datadog_aws: "true"
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201112.1
  machineType: m5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1a
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-12-15T09:55:50Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: clustername-us-east-1.k8s.local
  name: master-us-east-1b
spec:
  cloudLabels:
    NI-Category: EC2
    NI-CostType: RI
    NI-Environment: Production
    NI-GroupName: Platform
    NI-Name: Production-k8s-master-RI
    NI-ServiceName: k8s-master
    NI-System: xSite
    datadog: "true"
    datadog_aws: "true"
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201112.1
  machineType: m5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1b
  role: Master
  subnets:
  - us-east-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-12-15T09:55:50Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: clustername-us-east-1.k8s.local
  name: master-us-east-1c
spec:
  cloudLabels:
    NI-Category: EC2
    NI-CostType: RI
    NI-Environment: Production
    NI-GroupName: Platform
    NI-Name: Production-k8s-master-RI
    NI-ServiceName: k8s-master
    NI-System: xSite
    datadog: "true"
    datadog_aws: "true"
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201112.1
  machineType: m5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1c
  role: Master
  subnets:
  - us-east-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-12-15T10:04:04Z"
  generation: 4
  labels:
    kops.k8s.io/cluster: clustername-us-east-1.k8s.local
  name: production-rts
spec:
  cloudLabels:
    Category: EC2
    CostType: RI
    Environment: Production
    GroupName: Platform
    Name: Production-k8s-worker-rts-RI
    ServiceName: k8s-worker-rts
    System: xSite
    datadog: "true"
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/clustername-us-east-1.k8s.local: ""
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201112.1
  machineType: m5.xlarge
  maxSize: 20
  minSize: 20
  nodeLabels:
    kops.k8s.io/instancegroup: production-rts
  role: Node
  subnets:
  - us-east-1a
  - us-east-1b
  - us-east-1c
@mosheavni mosheavni changed the title Need to set env var for calico networking Need to set env var for calico DaemonSet Dec 17, 2020
@hakman
Copy link
Member

hakman commented Dec 17, 2020

@MosheM123 I am familiar with the issue, will try to add this new option to kOps 1.19 that will be released soon.

@hakman hakman changed the title Need to set env var for calico DaemonSet Add option to set FELIX_IGNORELOOSERPF for Calico Dec 17, 2020
@hakman
Copy link
Member

hakman commented Dec 17, 2020

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants