Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with deploy KIP on AWS EKS cluster #169

Closed
ghost opened this issue Aug 27, 2020 · 5 comments · Fixed by #173
Closed

Problem with deploy KIP on AWS EKS cluster #169

ghost opened this issue Aug 27, 2020 · 5 comments · Fixed by #173
Labels
bug Something isn't working

Comments

@ghost
Copy link

ghost commented Aug 27, 2020

Hello. I have a problem with run KIP on AWS eks cluster. First I have created eks with this instruction: https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html. I tested cluster and a can run deployments. Next, I cloned KIP repo and added credentials (accessKeyID and secretAccessKey) to kip/base/provider.yaml file. Next I executed "kustomize build base/ | kubectl apply -f –" command and I have problem with kip container. Do you know what is going on or do you have more detailed instruction? Maybe I forgot something. I also tried add minimum IAM Permissions to role created for eks. I tried version with minikube as well and it worked excellent.

log info:

F0825 08:20:15.743785       1 main.go:133] error initializing provider kip: error configuring cloud client: Error setting up cloud client: Could not configure AWS cloud client authorization: Error validationg connection to AWS: AuthFailure: AWS was not able to validate the provided access credentials
        status code: 401, request id: 7205bc35-2b6a-48ff-baeb-0eddfd2d4824

description of the container with the problem

Name:                      kip-provider-0
Namespace:                 kube-system
Priority:                  0
Node:                      ip-192-168-1-242.ec2.internal/192.168.1.242
Start Time:                Wed, 26 Aug 2020 09:23:09 +0000
Labels:                    app=kip-provider
                           controller-revision-hash=kip-provider-6d97b44c7
                           statefulset.kubernetes.io/pod-name=kip-provider-0
Annotations:               kubernetes.io/psp: eks.privileged
Status:                    Terminating (lasts 3h42m)
Termination Grace Period:  30s
IP:                        192.168.1.126
IPs:
  IP:           192.168.1.126
Controlled By:  StatefulSet/kip-provider
Init Containers:
  init-cert:
    Container ID:  docker://2f67aa09fb9204565ef8f8129e43e2179163f4b6e2e9df4c1a33e028c303754e
    Image:         elotl/init-cert:latest
    Image ID:      docker-pullable://elotl/init-cert@sha256:781e404f73ab2e78ba1de2aba9ed569fe9f8fe920c5aeb3cd4143b1eb39facc1
    Port:          <none>
    Host Port:     <none>
    Command:
      bash
      -c
      mkdir -p $(CERT_DIR) && /opt/csr/get-cert.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 26 Aug 2020 09:23:27 +0000
      Finished:     Wed, 26 Aug 2020 09:23:29 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      NODE_NAME:  kip-provider-0 (v1:metadata.name)
      CERT_DIR:   /data/kubelet-pki
    Mounts:
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kip-provider-token-n64lj (ro)
Containers:
  kip:
    Container ID:  docker://7adef4f167d4b756b048d3e6b0e20facb66a5ed90043bd30404e39e8bb6009c7
    Image:         elotl/kip:latest
    Image ID:      docker-pullable://elotl/kip@sha256:11508b91c7420e933b935f96d7235ca1d7133d4bd1e1b878935b23d9ab876143
    Port:          <none>
    Host Port:     <none>
    Command:
      /kip
      --provider
      kip
      --provider-config
      /etc/kip/provider.yaml
      --network-agent-secret
      kube-system/kip-network-agent
      --disable-taint
      --klog.logtostderr
      --klog.v=2
      --metrics-addr=:10255
      --nodename=$(VKUBELET_NODE_NAME)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 26 Aug 2020 09:26:27 +0000
      Finished:     Wed, 26 Aug 2020 09:26:29 +0000
    Ready:          False
    Restart Count:  5
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:     10m
      memory:  100Mi
    Environment:
      NODE_NAME:                 (v1:spec.nodeName)
      VKUBELET_NODE_NAME:       kip-provider-0 (v1:metadata.name)
      APISERVER_CERT_LOCATION:  /opt/kip/data/kubelet-pki/$(VKUBELET_NODE_NAME).crt
      APISERVER_KEY_LOCATION:   /opt/kip/data/kubelet-pki/$(VKUBELET_NODE_NAME).key
    Mounts:
      /etc/kip from provider-yaml (rw)
      /lib/modules from lib-modules (ro)
      /opt/kip/data from data (rw)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kip-provider-token-n64lj (ro)
  kube-proxy:
    Container ID:  docker://b1057a53b6ba6e416f7c984082be86a8633fa075ac2c5a9a5812dd4f9acf67f7
    Image:         k8s.gcr.io/kube-proxy:v1.18.3
    Image ID:      docker-pullable://k8s.gcr.io/kube-proxy@sha256:6a093c22e305039b7bd6c3f8eab8f202ad8238066ed210857b25524443aa8aff
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      exec kube-proxy --oom-score-adj=-998 --bind-address=127.0.0.1 --v=2
    State:          Running
      Started:      Wed, 26 Aug 2020 09:23:30 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kip-provider-token-n64lj (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-kip-provider-0
    ReadOnly:   false
  provider-yaml:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kip-config-t56f8td654
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/kip-xtables.lock
    HostPathType:  FileOrCreate
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  kip-provider-token-n64lj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kip-provider-token-n64lj
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>
@myechuri
Copy link
Contributor

@241423 : thank you for trying kip!

F0825 08:20:15.743785       1 main.go:133] error initializing provider kip: error configuring cloud client: Error setting up cloud client: Could not configure AWS cloud client authorization: Error validationg connection to AWS: AuthFailure: AWS was not able to validate the provided access credentials
        status code: 401, request id: 7205bc35-2b6a-48ff-baeb-0eddfd2d4824

From above log, it looks like the AWS credentials kip received are invalid. Can you please confirm that accessKeyID and secretAccessKey filled in kip/base/provider.yaml file are valid?

Meanwhile, i will reproduce your steps and see i get the same error. Will update the issue with findings. Thanks.

@ldx
Copy link
Contributor

ldx commented Aug 30, 2020

@241423 if you rely on IAM instance profiles for AWS authentication (if you used kustomize build base | kubectl apply -f -, then by default that assumes this), please make sure the permissions documented at https://github.com/elotl/kip/blob/master/docs/kip-iam-permissions.md are present in the IAM instance profile that is attached to the worker nodes.

Another common source of issues is a typo in the access keys as @myechuri pointed out (if you configured access keys via provider.yaml or environment variables), or an inaccurate system clock on the instance where the kip pod lands.

@myechuri
Copy link
Contributor

Thanks for the pointers, @ldx !

@ghost
Copy link
Author

ghost commented Sep 3, 2020

Hi guys! Thank You for fast response. The problem has been solved. I prepared the environment from scratch and KIP started. It could be my fault attributing policies to the wrong node role. Now, I have another one problem and it could be related to source code. So, After started kip, I wanted to stop it with this command.

kubectl delete statefulset kip-provider -n kube-system

I could not stop kip with below command, described on main git page

kubectl delete -n kube-system statefulset kip

Next time I wanted run KIP with the same settings I have seen another problem and I can’t resolve it. I tried clone repo again, used kip docker image on the day it worked, tried elder release. Summary, when I use kip container logs command I see below output. Have You ever seen this error? I blacked out some info. Maybe it could be related to code, I found this article with similar error. https://www.joeshaw.org/understanding-go-panic-output/

2020-08-31 11:54:55.222787 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster xxxxxxxxxxxxxxxx
I0831 11:54:55.222809       1 etcd.go:129] Etcd server is ready to serve requests
I0831 11:54:55.222834       1 server.go:100] validating write access to etcd (will block until we can connect)
I0831 11:54:55.223822       1 server.go:110] write to etcd successful
I0831 11:54:55.223859       1 server.go:221] ControllerID: xxxxxxxxxxxxxxxxxxxxxxxxxxx
I0831 11:54:58.384109       1 aws.go:113] detected AWS region: "xxxxxx"
I0831 11:54:58.384129       1 config.go:228] using AWS region "xxxxx"
I0831 11:54:58.384134       1 config.go:248] Validating connection to AWS
I0831 11:54:58.384203       1 aws.go:122] Checking for credential errors
I0831 11:55:01.488246       1 aws.go:127] Using credentials from xxxxxxxxxxx
I0831 11:55:01.488264       1 aws.go:133] Validating read access
I0831 11:55:01.832908       1 config.go:252] Validated access to AWS
I0831 11:55:05.100324       1 network.go:147] Current vpc:  vpc-xxxxxxxxxxxxxxxxx
I0831 11:55:05.101430       1 network.go:154] Getting subnets and availability zones for VPC vpc-xxxxxxxxxxxxxxxxxxxxxx
I0831 11:55:05.281960       1 aws.go:196] cells will run in a private subnet (no route to internet gateway)
I0831 11:55:05.281980       1 config.go:453] controller will connect to nodes via private IPs
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1d76cdd]
 
goroutine 1 [running]:
github.com/elotl/kip/pkg/server/cloud/aws.awsSGToMilpa(0xc000*6****, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/travis/gopath/src/github.com/elotl/kip/pkg/server/cloud/aws/security_groups.go:2*6 +0x21d
github.com/elotl/kip/pkg/server/cloud/aws.(*AwsEC2).FindSecurityGroup(0xc000*****0, 0xc001454***, 0x30, 0xc00138****, 0x569***, 0xc00145****)
        /home/travis/gopath/src/github.com/elotl/kip/pkg/server/cloud/aws/security_groups.go:107 +0x572
github.com/elotl/kip/pkg/server/cloud/aws.(*AwsEC2).EnsureSecurityGroup(0xc000c3****, 0xc001454**0, 0x30, 0xc0002*****, 0x4, 0x4, 0xc0000c****, 0x1, 0x1, 0x0, ...)
        /home/travis/gopath/src/github.com/elotl/kip/pkg/server/cloud/aws/security_groups.go:117 +0x5d
github.com/elotl/kip/pkg/server/cloud/aws.(*AwsEC2).EnsureMilpaSecurityGroups(0xc000c3****, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/travis/gopath/src/github.com/elotl/kip/pkg/server/cloud/aws/security_groups.go:68 +0x1d3
github.com/elotl/kip/pkg/server.ConfigureCloud(0xc00051****, 0xc0000****, 0x1a, 0xc0009b****, 0x3, 0x1, 0x1, 0x0, 0x0)
        /home/travis/gopath/src/github.com/elotl/kip/pkg/server/config.go:455 +0x18f
github.com/elotl/kip/pkg/server.NewInstanceProvider(0x7ffdb6ef6077, 0x16, 0x7ffdb6****, 0xe, 0xc0002a2***, 0xd, 0x0, 0x0, 0x2c9a3cd, 0xd, ...)
        /home/travis/gopath/src/github.com/elotl/kip/pkg/server/server.go:230 +0x65e
main.main.func1(0x7ffdb6ef****, 0x16, 0x7ffdb6ef****, 0xe, 0x2c8d***, 0x5, 0x0, 0x0, 0x280a, 0x2c***cd, ...)
        /home/travis/gopath/src/github.com/elotl/kip/cmd/kip/main.go:107 +0x205
github.com/elotl/kip/vendor/github.com/elotl/node-cli/internal/commands/root.runRootCommandWithProviderAndClient(0x350e840, 0xc00050****, 0xc0000c****, 0x356****, 0xc0001e****, 0xc0001*****, 0x0, 0x0)
        /home/travis/gopath/src/github.com/elotl/kip/vendor/github.com/elotl/node-cli/internal/commands/root/root.go:142 +0x7**
github.com/elotl/kip/vendor/github.com/elotl/node-cli/internal/commands/root.runRootCommand(0x350e840, 0xc00050*****, 0xc0001a****, 0xc00016****, 0x0, 0x0)
        /home/travis/gopath/src/github.com/elotl/kip/vendor/github.com/elotl/node-cli/internal/commands/root/root.go:74 +0xfb
github.com/elotl/kip/vendor/github.com/elotl/node-cli/internal/commands/root.NewCommand.func1(0xc00056****, 0xc00054****, 0x0, 0xa, 0x0, 0x0)
        /home/travis/gopath/src/github.com/elotl/kip/vendor/github.com/elotl/node-cli/internal/commands/root/root.go:55 +0x50
github.com/elotl/kip/vendor/github.com/spf13/cobra.(*Command).execute(0xc0005****, 0xc0000e****, 0xa, 0xa, 0xc0005****, 0xc0000e****)
        /home/travis/gopath/src/github.com/elotl/kip/vendor/github.com/spf13/cobra/command.go:838 +0x460
github.com/elotl/kip/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc00056****, 0xc000***0**, 0xc000******, 0xc0005*****)
        /home/travis/gopath/src/github.com/elotl/kip/vendor/github.com/spf13/cobra/command.go:943 +0x317
github.com/elotl/kip/vendor/github.com/spf13/cobra.(*Command).Execute(...)
        /home/travis/gopath/src/github.com/elotl/kip/vendor/github.com/spf13/cobra/command.go:883
github.com/elotl/kip/vendor/github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /home/travis/gopath/src/github.com/elotl/kip/vendor/github.com/spf13/cobra/command.go:876
github.com/elotl/kip/vendor/github.com/elotl/node-cli.(*Command).Run(0xc00056****, 0x350***, 0xc00050****, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/travis/gopath/src/github.com/elotl/kip/vendor/github.com/elotl/node-cli/cli.go:170 +0x84
main.main()
        /home/travis/gopath/src/github.com/elotl/kip/cmd/kip/main.go:132 +0x6ea

One more question, don’t see where can I attach extra security group id to KIP. Minikube has extraSecurityGroup filed in provider.yaml file. I tried add this piece of code to provider.yaml in base directory and it isn’t working. Could you advice where I can add it?

@ldx ldx added the bug Something isn't working label Sep 3, 2020
@ldx
Copy link
Contributor

ldx commented Sep 3, 2020

@241423 extraSecurityGroups should go inside cells in provider.yaml, example:

cells:
  defaultInstanceType: t3.nano
  bootImageSpec:
    owners: 689494258501
    filters: name=elotl-kip-*
  itzo:
    url: https://itzo-kip-download.s3.amazonaws.com
    version: latest
  extraSecurityGroups:
  - sg-12345678

As for the nil pointer dereference, I opened a PR that should fix the issue. If you would like to check if the new build fixes the problem in your setup, please use image: elotl/kip:v1.0.0-19-g000f5b4 for the kip container in the statefulset.

@ldx ldx closed this as completed in #173 Sep 4, 2020
@ldx ldx reopened this Sep 4, 2020
@ghost ghost closed this as completed Sep 8, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants