Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Failed to put CRD error, frameworkcontroller frequently restarts #66

@mf-giwoong-lee

Description

@mf-giwoong-lee

I set up kubernetes system with kubeadm!

I have one master node and one worker node.

I use calico as pod-network.

There are some problem when setting up frameworkcontroller.

Frameworkcontroller restarts frequently (every 1min) and I found this message when typing kubectl logs frameworkcontroller-0

I1028 02:01:51.888234      10 controller.go:207] Initializing frameworkcontroller
I1028 02:01:51.888637      10 controller.go:210] With Config:
kubeApiServerAddress: https://localhost:40443
kubeConfigFilePath: ""
kubeClientQps: 200
kubeClientBurst: 300
workerNumber: 500
largeFrameworkCompression: true
crdEstablishedCheckIntervalSec: 1
crdEstablishedCheckTimeoutSec: 60
objectLocalCacheCreationTimeoutSec: 300
frameworkCompletedRetainSec: 2592000
frameworkMinRetryDelaySecForTransientConflictFailed: 60
frameworkMaxRetryDelaySecForTransientConflictFailed: 900
logObjectSnapshot:
  framework:
    onFrameworkRetry: true
    onFrameworkDeletion: true
  task:
    onTaskRetry: true
    onTaskDeletion: true
  pod:
    onPodDeletion: true
podFailureSpec: []
I1028 02:01:51.889430      10 controller.go:427] Recovering frameworkcontroller
E1028 02:02:21.890073      10 runtime.go:69] Observed a panic: &errors.errorString{s:"Failed to put CRD: Get https://localhost:40443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/frameworks.fr
ameworkcontroller.microsoft.com: dial tcp: i/o timeout"} (Failed to put CRD: Get https://localhost:40443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/frameworks.frameworkcontroller.microsoft.
com: dial tcp: i/o timeout)
/go/src/github.com/microsoft/frameworkcontroller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/microsoft/frameworkcontroller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/microsoft/frameworkcontroller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/panic.go:522
/go/src/github.com/microsoft/frameworkcontroller/pkg/internal/utils.go:66
/go/src/github.com/microsoft/frameworkcontroller/pkg/controller/controller.go:428
/go/src/github.com/microsoft/frameworkcontroller/cmd/frameworkcontroller/main.go:35
/usr/local/go/src/runtime/proc.go:200
/usr/local/go/src/runtime/asm_amd64.s:1337
E1028 02:02:21.890143      10 panic.go:522] Stopping frameworkcontroller
panic: Failed to put CRD: Get https://localhost:40443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/frameworks.frameworkcontroller.microsoft.com: dial tcp: i/o timeout [recovered]
        panic: Failed to put CRD: Get https://localhost:40443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/frameworks.frameworkcontroller.microsoft.com: dial tcp: i/o timeout

goroutine 1 [running]:
github.com/microsoft/frameworkcontroller/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/github.com/microsoft/frameworkcontroller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105
panic(0x11f84e0, 0xc0003f8110)
        /usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/microsoft/frameworkcontroller/pkg/internal.PutCRD(0xc000338960, 0xc0003438c0, 0xc000047590, 0xc000047598)
        /go/src/github.com/microsoft/frameworkcontroller/pkg/internal/utils.go:66 +0x173
github.com/microsoft/frameworkcontroller/pkg/controller.(*FrameworkController).Run(0xc000107290, 0xc0000e4960)
        /go/src/github.com/microsoft/frameworkcontroller/pkg/controller/controller.go:428 +0x157
main.main()
        /go/src/github.com/microsoft/frameworkcontroller/cmd/frameworkcontroller/main.go:35 +0x47

I use the example yaml described in frameworkcontroller guideline https://github.com/Microsoft/frameworkcontroller/tree/master/example/run

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: frameworkcontroller
  namespace: default
spec:
  serviceName: frameworkcontroller
  selector:
    matchLabels:
      app: frameworkcontroller
  replicas: 1
  template:
    metadata:
      labels:
        app: frameworkcontroller
    spec:
      # Using the ServiceAccount with granted permission
      # if the k8s cluster enforces authorization.
      serviceAccountName: frameworkcontroller
      containers:
      - name: frameworkcontroller
        image: frameworkcontroller/frameworkcontroller
        # Using k8s inClusterConfig, so usually, no need to specify
        # KUBE_APISERVER_ADDRESS or KUBECONFIG
        env:
        - name: KUBE_APISERVER_ADDRESS
          value: https://localhost:40443 #{http[s]://host:port}
        #- name: KUBECONFIG
        #  value: {Pod Local KubeConfig File Path}
        command: [
          "bash", "-c",
          "cp /frameworkcontroller-config/frameworkcontroller.yaml . &&
          ./start.sh"]
        volumeMounts:
        - name: frameworkcontroller-config
          mountPath: /frameworkcontroller-config
      volumes:
      - name: frameworkcontroller-config
        configMap:
          name: frameworkcontroller-config

Additionally frameworkbarrier cannot find the custom resource frameworks, my guessing is that frameworkcontroller doesn't work well so it cannot build the custom resource frameworks.

I also upload my entire script for launching frameworkcontroller!

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

sleep 10

kubectl create serviceaccount frameworkcontroller --namespace default
kubectl create clusterrolebinding frameworkcontroller \
  --clusterrole=cluster-admin \
  --user=system:serviceaccount:default:frameworkcontroller

sleep 5

#kubectl create -f frameworkcontroller-with-default-config.yaml

# custom config
kubectl create -f frameworkcontroller-customized-config.yaml
kubectl create -f frameworkcontroller-with-customized-config.yaml

sleep 15

kubectl create serviceaccount frameworkbarrier --namespace default
kubectl create clusterrole frameworkbarrier --verb=get,list,watch --resource=frameworks
kubectl create clusterrolebinding frameworkbarrier --clusterrole=frameworkbarrier --user=system:serviceaccount:default:frameworkbarrier

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions