Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of ES instance fails on OCP 4.1.2 #166

Closed
kevinearls opened this issue Jun 27, 2019 · 6 comments
Closed

Creation of ES instance fails on OCP 4.1.2 #166

kevinearls opened this issue Jun 27, 2019 · 6 comments

Comments

@kevinearls
Copy link

When I try to create an elasticsearch backed Jaeger Operator instance using on of the example yaml files (https://github.com/jaegertracing/jaeger-operator/blob/master/deploy/examples/simple-prod-deploy-es.yaml) the operator instance never starts as the elasticsearch pod never gets out of pending state.

There is nothing in the log, but I get the following output from oc get elasticsearch -o yaml:

ovpn-118-43:jaeger-operator kearls$ oc get elasticsearch -o yaml
apiVersion: v1
items:

  • apiVersion: logging.openshift.io/v1
    kind: Elasticsearch
    metadata:
    creationTimestamp: "2019-06-27T12:56:31Z"
    generation: 8
    labels:
    app: jaeger
    app.kubernetes.io/co app.kubernetes.io/co app.kubernetes.io/co app.kubernetes.io/component: elasticsearch
    app.kubernetes.io/instance: simple-prod
    app.kubernetes.io/name: elasticsearch
    app.kubernetes.io/part-of: jaeger
    name: elasticsearch
    namespace: fud
    ownerReferences:
    • apiVersion: jaegertracing.io/v1
      controller: true
      kind: Jaeger
      name: simple-prod
      uid: fbd9d6b0-98da-11e9-8a21-fa163e292f36
      resourceVersion: "412161"
      selfLink: /apis/logging.openshift.io/v1/namespaces/fud/elasticsearches/elasticsearch
      uid: fc4ed617-98da-11e9-8a21-fa163e292f36
      spec:
      managementState: Managed
      nodeSpec:
      resources: {}
      nodes:
    • nodeCount: 1
      resources: {}
      roles:
      • client
      • data
      • master
        storage: {}
        redundancyPolicy: ""
        status:
        cluster:
        activePrimaryShards: 0
        activeShards: 0
        initializingShards: 0
        numDataNodes: 0
        numNodes: 0
        pendingTasks: 0
        relocatingShards: 0
        status: ""
        unassignedShards: 0
        clusterHealth: ""
        conditions:
    • lastTransitionTime: "2019-06-27T12:58:33Z"
      message: Previously used GenUUID "x1s6chde" is no longer found in Spec.Nodes
      reason: Invalid Spec
      status: "True"
      type: InvalidUUID
      nodes:
    • conditions:
      • lastTransitionTime: "2019-06-27T12:56:31Z"
        message: '0/5 nodes are available: 5 node(s) didn''t match node selector.'
        reason: Unschedulable
        status: "True"
        type: Unschedulable
        deploymentName: elasticsearch-cdm-x1s6chde-1
        upgradeStatus: {}
        pods:
        client:
        failed: []
        notReady:
        • elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
          ready: []
          data:
          failed: []
          notReady:
        • elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
          ready: []
          master:
          failed: []
          notReady:
        • elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
          ready: []
          shardAllocationEnabled: shard allocation unknown
          kind: List
          metadata:
          resourceVersion: ""
          selfLink: ""
@richm
Copy link
Contributor

richm commented Jun 27, 2019

@ewolinetz does this have anything to do with openshift/cluster-logging-operator#205 ?

@kevinearls
Copy link
Author

fyi this worked previously in an OCP 4.1 cluster on AWS

@pavolloffay
Copy link
Member

I get the same error when deployed locally on 3.11

Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  8s (x10 over 1m)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match node selector.

ES operator shows an error, not sure if that is related though:

oc logs elasticsearch-operator-6f94cb54bb-dnwp5 -n openshift-logging                          4:27 
{"level":"info","ts":1561645015.1384616,"logger":"cmd","msg":"Go Version: go1.10.8"}
{"level":"info","ts":1561645015.1385818,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1561645015.1385922,"logger":"cmd","msg":"Version of operator-sdk: v0.7.0"}
{"level":"info","ts":1561645015.1387944,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1561645015.210691,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1561645015.2138357,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1561645015.2449596,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1561645015.245202,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1561645015.3064365,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched, <nil>"}
{"level":"info","ts":1561645015.3065236,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1561645015.4067228,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1561645015.5070202,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
{"level":"error","ts":1561645302.627982,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"myproject/elasticsearch","error":"Failed to reconcile Elasticsearch deployment spec: Unsupported change to UUIDs made: Previously used GenUUID \"5ojat11f\" is no longer found in Spec.Nodes","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"

@objectiser
Copy link

@richm @ewolinetz As a potential workaround for the UUID issue, it seems that the CR can specify the GenUUID value to be used, instead of a random value being created by the es-operator. Do you see any issues with this approach?

We only expect to have a single ES cluster (per tenant) so hopefully using some stable UUID value would not be a problem. Although I guess each node in the CR should have a different value?

@ewolinetz
Copy link
Contributor

Do you see any issues with this approach?

No, that should work fine so long as you don't attempt to change it after the initial creation.

Although I guess each node in the CR should have a different value?

Yes, each node should be unique within a CR so that the Operator can correctly check if node configurations have changed for that node.

@pavolloffay
Copy link
Member

I think this issue can be closed.

The main reason why it was opened is that change in master broke our test pipeline. We switched our test to use release-4.1 branch instead of master to avoid breaking us. We will bump es-operator version once new stable version is out e.g. for OCP 4.2.

The check on kubernetes.io/os=linux could also use older beta.kubernetes.io/os=linux to support older OCP versions, this way it would not break us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants