Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting maxReplicas in OpenTelemetryCollector crashes the Otel Operator #803

Closed
logamanig opened this issue Apr 1, 2022 · 2 comments · Fixed by #798
Closed

Setting maxReplicas in OpenTelemetryCollector crashes the Otel Operator #803

logamanig opened this issue Apr 1, 2022 · 2 comments · Fixed by #798
Labels
area:collector Issues for deploying collector bug Something isn't working good first issue Good for newcomers

Comments

@logamanig
Copy link

Setting the maxReplicas: 2 crashes the Otel Operator. also tried explicitly set Replicas: null, still same result.

Operator starts successfully but crashes after few minutes with the following error from the log. Removing the maxReplicas solves problem

Operator Log:


{"level":"info","ts":1648778190.841319,"logger":"collector-upgrade","msg":"skipping upgrade for OpenTelemetry Collector instance, as it's newer than our latest version","name":"otel-collector","namespace":"monitoring","version":"0.47.0","latest":"0.43.0"}
{"level":"info","ts":1648778190.841519,"logger":"instrumentation-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":1648778190.8420982,"logger":"controller.opentelemetrycollector","msg":"Starting workers","reconciler group":"opentelemetry.io","reconciler kind":"OpenTelemetryCollector","worker count":1}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1f87c56]

goroutine 818 [running]:
github.com/open-telemetry/opentelemetry-operator/pkg/collector/reconcile.expectedDeployments({_, _}, {{0x0, {{0x2ceacd0, 0xc0008ce840}, 0x0}, {0x2c7d660, 0xc00082edb0}, 0x12a05f200, {0x0, ...}, ...}, ...}, ...)
	/workspace/pkg/collector/reconcile/deployment.go:106 +0x936
github.com/open-telemetry/opentelemetry-operator/pkg/collector/reconcile.Deployments({_, _}, {{0x0, {{0x2ceacd0, 0xc0008ce840}, 0x0}, {0x2c7d660, 0xc00082edb0}, 0x12a05f200, {0x0, ...}, ...}, ...})
	/workspace/pkg/collector/reconcile/deployment.go:45 +0x3be
github.com/open-telemetry/opentelemetry-operator/controllers.(*OpenTelemetryCollectorReconciler).RunTasks(_, {_, _}, {{0x0, {{0x2ceacd0, 0xc0008ce840}, 0x0}, {0x2c7d660, 0xc00082edb0}, 0x12a05f200, ...}, ...})
	/workspace/controllers/opentelemetrycollector_controller.go:163 +0x11f
github.com/open-telemetry/opentelemetry-operator/controllers.(*OpenTelemetryCollectorReconciler).Reconcile(0xc000279320, {0xc000aa11d0, 0x23edea0}, {{{0xc0007ff350, 0xa}, {0xc0007ff370, 0xe}}})
	/workspace/controllers/opentelemetrycollector_controller.go:153 +0x3cd
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc00088e000, {0x2ccb538, 0xc000aa11d0}, {{{0xc0007ff350, 0x2746040}, {0xc0007ff370, 0x413a94}}})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114 +0x26f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00088e000, {0x2ccb490, 0xc000862c00}, {0x24e3100, 0xc000c7c400})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311 +0x33e
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00088e000, {0x2ccb490, 0xc000862c00})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:223 +0x357

CRD:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
spec:
  mode: deployment
  maxReplicas: 2

  podSecurityContext:
    fsGroup: 1000
    runAsUser: 1000
    runAsNonRoot: true
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    readOnlyRootFilesystem: true
    runAsNonRoot: true

  resources:
    requests:
      cpu: 250m
      memory: 768Mi
    limits:
      cpu: 250m
      memory: 768Mi
      
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
      jaeger:
        protocols:
          grpc:
          thrift_binary:
          thrift_compact:
          thrift_http:
          
    processors:
      batch:

    exporters:
      logging:

      jaeger:
        endpoint: jaeger-operator-jaeger-collector.monitoring:14250
        tls:
          insecure: true

   service:
      pipelines:
        traces:
          receivers: [otlp, jaeger]
          processors: [batch]
          exporters: [logging, jaeger]


Operator Helm Chart (Operator Chart Version: 0.6.5) Values.yaml

manager:
  image:
    tag: v0.47.0
  collectorImage:
    repository: otel/opentelemetry-collector-contrib
    tag: 0.48.0
  resources:
    limits:
      cpu: 150m
      memory: 192Mi
    requests:
      cpu: 100m
      memory: 64Mi

kubeRBACProxy:
  image:
    tag: v0.8.0
  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 128Mi

admissionWebhooks:
  failurePolicy: Fail # Tried Ignore as well but still crashes
  enabled: true

  ## Provide the issuer kind and name to do the cert auth job.
  ## By default, OpenTelemetry Operator will use self-signer issuer.
  certManager:
    enabled: true
    issuerRef: {}
      # kind:
      # name:
@jpkrohling jpkrohling added bug Something isn't working good first issue Good for newcomers labels Apr 1, 2022
@pavolloffay pavolloffay added the area:collector Issues for deploying collector label Apr 1, 2022
@pavolloffay
Copy link
Member

pavolloffay commented Apr 1, 2022

PR #798 will resolve this issue

@jpkrohling
Copy link
Member

I knew I've seen something about this before, but couldn't find the issue. Turns out, it was a PR :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants