Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail pod CrashLoopBackoff - error creating promtail #1457

Closed
victoriaalee opened this issue Dec 27, 2019 · 7 comments
Closed

Promtail pod CrashLoopBackoff - error creating promtail #1457

victoriaalee opened this issue Dec 27, 2019 · 7 comments

Comments

@victoriaalee
Copy link
Contributor

Describe the bug
A pod in the promtail DaemonSet is in CrashLoopBackOff.

We've had the loki-stack chart deployed. As we've added nodes to our Kubernetes cluster, new pods that run correctly have been added to the DaemonSet. However, one of the oldest pods entered this CrashLoopBackOff state a period of time after initial deployment.

I've attempted to delete the pod, but the restarted pod produces the same error.

To Reproduce

  1. Deploy loki-stack Helm chart version 0.18.1 with defaults.

Expected behavior
All promtail pods run properly without crashing.

Environment:

  • Infrastructure: Kubernetes 1.12
  • Deployment tool: Helm 2

Screenshots, Promtail config, or terminal output

level=error ts=2019-12-27T19:00:36.919645077Z caller=main.go:56 msg="error creating promtail" error="yaml: line 107: could not find expected ':'"
@owen-d
Copy link
Member

owen-d commented Dec 30, 2019

This is interesting. It appears to be a yaml issue, but it's unusual that you're only seeing this on one pod and the problem persisted across restarts.

If your configuration is not sensitive, would you mind posting it?

@victoriaalee
Copy link
Contributor Author

victoriaalee commented Dec 30, 2019

promtail.yaml in the loki-promtail ConfigMap:

client:
  backoff_config:
    maxbackoff: 5s
    maxretries: 20
    minbackoff: 100ms
  batchsize: 102400
  batchwait: 1s
  external_labels: {}
  timeout: 10s
positions:
  filename: /run/promtail/positions.yaml
server:
  http_listen_port: 3101
target_config:
  sync_period: 10s

scrape_configs:
- job_name: kubernetes-pods-name
  pipeline_stages:
    - docker: {}

  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels:
    - __meta_kubernetes_pod_label_name
    target_label: __service__
  - source_labels:
    - __meta_kubernetes_pod_node_name
    target_label: __host__
  - action: drop
    regex: ''
    source_labels:
    - __service__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    replacement: $1
    separator: /
    source_labels:
    - __meta_kubernetes_namespace
    - __service__
    target_label: job
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: instance
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container_name
  - replacement: /var/log/pods/*$1/*.log
    separator: /
    source_labels:
    - __meta_kubernetes_pod_uid
    - __meta_kubernetes_pod_container_name
    target_label: __path__
- job_name: kubernetes-pods-app
  pipeline_stages:
    - docker: {}

  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: drop
    regex: .+
    source_labels:
    - __meta_kubernetes_pod_label_name
  - source_labels:
    - __meta_kubernetes_pod_label_app
    target_label: __service__
  - source_labels:
    - __meta_kubernetes_pod_node_name
    target_label: __host__
  - action: drop
    regex: ''
    source_labels:
    - __service__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    replacement: $1
    separator: /
    source_labels:
    - __meta_kubernetes_namespace
    - __service__
    target_label: job
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: instance
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container_name
  - replacement: /var/log/pods/*$1/*.log
    separator: /
    source_labels:
    - __meta_kubernetes_pod_uid
    - __meta_kubernetes_pod_container_name
    target_label: __path__
- job_name: kubernetes-pods-direct-controllers
  pipeline_stages:
    - docker: {}

  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: drop
    regex: .+
    separator: ''
    source_labels:
    - __meta_kubernetes_pod_label_name
    - __meta_kubernetes_pod_label_app
  - action: drop
    regex: '[0-9a-z-.]+-[0-9a-f]{8,10}'
    source_labels:
    - __meta_kubernetes_pod_controller_name
  - source_labels:
    - __meta_kubernetes_pod_controller_name
    target_label: __service__
  - source_labels:
    - __meta_kubernetes_pod_node_name
    target_label: __host__
  - action: drop
    regex: ''
    source_labels:
    - __service__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    replacement: $1
    separator: /
    source_labels:
    - __meta_kubernetes_namespace
    - __service__
    target_label: job
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: instance
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container_name
  - replacement: /var/log/pods/*$1/*.log
    separator: /
    source_labels:
    - __meta_kubernetes_pod_uid
    - __meta_kubernetes_pod_container_name
    target_label: __path__
- job_name: kubernetes-pods-indirect-controller
  pipeline_stages:
    - docker: {}

  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: drop
    regex: .+
    separator: ''
    source_labels:
    - __meta_kubernetes_pod_label_name
    - __meta_kubernetes_pod_label_app
  - action: keep
    regex: '[0-9a-z-.]+-[0-9a-f]{8,10}'
    source_labels:
    - __meta_kubernetes_pod_controller_name
  - action: replace
    regex: '([0-9a-z-.]+)-[0-9a-f]{8,10}'
    source_labels:
    - __meta_kubernetes_pod_controller_name
    target_label: __service__
  - source_labels:
    - __meta_kubernetes_pod_node_name
    target_label: __host__
  - action: drop
    regex: ''
    source_labels:
    - __service__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    replacement: $1
    separator: /
    source_labels:
    - __meta_kubernetes_namespace
    - __service__
    target_label: job
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: instance
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container_name
  - replacement: /var/log/pods/*$1/*.log
    separator: /
    source_labels:
    - __meta_kubernetes_pod_uid
    - __meta_kubernetes_pod_container_name
    target_label: __path__
- job_name: kubernetes-pods-static
  pipeline_stages:
    - docker: {}

  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: drop
    regex: ''
    source_labels:
    - __meta_kubernetes_pod_annotation_kubernetes_io_config_mirror
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_label_component
    target_label: __service__
  - source_labels:
    - __meta_kubernetes_pod_node_name
    target_label: __host__
  - action: drop
    regex: ''
    source_labels:
    - __service__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    replacement: $1
    separator: /
    source_labels:
    - __meta_kubernetes_namespace
    - __service__
    target_label: job
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: instance
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_container_name
    target_label: container_name
  - replacement: /var/log/pods/*$1/*.log
    separator: /
    source_labels:
    - __meta_kubernetes_pod_annotation_kubernetes_io_config_mirror
    - __meta_kubernetes_pod_container_name
    target_label: __path__

@owen-d
Copy link
Member

owen-d commented Dec 30, 2019

Ok, that appears to be valid yaml and after taking a look at the error message, it looks like it arises after parsing the config. My next guess would be to look at the positions file, which, according to your configuration can be found at /run/promtail/positions.yaml.

This is likely mounted via a hostpath. That would help explain why it fails restarting even after pod deletion -- it would try to mount the same host path (with the same positions file). Can you take a look at the /run/promtail/positions.yaml file on that host and ensure that it looks ok/post it here?

It should look like:

positions:
  fileA: "1234"
  fileB: "5678"

@victoriaalee
Copy link
Contributor Author

Looks like that's the problem! The last line of positions.yaml is truncated, making the file invalid. What would be the recommended way to recover from this? And any ideas what could have caused this?

@owen-d
Copy link
Member

owen-d commented Dec 30, 2019

Ha, good to know we found it! Honestly, I'd delete the last line that was truncated and let it restart from there. You may have some out of order errors for a bit from tailing pods on that node which you've tailed before, but that can't really be helped.

If you're still seeing problems with promtail not catching up due to the out of order errors, I'd suggest deleting pods on that node and letting them be rescheduled so they that promtail can start tailing them anew. Since they'll be new pods, you shouldn't see out of order issues (they'll be assigned to new log streams). Hopefully there's nothing delicate like stateful sets that were affected there.

I'm curious what caused this truncation -- it's an interesting failure condition.

Feel free to close the issue if you get everything resolved. I'll open an issue for discussion about this.

@victoriaalee
Copy link
Contributor Author

Thank you for the help!

@owen-d
Copy link
Member

owen-d commented Jan 6, 2020

I'm guessing the fix was introduced in fd25e6d which was not included in the 0.18.1 release. Updating should mitigate this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants