-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Bug Report
Describe the bug
We are forwarding the logs to splunk and we found that time to time getting the below error in fluentbit pods.
Fluentbit pods -
[2023/03/28 12:37:55] [error] [net] TCP connection failed: splunk-fluentd.monitoring.svc.cluster.local:24240 (Connection refused)
[2023/03/28 12:37:55] [error] [output:forward:forward.0] no upstream connections available
[2023/03/28 12:37:55] [ warn] [engine] failed to flush chunk '1-1680007074.805347894.flb', retry in 7 seconds: task_id=0, input=tail.0 > output=forward.0 (out_id=0)
[2023/03/28 12:37:55] [error] [net] TCP connection failed: splunk-fluentd.monitoring.svc.cluster.local:24240 (Connection refused)
[2023/03/28 12:37:55] [error] [output:forward:forward.0] no upstream connections available
[2023/03/28 12:37:55] [ warn] [engine] failed to flush chunk '1-1680007074.856843682.flb', retry in 6 seconds: task_id=2, input=tail.0 > output=forward.0 (out_id=0)
[2023/03/28 12:37:56] [error] [net] TCP connection failed: splunk-fluentd.monitoring.svc.cluster.local:24240 (Connection refused)
[2023/03/28 12:37:56] [error] [output:forward:forward.0] no upstream connections available
[2023/03/28 12:37:56] [error] [net] TCP connection failed: splunk-fluentd.monitoring.svc.cluster.local:24240 (Connection refused)
[2023/03/28 12:37:56] [error] [output:forward:forward.0] no upstream connections available
fluentd pods also having some error -
2023-03-29 09:52:08 +0000 [warn]: #0 [flow:outputflowname] failed to flush the buffer. retry_times=5 next_retry_time=2023-03-29 09:52:40 +0000 chunk="5f806e32832e16de5cdf89bf8714d9e6" error_class=RuntimeError error="Server error (502) for POST https://splunkendpoint/services/collector, response: \n\n<meta http-equiv="content-type" content="text/html;charset=utf-8">\n<title>502 Server Error</title>\n\n\n
Error: Server Error
\nThe server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
\n\n\n"2023-03-29 09:52:25 +0000 [warn]: #0 [flow:outputflowname] failed to flush the buffer. retry_times=0 next_retry_time=2023-03-29 09:52:26 +0000 chunk="5f806eaa0c2dec2aa728034f9cc4b3d0" error_class=RuntimeError error="Server error (502) for POST https://splunkendpoint/services/collector, response: \n\n<meta http-equiv="content-type" content="text/html;charset=utf-8">\n<title>502 Server Error</title>\n\n\n
Error: Server Error
\nThe server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
\n\n\n"2023-03-29 09:52:26 +0000 [warn]: #0 [flow:outputflowname] failed to flush the buffer. retry_times=6 next_retry_time=2023-03-29 09:53:30 +0000 chunk="5f806e1c8fce315d338f5e689b8a0e03" error_class=RuntimeError error="Server error (502) for POST https://splunkendpoint/services/collector, response: \n\n<meta http-equiv="content-type" content="text/html;charset=utf-8">\n<title>502 Server Error</title>\n\n\n
Error: Server Error
\nThe server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
\n\n\n"Fluentbit config:
[SERVICE]
Flush 1
Grace 5
Daemon Off
Log_Level warning
Parsers_File parsers.conf
Coro_Stack_Size 24576
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
storage.path /buffers
[INPUT]
Name tail
DB /tail-db/tail-containers-state.db
DB.locking true
Exclude_Path kube-system,cnrm-system,monitoring,bats-test,management-system,argocd,managed-operators,configconnector-operator-system
Mem_Buf_Limit 128MB
Parser cri
Path /var/log/containers/.log
Refresh_Interval 5
Skip_Long_Lines On
Tag kubernetes.
[FILTER]
Name kubernetes
Buffer_Size 0
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Tag_Prefix kubernetes.var.log.containers
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_URL https://kubernetes.default.svc:443
Match kubernetes.*
Merge_Log On
Use_Kubelet Off
[OUTPUT]
Name forward
Match *
Host splunk-fluentd.monitoring.svc.cluster.local
Port 24240
net.keepalive on
net.keepalive_idle_timeout 30
net.keepalive_max_recycle 100
Retry_Limit 50
To Reproduce
- Rubular link if applicable:
- Example log message if applicable:
{"log":"YOUR LOG MESSAGE HERE","stream":"stdout","time":"2018-06-11T14:37:30.681701731Z"}
- Steps to reproduce the problem:
Expected behavior
We would like to fix the error which capture those pods .
Screenshots
Your Environment
All the environment
*** Version used:**
- name: logging-operator
# repository: https://kubernetes-charts.banzaicloud.com
version: 3.17.9
fluent/fluent-bit:1.9.5
fluentd:v1.14.6-alpine-5
- Configuration:
- Environment name and version (e.g. Kubernetes? What version?):
- Server type and version:
- Operating System and version:
- Filters and plugins:
Additional context