Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Unstable using latest from 0.13-dev branch #17

Open
StevenACoffman opened this issue Jan 17, 2018 · 14 comments
Open

Unstable using latest from 0.13-dev branch #17

StevenACoffman opened this issue Jan 17, 2018 · 14 comments

Comments

@StevenACoffman
Copy link

StevenACoffman commented Jan 17, 2018

I am experiencing a lot of instability when applying the latest changes from 0.13-dev branch, specifically #16

Eventually if a pod crashes on a busy nodes and enters CrashLoopBackOff, it won't ever recover. I am still investigating, but if you can see anything obvious, I would really appreciate your insight.

At first, I thought it was the memory and or cpu limits, so I removed those and crashes happen much less reliably. Without limits, I'm still seeing what looks like multiple failure reasons. I changed the namespace (to kangaroo) and kafka topic (to k8s-firehose), and I changed the Log_Level to debug. In the Kube_URL, kubernetes.default.svc got a few Temporary failure in name resolution errors so I changed it to kubernetes.default.svc.cluster.local and it have not seen it again.

I am using kail to follow all the daemonset pods in parallel, but that's quite chatty, so I do filter it down to errors with some context using:

kail --ds=fluent-bit | grep -A 10 -B 10 error

The output I get is:

kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (885 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (854 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [out_kafka] enqueued message (871 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:49:05] [debug] [oukangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:49:41] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:49:41] [debug] [sched] retry=0x7fdd38017938 1 in 234 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:49:47] [debug] [retry] re-using retry for task_id=5 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:49:47] [debug] [sched] retry=0x7fbce1a17938 5 in 329 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:00] [debug] [retry] re-using retry for task_id=2 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:00] [debug] [sched] retry=0x7f2aa9a17a00 2 in 101 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:15] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:50:15] [debug] [sched] retry=0x7f2aa9a179d8 1 in 749 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:50:48] [debug] [retry] re-using retry for task_id=3 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:50:48] [debug] [sched] retry=0x7fbce1a17960 3 in 691 seconds
--
--
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (867 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (884 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (922 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:54:13] [dekangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:23] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:23] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:54:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 13:54:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [retry] re-using retry for task_id=2 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [sched] retry=0x7fbce1a178e8 2 in 1974 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [retry] re-using retry for task_id=5 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [sched] retry=0x7fbce1a17938 5 in 1763 seconds
--
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [retry] re-using retry for task_id=2 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:54:36] [debug] [sched] retry=0x7fbce1a178e8 2 in 1974 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [retry] re-using retry for task_id=5 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 13:55:15] [debug] [sched] retry=0x7fbce1a17938 5 in 1763 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:57:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 13:57:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [ info] [engine] started
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] inotify watch fd=20
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] scanning path /var/log/containers/*.log
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-xkv8g_kangaroo_fluent-bit-9e77c3d34cae27579fb2236fd361cc4d8d0f4018e2f1cb76a68a4d8f0b16b774.log, offset=0
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-cfjlw_teachers_go-spew-500d900f34d18b7e084a8bd024fce038bc4ee79994b1e13ad2ee7d8604926a4a.log, offset=491254
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-dfv6k_teachers_go-spew-99b1ea45b08173409691edc456a5f112b950424eea12034a7c0c36cc90c99a3a.log, offset=2778795
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube-proxy-ip-172-28-82-94.ec2.internal_kube-system_kube-proxy-42d02ce390db2f79131df096e7aa5153052e0ad9bdf7ca0fee4be782950a8577.log, offset=13507
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-3ade77e31c2d067214e178c71332a655396fa8ad4eab5c4bffe7d4e61ed94b0a.log, offset=160
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-4e7d2b495c01f0317306bfb3e9d09a327142f79645e6c30d9ae6958dac25b348.log, offset=1319893
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:24] [debug] [in_tail] add to scan queue /var/log/containers/logs-fluentbit-6b95b54d7b-n7mxc_test-kafka_testcase-df39898bc6379370606389693ec0c32d76aae1acbe66745fcc36c325f2ef4835.log, offset=284611
--
--
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (922 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] message delivered (1112 bytes, partition 4)
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (910 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (912 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (885 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (896 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kafka] enqueued message (850 bytes) for topic 'k8s-firehose'
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 13:59:30] [debug] [out_kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:08] [debug] [retry] re-using retry for task_id=5 attemps=9
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:08] [debug] [sched] retry=0x7fdd38017988 5 in 1123 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:00:11] [debug] [retry] re-using retry for task_id=3 attemps=9
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:00:11] [debug] [sched] retry=0x7f2aa9a17a50 3 in 1609 seconds
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:53] [debug] [retry] re-using retry for task_id=3 attemps=10
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:00:53] [debug] [sched] retry=0x7fdd380178e8 3 in 1386 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:08] [debug] [retry] re-using retry for task_id=1 attemps=9
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:08] [debug] [sched] retry=0x7fbce1a178c0 1 in 1636 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [retry] re-using retry for task_id=2 attemps=11
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [sched] retry=0x7f2aa9a17a00 2 in 839 seconds
--
--
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [retry] re-using retry for task_id=2 attemps=11
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:13] [debug] [sched] retry=0x7f2aa9a17a00 2 in 839 seconds
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:18] [debug] [retry] re-using retry for task_id=3 attemps=10
kangaroo/fluent-bit-7lvsj[fluent-bit]: [2018/01/17 14:02:18] [debug] [sched] retry=0x7fbce1a17960 3 in 888 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:43] [debug] [retry] re-using retry for task_id=1 attemps=10
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:02:43] [debug] [sched] retry=0x7f2aa9a179d8 1 in 1755 seconds
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:03:35] [debug] [retry] re-using retry for task_id=0 attemps=10
kangaroo/fluent-bit-sb8kf[fluent-bit]: [2018/01/17 14:03:35] [debug] [sched] retry=0x7f2aa9a179b0 0 in 1741 seconds
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:03:56] [debug] [retry] rkangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:29] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-2.broker.kafka.svc.cluster.local:9092/2]: kafka-2.broker.kafka.svc.cluster.local:9092/2: Receive failed: Disconnected
kangaroo/fluent-bit-z7rk5[fluent-bit]: [2018/01/17 14:04:34] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-2.broker.kafka.svc.cluster.local:9092/2]: kafka-2.broker.kafka.svc.cluster.local:9092/2: Receive failed: Disconnected
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [ info] [engine] started
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] inotify watch fd=20
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] scanning path /var/log/containers/*.log
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-xkv8g_kangaroo_fluent-bit-1ee969753bd79567e97d12a8c82e10542c4b720db6d0aa1f78a4009c6d064920.log, offset=0
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-cfjlw_teachers_go-spew-500d900f34d18b7e084a8bd024fce038bc4ee79994b1e13ad2ee7d8604926a4a.log, offset=515164
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-765547c587-dfv6k_teachers_go-spew-99b1ea45b08173409691edc456a5f112b950424eea12034a7c0c36cc90c99a3a.log, offset=2800532
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube-proxy-ip-172-28-82-94.ec2.internal_kube-system_kube-proxy-42d02ce390db2f79131df096e7aa5153052e0ad9bdf7ca0fee4be782950a8577.log, offset=13507
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-3ade77e31c2d067214e178c71332a655396fa8ad4eab5c4bffe7d4e61ed94b0a.log, offset=160
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bcz9n_kube-system_kube2iam-4e7d2b495c01f0317306bfb3e9d09a327142f79645e6c30d9ae6958dac25b348.log, offset=1346840
kangaroo/fluent-bit-xkv8g[fluent-bit]: [2018/01/17 14:04:39] [debug] [in_tail] add to scan queue /var/log/containers/logs-fluentbit-6b95b54d7b-n7mxc_test-kafka_testcase-df39898bc6379370606389693ec0c32d76aae1acbe66745fcc36c325f2ef4835.log, offset=335508
@StevenACoffman StevenACoffman changed the title Unstable using latest from master Unstable using latest from 0.13-dev branch Jan 17, 2018
@edsiper
Copy link
Member

edsiper commented Jan 17, 2018

do you know if is there any best practice about using a DNS name to reach the API server ? filter_kubernetes by default uses kubernetes.default.svc, but what about kubernetes.default.svc.cluster.local ? (cc: @solsson)

@solsson
Copy link
Contributor

solsson commented Jan 17, 2018

I've seen a lot of both [service].[namespace] as well as names ending with .svc.cluster.local but rarely names ending with .svc. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#a-records seems to recommend the full name. I don't know if there's specific conventions for Kubernetes API access.

@edsiper
Copy link
Member

edsiper commented Jan 17, 2018

thanks for the feedback, I will go ahead and change that.

@edsiper
Copy link
Member

edsiper commented Jan 29, 2018

@StevenACoffman is this still an issue ?

@StevenACoffman
Copy link
Author

StevenACoffman commented Jan 30, 2018

@edsiper You mentioned a lack of "check" in out_kafka in #18

The instability I'm seeing seems completely attributable due to the memory (and cpu?) limits for nodes with lots of pre-existing logs.

From @solsson on Jan 26, 2018:

Spikes in memory use at pod start are impractical. Can log processing be halted when kafka buffers hit a size limit? Would it be possible to add output buffer size to prometheus metrics?

From @leahnp on May 17, 2017 0:6

Add memory limit to deployment yaml. Test special case: in long-running clusters with lots of pre-existing logs, deploy Fluent-bit, initial workload is very heavy then it evens out. If it hits the memory limit in this initial processing it will continually be killed and re-created.

Copied from original issue: samsung-cnct/kraken-logging-fluent-bit-daemonset#5 and Moved to samsung-cnct/chart-fluent-bit#9

@solsson
Copy link
Contributor

solsson commented Jan 30, 2018

@edsiper I think you can merge #18 as further increases in limit would make no difference. Only caps to buffer sizes will.

What's the effect of Mem_Buf_Limit on the input plugin at start? Desired behavior of Tail would be that parsing stops temporarily. According to http://fluentbit.io/documentation/0.12/configuration/backpressure.html#membuflimit it can be set on output plugins too, but am I correct to interpret your earlier remarks as this having no effect because the kafka client does the buffering? Maybe fluent/fluent-bit#495 can help for a cap there, through queued.max.messages.kbytes etc.

@edsiper
Copy link
Member

edsiper commented Jan 30, 2018

@solsson merged, thanks.

Mem_Buf_Limit only applies for input plugins to pause data ingestion into the engine. Since out_kafka buffer the data (not delivery), Fluent Bit issue a "OK", so in_tail keeps ingesting data. The fix is to add out_kafka logic to real check if a message was delivered.

If you see memory grow with a different output plugin, there is definitely something wrong, I will double check the code anyways

@solsson
Copy link
Contributor

solsson commented Jan 30, 2018

I'm only running out_kafka. I will try out_kafka with queue.buffering.max.kbytes after the 0.13 release. The default seems to be 1GB, so with fluent/fluent-bit#495 I can set it to something like 10MB instead. The docs say "Maximum total message size sum allowed on the producer queue." and allowed indicates that out_kafka would be notified when the max is reached.

@StevenACoffman
Copy link
Author

StevenACoffman commented Jan 30, 2018

Hmm... I pulled from the latest, removed the cpu and memory limits, and I'm getting some CrashLoopBackOff Pods terminated with exitCode 139, which I think is a Segmentation Fault (SIGSEGV 11). No termination message. This is not on a node with excessive existing logs.
The logs from fluent-bit look entirely normal. I tried changing the Log_Level to debug and deleted the old pod, when the new one gets created it logs normal debug messages:

[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (1075 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (1116 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (980 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (2110 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] message delivered (983 bytes, partition 0)
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (2111 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (983 bytes) for topic 'k8s-firehose'
[2018/01/30 22:26:08] [debug] [out_kafka] enqueued message (997 bytes) for topic 'k8s-firehose'

Earlier in fluent-bit pod's output, I am getting a lot of:

[2018/01/30 22:26:08] [debug] [filter_kube] could not merge log as requested

I cannot seem to have the pod on that node come up healthy, regardless of restarts or terminate and re-create attempts, but the rest do.

@StevenACoffman
Copy link
Author

I altered the configmap and changed the filter-kubernetes.conf to:

        Merge_Log           Off
        K8S-Logging.Parser  Off

Applying the change, deleting the pod, the daemonset recreated the pod and it came up healthy after several hours of other failed attempts.

@edsiper
Copy link
Member

edsiper commented Feb 6, 2018

@edsiper
Copy link
Member

edsiper commented Feb 22, 2018

@solsson
Copy link
Contributor

solsson commented Feb 23, 2018

I've upgraded and it looks good to me.

@StevenACoffman
Copy link
Author

0.13-dev:0.9 is very solid so far (20 hours, large volume).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants