Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat shuts down early (stdin + once enabled) #3315

Closed
urso opened this issue Jan 9, 2017 · 10 comments
Closed

Filebeat shuts down early (stdin + once enabled) #3315

urso opened this issue Jan 9, 2017 · 10 comments
Labels
bug Filebeat Filebeat Stalled Team:Integrations Label for the Integrations team

Comments

@urso
Copy link

urso commented Jan 9, 2017

Original investigation in this dicussion: https://discuss.elastic.co/t/filebeat-does-not-exit-on-eof-input-type-stdin/69545/13

I found a race-condition due to incorrect shutdown signaling in crawler/prospector/harvester shutdown. That is, shutdown timeout is working correctly once the events enter the pipeline, but shutdown is not correctly waiting for the harvester (stdin reader) being finished and simple continues with shutdown. This results in a race condition between the shutdown routine and the still active reader trying to publish an event. One time I even found the harvester being started after crawler/prospector shutdown. Unfortunately a fix for this is not that easy.

Testing with file-based prospectors and stdin, the race did only occur for stdin.

@urso urso added the bug label Jan 9, 2017
@ruflin ruflin added the Filebeat Filebeat label Jan 9, 2017
@martinscholz83
Copy link
Contributor

There are 3 cases

  1. Shutdown waits until harvester is finished
    image

  2. Harvester is starting but shutdown is not waiting until harvester is finished
    image

  3. Harvester is starting after crawler and prospector stopped
    image

@ruflin
Copy link
Member

ruflin commented Jan 12, 2017

@maddin2016 Thanks for looking into this. It seems like the stdin harvester is somehow not blocking Start and Stop.

@martinscholz83
Copy link
Contributor

Hi @ruflin, i think this time i know why it's not working. Here you started the harvester without increment the prospector waitgroup.
If i change to

type ProspectorStdin struct {
	harvester  *harvester.Harvester
	Prospector *Prospector
	started    bool
}

...
...

func NewProspectorStdin(p *Prospector) (*ProspectorStdin, error) {

	prospectorer := &ProspectorStdin{}
	prospectorer.Prospector = p

...
...

func (p *ProspectorStdin) Run() {

	// Make sure stdin harvester is only started once
	if !p.started {
		reader, err := p.harvester.Setup()
		if err != nil {
			logp.Err("Error starting stdin harvester: %s", err)
			return
		}

		p.Prospector.wg.Add(1)
		// startHarvester is not run concurrently, but atomic operations are need for the decrementing of the counter
		// inside the following go routine
		atomic.AddUint64(&p.Prospector.harvesterCounter, 1)
		go func() {
			defer func() {
				atomic.AddUint64(&p.Prospector.harvesterCounter, ^uint64(0))
				p.Prospector.wg.Done()
			}()

			// Starts harvester and picks the right type. In case type is not set, set it to defeault (log)
			p.harvester.Harvest(reader)
			p.started = true
		}()
	}

shutdown is always waiting for the harvester to be finished.

@ruflin
Copy link
Member

ruflin commented Jan 15, 2017

LGTM. Can you open a PR with that?

@lesinigo
Copy link

I think we have been bitten by this while trying to write scripts to automate ingestion of old, archived and compressed log files (eg. bzcat old_data.bz2 | filebeat -c /tmp/filebeat-xyzzy.yml -once).

We tried various configuration options but AFAIK there is currently no workaround, is that correct?
I mean, is there any way to let stdin work or is it actually totally broken in 5.5.1?

@waldner
Copy link

waldner commented Sep 13, 2017

We're also seeing this in filebeat 5.5.2. Is this going to be fixed?

@hamsterready
Copy link

I was experiencing similar problem while running https://github.com/willfarrell/docker-filebeat/blob/master/5-stdin/docker-entrypoint.sh

One possible workaround is to use tail -f $fifo | filebeat ... while cating the data to the $fifo and I guess it can be applied to @lesinigo use case.

@hamsterready
Copy link

For the record, the tail -f workaround is not really a good approach as tail -f on fifo behaves a little bit different then one may think ;) https://stackoverflow.com/questions/20409071/tail-f-on-a-named-pipe-behaves-strange

@benriou
Copy link

benriou commented Jul 11, 2018

Just upgraded filebeats in latest version
filebeat version 6.3.1 (amd64), libbeat 6.3.1 [ed42bb8 built 2018-06-29 21:09:35 +0000 UTC]

And still suffering same bug - sometimes the process filebeat just stops immediately after starting.

[root@$server10]# zcat $logfile_obfuscated.esw3ccs_ghostip_S.201807100900-1000-0.gz | filebeat --once -v -e -c /etc/filebeat/filebeat.pipe.yml 2>&1 2018-07-10T17:06:16.336+0200 INFO instance/beat.go:492 Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat] 2018-07-10T17:06:16.336+0200 INFO instance/beat.go:499 Beat UUID: 4c438e13-87c2-4cc6-893e-5c5abe62d512 2018-07-10T17:06:16.337+0200 INFO [beat] instance/beat.go:716 Beat info {"system_info": {"beat": {"path": {"config": "/etc/filebeat", "data": "/var/lib/filebeat", "home": "/usr/share/filebeat", "logs": "/var/log/filebeat"}, "type": "filebeat", "uuid": "4c438e13-87c2-4cc6-893e-5c5abe62d512"}}} 2018-07-10T17:06:16.337+0200 INFO [beat] instance/beat.go:725 Build info {"system_info": {"build": {"commit": "ed42bb85e72ae58cc09748dc1825159713e0ffd4", "libbeat": "6.3.1", "time": "2018-06-29T21:09:35.000Z", "version": "6.3.1"}}} 2018-07-10T17:06:16.337+0200 INFO [beat] instance/beat.go:728 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":8,"version":"go1.9.4"}}} 2018-07-10T17:06:16.340+0200 INFO [beat] instance/beat.go:732 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2017-03-22T11:08:38+01:00","containerized":true,"hostname":"$server","ips":["127.0.0.1/8","::1/128","$server_ipv4/23","$server_ipv6/64"],"kernel_version":"2.6.32-573.12.1.el6.x86_64","mac_addresses":["00:50:56:98:6c:68"],"os":{"family":"redhat","platform":"redhat","name":"Red","version":"6.9 (Santiago)","major":6,"minor":9,"patch":0,"codename":"Santiago"},"timezone":"CEST","timezone_offset_sec":7200}}} 2018-07-10T17:06:16.340+0200 INFO [beat] instance/beat.go:761 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"ambient":null}, "cwd": "/$LOGFILE", "exe": "/usr/share/filebeat/bin/filebeat", "name": "filebeat", "pid": 30562, "ppid": 30178, "seccomp": {"mode":""}, "start_time": "2018-07-10T17:06:15.520+0200"}}} 2018-07-10T17:06:16.341+0200 INFO instance/beat.go:225 Setup Beat: filebeat; Version: 6.3.1 2018-07-10T17:06:16.342+0200 INFO pipeline/module.go:81 Beat name: $server 2018-07-10T17:06:16.343+0200 INFO instance/beat.go:315 filebeat start running. 2018-07-10T17:06:16.343+0200 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s 2018-07-10T17:06:16.343+0200 INFO registrar/registrar.go:116 Loading registrar data from /var/lib/filebeat/registry 2018-07-10T17:06:16.345+0200 INFO registrar/registrar.go:127 States Loaded from registrar: 68 2018-07-10T17:06:16.345+0200 WARN beater/filebeat.go:354 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning. 2018-07-10T17:06:16.346+0200 INFO crawler/crawler.go:48 Loading Inputs: 1 2018-07-10T17:06:16.346+0200 INFO input/input.go:88 Starting input of type: stdin; ID: 15719311329577953965 2018-07-10T17:06:16.347+0200 INFO input/input.go:139 Stopping Input: 15719311329577953965 2018-07-10T17:06:16.347+0200 INFO crawler/crawler.go:82 Loading and starting Inputs completed. Enabled inputs: 1 2018-07-10T17:06:16.347+0200 INFO beater/filebeat.go:370 Running filebeat once. Waiting for completion ... 2018-07-10T17:06:16.347+0200 INFO beater/filebeat.go:372 All data collection completed. Shutting down. 2018-07-10T17:06:16.347+0200 INFO crawler/crawler.go:109 Stopping Crawler 2018-07-10T17:06:16.347+0200 INFO crawler/crawler.go:119 Stopping 1 inputs 2018-07-10T17:06:16.347+0200 INFO log/harvester.go:228 Harvester started for file: - 2018-07-10T17:06:16.347+0200 INFO crawler/crawler.go:135 Crawler stopped 2018-07-10T17:06:16.349+0200 INFO beater/signalwait.go:76 Continue shutdown: All enqueued events being published. 2018-07-10T17:06:16.349+0200 INFO registrar/registrar.go:247 Stopping Registrar 2018-07-10T17:06:16.349+0200 INFO registrar/registrar.go:173 Ending Registrar 2018-07-10T17:06:16.350+0200 INFO harvester/forwarder.go:35 Input outlet closed 2018-07-10T17:06:16.372+0200 INFO [monitoring] log/log.go:132 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":10,"time":{"ms":14}},"total":{"ticks":50,"time":{"ms":58},"value":50},"user":{"ticks":40,"time":{"ms":44}}},"info":{"ephemeral_id":"cbf22b23-4f3b-43ef-8193-930f4aa6b976","uptime":{"ms":50}},"memstats":{"gc_next":4194304,"memory_alloc":1490560,"memory_total":3288568,"rss":13561856}},"filebeat":{"events":{"active":4,"added":4},"harvester":{"closed":1,"open_files":-1,"running":0,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"logstash"},"pipeline":{"clients":0,"events":{"active":4,"published":4,"total":4}}},"registrar":{"states":{"current":68},"writes":{"success":1,"total":1}},"system":{"cpu":{"cores":8},"load":{"1":0.69,"15":0.79,"5":0.93,"norm":{"1":0.0862,"15":0.0988,"5":0.1163}}}}}} 2018-07-10T17:06:16.372+0200 INFO [monitoring] **log/log.go:133 Uptime: 53.884677ms** 2018-07-10T17:06:16.372+0200 INFO [monitoring] log/log.go:110 Stopping metrics logging. 2018-07-10T17:06:16.372+0200 INFO instance/beat.go:321 filebeat stopped. [root@$server 10]#

And when it's working OK, the same command :

[root@server 10]# zcat $filename_obfuscatedesw3ccs_ghostip_S.201807100900-1000-0.gz | filebeat --once -v -e -c /etc/filebeat/filebeat.pipe.yml 2>&1 2018-07-10T17:05:31.852+0200 INFO instance/beat.go:492 Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat] 2018-07-10T17:05:31.852+0200 INFO instance/beat.go:499 Beat UUID: 4c438e13-87c2-4cc6-893e-5c5abe62d512 2018-07-10T17:05:31.852+0200 INFO [beat] instance/beat.go:716 Beat info {"system_info": {"beat": {"path": {"config": "/etc/filebeat", "data": "/var/lib/filebeat", "home": "/usr/share/filebeat", "logs": "/var/log/filebeat"}, "type": "filebeat", "uuid": "4c438e13-87c2-4cc6-893e-5c5abe62d512"}}} 2018-07-10T17:05:31.853+0200 INFO [beat] instance/beat.go:725 Build info {"system_info": {"build": {"commit": "ed42bb85e72ae58cc09748dc1825159713e0ffd4", "libbeat": "6.3.1", "time": "2018-06-29T21:09:35.000Z", "version": "6.3.1"}}} 2018-07-10T17:05:31.853+0200 INFO [beat] instance/beat.go:728 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":8,"version":"go1.9.4"}}} 2018-07-10T17:05:31.855+0200 INFO [beat] instance/beat.go:732 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2017-03-22T11:08:38+01:00","containerized":true,"hostname":"$server","ips":["127.0.0.1/8","::1/128","$server_ipv4","$server_ipv6/64"],"kernel_version":"2.6.32-573.12.1.el6.x86_64","mac_addresses":["00:50:56:98:6c:68"],"os":{"family":"redhat","platform":"redhat","name":"Red","version":"6.9 (Santiago)","major":6,"minor":9,"patch":0,"codename":"Santiago"},"timezone":"CEST","timezone_offset_sec":7200}}} 2018-07-10T17:05:31.856+0200 INFO [beat] instance/beat.go:761 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63"],"ambient":null}, "cwd": "$logpath", "exe": "/usr/share/filebeat/bin/filebeat", "name": "filebeat", "pid": 30538, "ppid": 30178, "seccomp": {"mode":""}, "start_time": "2018-07-10T17:05:31.030+0200"}}} 2018-07-10T17:05:31.857+0200 INFO instance/beat.go:225 Setup Beat: filebeat; Version: 6.3.1 2018-07-10T17:05:31.858+0200 INFO pipeline/module.go:81 Beat name: $server 2018-07-10T17:05:31.859+0200 INFO instance/beat.go:315 filebeat start running. 2018-07-10T17:05:31.859+0200 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s 2018-07-10T17:05:31.859+0200 INFO registrar/registrar.go:116 Loading registrar data from /var/lib/filebeat/registry 2018-07-10T17:05:31.864+0200 INFO registrar/registrar.go:127 States Loaded from registrar: 68 2018-07-10T17:05:31.865+0200 WARN beater/filebeat.go:354 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning. 2018-07-10T17:05:31.865+0200 INFO crawler/crawler.go:48 Loading Inputs: 1 2018-07-10T17:05:31.865+0200 INFO input/input.go:88 Starting input of type: stdin; ID: 15719311329577953965 2018-07-10T17:05:31.866+0200 INFO input/input.go:139 Stopping Input: 15719311329577953965 2018-07-10T17:05:31.866+0200 INFO crawler/crawler.go:82 Loading and starting Inputs completed. Enabled inputs: 1 2018-07-10T17:05:31.866+0200 INFO log/harvester.go:228 Harvester started for file: - 2018-07-10T17:05:31.866+0200 INFO beater/filebeat.go:370 Running filebeat once. Waiting for completion ... 2018-07-10T17:05:31.866+0200 INFO beater/filebeat.go:372 All data collection completed. Shutting down. 2018-07-10T17:05:31.867+0200 INFO crawler/crawler.go:109 Stopping Crawler 2018-07-10T17:05:31.867+0200 INFO crawler/crawler.go:119 Stopping 1 inputs 2018-07-10T17:05:31.867+0200 INFO crawler/crawler.go:135 Crawler stopped 2018-07-10T17:06:01.866+0200 INFO [monitoring] log/log.go:124 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":2230,"time":{"ms":2239}},"total":{"ticks":21420,"time":{"ms":21435},"value":21420},"user":{"ticks":19190,"time":{"ms":19196}}},"info":{"ephemeral_id":"2dab54fb-966a-4242-bc0d-b7ab70dcd6ba","uptime":{"ms":30030}},"memstats":{"gc_next":29148752,"memory_alloc":18411528,"memory_total":1158387856,"rss":44654592}},"filebeat":{"events":{"active":4117,"added":120853,"done":116736},"harvester":{"open_files":0,"running":1,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":116736,"active":4096,"batches":59,"total":120832},"read":{"bytes":342},"type":"logstash","write":{"bytes":43819359}},"pipeline":{"clients":1,"events":{"active":4117,"published":120852,"retry":8192,"total":120853},"queue":{"acked":116736}}},"registrar":{"states":{"current":0}},"system":{"cpu":{"cores":8},"load":{"1":0.69,"15":0.79,"5":0.94,"norm":{"1":0.0862,"15":0.0988,"5":0.1175}}}}}} 2018-07-10T17:06:09.051+0200 INFO log/harvester.go:251 End of file reached: . Closing because close_eof is enabled. 2018-07-10T17:06:10.393+0200 INFO beater/signalwait.go:76 Continue shutdown: All enqueued events being published. 2018-07-10T17:06:10.393+0200 INFO registrar/registrar.go:247 Stopping Registrar 2018-07-10T17:06:10.393+0200 INFO registrar/registrar.go:173 Ending Registrar 2018-07-10T17:06:10.403+0200 INFO [monitoring] log/log.go:132 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":2720,"time":{"ms":2724}},"total":{"ticks":26420,"time":{"ms":26426},"value":26420},"user":{"ticks":23700,"time":{"ms":23702}}},"info":{"ephemeral_id":"2dab54fb-966a-4242-bc0d-b7ab70dcd6ba","uptime":{"ms":38566}},"memstats":{"gc_next":17014192,"memory_alloc":13696192,"memory_total":1441591784,"rss":45985792}},"filebeat":{"events":{"added":151059,"done":151059},"harvester":{"closed":1,"open_files":-1,"running":0,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":151059,"batches":74,"total":151059},"read":{"bytes":444},"type":"logstash","write":{"bytes":53564213}},"pipeline":{"clients":0,"events":{"active":0,"published":151059,"retry":8192,"total":151059},"queue":{"acked":151059}}},"registrar":{"states":{"current":68},"writes":{"success":1,"total":1}},"system":{"cpu":{"cores":8},"load":{"1":0.67,"15":0.79,"5":0.93,"norm":{"1":0.0838,"15":0.0988,"5":0.1163}}}}}} 2018-07-10T17:06:10.403+0200 INFO [monitoring] **log/log.go:133 Uptime: 38.568974739s** 2018-07-10T17:06:10.403+0200 INFO [monitoring] log/log.go:110 Stopping metrics logging. 2018-07-10T17:06:10.403+0200 INFO instance/beat.go:321 filebeat stopped. [root@server10]#

Here is the configuration file used :
`[root@$server tmp]# cat /etc/filebeat/filebeat.pipe.yml
filebeat:
inputs:

  • type: stdin
    encoding: utf-8
    fields:
    • $fieldA: fieldA'

registry_file: /var/lib/filebeat/registry

output:
logstash:
hosts: ["$ip_logstash1:9207", "$ip_logstash2:9207"]
worker: 2

max_retries: 3
loadbalance: true`

@faec faec added the Team:Beats label Dec 6, 2019
@andresrc andresrc added Team:Integrations Label for the Integrations team and removed Team:Beats labels Mar 6, 2020
@botelastic
Copy link

botelastic bot commented Feb 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Feb 4, 2021
@botelastic botelastic bot closed this as completed Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Filebeat Filebeat Stalled Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants