Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: file plugin does not always read files from the beginning dispite start_position=beginning #173

Open
fbaligand opened this issue Mar 27, 2018 · 4 comments
Assignees

Comments

@fbaligand
Copy link

fbaligand commented Mar 27, 2018

When I configure file input to read all files in a folder from the beginning, I notice that not all files are read from the beginning. For example, some files are read from line 306, all previous lines are not read, and so, lost.
This is really annoying, because we can't rely on this plugin behavior.

The problem seems to happen when there are lots of files (more than 500).
Up to my tests, it is tied to sincedb file size. When it becomes too big, new files that arrive are not read from beginning, sometimes, even not read at all.

If I drop sincedb file, and I restart Logstash, it works fine during some days, until sincedb file becomes big again.

My guess about that bug fix is to remove from sincedb file, files that are older than ignore_older option.

Logstash version : 6.1.3
file input configuration :

	file {
		path => ["/path/to/file-inputs/*.csv"]
		start_position => "beginning"
		codec => plain {
			charset => "UTF-8"
		}
	}
@fbaligand fbaligand changed the title File first lines are - sometimes - not read Bug: file plugin does not always read files from the beginning dispite start_position=beginning Mar 28, 2018
@guyboertje
Copy link
Contributor

Same problem reported in LS repo. elastic/logstash#8929

@guyboertje guyboertje self-assigned this Apr 30, 2018
@guyboertje
Copy link
Contributor

Await feedback on whether sincedb_clean_after setting fixes this problem.

@fbaligand
Copy link
Author

Hi @guyboertje,

I will make tests this week with file input version 4.1.0 to check if it fixes the issue.
And obviously, I will give you feedback !

Fabien

@wjq
Copy link

wjq commented Jan 10, 2019

Hi @guyboertje ,
sincedb_clean_after is not working for me at current version 6.5.4. Please see my steps:

  1. create new sincedb file /tmp/sincedb/sincedb_test1, clear elastic index by curl and make sure it has permission to write
  2. update my input as
input {
  file {
    path => "/vagrant/elk_log_test/*.log"
    #mode => "read"
    start_position => "beginning"
    sincedb_clean_after => "20s"
    sincedb_path => "/tmp/sincedb/sincedb_test1"
  }
}
  1. run logstash by command:
    bin/logstash -f ~/elk_config/elk_config.conf -w 1 --pipeline.unsafe_shutdown --config.debug --log.level debug
  2. after running, create a new apache_access1.log file under elk_log_test folder
  3. add one line in log, nothing happened except
[2019-01-10T09:34:17,843][DEBUG][filewatch.tailmode.handlers.grow] read_to_eof: get chunk
[2019-01-10T09:34:17,957][DEBUG][filewatch.sincedbcollection] writing sincedb (delta since last write = 1547112857)
  1. add more 4 lines in log, corrected this time, I can see results. and delta since last write is.
[2019-01-10T09:34:39,688][DEBUG][filewatch.sincedbcollection] writing sincedb (delta since last write = 22)

sincedb was written to

123 0 25 1092 1547112879.687172 /vagrant/elk_log_test/apache-access1.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants