-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filebeat] filestream input resends whole log files after restart #36541
Comments
I think we are running into the same problem with a suricata setup producing about ± 100 eve.*.json files. After a restart filebeat doesn't seem to know where it left things and starts most all over again leading to duplicate data. |
We ran into this again and this time it was at the end of the day, so the whole 400GB get's reindexed again leading to double data and a SIEM that's lagging many hours behind. |
We are experiencing this issue too, we are using version 8.11.1. Is there any ETA for this? |
haven't seen anyone responding in months, so no clue :) |
+1 on this one - every restart for some reason filebeat reads the rotated files (not the current file though) - we are using version 8.11.2 |
The logs and information @oshmyrko provided show show something quite weird happening. By default the Filestream input uses the The Filebeat considers a file (identified by inode + device ID) truncated when the size from the file is smaller than size Filebeat has stored in memory. If a file was just moved the size is the same, Filebeat will detect the change in filename and move on without re-ingesting anything. From the However, based on the logs Filebeat perceived something different.
The inode is The Filebeat detects a file truncation, but this time the path shows
I'll further investigate it and try to reproduce it on my dev environment. Once I have some updates I'll post it here. |
For folks still experiencing this issue, please make sure all of your filestream inputs have an unique ID. If the ID is re-used or more than one input does not have the Our documentation is very clear about the need for unique IDs:
Also look at your logs (level error) for entries like this:
If it is there, then that's likely the reason for the data duplication. Setting unique IDs for filestream will fix the issue. Bear in mind that all files harvested by the inputs new/changed IDs will be re-ingested from the beginning once. After that Filebeat should stop duplicating data. |
@belimawr, all my filestream inputs have unique IDs (you can see them below), and I've just reproduced this issue with Filebeat # cat /etc/filebeat/inputs.d/app-log.yml
- type: filestream
id: app-log
paths:
- /home/ubuntu/my-app/logs/*-json.log*
parsers:
- ndjson:
ignore_decoding_error: true
processors:
- timestamp:
field: timestamp
layouts:
- '2006-01-02T15:04:05Z'
- '2006-01-02T15:04:05.999Z'
- '2006-01-02T15:04:05.999-07:00'
test:
- '2023-04-16T17:45:35.999Z'
ignore_missing: true
- drop_fields:
fields: [timestamp]
ignore_missing: true
---
# cat /etc/filebeat/inputs.d/docker.yml
- type: container
paths:
- '/var/lib/docker/containers/*/*.log'
json:
keys_under_root: true
ignore_decoding_error: true
---
# cat /etc/filebeat/inputs.d/syslog.yml
- type: filestream
id: syslog
paths:
- '/var/log/syslog*'
processors:
- add_fields:
target: ''
fields:
type: 'syslog'
---
# cat /etc/filebeat/inputs.d/kernel.yml
- type: filestream
id: kernel
paths:
- '/var/log/kern.log'
processors:
- add_fields:
target: ''
fields:
type: 'kernel'
---
# cat /etc/filebeat/inputs.d/cloud-init.yml
- type: filestream
id: cloud-init
paths:
- '/var/log/cloud-init-output.log'
ignore_older: '2h'
close_inactive: '10m'
exclude_lines: ['^\|', '^\+']
parsers:
- multiline:
type: pattern
pattern: '^ci-info.*'
negate: false
match: after
processors:
- add_fields:
target: ''
fields:
type: 'cloud-init' |
Thanks @oshmyrko! I'll look into reproducing this issue and further investigating |
@belimawr, just want to add some details. Here are rough numbers - 500MB of logs every ~3 min and about 1200K logs in the file:
|
I have been running into this repeatedly. Have documented my findings on discuss My understanding is that when filebeat detects truncation on files, it rereads from 0 but does not reset offset in registry. |
What solved it for us was to give a unique ID for each prospector we have for the same filebeat agent - it is documented here I have to say - this is a poor choice for the product, maintaining a UID for each prospector we have (over 400 servers, 100+ files we read) is such a overhead. I get that it is easier to debug issues this way but the fact that if you change this ID or change conventions of IDs in the future you will end up with duplicated events - means we are bounded to this ID Cant filebeat generate a UID for each prospector that does not have one? just for the sole purpose of it not reading the same events after every restart? |
Thanks for the new information @oshmyrko!
That sounds like inode reuse could be the the root cause for you. Anyway, it's on my list to further investigate this. @redbaron4 Thanks a lot for the detailed report. I wrote a more detailed answer on Discuss. Here goes a summary:
@SadeGili, the IDs need to be unique within a Filebeat process/configuration. So if you have multiple servers (400+ as you mentioned), they all can share the exact same Filebeat configuration. As long as each Filebeat process has got unique IDs on it's configuration file(s)/filestream inputs you will have no issue. One of the features from the Filestream input is to allow different instances of the input to read from the same file and have independent states. For that to work we need a way to tie a file state/identity and an Filestream input instance, we use the input ID defined in the configuration for that.
Unfortunately that is not a simple task. If we generate an ID to a filestream input that does not have one, we have to ensure this ID is unique and stable, persisted and linked to that specific input configuration. The problem is that an input configuration can be edited by the user (adding processors, fields, more file paths, etc), moved to different parts of the configuration file (like changing the ordering of the inputs definition), moved to an external file, etc. That is not an easy problem to solve and it might not even be solvable due to the possibility of so many things changing in an input configuration. |
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
@redbaron4 I believe I managed to reproduce the case you described in your discuss post. There are two pre-conditions to experience the file being re-ingested as you
Those two conditions, when Filebeat re-starts causes the file to be The real bug here is that when Filebeat detects the file truncation at There are legit cases of file truncation usage that also triggers this @oshmyrko your case seems to have the same root cause (registry not |
I created an issue with a step-by-step to reproduce the truncation offset tracking: #38070 |
Disabling close.on_state_change.removed: false and clean_removed: false and having any sort of identity reuse (inode reuse for native file identity or filepath/name reuse for path file identity) will trigger this issue. If the states can be removed from the registry, it will be a completely new file and Filebeat should work as expected. |
@oshmyrko, I've been trying to reproduce what you described, but no You posted this set of logs:
Because of the filter there ( According to those logs the last offset in the registry for If we look at the two truncated files from that set of logs (I removed {
"@timestamp": "2023-09-08T15:04:21.915Z",
"log.origin": {
"file.name": "filestream/input.go",
"file.line": 321
},
"message": "Reader was closed. Closing.",
"source_file": "filestream::app-log::native::520698-66305",
"path": "/home/ubuntu/my-app/logs/my-svc-json.log.1",
"state-id": "native::520698-66305"
}
{
"@timestamp": "2023-09-08T15:03:43.672Z",
"log.origin": {
"file.name": "filestream/input.go",
"file.line": 321
},
"message": "Reader was closed. Closing.",
"source_file": "filestream::app-log::native::525441-66305",
"path": "/home/ubuntu/my-app/logs/my-svc-json.log.2",
"state-id": "native::525441-66305"
}
When Filebeat starts again we have: {
"@timestamp": "2023-09-08T15:04:22.366Z",
"log.origin": {
"file.name": "filestream/input.go",
"file.line": 267
},
"message": "File was truncated. Reading file from offset 0. Path=/home/ubuntu/my-app/logs/my-svc-json.log.2",
"source_file": "filestream::app-log::native::520698-66305",
"path": "/home/ubuntu/my-app/logs/my-svc-json.log.2",
"state-id": "native::520698-66305"
}
{
"@timestamp": "2023-09-08T15:04:22.370Z",
"log.origin": {
"file.name": "filestream/input.go",
"file.line": 267
},
"message": "File was truncated. Reading file from offset 0. Path=/home/ubuntu/my-app/logs/my-svc-json.log.1",
"source_file": "filestream::app-log::native::525439-66305",
"path": "/home/ubuntu/my-app/logs/my-svc-json.log.1",
"state-id": "native::525439-66305"
}
Which looks normal for a file rotation strategy that moves Because I do not have access to Filebeat's registry nor the complete If you're experiencing inode reuse, then you might be falling in one On a normal operation and file rotation, Filebeat will remove the {"log.level":"debug","@timestamp":"2024-02-21T12:38:55.580+0100","log.logger":"file_watcher","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileWatcher).watch","file.name":"filestream/fswatch.go","file.line":120},"message":"Start next scan","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:38:55.580+0100","log.logger":"file_watcher","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileWatcher).watch","file.name":"filestream/fswatch.go","file.line":224},"message":"File scan complete","service.name":"filebeat","total":4,"written":1,"truncated":0,"renamed":0,"removed":0,"created":0,"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:38:55.580+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onFSEvent","file.name":"filestream/prospector.go","file.line":187},"message":"File /tmp/logs/foo.log has been updated","service.name":"filebeat","id":"filestream-input-id","prospector":"file_prospector","operation":"write","source_name":"native::85729-34","os_id":"85729-34","new_path":"/tmp/logs/foo.log","old_path":"/tmp/logs/foo.log","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:38:55.580+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.(*defaultHarvesterGroup).Start","file.name":"input-logfile/harvester.go","file.line":139},"message":"Starting harvester for file","service.name":"filebeat","id":"filestream-input-id","source_file":"filestream::filestream-input-id::native::85729-34","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:38:55.581+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1","file.name":"input-logfile/harvester.go","file.line":205},"message":"Stopped harvester for file","service.name":"filebeat","id":"filestream-input-id","source_file":"filestream::filestream-input-id::native::85729-34","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:00.582+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*logFile).shouldBeClosed","file.name":"filestream/filestream.go","file.line":215},"message":"close.on_state_change.removed is enabled and file /tmp/logs/foo.log has been removed","service.name":"filebeat","id":"filestream-input-id","source_file":"filestream::filestream-input-id::native::85589-34","path":"/tmp/logs/foo.log","state-id":"native::85589-34","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:05.581+0100","log.logger":"file_watcher","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileWatcher).watch","file.name":"filestream/fswatch.go","file.line":120},"message":"Start next scan","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:05.581+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onFSEvent","file.name":"filestream/prospector.go","file.line":215},"message":"File /tmp/logs/foo.log has been renamed to /tmp/logs/foo.log.1","service.name":"filebeat","id":"filestream-input-id","prospector":"file_prospector","operation":"rename","source_name":"native::85729-34","os_id":"85729-34","new_path":"/tmp/logs/foo.log.1","old_path":"/tmp/logs/foo.log","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:05.581+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onFSEvent","file.name":"filestream/prospector.go","file.line":215},"message":"File /tmp/logs/foo.log.1 has been renamed to /tmp/logs/foo.log.2","service.name":"filebeat","id":"filestream-input-id","prospector":"file_prospector","operation":"rename","source_name":"native::85674-34","os_id":"85674-34","new_path":"/tmp/logs/foo.log.2","old_path":"/tmp/logs/foo.log.1","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:05.581+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onFSEvent","file.name":"filestream/prospector.go","file.line":215},"message":"File /tmp/logs/foo.log.2 has been renamed to /tmp/logs/foo.log.3","service.name":"filebeat","id":"filestream-input-id","prospector":"file_prospector","operation":"rename","source_name":"native::85625-34","os_id":"85625-34","new_path":"/tmp/logs/foo.log.3","old_path":"/tmp/logs/foo.log.2","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:05.581+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onFSEvent","file.name":"filestream/prospector.go","file.line":210},"message":"File /tmp/logs/foo.log.3 has been removed","service.name":"filebeat","id":"filestream-input-id","prospector":"file_prospector","operation":"delete","source_name":"native::85589-34","os_id":"85589-34","old_path":"/tmp/logs/foo.log.3","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:05.581+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onRemove","file.name":"filestream/prospector.go","file.line":241},"message":"Stopping harvester as file native::85589-34 has been removed and close.on_state_change.removed is enabled.","service.name":"filebeat","id":"filestream-input-id","prospector":"file_prospector","operation":"delete","source_name":"native::85589-34","os_id":"85589-34","old_path":"/tmp/logs/foo.log.3","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:05.581+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onRemove","file.name":"filestream/prospector.go","file.line":246},"message":"Remove state for file as file removed: /tmp/logs/foo.log.3","service.name":"filebeat","id":"filestream-input-id","prospector":"file_prospector","operation":"delete","source_name":"native::85589-34","os_id":"85589-34","old_path":"/tmp/logs/foo.log.3","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-21T12:39:05.581+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*filestream).Run.func1","file.name":"filestream/input.go","file.line":151},"message":"Closing reader of filestream","service.name":"filebeat","id":"filestream-input-id","source_file":"filestream::filestream-input-id::native::85589-34","path":"/tmp/logs/foo.log","state-id":"native::85589-34","ecs.version":"1.6.0"}
To experiment with the file rotation I've been using this script: #!/bin/bash
cd /tmp/logs
rm foo.log.3
mv foo.log.2 foo.log.3
mv foo.log.1 foo.log.2
mv foo.log foo.log.1
for i in {1..1000000}
do
# Each log line is 50 bytes
printf "log line with some padding %10d ===========\n" $i >> /tmp/logs/foo.log
done
And my filebeat.inputs:
- type: filestream
id: filestream-input-id
enabled: true
paths:
- /tmp/logs/foo.log*
output:
file:
enabled: true
codec.json:
pretty: false
path: ${path.home}
filename: "output"
rotate_on_startup: false
queue.mem:
flush:
timeout: 1s
min_events: 32
filebeat.registry.flush: 1s
logging:
level: debug
selectors:
- crawler
- file_watcher
- input
- input.filestream
- input.harvester
- registrar
- scanner
- service
metrics:
enabled: false
I'm running a build from our |
@belimawr, thank you for the effort you spent on this testing to reproduce the issue. BTW, some of our logs have 100-500K bytes size. |
That would be great @oshmyrko ! If you could provide me with debug logs and the registry, that would be perfect. I believe the main issue you're facing is inode reuse. I cannot affirm that with the information I have but there is evidence. If you look at this:
The newer file (well, I assume it's newer given usual file rotation) has got a smaller inode, so they seem to be reused rather quickly by your system. I'm already working on #38070, so a fix should be coming up soon. If you're really facing inode reuse it will help, but not fix all problems. |
A possible workaround is to use the filebeat.inputs:
- type: filestream
id: id-unique-id
enabled: true
paths:
- /tmp/logs/foo.log*
file_identity.fingerprint: ~
prospector.scanner.fingerprint:
enabled: true WARNING: By changing the file identity, any file that is not currently ignored will be re-ingested from the beginning! Another thing to keep in mind is that because the identification of a file changes with its content, once a file is truncated Filebeat will log messages stating it was removed, ex: {"log.level":"debug","@timestamp":"2024-02-23T15:26:20.140+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*logFile).handleEOF","file.name":"filestream/filestream.go","file.line":259},"message":"File was truncated as offset (2111) > size (0): /tmp/logs/foo.log","service.name":"filebeat","id":"id","source_file":"filestream::id::fingerprint::413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","path":"/tmp/logs/foo.log","state-id":"fingerprint::413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-02-23T15:26:20.140+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*filestream).readFromSource","file.name":"filestream/input.go","file.line":334},"message":"File was truncated, nothing to read. Path='/tmp/logs/foo.log'","service.name":"filebeat","id":"id","source_file":"filestream::id::fingerprint::413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","path":"/tmp/logs/foo.log","state-id":"fingerprint::413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-23T15:26:20.140+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1","file.name":"input-logfile/harvester.go","file.line":247},"message":"Stopped harvester for file","service.name":"filebeat","id":"id","source_file":"filestream::id::fingerprint::413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-23T15:26:20.140+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*filestream).Run.func1","file.name":"filestream/input.go","file.line":151},"message":"Closing reader of filestream","service.name":"filebeat","id":"id","source_file":"filestream::id::fingerprint::413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","path":"/tmp/logs/foo.log","state-id":"fingerprint::413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-23T15:26:22.116+0100","log.logger":"file_watcher","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileWatcher).watch","file.name":"filestream/fswatch.go","file.line":120},"message":"Start next scan","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-02-23T15:26:22.116+0100","log.logger":"scanner","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileScanner).GetFiles","file.name":"filestream/fswatch.go","file.line":389},"message":"cannot create a file descriptor for an ingest target \"/tmp/logs/foo.log\": filesize of \"/tmp/logs/foo.log\" is 0 bytes, expected at least 1024 bytes for fingerprinting","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-23T15:26:22.116+0100","log.logger":"file_watcher","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileWatcher).watch","file.name":"filestream/fswatch.go","file.line":224},"message":"File scan complete","service.name":"filebeat","total":0,"written":0,"truncated":0,"renamed":0,"removed":1,"created":0,"ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-23T15:26:22.116+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onFSEvent","file.name":"filestream/prospector.go","file.line":210},"message":"File /tmp/logs/foo.log has been removed","service.name":"filebeat","id":"id","prospector":"file_prospector","operation":"delete","source_name":"fingerprint::413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","fingerprint":"413983b17a792f86ea64f2119474677eae2c1389c8ed8ebc08a841edacbd2c98","os_id":"50761-34","old_path":"/tmp/logs/foo.log","ecs.version":"1.6.0"} After enough content is added to calculate a new fingerprint, the file is detected as a new file, ex: {"log.level":"debug","@timestamp":"2024-02-23T15:39:22.118+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*fileProspector).onFSEvent","file.name":"filestream/prospector.go","file.line":179},"message":"A new file /tmp/logs/foo.log has been found","service.name":"filebeat","id":"id","prospector":"file_prospector","operation":"create","source_name":"fingerprint::d154c77df40cb626acaf9b2f3ea03af24a092e4eecef61c9a05ef683c34f4720","fingerprint":"d154c77df40cb626acaf9b2f3ea03af24a092e4eecef61c9a05ef683c34f4720","os_id":"50761-34","new_path":"/tmp/logs/foo.log","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-23T15:39:22.118+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.(*defaultHarvesterGroup).Start","file.name":"input-logfile/harvester.go","file.line":139},"message":"Starting harvester for file","service.name":"filebeat","id":"id","source_file":"filestream::id::fingerprint::d154c77df40cb626acaf9b2f3ea03af24a092e4eecef61c9a05ef683c34f4720","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-23T15:39:22.118+0100","log.logger":"input.filestream","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream.(*filestream).open","file.name":"filestream/input.go","file.line":185},"message":"newLogFileReader with config.MaxBytes:10485760","service.name":"filebeat","id":"id","source_file":"filestream::id::fingerprint::d154c77df40cb626acaf9b2f3ea03af24a092e4eecef61c9a05ef683c34f4720","path":"/tmp/logs/foo.log","state-id":"fingerprint::d154c77df40cb626acaf9b2f3ea03af24a092e4eecef61c9a05ef683c34f4720","ecs.version":"1.6.0"} |
Hi @belimawr, Thank you for these details. I hope your fix solves this issue. |
We have the same issue with Filebeat 8.15.0 Here is the config excerpt:
Here is the logfile:
Just in case, the free space:
So every time Filebeat is restarted or, more annoying, it get reconnected to Logstash, it resends the whole file. What other data should I collect in order to help fixing this issue? |
Hi @rlevytskyi to take a look at this I'd need debug logs, Filebeat's registry and your full Filebeat configuration (please redact any sensitive information like output credentials, IPs, etc) For the debug logs, to control verbosity of the logs, here is a good logging configuration: logging:
level: debug
selectors:
- crawler
- file_watcher
- input
- input.filestream
- input.harvester
- registrar
- scanner
- service If you can consistently reproduce this issue, a step-by-step would be very helpful as well. Does any restart at any point makes Filebeat re-ingest the logs or there is also a time component, like "after Filebeat is running for 1h, if restarted logs are re-ingested" |
Thank you for your prompt reply @belimawr ! I've applied debug settings you suggested. As far as I remember, few months ago we had the same issue with one of our instances and it was the same - ~20 times logs were repeated in several days, then stopped repeating by itself. So for now I see no other way than close the issue. Have a nice day! |
Thanks for the update @rlevytskyi, if you face this issue again, please first try using the fingerprint file identity I suggested #36541 (comment) to rule out issues with inode reuse. If after migrating to the fingerprint file identity the issue still persists, don't hesitate on posting here/contacting me again. |
Filebeat filestream resends whole log files after restart, but only in case several log files were rotated.
Not all log files might be resent, often it resends files with a second or third index.
Details
8.6.1
and8.9.1
Ubuntu 22.04.3 LTS
In our case, we use logback with FixedWindowRollingPolicy which renames files during rotation.
Configurations
/etc/filebeat/filebeat.yml:
/etc/filebeat/inputs.d/app-log.yml:
Logs
Log files before Filebeat restart:
Log files after Filebeat restart:
Filebeat logs, pay attention on
File was truncated. Reading file from offset 0
formy-svc-json.log.1-2
:After a few min, readers for files with no activities were closed:
One more restart:
The text was updated successfully, but these errors were encountered: