New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filestream input duplicating events after every restart #30061
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Yes, I'll jump right now |
There is a workaround for this issue. You have to configure an ID for each input, so the input can find the appropriate state in the registry.
|
Should we require that each input have an ID if it doesn't work without it? |
I do not have a strong opinion on this. I am afraid if we just go with requiring the ID we will never fix this bug. |
Ah, if the root cause of the bug is unrelated to the missing IDs then we should investigate that first before applying a permanent work around. |
Any news on this issue? This is a major problem when using elastic-agent as we are receiving a ton of duplicate events every time we update a policy that triggers a restart of the integrated filebeat. |
@rroemer #30061 (comment) didn't do the trick? |
@jlind23 Unfortunately, I checked it a few days ago and Agent does not bubble down the configured ID to the filestream input. |
I would like to better understand why we require an id. I assume this is related to the way filestream input writes the state. With the logfile input we had inode + device id as identifier. I remember we added also filename as an option. What is used for filestream? How does the state of a single file match to an input id if the input specifies a file pattern? |
The TLDR is:
Because all inputs share the same ID ( The effect of the "remove from registry" is to set the offset to On this comment I added some details of what is happening, let me know if it's not clear enough. |
I have still the same problem using just one filestream. Even if i set the id i have duplicates on each restart...
|
@dschweinbenz can you start a thread on https://discuss.elastic.co/c/elastic-stack/beats/ about this? We'd need your filebeat version, full configuration file, and logs if you can provide them. We'll open a new issue if we confirm it is a new bug. |
The config clean_removed: false seems to help in our case. We have the setup that the log file is overwritten on each new data set. Filebeat seems to have problems to recognize this. Our apps are writing files to a different place and afterwards the file gets moved to overwrite the file which is scanned by filebeat. With type: log everything worked great, but using filestream filebeat produces duplicates on each restart. The registry folder is mounted as volume so that it is persistent. |
I just migrated from log inputs to filestream inputs, and spent half a day trying to figure out why all my logs were resent every time I restarted filebeat. I don't think logging a warning when a filestream has no ID is sufficient. If the service behaves wildly incorrectly when an ID is not specified, then an ID should be required. That means when an ID is not present, refuse to initialize the filestream. |
Generally we agree with this, but this would be a breaking change that could result in data loss (filestream would not be running) instead of data duplication. We haven't made this change for that reason, which might be acceptable to you but not all users. We definitely regret getting ourselves into this situation, and are still thinking about ways to address it in better way. |
Under some circumstances the filestream input is processing all the events after every restart.
For example the following configuration works fine in a Filebeat running on Kubernetes (static input, no autodiscover):
But if we add a second input (actually from same disk) then Filebeat sends everything after every restart:
I've tried to apply
file_identity.inode_marker.path: /var/log/.filebeat-marker
but the result is the same, and with a single input all works as expected.The inodes of the files do not change after every restart and the volume UUID i don't know because it's not reported by
lsblk
(checked from the filebeat container itself).Doc reference: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#filestream-file-identity
The text was updated successfully, but these errors were encountered: