From 211c19148d20158d35d951de4c04cffcd5145553 Mon Sep 17 00:00:00 2001 From: Tiago Queiroz Date: Mon, 29 Apr 2024 20:24:21 +0200 Subject: [PATCH] Document havester_limit for Filestream input and fix typo (#39244) This commit documents `harvester_limit` for the filestream input and replaces `close_*` by the correct key `close.on_state_change.*`. (cherry picked from commit 59421bb12602eab337cee0fe6e689262cba89763) --- .../input-filestream-file-options.asciidoc | 24 +++++++++++++++++++ .../docs/inputs/input-filestream.asciidoc | 5 ++-- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/filebeat/docs/inputs/input-filestream-file-options.asciidoc b/filebeat/docs/inputs/input-filestream-file-options.asciidoc index a3be665e28e..ceb701c8e2f 100644 --- a/filebeat/docs/inputs/input-filestream-file-options.asciidoc +++ b/filebeat/docs/inputs/input-filestream-file-options.asciidoc @@ -516,6 +516,30 @@ less than or equal to `prospector.scanner.check_interval` If `backoff.max` needs to be higher, it is recommended to close the file handler instead and let {beatname_uc} pick up the file again. +[float] +[id="{beatname_lc}-input-{type}-harvester-limit"] +===== `harvester_limit` + +The `harvester_limit` option limits the number of harvesters that are started in +parallel for one input. This directly relates to the maximum number of file +handlers that are opened. The default for `harvester_limit` is 0, which means +there is no limit. This configuration is useful if the number of files to be +harvested exceeds the open file handler limit of the operating system. + +Setting a limit on the number of harvesters means that potentially not all files +are opened in parallel. Therefore we recommended that you use this option in +combination with the `close.on_state_change.*` options to make sure +harvesters are stopped more often so that new files can be picked up. + +Currently if a new harvester can be started again, the harvester is picked +randomly. This means it's possible that the harvester for a file that was just +closed and then updated again might be started instead of the harvester for a +file that hasn't been harvested for a longer period of time. + +This configuration option applies per input. You can use this option to +indirectly set higher priorities on certain inputs by assigning a higher +limit of harvesters. + [float] ===== `file_identity` diff --git a/filebeat/docs/inputs/input-filestream.asciidoc b/filebeat/docs/inputs/input-filestream.asciidoc index 47d1b24a8e8..54283d6cce7 100644 --- a/filebeat/docs/inputs/input-filestream.asciidoc +++ b/filebeat/docs/inputs/input-filestream.asciidoc @@ -11,8 +11,9 @@ Use the `filestream` input to read lines from active log files. It is the new, improved alternative to the `log` input. It comes with various improvements to the existing input: -1. Checking of `close_*` options happens out of band. Thus, if an output is blocked, -{beatname_uc} can close the reader and avoid keeping too many files open. +1. Checking of `close.on_state_change.*` options happens out of +band. Thus, if an output is blocked, {beatname_uc} can close the +reader and avoid keeping too many files open. 2. Detailed metrics are available for all files that match the `paths` configuration regardless of the `harvester_limit`. This way, you can keep track of all files,