Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more metrics to Filebeat harvesters #13395

Merged
merged 10 commits into from
Jan 9, 2020

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Aug 29, 2019

I have added a few new metrics to harvesters of Filebeat:

  • "last_event_timestamp": @timestamp of the last published event
  • "last_event_publised_time": the time when the last event was published and the offset was updated
  • "size": file size in bytes
  • "read_offset": offset of the file
  • "start_time": harvester start time

By reporting these metrics it is possible to create more complex checks in Kibana to help to diagnose issues in harvesters.

"harvester": {
    "files": {
        "0be8e828-e39f-42ba-8468-029a08451a37": {
            "last_event_timestamp": "2019-08-29T13:17:22.961Z",
            "last_event_published_time": "2019-08-29T13:17:22.961Z",
            "name": "/var/log/dpkg.log",
            "read_offset": 42417,
            "size": 42417,
            "start_time": "2019-08-29T13:17:19.908Z"
        },
        "6950906f-ef19-4d99-aff9-ca81b49e024f": {
            "last_event_timestamp": "",
            "last_event_published_time": "",
            "name": "/var/log/pm-powersave.log",
            "start_time": "2019-08-29T13:17:22.907Z"
        },
        "752bb2c3-2a61-4055-b3c6-5b8b87204f2b": {
            "last_event_timestamp": "2019-08-29T13:17:22.905Z",
            "last_event_published_time": "2019-08-29T13:17:22.905Z",
            "name": "/var/log/vbox-setup.log",
            "read_offset": 140,
            "size": 140,
            "start_time": "2019-08-29T13:17:19.908Z"
        },
        "c339d330-cfa6-41ae-8a77-83a019ca99ab": {
            "last_event_timestamp": "2019-08-29T13:17:22.924Z",
            "last_event_published_time": "2019-08-29T13:17:22.925Z",
            "name": "/var/log/fontconfig.log",
            "read_offset": 2269,
            "size": 2269,
            "start_time": "2019-08-29T13:17:22.915Z"
        },
        "edb8b270-904a-40dc-bb3b-e6ef278585a2": {
            "last_event_timestamp": "2019-08-29T13:17:22.915Z",
            "last_event_published_time": "2019-08-29T13:17:22.915Z",
            "name": "/var/log/alternatives.log",
            "read_offset": 5752,
            "size": 5752,
            "start_time": "2019-08-29T13:17:22.913Z"
        },
        "f63a4922-164b-47bb-9edd-1b446685090c": {
            "last_event_@timestamp": "",
            "last_event_published_time": "",
            "name": "/var/log/pm-suspend.log",
            "start_time": "2019-08-29T13:17:22.906Z"
        }
    },
    "open_files": 6,
    "running": 6,
    "started": 6
}

Closes #7743

@kvch kvch added in progress Pull request is currently in progress. review Filebeat Filebeat labels Aug 29, 2019
@kvch kvch requested review from ph and urso August 29, 2019 11:30
@kvch kvch added [zube]: In Review and removed in progress Pull request is currently in progress. labels Nov 22, 2019
@kvch kvch force-pushed the feature-filebeat-additional-input-metrics branch from 0765b6a to 5e3a410 Compare November 22, 2019 11:24
@urso
Copy link

urso commented Nov 23, 2019

@ycombinator @cachedout Would be nice if we could display these information in stack monitoring UI. The sum(size) - sum(read_offset) gives you an idea about data to be shipped. If we display this, we should also display the number of active harvesters.

@kvch why do have some entries no size and read_offset? If we know the file needs to be processed soonish, we should set size to the actual file size and read_offset: 0.

How about adding some status telling us if the file is currently open or not. In the future we could add more info like removed or renamed (assuming we have a separate file watcher).

@cachedout
Copy link
Contributor

Thanks for doing this, @kvch and thanks tagging us, @urso

From the @elastic/stack-monitoring side, there's an important caveat here in that right now we display the same set of metrics regardless of which type of Beat is being monitored. So, we have a little bit of foundational work to do in order to support the display of this kind of data. I have broadly outlined the work here:

elastic/kibana#51573

@kvch
Copy link
Contributor Author

kvch commented Nov 25, 2019

@urso size and read_offset are zero, and zero metrics are not logged due to our monitoring implementation. So the metrics are available in the data structure, but they are not displayed in the logs. But AFAIK all metric, including zeros, are included in monitoring events.

In this PR only opened files reported. It would be indeed interesting to show other file states. But that should be done on the input level, not in the harvester.

@kvch kvch force-pushed the feature-filebeat-additional-input-metrics branch from fe3c0a1 to 60f97c0 Compare November 26, 2019 08:21
@urso
Copy link

urso commented Nov 26, 2019

But that should be done on the input level, not in the harvester.

Agreed. I think they are important though. Otherwise lag stats are wrong and confusing if we compare the sums of size and read_offset.

@kvch
Copy link
Contributor Author

kvch commented Nov 26, 2019

@urso Should I add that functionality to this PR?

@urso
Copy link

urso commented Nov 26, 2019

If its not to complicated yes, please.

@kvch kvch self-assigned this Nov 26, 2019
@kvch kvch force-pushed the feature-filebeat-additional-input-metrics branch from 60f97c0 to 7a376b0 Compare November 28, 2019 21:39
filebeat/input/log/harvester.go Outdated Show resolved Hide resolved
filebeat/harvester/registry.go Outdated Show resolved Hide resolved
@kvch
Copy link
Contributor Author

kvch commented Nov 28, 2019

Current format of reporting the file states besides running:

{"filebeat":
  {"log":
    {"463bc38-6737-4938-9371-9afef4efb3c0":
      {"harvesters":
        {"/home/n/go/src/github":
          {"com/elastic/beats/filebeat/test":
             {"log":"file was removed"}}}}}}

I need to revisit the format, because it is not human-readable as filenames are split along dots...

@kvch kvch force-pushed the feature-filebeat-additional-input-metrics branch from fa2da63 to 3f2e91f Compare November 29, 2019 13:36
@kvch
Copy link
Contributor Author

kvch commented Nov 29, 2019

As you can see by my comment above, I have started working on reporting file states. However, I stopped because it has started to grow a lot in complexity. I would rather add this feature in a follow-up PR.

This PR already provides better visibility into the harvesters' souls.

@kvch
Copy link
Contributor Author

kvch commented Nov 29, 2019

I opened a follow-up issue: #14860

@kvch kvch removed their assignment Nov 29, 2019
@kvch kvch force-pushed the feature-filebeat-additional-input-metrics branch from d450d49 to 5f33d27 Compare January 6, 2020 16:15
@kvch
Copy link
Contributor Author

kvch commented Jan 6, 2020

Updated and rebased the PR.

@kvch kvch requested a review from urso January 6, 2020 16:15
@urso
Copy link

urso commented Jan 7, 2020

Jenkins, test this.

@kvch
Copy link
Contributor Author

kvch commented Jan 8, 2020

jenkins test this

@kvch
Copy link
Contributor Author

kvch commented Jan 9, 2020

Failing tests are unrelated.

@kvch kvch merged commit 33a638b into elastic:master Jan 9, 2020
@kvch kvch added the needs_backport PR is waiting to be backported to other branches. label Jan 9, 2020
kvch added a commit to kvch/beats that referenced this pull request Jan 9, 2020
I have added a few new metrics to harvesters of Filebeat:

- `"last_event_timestamp"`: `@timestamp` of the last published event
- `"last_event_publised_time"`:  the time when the last event was published and the offset was updated
- `"size"`: file size in bytes
- `"read_offset"`: offset of the file
- `"start_time"`: harvester start time

By reporting these metrics it is possible to create more complex checks in Kibana to help to diagnose issues in harvesters.

```
"harvester": {
    "files": {
        "0be8e828-e39f-42ba-8468-029a08451a37": {
            "last_event_timestamp": "2019-08-29T13:17:22.961Z",
            "last_event_published_time": "2019-08-29T13:17:22.961Z",
            "name": "/var/log/dpkg.log",
            "read_offset": 42417,
            "size": 42417,
            "start_time": "2019-08-29T13:17:19.908Z"
        },
        "6950906f-ef19-4d99-aff9-ca81b49e024f": {
            "last_event_timestamp": "",
            "last_event_published_time": "",
            "name": "/var/log/pm-powersave.log",
            "start_time": "2019-08-29T13:17:22.907Z"
        },
        "752bb2c3-2a61-4055-b3c6-5b8b87204f2b": {
            "last_event_timestamp": "2019-08-29T13:17:22.905Z",
            "last_event_published_time": "2019-08-29T13:17:22.905Z",
            "name": "/var/log/vbox-setup.log",
            "read_offset": 140,
            "size": 140,
            "start_time": "2019-08-29T13:17:19.908Z"
        },
        "c339d330-cfa6-41ae-8a77-83a019ca99ab": {
            "last_event_timestamp": "2019-08-29T13:17:22.924Z",
            "last_event_published_time": "2019-08-29T13:17:22.925Z",
            "name": "/var/log/fontconfig.log",
            "read_offset": 2269,
            "size": 2269,
            "start_time": "2019-08-29T13:17:22.915Z"
        },
        "edb8b270-904a-40dc-bb3b-e6ef278585a2": {
            "last_event_timestamp": "2019-08-29T13:17:22.915Z",
            "last_event_published_time": "2019-08-29T13:17:22.915Z",
            "name": "/var/log/alternatives.log",
            "read_offset": 5752,
            "size": 5752,
            "start_time": "2019-08-29T13:17:22.913Z"
        },
        "f63a4922-164b-47bb-9edd-1b446685090c": {
            "last_event_@timestamp": "",
            "last_event_published_time": "",
            "name": "/var/log/pm-suspend.log",
            "start_time": "2019-08-29T13:17:22.906Z"
        }
    },
    "open_files": 6,
    "running": 6,
    "started": 6
}
```

Closes elastic#7743
(cherry picked from commit 33a638b)
@kvch kvch added v7.6.0 and removed needs_backport PR is waiting to be backported to other branches. labels Jan 9, 2020
kvch added a commit that referenced this pull request Jan 9, 2020
I have added a few new metrics to harvesters of Filebeat:

- `"last_event_timestamp"`: `@timestamp` of the last published event
- `"last_event_publised_time"`:  the time when the last event was published and the offset was updated
- `"size"`: file size in bytes
- `"read_offset"`: offset of the file
- `"start_time"`: harvester start time

By reporting these metrics it is possible to create more complex checks in Kibana to help to diagnose issues in harvesters.

```
"harvester": {
    "files": {
        "0be8e828-e39f-42ba-8468-029a08451a37": {
            "last_event_timestamp": "2019-08-29T13:17:22.961Z",
            "last_event_published_time": "2019-08-29T13:17:22.961Z",
            "name": "/var/log/dpkg.log",
            "read_offset": 42417,
            "size": 42417,
            "start_time": "2019-08-29T13:17:19.908Z"
        },
        "6950906f-ef19-4d99-aff9-ca81b49e024f": {
            "last_event_timestamp": "",
            "last_event_published_time": "",
            "name": "/var/log/pm-powersave.log",
            "start_time": "2019-08-29T13:17:22.907Z"
        },
        "752bb2c3-2a61-4055-b3c6-5b8b87204f2b": {
            "last_event_timestamp": "2019-08-29T13:17:22.905Z",
            "last_event_published_time": "2019-08-29T13:17:22.905Z",
            "name": "/var/log/vbox-setup.log",
            "read_offset": 140,
            "size": 140,
            "start_time": "2019-08-29T13:17:19.908Z"
        },
        "c339d330-cfa6-41ae-8a77-83a019ca99ab": {
            "last_event_timestamp": "2019-08-29T13:17:22.924Z",
            "last_event_published_time": "2019-08-29T13:17:22.925Z",
            "name": "/var/log/fontconfig.log",
            "read_offset": 2269,
            "size": 2269,
            "start_time": "2019-08-29T13:17:22.915Z"
        },
        "edb8b270-904a-40dc-bb3b-e6ef278585a2": {
            "last_event_timestamp": "2019-08-29T13:17:22.915Z",
            "last_event_published_time": "2019-08-29T13:17:22.915Z",
            "name": "/var/log/alternatives.log",
            "read_offset": 5752,
            "size": 5752,
            "start_time": "2019-08-29T13:17:22.913Z"
        },
        "f63a4922-164b-47bb-9edd-1b446685090c": {
            "last_event_@timestamp": "",
            "last_event_published_time": "",
            "name": "/var/log/pm-suspend.log",
            "start_time": "2019-08-29T13:17:22.906Z"
        }
    },
    "open_files": 6,
    "running": 6,
    "started": 6
}
```

Closes #7743
(cherry picked from commit 33a638b)
@kovyrin
Copy link
Contributor

kovyrin commented Jan 9, 2020

This is great! Thank you for doing it, we could finally build ingestion lag monitoring for our logging pipeline!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat review Team:Integrations Label for the Integrations team v7.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Filebeat inputs metrics visibility to help identify back pressure
6 participants