New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOG-1213: Metric for inbound log data loss at the collector #2070
LOG-1213: Metric for inbound log data loss at the collector #2070
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest metric names:
- fluentd_input_status_total_bytes_logged
- fluentd_input_status_total_bytes_collected
This fits the pattern of other fluentd metrics. The "on_disk", seems redundant.
The code is hard to review because so much of it is copied from fluentd. Can you subclass the original plugin classes and just override what you need instead of copying the code?
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
b74b907
to
cdf1430
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some simple clean up comments in line.
We will need a test to prove that this works before we merge it.
I think it would be sufficient to test it on a stand-alone fluentd with a very simple dummy configuration, and use the Go HTTP client to scrape the metrics and validate them.
Check out tests/functional and check with @jcantrill on using his functional framework, it's designed to deploy a stand-alone fluentd without setting up the whole logging system.
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
/hold |
464437e
to
fda80db
Compare
get_gauge_or_counter is not working as per expectations, throwing some errors hence leaving it to get_gauge type for total_bytes_collected metric. require some help there. have tested the functionality by looking at prometheus dashboard if total_bytes_collected getting published correctly. it is working fine as per my stand-alone test environment. |
fda80db
to
9ed90c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
3abf323
to
5ba119c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of points to clean up, otherwise LGTM.
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
/approve |
c94553a
to
658b5ff
Compare
…b and in_tail/in_tail.rb in_tail/write_watcher.rb keeping them in /origin-aggregated-logging/fluentd/lib/
658b5ff
to
c1b2be4
Compare
Final names for the metrics:
|
@@ -0,0 +1,120 @@ | |||
require 'fluent/plugin/input' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be either removed entirely, appear as a patch to vendored_src, are a new file to vendored_src since you cant patch it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcantrill i would like to know where we can keep this file version controlled as a single piece of truth to all changes that are used to create patch over vendored_src baseline implementation of this plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to add it as a net new patch to the lib. Additionally, there are examples where we copy a file wholesale into vendor dir before building. This may be a valid choice for now
fluentd/lib/in_tail/in_tail.rb
Outdated
@@ -0,0 +1,1048 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be either removed entirely, appear as a patch to vendored_src, are a new file to vendored_src since you cant patch it
…es_total metric in fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb plugin
…ace podname containername labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alanconway, pmoogi-redhat The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -0,0 +1,120 @@ | |||
require 'fluent/plugin/input' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to add it as a net new patch to the lib. Additionally, there are examples where we copy a file wholesale into vendor dir before building. This may be a valid choice for now
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
fluentd/lib/fluent-plugin-prometheus/in_prometheus_tail_monitor.rb
Outdated
Show resolved
Hide resolved
New changes are detected. LGTM label has been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanconway @jcantrill kindly review - added k8 regex based filename parsing.
@jcantrill @pmoogi-redhat I think we can re-factor this code into a single, fully independent plug-in belonging to us, with no patches to fluentd. That will solve the labelling problems. |
69e1909
to
1357a78
Compare
@pmoogi-redhat: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Separate PR is raised to enable this new metric as a new plugin - new_plugin_Metric-for-inbound-log-data-loss-at-the-collector-JIRA-ticket-LOG-1032. Hence closing this PR which reflects in_tail and fluent-prometheus-plugin changes. |
Description
Currently in_tail plugin doesn't support publishing of inbound logloss - i.e. difference between total bytes written to disk (logfile) and total bytes collected or read by fluentd. This PR got changes in fluentd/lib/fluent/plugin/in_tail.rb, and fluent-plugin-prometheus/lib/fluent/plugin/in_prometheus_tail_monitor.rb plugins to enable publishing of the below parameters
/cc @alanconway @jcantrill
/assign @alanconway
/cherry-pick
Links
Depending on PR(s):
Bugzilla:
Github issue:
JIRA: https://issues.redhat.com/browse/LOG-1213
Enhancement proposal: many rotations getting missed by fluentd, next enhancement proposal is to get fluentd know about all actual rotations done by CRIO/conmon process by reading extra meta data such as . Based on what rotations fluentd could track and which one got missed computing log-loss as [no-of-rotations-those-missed * maxsizeoflogfile + sum over all tracked rotations by fluentd as [totalbytes_logged_in_disk - totalbytes_collected_from_disk]