Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NR-228258 Logging too many files warning msg #1823

Conversation

Sivakumar3695
Copy link
Contributor

Description:

When there are too many files watched by the FB process, the OS will stop the process. To help users understand what went wrong, a warning log message will be produced which the user can take a look to understand the number of files targeted by each file paths provided for log forwarding.

With the changes in this PR, now, the Infra-Agent (while starting up) will compute the amount of target files for each file path config. If the total target files exceed the recommended maximum, the warning message will be printed in the logs.

@Sivakumar3695 Sivakumar3695 requested a review from a team as a code owner February 29, 2024 09:28
Copy link
Contributor

@jsubirat jsubirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work @Sivakumar3695 ! I left you some comments for you to consider. 🙇

}
files, err := filepath.Glob(l.File)
if err != nil {
cfgLogger.WithField("filePath", l.File).Error("Error while reading files under the given file path")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This will also tell us if any of the filepaths is haveing any permission issue. Would it be possible to include the err in the logged message as well? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have included the error in the logger as below:
cfgLogger.WithField("filePath", l.File).WithError(err).Error("Error while reading files under the given file path")

pkg/integrations/v4/logs/cfg.go Outdated Show resolved Hide resolved
pkg/integrations/v4/logs/cfg.go Outdated Show resolved Hide resolved
pkg/integrations/v4/logs/cfg_template.go Outdated Show resolved Hide resolved
pkg/integrations/v4/logs/cfg_template.go Outdated Show resolved Hide resolved
pkg/integrations/v4/logs/cfg_template.go Outdated Show resolved Hide resolved
name string
logCfg LogCfg
expectedCount int
}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add test cases with:

  • One that contains a glob of an existing folder but with no matches (should produce a 0 in the output message). i.e. ./*_test.starwars
  • One that contains a glob of a non-existing folder (should throw an error due to not having access to that folder or not existing): ./starwars/*_test.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a test case for the first one -> with no matches, will produce 0 target files

For the second test case suggested, the current code consumes the error inside the method itself. Hence, that cannot be tested in the unit testing. Instead, I have added a unit test case to check if 0 is returned as Target files count. I hope this is fine. Please feel free to improve or provide alternative solution.

pkg/integrations/v4/logs/cfg_test.go Outdated Show resolved Hide resolved
pkg/integrations/v4/logs/cfg_test.go Outdated Show resolved Hide resolved
@Sivakumar3695
Copy link
Contributor Author

Please take a look at the updated warning message
Screenshot 2024-03-04 at 5 37 21 PM

Also, note the error message while reading the non-existing folder and non-matching file path.

Thank you!

@Sivakumar3695 Sivakumar3695 force-pushed the NR-228258-Logging-too-many-files-warning-msg branch from 12e51a3 to 949d800 Compare March 4, 2024 13:06
Copy link
Contributor

@jsubirat jsubirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks a lot Siva! I have made up my mind again regarding the displayed message, feel free to ignore the suggestion if you think it's already alright.

Comment on lines 178 to 179
Please note that this is a friendly warning message. If your operating system allows more than {{ .DefaultFileLimit }} file descriptors/inotify watchers
or if you already increased their maximum amount by following the above link, you can safely ignore this message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Please note that this is a friendly warning message. If your operating system allows more than {{ .DefaultFileLimit }} file descriptors/inotify watchers
or if you already increased their maximum amount by following the above link, you can safely ignore this message.
Please note that this is a friendly warning message. You can safely ignore this message if your operating system allows more than {{ .DefaultFileLimit }} file descriptors/inotify watchers
or if you have already increased their maximum amount by following the above link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This would be much more opt than the previous message. I agree with you on this one and have update the message accordingly. Nice suggestion @jsubirat

@Sivakumar3695
Copy link
Contributor Author

Here is the test screenshot with the latest message

image

Thank you!

Copy link
Contributor

@rogercoll rogercoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM, but I have a minor concern regarding the amount of lines this warning can generate. Actually, the infastructure-agent does not have any multiline log entry and the format can be changed to "json" instead of plain text. Could it happen that the logging directory has many subdirectories? If so, the warning text would be quite long. Is it required to show detailed information of each directory? Could we move the details of each subdirectory to trace level?

@rogercoll
Copy link
Contributor

@jsubirat @Sivakumar3695 Currently, the Fluent-bit infrastructure agent's supervisor auto removes temporal configuration files if there are more than 50: https://github.com/newrelic/infrastructure-agent/blob/master/pkg/integrations/v4/supervisor_fb.go#L32

Should the fb supervisor auto remove this files too?

@rogercoll
Copy link
Contributor

Related to the generated warning message, would it be correctly parsed by fluent-bit itself? Would it take into account the multiline text? Have you considered pointing to public documentation URL or local file path instead?

@Sivakumar3695
Copy link
Contributor Author

Hi @rogercoll
Thanks for your feedback. Here are my thoughts on your feedback comments:

The code LGTM, but I have a minor concern regarding the amount of lines this warning can generate. Actually, the infastructure-agent does not have any multiline log entry and the format can be changed to "json" instead of plain text. Could it happen that the logging directory has many subdirectories? If so, the warning text would be quite long. Is it required to show detailed information of each directory? Could we move the details of each subdirectory to trace level?

I will try to work on this and will come up with an alternative way to convey the same message. Moving to Json? Idk, probably, I need to check to confirm this. Currently, we print the generated fluent-bit config file in debug and I will look if the same can be done here as well.

@Sivakumar3695
Copy link
Contributor Author

Currently, the Fluent-bit infrastructure agent's supervisor auto removes temporal configuration files if there are more than 50: https://github.com/newrelic/infrastructure-agent/blob/master/pkg/integrations/v4/supervisor_fb.go#L32. Should the fb supervisor auto remove this files too?

No. These are log files of the users that they want to ship the log data to newrelic. This is not something that can't be removed by the Infra-agent. We provide the warning message to inform the user that "Too many files are opened for log forwarding" and they need to increase the number of files that can be opened by a process in an OS.

@Sivakumar3695
Copy link
Contributor Author

Related to the generated warning message, would it be correctly parsed by fluent-bit itself? Would it take into account the multiline text? Have you considered pointing to public documentation URL or local file path instead?

I will check once on whether the multi-line warning message can be correctly processed by fluent-bit. For the public doc link suggested by you, in fact, we provided a public documentation URL in the warning message itself. However, the detailed warning message is only to help users / a GTSE engineer to understand the root cause of the issue by giving insights into the folders containing too many files.

@Sivakumar3695
Copy link
Contributor Author

@jsubirat anything that you would like to add here?

@Sivakumar3695
Copy link
Contributor Author

@rogercoll Would it be fine to move the warning message to a log field as below?

cfgLogger.WithField("tooManyFilesWarning", warningMessage).Warn(
"The amount of open files targeted by your Log Forwarding configuration " +
"files exceeds the recommended maximum.")

This way, the multi-line warning message in discussion would be wrapped inside a log field similar to how we log fluent-bit configuration file here.

@jsubirat We can reduce the log level for this to Trace. WDYT?

@Sivakumar3695 Sivakumar3695 force-pushed the NR-228258-Logging-too-many-files-warning-msg branch from e16ddd4 to 197dca1 Compare March 13, 2024 11:42
@Sivakumar3695 Sivakumar3695 force-pushed the NR-228258-Logging-too-many-files-warning-msg branch from 197dca1 to 66c55d6 Compare March 13, 2024 12:28
@Sivakumar3695
Copy link
Contributor Author

Hi @rogercoll.
After our discussion, I have updated the warning message as below:

  1. Now, we have a non-multiline warning log message with log level -> Warn
Screenshot 2024-03-13 at 5 48 22 PM
  1. We moved the detailed information of each directory to individual log message in Trace level
Screenshot 2024-03-13 at 5 48 11 PM

Copy link
Contributor

@rogercoll rogercoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Sivakumar3695 ! New code changes look good to me, just added a small comment

pkg/integrations/v4/logs/cfg.go Outdated Show resolved Hide resolved
@Sivakumar3695 Sivakumar3695 force-pushed the NR-228258-Logging-too-many-files-warning-msg branch from ede93ce to e09b06a Compare March 18, 2024 04:36
Copy link
Contributor

@rogercoll rogercoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rogercoll rogercoll merged commit 745d8a4 into newrelic:master Mar 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants