New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flb_utils: remove false positive error 'file not found'. #7806
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hold this until tomorrow please, it seems to me that it actually introduces an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove the flb_errno
in the fread
error code pathbecause there are scenarios where a meaningful value for errno
is not to be expected so we should probably improve that check instead of assuming that 0
means trouble. We should also add an flb_errno
call in the fopen
failure code path and a result check for that fileno
call but I think we can safely ignore the result of fclose
.
In any case, I can't really see the benefit of removing the flb_calloc
flb_errno
call because flb_calloc
doesn't do it and adding those to the client code can actually print erroneous information and cause issues (eg. flb_utils_get_machine_id
fails because RAND_bytes
failed or the flb_malloc
call in flb_utils_get_machine_id
which already calls flb_errno
so we repeat the same message).
Could you please explain the rationale behind this change so we are on the same page?
The problem is that This PR simply removes the code that reports the error message, which is reported by the |
This will definitely affect other use cases, if all you care about is silencing that error message selectively then add a flag to that function named |
The only two places that function are called have been accounted for. We really should decide where we log error messages. If we log them immediately always it can cause spurious messages, like this specific case. This one also harkens back to the "[lib] Backend failed" error message which has caused nothing but confusion. Adding the Another idea would be to restore the |
That type of behavior change has to be agreed upon so let's not make changes in a rush. Please revert that change and make it an explicit opt-in. Edit: to address your additions, reprioritizing is an option, still, it would mean either reprioritizing |
I'm not against |
Most of the times flb_errno are called are of little to no value. It reports the compiled file name where it is invoked, which is confusing for ENOENT, it has little to no context, etc...
As much as I hate to say it, that is frightfully true. I would not be surprised to hear of more than a few spacebar heater workflows based on how There are approximately 1092 instances of ~/Projects/work/flb/fluent-bit/ [master] grep -r flb_errno src plugins | wc -l
1092 About half of those are in the plugins: ~/Projects/work/flb/fluent-bit/ [pwhelan-fix-flb_utils_false_error*] grep -r flb_errno plugins | wc -l
665 That would at least somewhat validate my approach of moving the policy of reporting the error out of |
I think improving the logging paradigm is a project that we haven't yet started so I'm pretty sure your participation would be appreciated. In regards to this change, sadly the argument doesn't convince me. |
More than a few plugins already report flb_errno themselves after calling flb APIs. I can include some examples: wildcard->key = flb_strndup(kv->val, flb_sds_len(kv->val));
if (wildcard->key == NULL) {
flb_errno();
flb_free(wildcard);
return -1;
} /* flb_sds_printf allocs if the incoming sds is not at least 64 bytes */
hash_str = flb_sds_create_size(64);
if (!hash_str) {
flb_errno();
return NULL;
} label_key = flb_sds_create_len(val->via.str.ptr, val->via.str.size);
if (label_key == NULL) {
flb_errno();
return -1;
} Here is a case of an API calling an API and logging tmp = flb_realloc(state->tokens, new_size);
if (!tmp) {
flb_errno();
return -1;
} And src/flb_network.c seems to always report with At the moment it looks to me like the outlier, but I have not checked every single instead inside src to see which ones apply to the runtime or the API (which is made more difficult by the fact that the separation between them is subjective, at best). It is a toss up between functions reporting it themselves or the functions that call them reporting it, but with some precedence favoring reporting it from the API itself, but no precedence for some magical I personally do not favor adding a magical flag since it seems to me like some weird special case that is likely to amount to future technical debt. It pretty much just makes sense if there are concrete plans to actually remove it in the mid-term future. |
But you do understand that the difference here is that this is a meta operation and it's clear that you can't expect errno to be valid after the call right? because that's what I pointed out in my first message, this change is not reliable. As for these lines you quoted, the client code is wrong in half of them and the others are just very thin wrappers on top of lower level functions (with So please, let's just stop arguing about something as meaningless as this, unlike we just demote flb_errno, that flag is the only one of the presented options that doesn't swap one corner case for another and a even though a refactor of the logging approach is something that we need to do it's not something that's appropriate for this PR. |
Signed-off-by: Phillip Whelan <phil@calyptia.com>
28d3651
to
a017736
Compare
I rewrote the solution to use It only checks them with F_OK and not R_OK on purpose to not avoid situations where permissions are denied so that that situation can get logged. This solution should not introduce any corner cases or technical debt and localizes the changes to the single function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Signed-off-by: Phillip Whelan <phil@calyptia.com>
Summary
When retrieving the machine_id for agents, and in some other rare cases, the following error can be spuriously reported:
This pushes out the error log so it is not reported when not necessary.
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.