Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 0 as Retry_Limit to disable retrying #3219

Merged
merged 5 commits into from Apr 17, 2021
Merged

Conversation

nokute78
Copy link
Collaborator

Fixes #3190

This PR allows Retry_Limit 0 and it disables retrying.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • Documentation required for this feature

Example Configuration

[INPUT]
    Name dummy
    Samples 1

[OUTPUT]
    Name es
    Retry_Limit 0

Debug output

Stop Elasticsearch before testing.

$ ../bin/fluent-bit -c a.conf 
Fluent Bit v1.8.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/03/13 12:52:41] [ info] [engine] started (pid=100875)
[2021/03/13 12:52:41] [ info] [storage] version=1.1.1, initializing...
[2021/03/13 12:52:41] [ info] [storage] in-memory
[2021/03/13 12:52:41] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/03/13 12:52:41] [ info] [sp] stream processor started
[2021/03/13 12:52:46] [error] [src/flb_http_client.c:1156 errno=32] Broken pipe
[2021/03/13 12:52:46] [ warn] [output:es:es.0] http_do=-1 URI=/_bulk
[2021/03/13 12:52:46] [ info] [engine] chunk '100875-1615607562.413281173.flb' is not retried (no retry config): task_id=0, input=dummy.0 > output=es.0 (out_id=0)
^C[2021/03/13 12:53:17] [engine] caught signal (SIGINT)
[2021/03/13 12:53:17] [ warn] [engine] service will stop in 5 seconds
[2021/03/13 12:53:21] [ info] [engine] service stopped

Valgrind output

$ valgrind ../bin/fluent-bit -c a.conf 
==100878== Memcheck, a memory error detector
==100878== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==100878== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==100878== Command: ../bin/fluent-bit -c a.conf
==100878== 
Fluent Bit v1.8.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/03/13 12:53:48] [ info] [engine] started (pid=100878)
[2021/03/13 12:53:48] [ info] [storage] version=1.1.1, initializing...
[2021/03/13 12:53:48] [ info] [storage] in-memory
[2021/03/13 12:53:48] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/03/13 12:53:48] [ info] [sp] stream processor started
==100878== Warning: client switching stacks?  SP change: 0x57e48b8 --> 0x4c6bd20
==100878==          to suppress, use: --max-stackframe=12028824 or greater
==100878== Warning: client switching stacks?  SP change: 0x4c6bc98 --> 0x57e48b8
==100878==          to suppress, use: --max-stackframe=12028960 or greater
==100878== Warning: client switching stacks?  SP change: 0x57e48b8 --> 0x4c6bc98
==100878==          to suppress, use: --max-stackframe=12028960 or greater
==100878==          further instances of this message will not be shown.
[2021/03/13 12:53:52] [error] [src/flb_http_client.c:1156 errno=32] Broken pipe
[2021/03/13 12:53:52] [ warn] [output:es:es.0] http_do=-1 URI=/_bulk
[2021/03/13 12:53:52] [ info] [engine] chunk '100878-1615607628.446813184.flb' is not retried (no retry config): task_id=0, input=dummy.0 > output=es.0 (out_id=0)
^C[2021/03/13 12:54:32] [engine] caught signal (SIGINT)
[2021/03/13 12:54:32] [ warn] [engine] service will stop in 5 seconds
[2021/03/13 12:54:36] [ info] [engine] service stopped
==100878== 
==100878== HEAP SUMMARY:
==100878==     in use at exit: 0 bytes in 0 blocks
==100878==   total heap usage: 534 allocs, 534 frees, 1,652,774 bytes allocated
==100878== 
==100878== All heap blocks were freed -- no leaks are possible
==100878== 
==100878== For lists of detected and suppressed errors, rerun with: -s
==100878== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
src/flb_output.c Outdated
Comment on lines 611 to 613
else {
ins->retry_limit = 0;
ins->retry_limit = FLB_OUT_RETRY_NONE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What user input leads to this else case being run? I'm trying to follow the logic and I don't get it..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set via command line. -p "Retry_Limit="
e.g. $ ../bin/fluent-bit -i dummy -o es -p "Retry_Limit=" --sosreport

Without this patch, if we set retry_limit=0, fluent-bit will retry again and again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(By the way, I think this value should be 1 (=default value) at this else case.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(By the way, I think this value should be 1 (=default value) at this else case.)

I agree

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
I've pushed a commit to set default value 1.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
@PettitWesley
Copy link
Contributor

@edsiper Can we merge this?

@PettitWesley
Copy link
Contributor

@nokute78 Can you cherry-pick these commits onto the 1.7 branch so that we can get this change released in the next 1.7 patch release?

@nokute78
Copy link
Collaborator Author

@PettitWesley No I can't.
I've never released fluent-bit..

@PettitWesley
Copy link
Contributor

@nokute78 Yes you can! Lol I just meant open a pull request against the 1.7 branch. I can merge it for you. I can't do releases either but if we merge it then it will go out in the next release.

@nokute78
Copy link
Collaborator Author

@PettitWesley Sure.
I open new pull request #3261 for 1.7 branch.

@edsiper
Copy link
Member

edsiper commented Mar 20, 2021

@nokute78

can we use some words instead of numbers as the suggested values ?, e.g:

  • the current context per configuration variable name is retry_limit, which aims to specify "what's the limit of retries ?"

if we set "0" from a reading perspective might mean "there is no limit", so I propose the following values and behaviors:

value description / behavior
1-9 specify the maximum limit of retries that can be done. The value is any integer value greater than zero.
no_retries do not attempt to perform any retry
no_limits or false retry forever until it succeeds

my concern is the user readability of the expected behavior more than the technical implementation.

@PettitWesley comments?

@nokute78
Copy link
Collaborator Author

nokute78 commented Mar 21, 2021

@edsiper Thank you for comment. I agree. I want to wait @PettitWesley 's comment.

Or another way is to create new option.

Fluentd supports so many configuration parameters for retrying.
It means that there are so many requests for controlling retrying.
https://docs.fluentd.org/output#control-retrying

Currently retry_limit supports functionalities retry_forever and retry_max_times of Fluentd.
To extract the meaning of retry_limit may not handle the so many use cases.

@edsiper
Copy link
Member

edsiper commented Mar 21, 2021

I would avoid adding more options if the original configuration property can handle all situations

@edsiper
Copy link
Member

edsiper commented Mar 25, 2021

this PR is on hold until the logic of the PR is changed as suggested on #3219 (comment)

@edsiper
Copy link
Member

edsiper commented Mar 25, 2021

@niedbalski can you check pls why the message check is failing?

@nokute78
Copy link
Collaborator Author

can you check pls why the message check is failing?

https://github.com/fluent/fluent-bit/blob/master/.github/workflows/check-commit.yaml#L17
This pattern doesn't allow /, so the commit message
output: define unlimited/none retry configurtion constant
is invalid.

$ irb
irb(main):001:0> str = "output: define unlimited/none retry configurtion constant"
irb(main):002:0> p str.match /^[a-z\-_]+\:[ ]{0,1}[a-z]+[a-zA-Z0-9 \-\.\:_]+$/
nil
=> nil
irb(main):003:0> p str.match /^[a-z\-_]+\:[ ]{0,1}[a-z]+[a-zA-Z0-9 \/\-\.\:_]+$/
#<MatchData "output: define unlimited/none retry configurtion constant">
=> #<MatchData "output: define unlimited/none retry configurtion constant">

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
@nokute78
Copy link
Collaborator Author

I updated this PR to follow #3219 (comment)

@nokute78
Copy link
Collaborator Author

@PettitWesley Would you comment about #3219 (comment) ?
I already pushed the patch to follow that comment.

@PettitWesley
Copy link
Contributor

@nokute78 @edsiper

if we set "0" from a reading perspective might mean "there is no limit",
I would avoid adding more options if the original configuration property can handle all situations

I agree with Eduardo's reasoning, and his proposal. I'd use no_retries to mean don't retry at all- and I would use false for no limit.

@nokute78
Copy link
Collaborator Author

nokute78 commented Apr 1, 2021

@PettitWesley Thank you for comment.

@edsiper Can we merge this ?

irb(main):001:0> str = 'output: define unlimited/none retry configurtion constan
t'
irb(main):002:0> str.match /^[a-z0-9A-Z\-_\s\,\.\/]+\:[ ]{0,1}[a-zA-Z]+[a-zA-Z0-
9 \-\.\:_\#\(\)=\/\'\"\,><\+\[\]\!\*\\]+$/
=> #<MatchData "output: define unlimited/none retry configurtion constant">

@edsiper
Copy link
Member

edsiper commented Apr 10, 2021

@nokute78 we are good to go, please just do the minor changes to pass the CI

@nokute78
Copy link
Collaborator Author

nokute78 commented Apr 10, 2021

@edsiper CI status is #3219 (comment)

There are two errors, but one is not related this PR and the other should be fixed by latest regex pattern.

====

  • publish the docker images for PR failed. I think it doesn't relate this PR by log
  • Commit message check failed, but current master pattern will not fail.
irb(main):001:0> str = 'output: define unlimited/none retry configurtion constan
t'
irb(main):002:0> str.match /^[a-z0-9A-Z\-_\s\,\.\/]+\:[ ]{0,1}[a-zA-Z]+[a-zA-Z0-
9 \-\.\:_\#\(\)=\/\'\"\,><\+\[\]\!\*\\]+$/
=> #<MatchData "output: define unlimited/none retry configurtion constant">

@nokute78
Copy link
Collaborator Author

Note: Commit message regex pattern is not updated by re-run CI job.

@nokute78
Copy link
Collaborator Author

Run integration tests for PR is failed at Wait for docker build to success.
It took 1hour, it means time out happened.

However previous step Build jakejarvis/wait-action@master was success.

@niedbalski
Is it related following issue ? It reports wrong sha issue.
fountainhead/action-wait-for-check#8

@nokute78
Copy link
Collaborator Author

Only Commit Message Check failed and code owner review required.
What should I do ?

@PettitWesley
Copy link
Contributor

Looks like the check wasn't run on your latest commits: https://github.com/fluent/fluent-bit/pull/3219/checks?check_run_id=2311012858

@nokute78
Copy link
Collaborator Author

@PettitWesley Thank you for comment.
"Check Commit Message" will not be success.
Because its regex pattern is not updated by re-running CI. The pattern is different from current master...

I tested by hand and it was success #3219 (comment)

@PettitWesley
Copy link
Contributor

@PettitWesley Thank you for comment.
"Check Commit Message" will not be success.
Because its regex pattern is not updated by re-running CI. The pattern is different from current master...

I tested by hand and it was success #3219 (comment)

I think only @edsiper or @agup006 Can fix this problem

@nokute78
Copy link
Collaborator Author

@PettitWesley I think so.

@edsiper @agup006 @niedbalski What should I do?

@agup006 agup006 merged commit a9801ab into fluent:master Apr 17, 2021
@agup006
Copy link
Member

agup006 commented Apr 17, 2021

Merged for now, @niedbalski maybe we need the Commit message check to rerun on fix?

@nokute78
Copy link
Collaborator Author

I removed draft status from document repo.
fluent/fluent-bit-docs#510

@agup006 @niedbalski Thank you.

FYI: CI was succeeded when it was merged master.
https://github.com/fluent/fluent-bit/tree/a9801ab0a9e464c3098be868b52057330019120a

@nokute78 nokute78 deleted the zero_retry branch April 17, 2021 06:26
JeffLuoo pushed a commit to JeffLuoo/fluent-bit that referenced this pull request Apr 30, 2021
* output: define unlimited/none retry configurtion constant

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* engine: support no retry configuration(fluent#3190)

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* sosreport: use FLB_OUT_RETRY_UNLIMITED define

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* output: set default value when invalid retry_limit is set

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* output: support no_retries/no_limits as retry_limits value

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
DrewZhang13 pushed a commit to DrewZhang13/fluent-bit that referenced this pull request May 3, 2021
* output: define unlimited/none retry configurtion constant

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* engine: support no retry configuration(fluent#3190)

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* sosreport: use FLB_OUT_RETRY_UNLIMITED define

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* output: set default value when invalid retry_limit is set

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* output: support no_retries/no_limits as retry_limits value

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
DrewZhang13 pushed a commit to DrewZhang13/fluent-bit that referenced this pull request May 3, 2021
* output: define unlimited/none retry configurtion constant

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* engine: support no retry configuration(fluent#3190)

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* sosreport: use FLB_OUT_RETRY_UNLIMITED define

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* output: set default value when invalid retry_limit is set

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* output: support no_retries/no_limits as retry_limits value

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
edsiper pushed a commit that referenced this pull request May 13, 2021
* output: define unlimited/none retry configurtion constant

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* engine: support no retry configuration(#3190)

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* sosreport: use FLB_OUT_RETRY_UNLIMITED define

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* output: set default value when invalid retry_limit is set

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>

* output: support no_retries/no_limits as retry_limits value

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
edsiper pushed a commit that referenced this pull request May 13, 2021
* output: define unlimited/none retry configurtion constant
* engine: support no retry configuration(#3190)
* sosreport: use FLB_OUT_RETRY_UNLIMITED define
* output: set default value when invalid retry_limit is set
* output: support no_retries/no_limits as retry_limits value

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Option to Disable retries (or allow 0 for retries)
6 participants