Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: increase coro stack size if it is less than pagesize(#3716) #4434

Merged
merged 2 commits into from Dec 13, 2021

Conversation

nokute78
Copy link
Collaborator

Fixes #3716

On some environment, PTHREAD_STACK_MIN is less than pagesize.
The coro stack size is less than page size and it causes aborting coro stack size error.
This patch is to ensure minimum coro stack size is greater equal pagesize.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • [N/A] Documentation required for this feature

Debug log

I modified FLB_CORO_STACK_SIZE is 10 and tested.

$ bin/fluent-bit -i cpu -o stdout
[2021/12/11 17:06:50] [ info] [config] change coro_stack_size 10 -> 4096 bytes
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/11 17:06:50] [ info] [engine] started (pid=36742)
[2021/12/11 17:06:50] [ info] [storage] version=1.1.5, initializing...
[2021/12/11 17:06:50] [ info] [storage] in-memory
[2021/12/11 17:06:50] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/11 17:06:50] [ info] [cmetrics] version=0.2.2
[2021/12/11 17:06:50] [ info] [sp] stream processor started
^C[2021/12/11 17:06:50] [engine] caught signal (SIGINT)
[2021/12/11 17:06:50] [ info] [input] pausing cpu.0
[0] cpu.0: [1639210010.226823507, {"cpu_p"=>4.000000, "user_p"=>3.000000, "system_p"=>1.000000, "cpu0.p_cpu"=>4.000000, "cpu0.p_user"=>3.000000, "cpu0.p_system"=>1.000000}]
[2021/12/11 17:06:50] [ warn] [engine] service will shutdown in max 5 seconds
[2021/12/11 17:06:51] [ info] [engine] service has stopped (0 pending tasks)

If user sets small coro stack size, fluent-bit stops booting.

$ bin/fluent-bit -i cpu -o stdout -s 10
[2021/12/11 17:07:56] [ info] [config] change coro_stack_size 10 -> 4096 bytes
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

Error: Invalid coroutine stack size. Aborting

Valgrind output

$ valgrind --leak-check=full bin/fluent-bit -i cpu -o stdout 
==36747== Memcheck, a memory error detector
==36747== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==36747== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==36747== Command: bin/fluent-bit -i cpu -o stdout
==36747== 
[2021/12/11 17:08:21] [ info] [config] change coro_stack_size 10 -> 4096 bytes
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/11 17:08:22] [ info] [engine] started (pid=36747)
[2021/12/11 17:08:22] [ info] [storage] version=1.1.5, initializing...
[2021/12/11 17:08:22] [ info] [storage] in-memory
[2021/12/11 17:08:22] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/11 17:08:22] [ info] [cmetrics] version=0.2.2
[2021/12/11 17:08:22] [ info] [sp] stream processor started
^C[2021/12/11 17:08:24] [engine] caught signal (SIGINT)
[2021/12/11 17:08:24] [ info] [input] pausing cpu.0
==36747== Warning: client switching stacks?  SP change: 0x57e5958 --> 0x4c7d720
==36747==          to suppress, use: --max-stackframe=11960888 or greater
==36747== Warning: client switching stacks?  SP change: 0x4c7d6c8 --> 0x57e5958
==36747==          to suppress, use: --max-stackframe=11960976 or greater
==36747== Warning: client switching stacks?  SP change: 0x57e5958 --> 0x4c7d6c8
==36747==          to suppress, use: --max-stackframe=11960976 or greater
==36747==          further instances of this message will not be shown.
[0] cpu.0: [1639210102.238438251, {"cpu_p"=>5.000000, "user_p"=>5.000000, "system_p"=>0.000000, "cpu0.p_cpu"=>5.000000, "cpu0.p_user"=>5.000000, "cpu0.p_system"=>0.000000}]
[1] cpu.0: [1639210103.227339212, {"cpu_p"=>8.000000, "user_p"=>7.000000, "system_p"=>1.000000, "cpu0.p_cpu"=>8.000000, "cpu0.p_user"=>7.000000, "cpu0.p_system"=>1.000000}]
[2] cpu.0: [1639210104.226913246, {"cpu_p"=>2.000000, "user_p"=>2.000000, "system_p"=>0.000000, "cpu0.p_cpu"=>2.000000, "cpu0.p_user"=>2.000000, "cpu0.p_system"=>0.000000}]
[2021/12/11 17:08:24] [ warn] [engine] service will shutdown in max 5 seconds
[2021/12/11 17:08:25] [ info] [engine] service has stopped (0 pending tasks)
==36747== 
==36747== HEAP SUMMARY:
==36747==     in use at exit: 0 bytes in 0 blocks
==36747==   total heap usage: 1,003 allocs, 1,003 frees, 569,159 bytes allocated
==36747== 
==36747== All heap blocks were freed -- no leaks are possible
==36747== 
==36747== For lists of detected and suppressed errors, rerun with: -s
==36747== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

src/flb_config.c Outdated
@@ -232,6 +232,11 @@ struct flb_config *flb_config_init()

/* Set default coroutines stack size */
config->coro_stack_size = FLB_CORO_STACK_SIZE;
if (config->coro_stack_size < getpagesize()) {
flb_info("[config] change coro_stack_size %u -> %u bytes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: use slightly different wording changing coro_stack_size from %u to %u bytes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
I modified the log message.

@edsiper edsiper added waiting-for-user Waiting for more information, tests or requested changes and removed docs-required labels Dec 12, 2021
)

On some environment, PTHREAD_STACK_MIN is less than pagesize.
The coro stack size is less than page size and it causes aborting coro stack size error.
This patch is to ensure minimum coro stack size is greater equal pagesize.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
@nokute78
Copy link
Collaborator Author

Debug log

[2021/12/12 16:49:56] [ info] [config] changing coro_stack_size from 10 to 4096 bytes

Full log:

$ bin/fluent-bit -i cpu -o stdout
[2021/12/12 16:49:56] [ info] [config] changing coro_stack_size from 10 to 4096 bytes
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/12 16:49:56] [ info] [engine] started (pid=10962)
[2021/12/12 16:49:56] [ info] [storage] version=1.1.5, initializing...
[2021/12/12 16:49:56] [ info] [storage] in-memory
[2021/12/12 16:49:56] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/12 16:49:56] [ info] [cmetrics] version=0.2.2
[2021/12/12 16:49:56] [ info] [sp] stream processor started
[0] cpu.0: [1639295397.196740875, {"cpu_p"=>3.000000, "user_p"=>2.000000, "system_p"=>1.000000, "cpu0.p_cpu"=>3.000000, "cpu0.p_user"=>2.000000, "cpu0.p_system"=>1.000000}]
[1] cpu.0: [1639295398.196250706, {"cpu_p"=>1.000000, "user_p"=>1.000000, "system_p"=>0.000000, "cpu0.p_cpu"=>1.000000, "cpu0.p_user"=>1.000000, "cpu0.p_system"=>0.000000}]
[2] cpu.0: [1639295399.196372439, {"cpu_p"=>1.000000, "user_p"=>1.000000, "system_p"=>0.000000, "cpu0.p_cpu"=>1.000000, "cpu0.p_user"=>1.000000, "cpu0.p_system"=>0.000000}]
[3] cpu.0: [1639295400.196297001, {"cpu_p"=>8.000000, "user_p"=>8.000000, "system_p"=>0.000000, "cpu0.p_cpu"=>8.000000, "cpu0.p_user"=>8.000000, "cpu0.p_system"=>0.000000}]
^C[2021/12/12 16:50:01] [engine] caught signal (SIGINT)
[2021/12/12 16:50:01] [ info] [input] pausing cpu.0
[0] cpu.0: [1639295401.196760056, {"cpu_p"=>0.000000, "user_p"=>0.000000, "system_p"=>0.000000, "cpu0.p_cpu"=>0.000000, "cpu0.p_user"=>0.000000, "cpu0.p_system"=>0.000000}]
[2021/12/12 16:50:01] [ warn] [engine] service will shutdown in max 5 seconds
[2021/12/12 16:50:02] [ info] [engine] service has stopped (0 pending tasks)

@edsiper edsiper merged commit 614e54d into fluent:master Dec 13, 2021
0Delta pushed a commit to 0Delta/fluent-bit that referenced this pull request Jan 20, 2022
) (fluent#4434)

On some environment, PTHREAD_STACK_MIN is less than pagesize.
The coro stack size is less than page size and it causes aborting coro stack size error.
This patch is to ensure minimum coro stack size is greater equal pagesize.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-required waiting-for-user Waiting for more information, tests or requested changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Co-routine stack size check is failure prone
2 participants