Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Co-routine stack size check is failure prone #3716

Closed
ericonr opened this issue Jul 3, 2021 · 12 comments · Fixed by #4434
Closed

Co-routine stack size check is failure prone #3716

ericonr opened this issue Jul 3, 2021 · 12 comments · Fixed by #4434
Assignees
Labels

Comments

@ericonr
Copy link

ericonr commented Jul 3, 2021

Bug Report

Describe the bug

As seen in #2464, launching fluent-bit makes it spit out this error message:

Error: Invalid coroutine stack size. Aborting

This happens due to

if (config->coro_stack_size < getpagesize()) {
and
#define FLB_CORO_STACK_SIZE ((3 * PTHREAD_STACK_MIN) / 2)

musl defines PTHREAD_STACK_MIN to 2048, which means that the result of PTHREAD_STACK_MIN * 3 / 2 is 3072, which is definitely smaller than what getpagesize() returns (4096 on most platforms). I'm not sure what the best solution here is, either don't use getpagesize as the basis for the check, or use the maximum between getpagesize and PTHREAD_STACK_MIN * 3 / 2.

This also fails on platforms that support 64k pages (aarch64 and ppc64), even when using glibc. Most aarch64 machines won't be configured that way, but to my knowledge all ppc64 distros besides Void Linux use 64k pages.

To Reproduce

Simply try to launch fluent-bit when built against musl libc.

Expected behavior

It will at least launch.

Your Environment

  • Version used: built from master (305590f)
  • Operating System and version: any musl based distro
@bgdnlp
Copy link

bgdnlp commented Jul 15, 2021

The same thing happens on FreeBSD. Running just fluent-bit immediately complains that

Fluent Bit v1.7.9
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

Error: Invalid coroutine stack size. Aborting

fluent-bit --help says:

  -s, --coro_stack_size   set coroutines stack size in bytes (default: 3072)

Would it be acceptable to simply set coro_stack_size to getpagesize() if the default is smaller and it isn't otherwise configured?

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Aug 15, 2021
@ericonr
Copy link
Author

ericonr commented Aug 15, 2021

I would ask of the repo maintainers to remove the auto stale bot. I have nothing to do with upstream not answering this issue yet. If the bot pings the issue again I will just close it and give up on the bug report.

@github-actions github-actions bot removed the Stale label Aug 16, 2021
@girgen
Copy link
Contributor

girgen commented Sep 2, 2021

It seems stupid that the binary just fails to run unless you set the coro_stack_size?

I maintain the FreeBSD port, and I'm patching the source to double the stack size, in wait for a better fix.

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Sep 2, 2021
There is a known problem with the default coro_stack_size being too
small. Double it to avoid failing to start. [1]

Since the daemon option seems to be somewhat funky [2], use daemon to
wrap the fluent-bit binary when running from the rc.d script.

[1] fluent/fluent-bit#3716
PR:	255593 [2]
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 5, 2021
@bgdnlp
Copy link

bgdnlp commented Oct 5, 2021

Could someone please have a look at this issue, maybe it's not controversial and can be fixed easily?

@github-actions github-actions bot removed the Stale label Oct 6, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Nov 6, 2021
@bgdnlp
Copy link

bgdnlp commented Nov 6, 2021

Aaaand, bump

@github-actions github-actions bot removed the Stale label Nov 7, 2021
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Dec 10, 2021
@krystiannowak
Copy link

another bump

@edsiper
Copy link
Member

edsiper commented Dec 10, 2021

note: we are adjusting the auto-stale bot, quite noisy.

assigning @nokute78 to review this.

nokute78 added a commit to nokute78/fluent-bit that referenced this issue Dec 11, 2021
)

On some environment, PTHREAD_STACK_MIN is less than pagesize.
The coro stack size is less than page size and it causes aborting coro stack size error.
This patch is to ensure minimum coro stack size is greater equal pagesize.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
@nokute78
Copy link
Collaborator

Would it be acceptable to simply set coro_stack_size to getpagesize() if the default is smaller and it isn't otherwise configured?

I agree.

I sent a patch #4434 to change the default size if it is smaller than page size.
User can overwrite via config/option.

nokute78 added a commit to nokute78/fluent-bit that referenced this issue Dec 12, 2021
)

On some environment, PTHREAD_STACK_MIN is less than pagesize.
The coro stack size is less than page size and it causes aborting coro stack size error.
This patch is to ensure minimum coro stack size is greater equal pagesize.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
nokute78 added a commit to nokute78/fluent-bit that referenced this issue Dec 12, 2021
)

On some environment, PTHREAD_STACK_MIN is less than pagesize.
The coro stack size is less than page size and it causes aborting coro stack size error.
This patch is to ensure minimum coro stack size is greater equal pagesize.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
edsiper pushed a commit that referenced this issue Dec 13, 2021
…4434)

On some environment, PTHREAD_STACK_MIN is less than pagesize.
The coro stack size is less than page size and it causes aborting coro stack size error.
This patch is to ensure minimum coro stack size is greater equal pagesize.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
0Delta pushed a commit to 0Delta/fluent-bit that referenced this issue Jan 20, 2022
) (fluent#4434)

On some environment, PTHREAD_STACK_MIN is less than pagesize.
The coro stack size is less than page size and it causes aborting coro stack size error.
This patch is to ensure minimum coro stack size is greater equal pagesize.

Signed-off-by: Takahiro Yamashita <nokute78@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants