Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gadgettracermanager: Make gRPC buffer size configurable. #2181

Merged
merged 1 commit into from
Nov 30, 2023

Conversation

eiffel-fl
Copy link
Member

@eiffel-fl eiffel-fl commented Oct 26, 2023

This commit adds flags and environment values to set gRPC buffer size rather than having it hardcoded.
The default value size is set to 16384 to improve horizontal scaling.

Copy link
Member

@flyth flyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having this number configurable is a really good thing. At which point did it lead to issues? Can you please attach some data/benchmarks around it?

Also, this is about the buffer of events (in number of events, not number of bytes of event data) that we're holding before marshaling them, so it's got nothing to do at that stage with gRPC, yet (gRPC uses buffers on its own, so let's not confuse by calling this gRPC buffer (naming again, I know... hehe) - even though it eventually will get sent over gRPC). So I think eventBufferSize or eventBufferLen would be a better name.

Just fyi, sending looks something like this:

Event -> __Event Buffer__ -> Marshaling -> gRPC -> TCP send buffer -> wire

The buffer is mainly used to throw away events if the client is slow in reading the events (and thus the tcp write on the server is blocking), and it especially handles bursts at the start of a gadget run - it also kicks in if marshaling is too slow to keep up (if the CPU is under heavy load for example). It would be interesting to know if increasing the TCP write buffer would also help.

cmd/ig/daemon.go Outdated Show resolved Hide resolved
pkg/gadget-service/service.go Outdated Show resolved Hide resolved
@eiffel-fl
Copy link
Member Author

eiffel-fl commented Oct 27, 2023

Hi.

I think having this number configurable is a really good thing. At which point did it lead to issues? Can you please attach some data/benchmarks around it?

As already discussed, everything is available here:
inspektor-gadget/inspektor-gadget-scaling#2
The first set of results is with #2146, the second with 811a541.

Also, this is about the buffer of events (in number of events, not number of bytes of event data) that we're holding before marshaling them, so it's got nothing to do at that stage with gRPC, yet (gRPC uses buffers on its own, so let's not confuse by calling this gRPC buffer (naming again, I know... hehe) - even though it eventually will get sent over gRPC). So I think eventBufferSize or eventBufferLen would be a better name.

Just fyi, sending looks something like this:

Event -> __Event Buffer__ -> Marshaling -> gRPC -> TCP send buffer -> wire

The whole problem here is that since the refactoring we do not have any document and/or figures which depicts the current architecture regarding of events are sent.
All our current figures are no more up to date.

The buffer is mainly used to throw away events if the client is slow in reading the events (and thus the tcp write on the server is blocking), and it especially handles bursts at the start of a gadget run - it also kicks in if marshaling is too slow to keep up (if the CPU is under heavy load for example). It would be interesting to know if increasing the TCP write buffer would also help.

I still have a 5% difference between how many events where generated by stress-ng and what is reported by Inspektor Gadget, this could be due to the TCP buffer size.
In any case, having hardcoded values for buffer is clearly something wrong and it will clearly lead to troubles as we clearly not are in a situation of "one size fits all".

Best regards.

This commit adds flags and environment values to set events buffer length rather
than having it hardcoded.
The default value size is set to 16384 to improve horizontal scaling.

Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
Copy link
Member

@burak-ok burak-ok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side it looks good.

@eiffel-fl
Copy link
Member Author

Thank you for the review!

@eiffel-fl eiffel-fl merged commit 6ad3f44 into main Nov 30, 2023
50 checks passed
@eiffel-fl eiffel-fl deleted the francis/grpc-buffers branch November 30, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants