Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump gRPC max tx/rx to 100MB for ingester and distributor #149

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

amckinley
Copy link
Contributor

This just changes the send and receive limits to match what was configured in the query-frontend (and changes the definition of 100MB from 100 << 20 to 1024 * 1024 * 100. My logs are full of ResourceExhausted desc = trying to send message larger than max, and I'm guessing it's just an oversight that the ingester and distributor didn't get their default limits bumped to match.

@amckinley amckinley requested a review from a team as a code owner July 29, 2020 00:40
@pracucci
Copy link
Collaborator

Thanks @amckinley for opening this PR and raising the discussion around gRPC message size limit. The current config is not an oversight, but we're aware of cases when the limit can be hit. Let me do a step back.

Cortex internally uses gRPC to communicate between different services. Different services use gRPC to transfer different type of data; for some communication we use gRPC streaming (which suffers less this issue) and for other we don't.

In your setup you can increase the limits as a quick workaround, but I don't think it's wise to raise all the limits to 100MB by default. On the contrary, we should understand which channel and why reach the limit. If this happens between the ingester and querier when running the blocks storage, then it's a known issue we want to work on (cortexproject/cortex#2945) so my suggestion would be to override it in your setup but not change the default here. If it happens anywhere else, please us know where so we can further investigate it.

@amckinley
Copy link
Contributor Author

Hi @pracucci, what is the purpose of these limits? It doesn't look like Cortex is capable of "chunking" any of the data it returns, so hitting these limits just causes hard failures. In my deployment, I've been forced to just keep increasing these limits every time I find a new Grafana dashboard that refuses to render because of max gRPC size. Most recently we hit grpc: trying to send message larger than max (219597294 vs. 104857600) when trying to render a dashboard that parameterizes on k8s namespace, in a cluster where we have ~1000 unique namespaces. Wouldn't it be better to leave these limits uncapped everywhere?

Base automatically changed from master to main March 3, 2021 14:44
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants