Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A gossip spy node will OOM if zeros are shoveled at it. #8175

Closed
mvines opened this issue Feb 8, 2020 · 1 comment · Fixed by #8328
Closed

A gossip spy node will OOM if zeros are shoveled at it. #8175

mvines opened this issue Feb 8, 2020 · 1 comment · Fixed by #8328
Assignees

Comments

@mvines
Copy link
Member

@mvines mvines commented Feb 8, 2020

STR:

  1. $ solana-gossip spy --gossip-port 1234
  2. $ dd if=/dev/zero bs=1232 > /dev/udp/127.0.0.1/1234

Within a minute or two, the spy node will be killed by the kernel.

cc: #5414

@sagar-solana

This comment has been minimized.

Copy link
Contributor

@sagar-solana sagar-solana commented Feb 11, 2020

I spent some time debugging this.

At first it looked like the calls to "verify" the incoming gossip messages were causing the recycler to blow up. After further inspection, it became clear that if the channel's consumer take even a few nanoseconds the socket receiver is able to collect items at a much higher rate.
This leads to the Recycler making new allocations instead of recycling old ones.

It is not a memory leak, just a resource exhaustion where the channel's consumer isn't fast enough.

Suggested fix:
In Gossip, perform a greedy receive on the receiver. If the number of items exceeds some computed limit (number of nodes * expected messages per node * expected messages per second) use stakes to drop lower staked messages. If no stakes are known, drop items at random.

While that approach seems simple enough, it needs to be efficient otherwise there will be no improvement. Figuring out the stakes and which ones are "lower" will require 2 passes over the incoming packets or will need cache some of the data possibly increasing memory consumption (by a negligible amount).

@mvines mvines modified the milestones: Tofino v0.23.3, Tofino v0.23.4 Feb 12, 2020
@mvines mvines modified the milestones: Tofino v0.23.5, Tofino v0.23.6 Feb 15, 2020
@mvines mvines added this to Needs triage in TdS Potholes via automation Feb 24, 2020
@mvines mvines moved this from Needs triage to Closed in TdS Potholes Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
TdS Potholes
  
Closed
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.