Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to retune % of data going to unvetted nodes #6011

Closed
thepaul opened this issue Jul 1, 2023 · 12 comments
Closed

Need to retune % of data going to unvetted nodes #6011

thepaul opened this issue Jul 1, 2023 · 12 comments
Assignees
Labels
Bug Something isn't working

Comments

@thepaul
Copy link
Contributor

thepaul commented Jul 1, 2023

Description

Currently, in node selection, we select 5% of the nodes from the pool of unvetted nodes. This was meant to limit how much data went to unvetted nodes. However, (somewhat) recent changes have made it so that nodes can be vetted significantly faster than before, leaving the pool of vetted nodes much smaller. This, in turn, leads to unvetted nodes getting a larger share of data than they would have gotten when we first tuned the 5% value.

There are some pretty dramatic bandwidth screenshots illustrating this effect at https://forum.storj.io/t/unvetted-vetted-node-traffic/23086. These might not be reflective of the average node experience, though; a small number of unvetted nodes sharing a /24 network with many vetted nodes would see this bandwidth effect multiplied, and that may be what's happening there.

Possible Fix

We should lower the 5% parameter. How much we should lower it is not totally clear to me after researching the historical fraction of unvetted nodes, but we should probably lower it to be at most 2%.

Distinct last_nets with unvetted nodes currently make up about 2.8% of all distinct last_nets, so we probably want the new value lower than that, at least.

The parameter is overlay.node.new-node-fraction in satellite config.

@thepaul thepaul added the Bug Something isn't working label Jul 1, 2023
@thepaul thepaul self-assigned this Jul 1, 2023
@storjrobot
Copy link

This issue has been mentioned on Storj Community Forum (official). There might be relevant details there:

https://forum.storj.io/t/unvetted-vetted-node-traffic/23086/5

@thepaul
Copy link
Contributor Author

thepaul commented Jul 1, 2023

Thinking more about the multiplicative effect where unvetted nodes share a last_net with vetted nodes. The last_net for the unvetted nodes effectively gets counted as a separate network from the last_net of the vetted nodes, which is why it can make such a difference.

We might need to do a bigger refactor of node selection code in order to mitigate that effect, so it doesn't become an incentive to keep lots of nodes unvetted.

@ReneSmeekes
Copy link
Contributor

Suggestion: Create a satellite setting to set the % of normal node ingress an unvetted node should get. Calculate the number of pieces to go to unvetted nodes by: (nUnvettedNodes / nTotalNodes) * unvettedTraffic% * nPiecesToSelect
This won't usually be a whole number. In fact, in the future this will likely be below 1. To resolve that, round that number down and generate a random number between 0 and 1. If that number is below the remainder, add 1 to the rounded down number of unvetted nodes to select.

@storjrobot
Copy link

This issue has been mentioned on Storj Community Forum (official). There might be relevant details there:

https://forum.storj.io/t/outrageous-upload-from-some-nodes/23937/4

@thepaul
Copy link
Contributor Author

thepaul commented Dec 4, 2023

We're currently evaluating whether we still even want to guarantee a percentage of data to new nodes, and what changes to the filter DSL would need to be made to avoid the traffic-multiplier effect.

Putting this back in Todo until we resolve those questions.

@shaupt131
Copy link

@shaupt131 to add this issue to arch review agenda to have a broader discussion and come to some decisions.

@shaupt131
Copy link

Decided during arch review to move forward with 1%.

@storjrobot
Copy link

This issue has been mentioned on Storj Community Forum (official). There might be relevant details there:

https://forum.storj.io/t/it-looks-like-a-ssd-is-much-faster-full-and-vetted-than-an-hdd/24818/11

@storj-gerrit
Copy link

storj-gerrit bot commented Jan 11, 2024

Change satellite/overlay: change % of data going to new nodes from 5 to 1 mentions this issue.

@storj-gerrit
Copy link

storj-gerrit bot commented Jan 18, 2024

Change satellite/overlay: change % of data going to new nodes from 5 to 1 mentions this issue.

storjBuildBot pushed a commit that referenced this issue Jan 22, 2024
This parameter is marked deprecated, but as far as I can tell it is
still the place to make the change we want, until this concern is
translated to placement definitions.

Refs: #6011
Change-Id: Iafa7d58e58429dd961d8cd1405fb258ddc398e08
ihaid pushed a commit that referenced this issue Jan 30, 2024
This parameter is marked deprecated, but as far as I can tell it is
still the place to make the change we want, until this concern is
translated to placement definitions.

Refs: #6011
Change-Id: Iafa7d58e58429dd961d8cd1405fb258ddc398e08
@iglesiasbrandon
Copy link
Contributor

moving to deployed.

@storjrobot
Copy link

This issue has been mentioned on Storj Community Forum (official). There might be relevant details there:

https://forum.storj.io/t/no-audit-traffic-and-no-repair-traffic/25425/43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
Status: Done/Deployed
Development

No branches or pull requests

5 participants