Skip to content

Commit

Permalink
satellite/overlay: configurable meaning of last_net
Browse files Browse the repository at this point in the history
Up to now, we have been implementing the DistinctIP preference with code
in two places:

 1. On check-in, the last_net is determined by taking the /24 or /64
    (in ResolveIPAndNetwork()) and we store it with the node record.
 2. On node selection, a preference parameter defines whether to return
    results that are distinct on last_net.

It can be observed that we have never yet had the need to switch from
DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the
same satellite, and we will probably never need to do so in an automated
way. It can also be observed that this arrangement makes tests more
complicated, because we often have to arrange for test nodes to have IP
addresses in different /24 networks (a particular pain on macOS).

Those two considerations, plus some pending work on the repair framework
that will make repair take last_net into consideration, motivate this
change.

With this change, in the #2 place, we will _always_ return results that
are distinct on last_net. We implement the DistinctIP preference, then,
by making the #1 place (ResolveIPAndNetwork()) more flexible. When
DistinctIP is enabled, last_net will be calculated as it was before. But
when DistinctIP is _off_, last_net can be the same as address (IP and
port). That will effectively implement !DistinctIP because every
record will have a distinct last_net already.

As a side effect, this flexibility will allow us to change the rules
about last_net construction arbitrarily. We can do tests where last_net
is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and
be able to exercise the production logic without requiring a virtual
network bridge.

This change should be safe to make without any migration code, because
all known production satellite deployments use DistinctIP, and the
associated last_net values will not change for them. They will only
change for satellites with !DistinctIP, which are mostly test
deployments that can be recreated trivially. For those satellites which
are both permanent and !DistinctIP, node selection will suddenly start
acting as though DistinctIP is enabled, until the operator runs a single
SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done
either before or after deploying software with this change.

I also assert that this will not hurt performance for production
deployments. It's true that adding the distinct requirement to node
selection makes things a little slower, but the distinct requirement is
already present for all production deployments, and they will see no
change.

Refs: #5391
Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924
  • Loading branch information
thepaul authored and Storj Robot committed Mar 9, 2023
1 parent 67ad792 commit 2522ff0
Show file tree
Hide file tree
Showing 14 changed files with 215 additions and 179 deletions.
1 change: 1 addition & 0 deletions private/testplanet/planet.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ type Config struct {
MultinodeCount int

IdentityVersion *storj.IDVersion
LastNetFunc overlay.LastNetFunc
Reconfigure Reconfigure

Name string
Expand Down
4 changes: 4 additions & 0 deletions private/testplanet/satellite.go
Original file line number Diff line number Diff line change
Expand Up @@ -533,6 +533,10 @@ func (planet *Planet) newSatellite(ctx context.Context, prefix string, index int
return nil, errs.Wrap(err)
}

if planet.config.LastNetFunc != nil {
peer.Overlay.Service.LastNetFunc = planet.config.LastNetFunc
}

err = db.Testing().TestMigrateToLatest(ctx)
if err != nil {
return nil, errs.Wrap(err)
Expand Down
4 changes: 2 additions & 2 deletions satellite/contact/endpoint.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ func (endpoint *Endpoint) CheckIn(ctx context.Context, req *pb.CheckInRequest) (
return nil, rpcstatus.Error(rpcstatus.FailedPrecondition, errCheckInIdentity.New("failed to add peer identity entry for ID: %v", err).Error())
}

resolvedIP, port, resolvedNetwork, err := overlay.ResolveIPAndNetwork(ctx, req.Address)
resolvedIP, port, resolvedNetwork, err := endpoint.service.overlay.ResolveIPAndNetwork(ctx, req.Address)
if err != nil {
endpoint.log.Info("failed to resolve IP from address", zap.String("node address", req.Address), zap.Stringer("Node ID", nodeID), zap.Error(err))
return nil, rpcstatus.Error(rpcstatus.InvalidArgument, errCheckInNetwork.New("failed to resolve IP from address: %s, err: %v", req.Address, err).Error())
Expand Down Expand Up @@ -205,7 +205,7 @@ func (endpoint *Endpoint) PingMe(ctx context.Context, req *pb.PingMeRequest) (_
Address: req.Address,
}

resolvedIP, _, _, err := overlay.ResolveIPAndNetwork(ctx, req.Address)
resolvedIP, _, _, err := endpoint.service.overlay.ResolveIPAndNetwork(ctx, req.Address)
if err != nil {
endpoint.log.Info("failed to resolve IP from address", zap.String("node address", req.Address), zap.Stringer("Node ID", nodeID), zap.Error(err))
return nil, rpcstatus.Error(rpcstatus.InvalidArgument, errCheckInNetwork.New("failed to resolve IP from address: %s, err: %v", req.Address, err).Error())
Expand Down
38 changes: 9 additions & 29 deletions satellite/nodeselection/uploadselection/state.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,7 @@ type State struct {
stats Stats
// netByID returns subnet based on storj.NodeID
netByID map[storj.NodeID]string
// nonDistinct contains selectors for non-distinct selection.
nonDistinct struct {
Reputable SelectByID
New SelectByID
}
// distinct contains selectors for distinct slection.
// distinct contains selectors for distinct selection.
distinct struct {
Reputable SelectBySubnet
New SelectBySubnet
Expand All @@ -39,9 +34,6 @@ type State struct {
type Stats struct {
New int
Reputable int

NewDistinct int
ReputableDistinct int
}

// Selector defines interface for selecting nodes.
Expand All @@ -65,18 +57,12 @@ func NewState(reputableNodes, newNodes []*Node) *State {
state.netByID[node.ID] = node.LastNet
}

state.nonDistinct.Reputable = SelectByID(reputableNodes)
state.nonDistinct.New = SelectByID(newNodes)

state.distinct.Reputable = SelectBySubnetFromNodes(reputableNodes)
state.distinct.New = SelectBySubnetFromNodes(newNodes)

state.stats = Stats{
New: state.nonDistinct.New.Count(),
Reputable: state.nonDistinct.Reputable.Count(),

NewDistinct: state.distinct.New.Count(),
ReputableDistinct: state.distinct.Reputable.Count(),
New: state.distinct.New.Count(),
Reputable: state.distinct.Reputable.Count(),
}

return state
Expand All @@ -86,7 +72,6 @@ func NewState(reputableNodes, newNodes []*Node) *State {
type Request struct {
Count int
NewFraction float64
Distinct bool
ExcludedIDs []storj.NodeID
Placement storj.PlacementConstraint
ExcludedCountryCodes []string
Expand Down Expand Up @@ -119,19 +104,14 @@ func (state *State) Select(ctx context.Context, request Request) (_ []*Node, err

criteria.Placement = request.Placement

if request.Distinct {
criteria.AutoExcludeSubnets = make(map[string]struct{})
for _, id := range request.ExcludedIDs {
if net, ok := state.netByID[id]; ok {
criteria.AutoExcludeSubnets[net] = struct{}{}
}
criteria.AutoExcludeSubnets = make(map[string]struct{})
for _, id := range request.ExcludedIDs {
if net, ok := state.netByID[id]; ok {
criteria.AutoExcludeSubnets[net] = struct{}{}
}
reputableNodes = state.distinct.Reputable
newNodes = state.distinct.New
} else {
reputableNodes = state.nonDistinct.Reputable
newNodes = state.nonDistinct.New
}
reputableNodes = state.distinct.Reputable
newNodes = state.distinct.New

// Get a random selection of new nodes out of the cache first so that if there aren't
// enough new nodes on the network, we can fall back to using reputable nodes instead.
Expand Down
114 changes: 64 additions & 50 deletions satellite/nodeselection/uploadselection/state_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,106 +16,118 @@ import (
"storj.io/storj/satellite/nodeselection/uploadselection"
)

func TestState_Select(t *testing.T) {
func TestState_SelectNonDistinct(t *testing.T) {
ctx := testcontext.New(t)
defer ctx.Cleanup()

reputableNodes := joinNodes(
createRandomNodes(2, "1.0.1"),
createRandomNodes(3, "1.0.2"),
createRandomNodes(2, "1.0.1", false),
createRandomNodes(3, "1.0.2", false),
)
newNodes := joinNodes(
createRandomNodes(2, "1.0.3"),
createRandomNodes(3, "1.0.4"),
createRandomNodes(2, "1.0.3", false),
createRandomNodes(3, "1.0.4", false),
)

state := uploadselection.NewState(reputableNodes, newNodes)
require.Equal(t, uploadselection.Stats{
New: 5,
Reputable: 5,
NewDistinct: 2,
ReputableDistinct: 2,
New: 5,
Reputable: 5,
}, state.Stats())

{ // select 5 non-distinct subnet reputable nodes
const selectCount = 5
selected, err := state.Select(ctx, uploadselection.Request{
Count: selectCount,
NewFraction: 0,
Distinct: false,
ExcludedIDs: nil,
})
require.NoError(t, err)
require.Len(t, selected, selectCount)
}

{ // select 2 distinct subnet reputable nodes
const selectCount = 2
{ // select 6 non-distinct subnet reputable and new nodes (50%)
const selectCount = 6
const newFraction = 0.5
selected, err := state.Select(ctx, uploadselection.Request{
Count: selectCount,
NewFraction: 0,
Distinct: true,
NewFraction: newFraction,
ExcludedIDs: nil,
})
require.NoError(t, err)
require.Len(t, selected, selectCount)
require.Len(t, intersectLists(selected, reputableNodes), selectCount*(1-newFraction))
require.Len(t, intersectLists(selected, newNodes), selectCount*newFraction)
}

{ // try to select 5 distinct subnet reputable nodes, but there are only two 2 in the state
const selectCount = 5
{ // select 10 distinct subnet reputable and new nodes (100%), falling back to 5 reputable
const selectCount = 10
const newFraction = 1.0
selected, err := state.Select(ctx, uploadselection.Request{
Count: selectCount,
NewFraction: 0,
Distinct: true,
NewFraction: newFraction,
ExcludedIDs: nil,
})
require.Error(t, err)
require.Len(t, selected, 2)
require.NoError(t, err)
require.Len(t, selected, selectCount)
require.Len(t, intersectLists(selected, reputableNodes), 5)
require.Len(t, intersectLists(selected, newNodes), 5)
}
}

{ // select 6 non-distinct subnet reputable and new nodes (50%)
const selectCount = 6
const newFraction = 0.5
func TestState_SelectDistinct(t *testing.T) {
ctx := testcontext.New(t)
defer ctx.Cleanup()

reputableNodes := joinNodes(
createRandomNodes(2, "1.0.1", true),
createRandomNodes(3, "1.0.2", true),
)
newNodes := joinNodes(
createRandomNodes(2, "1.0.3", true),
createRandomNodes(3, "1.0.4", true),
)

state := uploadselection.NewState(reputableNodes, newNodes)
require.Equal(t, uploadselection.Stats{
New: 2,
Reputable: 2,
}, state.Stats())

{ // select 2 distinct subnet reputable nodes
const selectCount = 2
selected, err := state.Select(ctx, uploadselection.Request{
Count: selectCount,
NewFraction: newFraction,
Distinct: false,
NewFraction: 0,
ExcludedIDs: nil,
})
require.NoError(t, err)
require.Len(t, selected, selectCount)
require.Len(t, intersectLists(selected, reputableNodes), selectCount*(1-newFraction))
require.Len(t, intersectLists(selected, newNodes), selectCount*newFraction)
}

{ // select 4 distinct subnet reputable and new nodes (50%)
const selectCount = 4
const newFraction = 0.5
{ // try to select 5 distinct subnet reputable nodes, but there are only two 2 in the state
const selectCount = 5
selected, err := state.Select(ctx, uploadselection.Request{
Count: selectCount,
NewFraction: newFraction,
Distinct: true,
NewFraction: 0,
ExcludedIDs: nil,
})
require.NoError(t, err)
require.Len(t, selected, selectCount)
require.Len(t, intersectLists(selected, reputableNodes), selectCount*(1-newFraction))
require.Len(t, intersectLists(selected, newNodes), selectCount*newFraction)
require.Error(t, err)
require.Len(t, selected, 2)
}

{ // select 10 distinct subnet reputable and new nodes (100%), falling back to 5 reputable
const selectCount = 10
const newFraction = 1.0
{ // select 4 distinct subnet reputable and new nodes (50%)
const selectCount = 4
const newFraction = 0.5
selected, err := state.Select(ctx, uploadselection.Request{
Count: selectCount,
NewFraction: newFraction,
Distinct: false,
ExcludedIDs: nil,
})
require.NoError(t, err)
require.Len(t, selected, selectCount)
require.Len(t, intersectLists(selected, reputableNodes), 5)
require.Len(t, intersectLists(selected, newNodes), 5)
require.Len(t, intersectLists(selected, reputableNodes), selectCount*(1-newFraction))
require.Len(t, intersectLists(selected, newNodes), selectCount*newFraction)
}
}

Expand All @@ -124,12 +136,12 @@ func TestState_Select_Concurrent(t *testing.T) {
defer ctx.Cleanup()

reputableNodes := joinNodes(
createRandomNodes(2, "1.0.1"),
createRandomNodes(3, "1.0.2"),
createRandomNodes(2, "1.0.1", false),
createRandomNodes(3, "1.0.2", false),
)
newNodes := joinNodes(
createRandomNodes(2, "1.0.3"),
createRandomNodes(3, "1.0.4"),
createRandomNodes(2, "1.0.3", false),
createRandomNodes(3, "1.0.4", false),
)

state := uploadselection.NewState(reputableNodes, newNodes)
Expand All @@ -140,7 +152,6 @@ func TestState_Select_Concurrent(t *testing.T) {
nodes, err := state.Select(ctx, uploadselection.Request{
Count: selectCount,
NewFraction: 0.5,
Distinct: false,
ExcludedIDs: nil,
})
require.Len(t, nodes, selectCount)
Expand All @@ -152,7 +163,6 @@ func TestState_Select_Concurrent(t *testing.T) {
nodes, err := state.Select(ctx, uploadselection.Request{
Count: selectCount,
NewFraction: 0.5,
Distinct: true,
ExcludedIDs: nil,
})
require.Len(t, nodes, selectCount)
Expand All @@ -162,7 +172,7 @@ func TestState_Select_Concurrent(t *testing.T) {
}

// createRandomNodes creates n random nodes all in the subnet.
func createRandomNodes(n int, subnet string) []*uploadselection.Node {
func createRandomNodes(n int, subnet string, shareNets bool) []*uploadselection.Node {
xs := make([]*uploadselection.Node, n)
for i := range xs {
addr := subnet + "." + strconv.Itoa(i) + ":8080"
Expand All @@ -171,9 +181,13 @@ func createRandomNodes(n int, subnet string) []*uploadselection.Node {
ID: testrand.NodeID(),
Address: addr,
},
LastNet: subnet,
LastIPPort: addr,
}
if shareNets {
xs[i].LastNet = subnet
} else {
xs[i].LastNet = addr
}
}
return xs
}
Expand Down
2 changes: 0 additions & 2 deletions satellite/overlay/benchmark_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,6 @@ func BenchmarkNodeSelection(b *testing.B) {
ExcludedNetworks: nil,
MinimumVersion: "v1.0.0",
OnlineWindow: time.Hour,
DistinctIP: false,
AsOfSystemInterval: -time.Microsecond,
}
excludedCriteria := &overlay.NodeCriteria{
Expand All @@ -299,7 +298,6 @@ func BenchmarkNodeSelection(b *testing.B) {
ExcludedNetworks: excludedNets,
MinimumVersion: "v1.0.0",
OnlineWindow: time.Hour,
DistinctIP: false,
AsOfSystemInterval: -time.Microsecond,
}

Expand Down
12 changes: 7 additions & 5 deletions satellite/overlay/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,13 @@ type AsOfSystemTimeConfig struct {
// NodeSelectionConfig is a configuration struct to determine the minimum
// values for nodes to select.
type NodeSelectionConfig struct {
NewNodeFraction float64 `help:"the fraction of new nodes allowed per request" releaseDefault:"0.05" devDefault:"1"`
MinimumVersion string `help:"the minimum node software version for node selection queries" default:""`
OnlineWindow time.Duration `help:"the amount of time without seeing a node before its considered offline" default:"4h" testDefault:"1m"`
DistinctIP bool `help:"require distinct IPs when choosing nodes for upload" releaseDefault:"true" devDefault:"false"`
MinimumDiskSpace memory.Size `help:"how much disk space a node at minimum must have to be selected for upload" default:"500.00MB" testDefault:"100.00MB"`
NewNodeFraction float64 `help:"the fraction of new nodes allowed per request" releaseDefault:"0.05" devDefault:"1"`
MinimumVersion string `help:"the minimum node software version for node selection queries" default:""`
OnlineWindow time.Duration `help:"the amount of time without seeing a node before its considered offline" default:"4h" testDefault:"1m"`
DistinctIP bool `help:"require distinct IPs when choosing nodes for upload" releaseDefault:"true" devDefault:"false"`
NetworkPrefixIPv4 int `help:"the prefix to use in determining 'network' for IPv4 addresses" default:"24" hidden:"true"`
NetworkPrefixIPv6 int `help:"the prefix to use in determining 'network' for IPv6 addresses" default:"64" hidden:"true"`
MinimumDiskSpace memory.Size `help:"how much disk space a node at minimum must have to be selected for upload" default:"500.00MB" testDefault:"100.00MB"`

AsOfSystemTime AsOfSystemTimeConfig

Expand Down

0 comments on commit 2522ff0

Please sign in to comment.