engine: Add tests for initialization failure #2238

fyrchik · 2023-02-06T10:37:38Z

Fix bugs along the way.
Invalid mode on a directory is an approximation of missing media.
We expect any Open/Init errors to be logged and a shard to be disabled.

codecov · 2023-02-06T10:58:29Z

Codecov Report

Merging #2238 (5fec25c) into support/v0.35 (e3f1804) will increase coverage by 0.10%.
The diff coverage is 64.10%.

@@                Coverage Diff                @@
##           support/v0.35    #2238      +/-   ##
=================================================
+ Coverage          30.88%   30.99%   +0.10%     
=================================================
  Files                383      383              
  Lines              28395    28419      +24     
=================================================
+ Hits                8770     8808      +38     
+ Misses             18878    18870       -8     
+ Partials             747      741       -6

Impacted Files	Coverage Δ
cmd/neofs-node/config.go	`0.00% <0.00%> (ø)`
pkg/local_object_storage/shard/control.go	`76.19% <50.00%> (-0.32%)`	⬇️
pkg/local_object_storage/engine/control.go	`84.02% <85.18%> (+11.83%)`	⬆️
pkg/local_object_storage/engine/shards.go	`70.50% <0.00%> (+2.87%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

pkg/local_object_storage/engine/control.go

carpawell · 2023-02-06T11:13:51Z

cmd/neofs-node/config.go

+			c.log.Info("shard attached to engine", zap.Stringer("id", id))
+		}
+	}
+	if shardsAttached == 0 {


is it a discussed behavior? i mean could a node starting with just an error in the logs but without some planned shards be expected by an admin?

Currently we are still dropping shards in Init , so this behaviour is not new.
In our model shard is an (almost) independent domain of failure. I am mostly thinking about automatic node restart after hardware failures. We can discuss this in future, I am not sure what the expected behaviour here is: for 11 shards repetitve config manipulations is a laborious and error-prone task.

Or do you mean that we could start it in a degraded mode if we cannot write ID to the metabase?

The biggest problem of this approach is that you can lose shards (and potentially data) on benign misconfiguration. This will happen.

so this behaviour is not new

but we had fatalOnErr(err) previously if a shard has not been attached. as i understand, that was very different. losing shards without some huge "you are losing shards" banner bothers me but i dont have any good ideas of how to solve that right now

In our model shard is an (almost) independent domain of failure.

also, can't fully agree with this. one shard could contain Locks, Tombstones that protect/remove data from another so losing one shard is not just losing some objects. but that is our general problem

roman-khimov · 2023-02-06T14:39:06Z

pkg/local_object_storage/engine/control.go

-	e.mtx.RLock()
-	defer e.mtx.RUnlock()
+	e.mtx.Lock()
+	defer e.mtx.Unlock()


Why do you duplicate this much code from Init()?

Could you elaborate?
This line is here because we can remove shards in Open after this PR.

It's not about this particular line, I just had to attach this comment somewhere. There is quite some duplication between open and Init now it seems.

Oh, I see.
I've thought about it here, though, I am not sure it will be simpler with some ad-hoc parrallelizing function which accepts functions as arguments.

roman-khimov · 2023-02-06T14:41:40Z

cmd/neofs-node/config.go

+			c.log.Info("shard attached to engine", zap.Stringer("id", id))
+		}
+	}
+	if shardsAttached == 0 {


The biggest problem of this approach is that you can lose shards (and potentially data) on benign misconfiguration. This will happen.

fyrchik · 2023-02-06T15:10:56Z

The biggest problem of this approach is that you can lose shards (and potentially data) on benign misconfiguration. This will happen.

I agree. But the alternative here is lose all vs lose some data.
And to be clear, we don't lose something here, just make it possibly unavailable.

carpawell

I would add changes that affect node's actions to the CHANGELOG.

1. Both could initialize shards in parallel. 2. Both should close shards after an error. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>

…rrors Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>

carpawell reviewed Feb 6, 2023

View reviewed changes

roman-khimov reviewed Feb 6, 2023

View reviewed changes

carpawell previously approved these changes Feb 6, 2023

View reviewed changes

fyrchik added 4 commits February 6, 2023 19:24

[nspcc-dev#2238] engine: Make Open and Init similar

d1848f4

1. Both could initialize shards in parallel. 2. Both should close shards after an error. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>

[nspcc-dev#2238] shard: Try closing all components

0a18a82

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>

[nspcc-dev#2238] neofs-node: Gracefully handle shard initialization e…

f469d98

…rrors Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>

[nspcc-dev#2238] engine: Add test for component initialization failures

5fec25c

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>

fyrchik merged commit f92c14f into nspcc-dev:support/v0.35 Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engine: Add tests for initialization failure #2238

engine: Add tests for initialization failure #2238

fyrchik commented Feb 6, 2023

codecov bot commented Feb 6, 2023 •

edited

Loading

carpawell Feb 6, 2023 •

edited

Loading

fyrchik Feb 6, 2023

fyrchik Feb 6, 2023

roman-khimov Feb 6, 2023

carpawell Feb 6, 2023 •

edited

Loading

carpawell Feb 6, 2023

roman-khimov Feb 6, 2023

fyrchik Feb 6, 2023

roman-khimov Feb 6, 2023

fyrchik Feb 6, 2023

roman-khimov Feb 6, 2023

fyrchik commented Feb 6, 2023 •

edited

Loading

carpawell left a comment

engine: Add tests for initialization failure #2238

engine: Add tests for initialization failure #2238

Conversation

fyrchik commented Feb 6, 2023

codecov bot commented Feb 6, 2023 • edited Loading

Codecov Report

carpawell Feb 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carpawell Feb 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fyrchik commented Feb 6, 2023 • edited Loading

carpawell left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 6, 2023 •

edited

Loading

carpawell Feb 6, 2023 •

edited

Loading

carpawell Feb 6, 2023 •

edited

Loading

fyrchik commented Feb 6, 2023 •

edited

Loading