Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/e2e/txsource: periodically restart runtime nodes #3124

Merged
merged 3 commits into from
Jul 27, 2020

Conversation

ptrus
Copy link
Member

@ptrus ptrus commented Jul 22, 2020

TODO:

  • setup 2 keymanagers for the long-running test and also enable restarts there
  • setup 4 compute nodes
  • keep one of each type of nodes always running (so we can observe the memory/cpu/disk usage of that node, compared to the others)
  • revert the scripts/daily_txsource.sh change before merging

Fixes:

@codecov
Copy link

codecov bot commented Jul 22, 2020

Codecov Report

Merging #3124 into master will decrease coverage by 0.18%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3124      +/-   ##
==========================================
- Coverage   68.72%   68.53%   -0.19%     
==========================================
  Files         374      374              
  Lines       36990    37000      +10     
==========================================
- Hits        25421    25358      -63     
- Misses       8331     8396      +65     
- Partials     3238     3246       +8     
Impacted Files Coverage Δ
go/consensus/tendermint/roothash/roothash.go 73.29% <0.00%> (-0.69%) ⬇️
go/consensus/api/transaction/results/results.go 0.00% <0.00%> (-100.00%) ⬇️
go/oasis-node/cmd/common/metrics/disk.go 65.51% <0.00%> (-20.69%) ⬇️
go/consensus/tendermint/api/errors.go 86.66% <0.00%> (-13.34%) ⬇️
go/worker/compute/executor/committee/state.go 74.07% <0.00%> (-11.12%) ⬇️
go/runtime/host/sandbox/sandbox.go 66.54% <0.00%> (-10.79%) ⬇️
go/oasis-node/cmd/common/metrics/resource.go 84.00% <0.00%> (-8.00%) ⬇️
go/worker/storage/service_external.go 47.31% <0.00%> (-6.46%) ⬇️
...n/crypto/signature/signers/memory/memory_signer.go 71.42% <0.00%> (-4.77%) ⬇️
go/worker/common/committee/runtime_host.go 65.71% <0.00%> (-4.77%) ⬇️
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02a2481...f26c4bb. Read the comment docs.

@ptrus ptrus force-pushed the ptrus/feature/txsource-restart-runtime-nodes branch 7 times, most recently from 7a9bec2 to d407da1 Compare July 23, 2020 11:17
.changelog/3124.bugfix.md Outdated Show resolved Hide resolved
go/consensus/tendermint/roothash/roothash.go Outdated Show resolved Hide resolved
go/consensus/tendermint/roothash/roothash.go Outdated Show resolved Hide resolved
go/consensus/tendermint/roothash/roothash.go Outdated Show resolved Hide resolved
go/consensus/tendermint/roothash/roothash.go Outdated Show resolved Hide resolved
@kostko
Copy link
Member

kostko commented Jul 23, 2020

Nice catch!

@ptrus ptrus force-pushed the ptrus/feature/txsource-restart-runtime-nodes branch 3 times, most recently from 325b6fd to 6c52c09 Compare July 23, 2020 15:56
@ptrus ptrus marked this pull request as ready for review July 23, 2020 15:57
@ptrus ptrus force-pushed the ptrus/feature/txsource-restart-runtime-nodes branch from 6c52c09 to 6ebd10f Compare July 23, 2020 15:58
@ptrus ptrus force-pushed the ptrus/feature/txsource-restart-runtime-nodes branch from 6ebd10f to 91aecd8 Compare July 24, 2020 08:22
.changelog/3124.intenral.md Outdated Show resolved Hide resolved
@ptrus ptrus force-pushed the ptrus/feature/txsource-restart-runtime-nodes branch from 91aecd8 to 9e02474 Compare July 24, 2020 12:25
@ptrus ptrus force-pushed the ptrus/feature/txsource-restart-runtime-nodes branch from 9e02474 to ed35e5a Compare July 24, 2020 21:07
@ptrus ptrus force-pushed the ptrus/feature/txsource-restart-runtime-nodes branch from ed35e5a to f26c4bb Compare July 24, 2020 21:09
@ptrus
Copy link
Member Author

ptrus commented Jul 25, 2020

hm got a: failed to run scenario: log watcher compute-3/log: log assertion failed: timeout detected in the short-txsource test on CI: https://buildkite.com/oasisprotocol/oasis-core-ci/builds/918#2cdbef61-4395-4526-894a-bc040620f727 (where we don't do any restarts, so only change are few more nodes). Maybe would need beefier instance for the test due to increased number of nodes.

The reason seems to be that one of the merge workers were behind a bit, will merge this (since it only happened once on 10+ runs) and re-address if it occurs more.

@ptrus ptrus merged commit 24bcb82 into master Jul 27, 2020
@ptrus ptrus deleted the ptrus/feature/txsource-restart-runtime-nodes branch July 27, 2020 07:55
@kostko
Copy link
Member

kostko commented Jul 27, 2020

Based on a quick look at the logs, the timeout has to do with one of the storage nodes returning "permission denied" for some updates. Need to investigate this more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants