Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoH on SLP2 is going very slowly #8445

Closed
mvines opened this issue Feb 25, 2020 · 8 comments
Closed

PoH on SLP2 is going very slowly #8445

mvines opened this issue Feb 25, 2020 · 8 comments

Comments

@mvines
Copy link
Member

@mvines mvines commented Feb 25, 2020

PoH should be running at ~2.5 slots a second, it seems to be running more a ~0.25 slots a second.

@mvines mvines added this to the Tofino v0.23.7 milestone Feb 25, 2020
@mvines

This comment has been minimized.

Copy link
Member Author

@mvines mvines commented Feb 25, 2020

I move the bootstrap validator to a colo machine, not sure that helped though

@mvines

This comment has been minimized.

Copy link
Member Author

@mvines mvines commented Feb 25, 2020

Issue reproduces if a test SLP cluster is launched with https://github.com/solana-labs/cluster

@mvines mvines added this to Needs triage in TdS Potholes via automation Feb 25, 2020
@mvines

This comment has been minimized.

Copy link
Member Author

@mvines mvines commented Feb 25, 2020

cc: #8450

@mvines mvines moved this from Needs triage to TdS Stage 1 Blockers in TdS Potholes Feb 25, 2020
@mvines

This comment has been minimized.

Copy link
Member Author

@mvines mvines commented Feb 25, 2020

Regression range is v0.23.2 - v0.23.6. Something in this window has caused PoH to slow down significantly: v0.23.2...v0.23.6

@garious

This comment has been minimized.

Copy link
Member

@garious garious commented Feb 25, 2020

@pgarg66, I recall you tweaking the PoH thread affinity. Any chance that's related?

@mvines

This comment has been minimized.

Copy link
Member Author

@mvines mvines commented Feb 26, 2020

Update: I can make v0.23.6 PoH as fast as v0.23.2 with some genesis config changes

Using the v0.23.6 release binaries:

  1. Slow PoH can be reproduced by creating a genesis config with --slots-per-epoch 432000 and no warm-up epochs.
  2. Normal PoH can be reproduced by creating a genesis config with --slots-per-epoch 8192 and no warm-up epochs.

So we have some O(slots-per-epoch) code running in the PoH hot path

@mvines

This comment has been minimized.

Copy link
Member Author

@mvines mvines commented Feb 26, 2020

v0.23.2 behaves the same as v0.23.6, so this is not a regression. The bug was triggered by me disabling warm-up epochs, making slow PoH visible right from epoch 0 instead of 1-2 weeks in when the cluster finally reaches the normal epoch length

@mvines

This comment has been minimized.

Copy link
Member Author

@mvines mvines commented Feb 26, 2020

STR on master:

  1. Apply this patch. Note that the issue reproduces with sleepy PoH too!
diff --git a/multinode-demo/setup.sh b/multinode-demo/setup.sh
index ebb8ac8d8..fe2de2ce8 100755
--- a/multinode-demo/setup.sh
+++ b/multinode-demo/setup.sh
@@ -27,7 +27,8 @@ $solana_keygen new --no-passphrase -so "$SOLANA_CONFIG_DIR"/bootstrap-validator/
 $solana_keygen new --no-passphrase -so "$SOLANA_CONFIG_DIR"/bootstrap-validator/storage-keypair.json
 
 args=("$@")
-default_arg --enable-warmup-epochs
+default_arg --slots-per-epoch 432000 # Bad
+#default_arg --slots-per-epoch 8192  # Good
 default_arg --bootstrap-validator-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/identity-keypair.json
 default_arg --bootstrap-vote-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/vote-keypair.json
 default_arg --bootstrap-stake-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/stake-keypair.json
@@ -35,6 +36,6 @@ default_arg --bootstrap-storage-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/
 default_arg --ledger "$SOLANA_CONFIG_DIR"/bootstrap-validator
 default_arg --faucet-pubkey "$SOLANA_CONFIG_DIR"/faucet-keypair.json
 default_arg --faucet-lamports 500000000000000000
-default_arg --hashes-per-tick auto
+default_arg --hashes-per-tick sleep
 default_arg --operating-mode development
 $solana_genesis "${args[@]}"
  1. Run ./multinode-demo/setup.sh && ./multinode-demo/bootstrap-validator.sh

You can easily see from standard output that slots are passing by very slowly. But another way to view the problem after the bootstrap-validator starts up is by running cargo run --bin solana -- live-slots

TdS Potholes automation moved this from TdS Stage 2 Blockers to Closed Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
TdS Potholes
  
Closed
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.