Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync broken on release-0.10 with segwit enabled #557

Closed
ihavenoface opened this issue Mar 11, 2021 · 29 comments
Closed

Sync broken on release-0.10 with segwit enabled #557

ihavenoface opened this issue Mar 11, 2021 · 29 comments
Assignees
Labels
Milestone

Comments

@ihavenoface
Copy link
Contributor

Just copy-pasting from discord here.

It happens on testnet, and I'm guessing main net as well.
The sequence I'm trying goes like so (this is all clean slate, with fresh profile dirs);

  • start one node
  • exec getnewaddress
  • exec generatetoaddress 1 <address> (this can be any number of blocks)
  • start a second node and connect it to the first one
  • get this in debug on the first node:
2021-03-11T15:01:15Z received getdata (1 invsz) peer=0
2021-03-11T15:01:15Z received getdata for: witness-block 17821b487854a6930811b560bc3ca3d6d1b59e8363f6d53f144e3b2e05677cac peer=0
  • get this in debug on the second node (or any other nodes connecting later on):
2021-03-11T15:01:15Z Ignoring getheaders from peer=0 because node is in initial block download
2021-03-11T15:01:15Z received: headers (87 bytes) peer=0
2021-03-11T15:01:15Z Synchronizing blockheaders, height: 1 (~100.00%)
2021-03-11T15:01:15Z Requesting block 17821b487854a6930811b560bc3ca3d6d1b59e8363f6d53f144e3b2e05677cac (1) peer=0
2021-03-11T15:01:15Z sending getdata (37 bytes) peer=0
2021-03-11T15:01:15Z received: block (171 bytes) peer=0
2021-03-11T15:01:15Z ProcessMessages(block, 171 bytes): Exception 'CDataStream::read(): end of data: iostream error' (NSt8ios_base7failureB5cxx11E) caught
2021-03-11T15:01:15Z ProcessMessages(block, 171 bytes) FAILED peer=0

Somewhat amusingly; when running the nodes at the same time this will only occur once every other reset.
But as soon as you close the first peer and re-start it with a fresh profile, it starts failing again.

Now given that this seems to be vaguely related to segwit I have tried this patch:

diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 0d98b5c67..6d50cfff7 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -2062,7 +2062,7 @@ bool ProcessMessage(CNode* pfrom, const std::string& msg_type, CDataStream& vRec
         if((nServices & NODE_WITNESS))
         {
             LOCK(cs_main);
-            State(pfrom->GetId())->fHaveWitness = true;
+            State(pfrom->GetId())->fHaveWitness = false;
         }
 
         // Potentially mark this peer as a preferred download peer.
diff --git a/src/validation.cpp b/src/validation.cpp
index 773e00266..bbed352e8 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -3197,6 +3197,7 @@ bool CheckBlock(const CBlock& block, BlockValidationState& state, const Consensu
 
 bool IsWitnessEnabled(const CBlockIndex* pindexPrev, const Consensus::Params& params)
 {
+    return false;
     return pindexPrev ? IsBTC16BIPsEnabled(pindexPrev->nTime) : false; // pindexPrev == null on genesis block
 }
 
@@ -3385,7 +3386,7 @@ static bool ContextualCheckBlock(const CBlock& block, BlockValidationState& stat
             if (memcmp(hashWitness.begin(), &block.vtx[0]->vout[commitpos].scriptPubKey[6], 32)) {
                 return state.Invalid(BlockValidationResult::BLOCK_MUTATED, "bad-witness-merkle-match", strprintf("%s : witness merkle commitment mismatch", __func__));
             }
-            fHaveWitness = true;
+            fHaveWitness = false;
         }
     }
 

That makes it work, but it seems to be like a bad idea to just disable segwit for what appears to be a minor networking issue.

This same sequence does not fail on the release-0.9 branch.

@peerchemist peerchemist added this to the v0.10 milestone Mar 11, 2021
@backpacker69
Copy link
Member

can you please check on the latest 0.10?

@ihavenoface
Copy link
Contributor Author

IIRC I tested with DEFAULT_ADDRESS_TYPE{OutputType::LEGACY} and it did not work, but yes I'll try again, just to make sure.

@backpacker69
Copy link
Member

i think issue only affected testnet and case when chain is recreated and synched using headers only sync, so the fix of copying nFlags to blockindex from header could have effect on the subject issue.

@ihavenoface
Copy link
Contributor Author

Testing on the latest commit gives the same issue described above.

There appears to be a small chance a block gets through to the connecting node, but this results in this error being logged

2021-03-27T12:34:17Z Requesting block 7db91a16b4f6280e436c286d9471a6ed38c7ec07a1cf50af8b552fceb5e811e7 (1) peer=0
2021-03-27T12:34:17Z sending getdata (37 bytes) peer=0
2021-03-27T12:34:17Z received: block (174 bytes) peer=0
2021-03-27T12:34:17Z received block 7db91a16b4f6280e436c286d9471a6ed38c7ec07a1cf50af8b552fceb5e811e7 peer=0
2021-03-27T12:34:17Z BlockChecked: block hash=7db91a16b4f6280e436c286d9471a6ed38c7ec07a1cf50af8b552fceb5e811e7 state=bad-txnmrklroot, hashMerkleRoot mismatch
2021-03-27T12:34:17Z Misbehaving: seed peer=0 (3436900 -> 3437000)
2021-03-27T12:34:17Z ERROR: ProcessNewBlock: AcceptBlock FAILED (bad-txnmrklroot, hashMerkleRoot mismatch)

backpacker69 pushed a commit that referenced this issue Mar 28, 2021
b19c000063 Merge #607: Use size_t shifts when computing a size_t
4d01bc2d9c Merge #606: travis: Remove unused sudo:false
e6d01e9347 Use size_t shifts when computing a size_t
7667532bd7 travis: Remove unused sudo:false
ee99f12f3d Merge #599: Switch x86_64 asm to use "i" instead of "n" for immediate values.
d58bc93f2c Switch x86_64 asm to use "i" instead of "n" for immediate values.
05362ee042 Merge #597: Add $(COMMON_LIB) to exhaustive tests to fix ARM asm build
83483869ac Add $(COMMON_LIB) to exhaustive tests to fix ARM asm build
aa15154a48 Merge #568: Fix integer overflow in ecmult_multi_var when n is large
2277af5ff0 Fix integer overflow in ecmult_multi_var when n is large
85d0e1bcce Merge #591: Make bench_internal obey secp256k1_fe_sqrt's contract wrt aliasing.
14196379ec Merge #580: Add trivial ecmult_multi algorithm which does not require a scratch space
a697d82da9 Add trivial ecmult_multi to the benchmark tool
bade617417 Add trivial ecmult_multi algorithm. It is selected when no scratch space is given and just multiplies and adds the points.
5545e13dea Merge #584: configure: Use CFLAGS_FOR_BUILD when checking native compiler
20c5869df2 Merge #516: improvements to random seed in src/tests.c
b76e45d5d6 Make bench_internal obey secp256k1_fe_sqrt's contract wrt aliasing.
870a977644 Merge #562: Make use of TAG_PUBKEY constants in secp256k1_eckey_pubkey_parse
be40c4d0b5 Fixup for C90 mixed declarations.
c71dd2c08f Merge #509: Fix algorithm selection in bench_ecmult
6492bf88cc Merge #518: Summarize build options after running configure
0e9ada1941 Merge #567: Correct order of libs returned on pkg-config --libs --static libsecp2…
e96901a4b9 Merge #587: Make randomization of a non-signing context a noop
58df8d03ad Merge #511: Portability fix for the configure scripts generated
2ebdad772a Merge #552: Make constants static:
1c131affd3 Merge #551: secp256k1_fe_sqrt: Verify that the arguments don't alias.
ba698f883b Merge #539: Assorted minor corrections
949e85b009 Merge #550: Optimize secp256k1_fe_normalize_weak calls.
a34bcaadf1 Actually pass CFLAGS_FOR_BUILD and LDFLAGS_FOR_BUILD to linker
2d5f4cebdc configure: Use CFLAGS_FOR_BUILD when checking native compiler
b408c6a8b2 Merge #579: Use __GNUC_PREREQ for detecting __builtin_expect
6198375218 Make randomization of a non-signing context a noop
c663397f46 Use __GNUC_PREREQ for detecting __builtin_expect
e34ceb333b Merge #557: Eliminate scratch memory used when generating contexts
b3bf5f99a3 ecmult_impl: expand comment to explain how effective affine interacts with everything
efa783f8f0 Store z-ratios in the 'x' coord they'll recover
ffd3b346fe add `secp256k1_ge_set_all_gej_var` test which deals with many infinite points
84740acd2a ecmult_impl: save one fe_inv_var
47045270fa ecmult_impl: eliminate scratch memory used when generating context
7f7a2ed3a8 ecmult_gen_impl: eliminate scratch memory used when generating context
314a61d724 Merge #553: add static context object which has no capabilities
89a20a8945 Correct order of libs returned on pkg-config --libs --static libsecp256k1 call.
1086fda4c1 Merge #354: [ECDH API change] Support custom hash function
d3cb1f95eb Make use of TAG_PUBKEY constants in secp256k1_eckey_pubkey_parse
40fde611bd prevent attempts to modify `secp256k1_context_no_precomp`
ed7c08417a add static context object which has no capabilities
496c5b43b8 Make constants static: static const secp256k1_ge secp256k1_ge_const_g; static const int CURVE_B;
bf8b86cc07 secp256k1_fe_sqrt: Verify that the arguments don't alias.
9bd89c836b Optimize secp256k1_fe_normalize_weak calls. Move secp256k1_fe_normalize_weak calls out of ECMULT_TABLE_GET_GE and ECMULT_TABLE_GET_GE_STORAGE and into secp256k1_ge_globalz_set_table_gej instead.
52ab96fedb clean dependendies in field_*_impl.h
deff5edd42 Correct math typos in field_*.h
4efb3f8dd1 Add check that restrict pointers don't alias with all parameters.
1e6f1f5ad5 Merge #529: fix tests.c in the count == 0 case
c8fbc3c397 [ECDH API change] Allow pass arbitrary data to hash function
b00be65056 [ECDH API change] Support custom hash function
95e99f196f fix tests.c in the count == 0 case
452d8e4d2a Merge #523: scratch: add stack frame support
6fe50439ae scratch: add stack frame support
9bc2e26502 Merge #522: parameterize ecmult_const over input size
7c1b91ba4b parameterize ecmult_const over input size
dbc3ddd5e2 Merge #513: Increase sparsity of pippenger fixed window naf representation
3965027c81 Summarize build options in configure script
0f0517369c Fix algorithm selection in bench_ecmult
fb9271dcf0 Merge #510: add a couple missing `const`s to ecmult_pippenger_wnaf
cd5f6028e5 Merge #515: Fix typo
09146ae854 Merge #512: secp256k1_ec_privkey_negate - fix documentation
ec0a7b3ae3 Don't touch leading zeros in wnaf_fixed.
9e36d1bfe2 Fix bug in wnaf_fixed where the wnaf array is not completely zeroed when given a 0 scalar.
96f68a0afc Don't invert scalar in wnaf_fixed when it is even because a caller might intentionally give a scalar with many leading zeros.
8b3841c91d fix bug in fread() failure check
cddef0c0be tests: add warning message when /dev/urandom fails
9b7c47a21e Fix typo
6dbb007869 Increase sparsity of pippenger fixed window naf representation
1646ace4d5 secp256k1_ec_privkey_negate - fix documentation
270f6c80db Portability fix for the configure scripts generated
9b3ff0309d add a couple missing `const`s to ecmult_pippenger_wnaf
cd329dbc3e Merge #460: [build] Update ax_jni_include_dir.m4 macro
7f9c1a1565 Merge #498: tests: Avoid calling fclose(...) with an invalid argument
f99aa8d4d3 Merge #499: tests: Make sure we get the requested number of bytes from /dev/urandom
b549d3d5f7 Merge #472: [build] Set --enable-jni to no by default instead of auto.
d333521516 Merge #494: Support OpenSSL versions >= 1.1 for ENABLE_OPENSSL_TESTS
2ef8ea5d21 Merge #495: Add bench_ecmult to .gitignore
82a96e4587 tests: Make sure we get the requested number of bytes from /dev/urandom
5aae5b5bb2 Avoid calling fclose(...) with an invalid argument
cb32940df3 Add bench_ecmult to .gitignore
31abd3ab8d Support OpenSSL versions >= 1.1 for ENABLE_OPENSSL_TESTS
c95f6f1360 Merge #487: fix tests typo, s/changed/unchanged
fb46c83881 Merge #463: Reduce usage of hardcoded size constants
02f5001dfc Merge #490: Disambiguate bench functions and types
1f46d6089e Disambiguate bench functions and types
f54c6c5083 Merge #480: Enable benchmark building by default
c77fc08597 Merge #486: Add pippenger_wnaf for multi-multiplication
d2f9c6b5dc Use more precise pippenger bucket windows
4c950bbeaf Save some additions per window in _pippenger_wnaf
a58f543f5a Add flags for choosing algorithm in ecmult_multi benchmark
36b22c9337 Use scratch space dependent batching in ecmult_multi
355a38f113 Add pippenger_wnaf ecmult_multi
bc65aa794e Add bench_ecmult
dba5471b69 Add ecmult_multi tests
8c1c831bdb Generalize Strauss to support multiple points
548de42ecf add resizeable scratch space API
0e96cdc6b6 fix typo, s/changed/unchanged
c7680e570f Reduce usage of hardcoded size constants
6ad5cdb42a Merge #479: Get rid of reserved _t in type names
7a78f60598 Print whether we're building benchmarks
4afec9f1ae Build benchmarks by default
d1dc9dfc0a Get rid of reserved _t in type names
57752d28b3 [build] Set --enable-jni to no by default instead of auto.
e7daa9b3c2 [build] Tweak JNI macro to warn instead of error for JNI not found.
5b22977922 [build] Update ax_jni_include_dir.m4 macro to deal with recent versions of macOS

git-subtree-dir: src/secp256k1
git-subtree-split: b19c000063be11018b4d1a6b0a85871ab9d0bdcf
@ihavenoface
Copy link
Contributor Author

I just realized that I should actually test this on main net as well. I doubt anyone would have fun with fresh 0.10 nodes mining / minting new blocks and then potentially being unable to share them to other fresh 0.10 peers.
Do you reckon it's fine to allow any difficulty when generating blocks or would that somehow mess with the results?

@peerchemist
Copy link
Member

Please do same testing procedure on mainnet with rc5 and report back. Thank you.

@ihavenoface
Copy link
Contributor Author

The same procedure requires me to produce new blocks, which will not be possible on current main net difficulty, unless you have a couple of months to spend.

@peerchemist
Copy link
Member

Do you experience sync issues with rc5 on mainnet maybe?

@ihavenoface
Copy link
Contributor Author

ihavenoface commented Mar 28, 2021

I do not I have not fully tested sync with the new client on the main net yet, but how many similar nodes (running rc5) on the mainnet are there?

I believe the main concern with this issue right now is that's unclear where it stems from. It's likely a relay issue, but what if it happens right at block creation?

@peerchemist
Copy link
Member

but how many similar nodes (running rc5) on the mainnet are there?

I see three here: https://chainz.cryptoid.info/ppc/#!network

More rc5 nodes should come online during this week.

@ihavenoface
Copy link
Contributor Author

ihavenoface commented Mar 31, 2021

I just ran https://github.com/peercoin/electrumx-docker with no modifications for ease of use and here is some output:

peercoind_1  | 2021-03-31 08:00:58 Misbehaving: 178.49.167.228:9901 peer=4 (0 -> 100) BAN THRESHOLD EXCEEDED
peercoind_1  | 2021-03-31 08:00:58 ERROR: ProcessNewBlock: AcceptBlock FAILED (hashMerkleRoot mismatch)
peercoind_1  | 2021-03-31 08:00:58 Misbehaving: 104.54.203.174:9901 peer=7 (0 -> 100) BAN THRESHOLD EXCEEDED
peercoind_1  | 2021-03-31 08:00:58 ERROR: ProcessNewBlock: AcceptBlock FAILED (hashMerkleRoot mismatch)

Looking at https://chainz.cryptoid.info/ppc/#!network right now the node 104.54.203.174 is running /Satoshi:0.20.1/Peercoin:0.10.0(v0.10.0ppc.rc5 Scarab)/ so this seems in line with #557 (comment).

@sandakersmann
Copy link
Contributor

RC5 does not include this commit: 4c36519

@peerchemist
Copy link
Member

@ihavenoface do you have any further reports?

@ihavenoface
Copy link
Contributor Author

ihavenoface commented May 13, 2021

@peerchemist
I haven't had the time to look into this further up until now, but try running a 0.9 node now, and see it banning 0.10 nodes upon sight:
Screenshot 2021-05-13 220008
image

This whole thing seems sporadic if anything. I have synced a 0.10 node just yesterday, launched it again today, only to find no connections. Only after a restart the 0.10 node was able to find peers again.

@7h0m45
Copy link

7h0m45 commented May 19, 2021

I'd like to attest to this issue as well. It looks like a critical problem. In my experience v0.9 nodes inevitably fail to sync with v0.10 nodes. I agree that it seems sporadic so it's difficult to pinpoint the problem. It's easy to demonstrate though if you attempt to launch a v0.9 node only connected to v0.10 nodes.

@7h0m45
Copy link

7h0m45 commented May 20, 2021

@peerchemist

I'm concerned from your remark on Discord that you are no longer tracking this issue. It's significant and reproducible. The protocol layer may still be stable but the network layer is broken.

v0.9 nodes are banning v0.10 nodes for the following:
ERROR: ProcessNewBlock: AcceptBlock FAILED (proof of work failed)

Currently, using the official builds, it appears that v0.9 nodes will only accept new blocks from other v0.9 nodes.

@backpacker69
Copy link
Member

@peerchemist

I'm concerned from your remark on Discord that you are no longer tracking this issue. It's significant and reproducible. The protocol layer may still be stable but the network layer is broken.

v0.9 nodes are banning v0.10 nodes for the following:
ERROR: ProcessNewBlock: AcceptBlock FAILED (proof of work failed)

Currently, using the official builds, it appears that v0.9 nodes will only accept new blocks from other v0.9 nodes.

can you please try building latest 0.9 branch to see if it solves the problem?

@ihavenoface
Copy link
Contributor Author

ihavenoface commented May 20, 2021

I have cherry-picked that commit to my repo. Builds are running here https://gitlab.com/noface/peercoin/-/pipelines/306456014 and will be available here https://gitlab.com/noface/peercoin/-/releases/v0.9.0initNflags once done.

Obviously the usual safety-precautions should be taken, so run those in a VM, if you choose to use them.

@ihavenoface
Copy link
Contributor Author

ihavenoface commented May 20, 2021

Still getting

2021-05-20 11:01:49 Misbehaving: 176.37.82.11:9901 peer=1 (0 -> 100) BAN THRESHOLD EXCEEDED
2021-05-20 11:01:49 ERROR: ProcessNewBlock: AcceptBlock FAILED (hashMerkleRoot mismatch)

with that build.
And this right off the bat.
image

@backpacker69
Copy link
Member

Still getting

can you please check with latest commit in release-0.10?

@7h0m45
Copy link

7h0m45 commented May 28, 2021

All systems nominal on v0.10.1
Thank you!

@ihavenoface
Copy link
Contributor Author

Looks good on my end.
Feel free to close this as you see fit.

@ihavenoface
Copy link
Contributor Author

Or maybe not.
I'm getting 2021-05-28T23:01:56Z ERROR: AcceptBlockHeader: Consensus::CheckBlockHeader: 514e141ef5778299506b779145c9c94d78b0603281246741cb31f112c6635365, high-hash, proof of work failed on sync now and when trying to import a bootstrap.dat created with the same client version (0.10.1).

@backpacker69
Copy link
Member

can you please provide step by step instructions to reproduce, this shouldn't happen on the correct 0.10.1 tag

@ihavenoface
Copy link
Contributor Author

  1. Go to https://github.com/peercoin/peercoin/releases/tag/v0.10.1ppc
  2. Select peercoin-0.10.1-win64-setup-unsigned.exe and download it
  3. Open %appdata% in Windows Explorer, find and delete the directory called Peercoin
  4. Execute the downloaded installer
  5. Open the recently installed Peercoin application
  6. Let it sit syncing for a while
  7. Open %appdata%\Peercoin\debug.log
  8. Notice the error mentioned above or in this log https://gitlab.com/noface/peercoin/-/jobs/1303009055#L380
  9. Download the produced bootstrap.dat ( https://gitlab.com/noface/peercoin/-/jobs/1303009055/artifacts/browse ) and use that in conjunction with -loadblock on yet another fresh node, in a vm with no networking enabled to have it getting stuck at around blockheight ~6000

This all works fine on 0.9.

@peerchemist
Copy link
Member

Problem can be in that boostrap you use.

@ihavenoface
Copy link
Contributor Author

ihavenoface commented May 30, 2021

Most certainly so. But given that it is produced by the 0.10.1 client, and then fails to be reimported by that same client, I think it's reasonable to assume that there is something afoot when it shouldn't be.
That is, either in the bootstrap generation, or higher up in the client that writes the blocks beforehand.

@ymgve
Copy link

ymgve commented Jun 13, 2021

Might be connected to a serialization issue I'm seeing - I noticed that a node with version "/Satoshi:0.20.1/Peercoin:0.10.0(v0.10.0.0-0e0fa7bbc Scarab)/" and service flags 1033 would give this data when requesting the genesis block: 010000000000000000000000000000000000000000000000000000000000000000000000c293592c05905698290c89eb6ddef0cf8aa5a148c68c55ac7ad1b4fa858f2d3c7f5b2c50ffff001dab82e5810101000000a2592c50010000000000000000000000000000000000000000000000000000000000000000ffffffff5404ffff001d020f274b4d61746f6e69732030372d4155472d3230313220506172616c6c656c2043757272656e6369657320416e642054686520526f61646d617020546f204d6f6e65746172792046726565646f6dffffffff010000000000000000000000000000

while a node with version '/Satoshi:0.20.1/Peercoin:0.10.0(v0.10.1ppc Scarab)/' and service flags 1033 would give this data:
010000000000000000000000000000000000000000000000000000000000000000000000c293592c05905698290c89eb6ddef0cf8aa5a148c68c55ac7ad1b4fa858f2d3c7f5b2c50ffff001dab82e581000000000101000000a2592c50010000000000000000000000000000000000000000000000000000000000000000ffffffff5404ffff001d020f274b4d61746f6e69732030372d4155472d3230313220506172616c6c656c2043757272656e6369657320416e642054686520526f61646d617020546f204d6f6e65746172792046726565646f6dffffffff010000000000000000000000000000

The difference seems to be that the former doesn't include the header flags in the serialization for some reason.

@backpacker69
Copy link
Member

and the reason for it c2cb099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants