New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for verifychain #634
Fix for verifychain #634
Conversation
A new build of 91e731d has completed succesfully! |
A new build of 70a96a7 has completed succesfully! |
Tested locally and fixed with fcc66d1, ready for review |
A new build of fcc66d1 has completed succesfully! |
A new build of ceb4a5e has completed succesfully! |
Although I've initially found that partial runs were failing in mainnet, but each run of
datadir=/Users/alex/navcoin-core/data
navcli="/Users/alex/navcoin-core/src/navcoin-cli -devnet -datadir=$datadir"
blocks=`$navcli getinfo|jq .blocks`
for i in $(seq 1 $blocks);
do
verifyoutput=`$navcli verifychain 4 $i`
if [[ "$verifyoutput" == "false" ]];
then
verifyoutput+=`echo ' - ' && echo failed at $(grep 'ERROR: VerifyDB()' $datadir/devnet/debug.log |tail -1|sed 's/.*block at \(\d*\)/\1/')`
fi
echo Rewinding to \
$(bc <<< $blocks-$i) \
- reconnecting up to $blocks \
- verifychain 4 $i -> $verifyoutput;
done Running this script would be very useful against a mainnet fully synced node, as it would give us certainty about how well the cfunddb behaves with long reorganizations. But it would take an immense amount of time. How to efficiently run this test in mainnet is an open question. A possible solution would be to run it against an stressed testnet, which could even give a richer context than mainnet (due to an elevated count of injected proposals and payment requests). Any other ideas? Coming back to the script output, I was able to find it was failing disconnecting down to the following ranges:
In devnet, voting cycles last 30 blocks. Blocks cycle from 0 to 29 as in Let's see when there's been votes in the chain:
Having a closer look at what happened in the critical blocks:
Why? When we disconnect a block, the state of all the entries which had an update in that block is reseted. This is a wrong design of the cfunddb, because it assumes the entry state will be correctly calculated in the next reorganization, which does not happen all the time. For example, when we disconnect Proposal 1:
Payment request 1:
Now we can stop disconnecting and reconnect This situation happens unless we reorganise lower than I have an idea on how to fix this, but would need every wallet to reindex/resync. I'll leave it here waiting for feedback from others. |
I've committed a fix in 29f404d for the previously described issue. This will require every node to reindex the first time it's opened. Verified with the aforementioned script, With the new patch, the whole history of states is stored in a map. Every time the state changes it is stored and associated with the hash of the black where the transition happened. When an entry is reorganised the state associated with the hash of the disconnected block is erased. The new state will be that one associated with the block with higher height part of the active chain. This completely substitutes the previous reorganisation method. |
A new build of 29f404d has completed succesfully! |
This PR is failing the
|
A new build of 2eee7d5 has completed succesfully! |
@aguycalled the cfunddb-fork-reorg-* tests seem to be failing in this build. |
A new build of 1bd70c8 has completed succesfully! |
A new build of 475cf7b has completed succesfully! |
compiles, running tests. Will review the diff in the morning. |
RPC test suite passes, except getting some error with it not being able to find the file |
Reindex initiated on OSX both testnet & mainnet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RPC test suite reviewed.
@@ -16,6 +16,7 @@ class CfundForkReorgProposal(NavCoinTestFramework): | |||
def __init__(self): | |||
super().__init__() | |||
self.setup_clean_chain = True | |||
self.node_args = [['-debug=dao'], ['-debug=dao']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this node initialisation done the same way as it is done in cfund-fork-reorg-preq.py
consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -96,7 +97,7 @@ def run_test(self): | |||
assert_equal(self.nodes[0].getblock(self.nodes[0].getproposal(proposalHash0)["blockHash"]), self.nodes[1].getblock(self.nodes[1].getproposal(proposalHash0)["blockHash"])) | |||
assert_equal(self.nodes[0].getproposal(proposalHash0), self.nodes[1].getproposal(proposalHash0)) | |||
|
|||
# End cycle 2 | |||
# End cycle 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lots of whitespace diffs, i wonder if we should commit some code linter, or auto formatter config to the repo to help make this more consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -169,36 +167,30 @@ def run_test(self): | |||
for x in range(self.num_nodes): | |||
assert(self.nodes[x].getbestblockhash() == bestBlockHash) | |||
assert(self.nodes[x].getpaymentrequest(paymentReq)["state"] == 1) | |||
assert(self.nodes[x].getpaymentrequest(paymentReq)["paidOnBlock"] == "0000000000000000000000000000000000000000000000000000000000000000") | |||
assert(self.nodes[x].getpaymentrequest(paymentReq)["stateChangedOnBlock"] != "0000000000000000000000000000000000000000000000000000000000000000") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why has this assertion been inverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paidOnBlock does not exist anymore, but stateChangedOnBlock is used combined with state==6 to show in which block a payment request has been paid. line 172 checks the payment request is not null
|
||
# disconnect the nodes and generate the payout on each node | ||
url = urllib.parse.urlparse(self.nodes[1].url) | ||
self.nodes[0].disconnectnode(url.hostname+":"+str(p2p_port(1))) | ||
|
||
time.sleep(1) | ||
|
||
while self.nodes[0].getpaymentrequest(paymentReq)["paidOnBlock"] == "0000000000000000000000000000000000000000000000000000000000000000": | ||
while self.nodes[0].getpaymentrequest(paymentReq)["state"] == 6: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't state 6 the final state of paid
? Or will it move from this to something else eventually? The diff says the while condition was previously paidonblock == "000.."
which i thought meant loop until it has been paid, but now the while runs until the state is no longer paid? I must be misinterpreting this, the test passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ qa/rpc-tests/cfund-paymentrequest-payout.py
Initializing test directory /var/folders/qh/7_l8lmqx2sb5ncxgtzzyb51r0000gn/T/testfn5g4cou/27872
Stopping nodes
Cleaning up
Tests successful
@@ -150,7 +151,7 @@ def run_test(self): | |||
self.nodes[0].paymentrequestvote(paymentrequestid0, "remove") | |||
|
|||
assert(self.nodes[0].getpaymentrequest(paymentrequestid0)["state"] == 0) | |||
assert(self.nodes[0].getpaymentrequest(paymentrequestid0)["status"] == "accepted waiting for end of voting period") | |||
assert_equal(self.nodes[0].getpaymentrequest(paymentrequestid0)["status"], "accepted waiting for end of voting period") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any benefit changing from assert
with the string comparison to assert_equal
? or just for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for debug reasons
|
||
self.sync_all() | ||
|
||
assert_equal(self.nodes[0].getbestblockhash(), self.nodes[1].getbestblockhash()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we again check that the cfundstatehash
matches on each node after they are reconnected? or is it implied they will match because their bestblockhash
is the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ qa/rpc-tests/cfunddb-statehash.py
Initializing test directory /var/folders/qh/7_l8lmqx2sb5ncxgtzzyb51r0000gn/T/testivmhhxdf/28317
Stopping nodes
Cleaning up
Tests successful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more review added
{ | ||
if (!it.second.IsNull()) | ||
{ | ||
writer << it.second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be enough to only concatenate the proposal hashes and payment request hashes here since we can already assume those are hashes of the contents of the objects? Or should we be completely verbose and concatenate the whole object like this to be sure? What was the benchmarking results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the proposal and payment hashes are the hash of the transaction where they were created and do not change with the objects, hence we need to serialize and hash them
@@ -310,7 +310,7 @@ struct CCacheEntry | |||
typedef CCacheEntry<CCoins> CCoinsCacheEntry; | |||
typedef boost::unordered_map<uint256, CCoinsCacheEntry, SaltedTxidHasher> CCoinsMap; | |||
typedef std::map<uint256, CProposal> CProposalMap; | |||
typedef boost::unordered_map<uint256, CPaymentRequest, SaltedTxidHasher> CPaymentRequestMap; | |||
typedef std::map<uint256, CPaymentRequest> CPaymentRequestMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is SaltedTxidHasher and why is it removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/cpp/container/unordered_map
what i would say here, is that in std::map
the order is guaranteed while boost::unordered_map
does not guarantee it. guaranteed order is needed for cfunddbstate hash consistency. SaltedTxidHasher is removed because now we use std::map
@@ -221,28 +221,95 @@ bool CFund::IsValidPaymentRequest(CTransaction tx, CCoinsViewCache& coins, int n | |||
|
|||
} | |||
|
|||
flags CFund::CPaymentRequest::GetLastState() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i understand it we're storing the block height of each state change into a <blockhash,state> map against the payment request (and proposal) now correct? How was this done previously? Was it not needed? Did we just kept the last state and the last blockhash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we store a map of block hash and states (the whole state history, so its easy to roll back).
before we just stored the last state and blockhash, which was the main source of issues related to reorganizations (how to recover previous state)
if (fState == ACCEPTED && mapBlockIndex.count(blockhash) > 0) { | ||
CBlockIndex* pBlockIndex = mapBlockIndex[blockhash]; | ||
ret.pushKV("expiresOn", pBlockIndex->GetBlockTime() + (uint64_t)nDeadline); | ||
if ((fState == ACCEPTED || fState == PAID) && pblockindex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't pblockindex
always evaluate true unless the proposal has not got any state at all? Is that what you're supposed to be checking in this condition? Or should we be using GetLastStateBlockIndexForState()
to check a particular state has been correctly achieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's true that (fState == ACCEPTED || fState == PAID)
can't give us a null pointer in pblockindex
, but i think the check does not hurt for sanity
@@ -767,94 +887,97 @@ void CFund::CFundStep(const CValidationState& state, CBlockIndex *pindexNew, con | |||
|
|||
CBlockIndex* pblockindex = mapBlockIndex[prequest->txblockhash]; | |||
|
|||
CProposal proposal; | |||
|
|||
if (!view.GetProposal(prequest->proposalhash, proposal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we need this check anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFundStep has been rewritten a bit
if((pindexNew->nHeight + 1) % Params().GetConsensus().nBlocksPerVotingCycle == 0) | ||
CProposal proposal; | ||
|
||
if (!view.GetProposal(prequest->proposalhash, proposal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can see it's moved here. what was the motive of moving it here? so we get the logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFundStep has been rewritten a bit, i'm not sure if this check would make more sense earlier. it's less likely a payment request has no parent proposal than a payment request being out of chain (previous checks)
CProposal tmp; | ||
CProposal oldproposal = CProposal(); | ||
|
||
if (fLog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this also moved further down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFundStep has been rewritten a bit to accomodate the changes of this pull request. this moved some parts of the code
A new build of f630567 has completed succesfully! |
A new build of 561339d has completed succesfully! |
The RPC tests are failing in travis now. For me, either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all reviewed except diff on main.cpp
return nAmount == b.nAmount | ||
&& fState == b.fState | ||
&& thisMapState == bMapState | ||
&& hash == b.hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming how these operator overloads work. In this context hash
is the hash
property of the CPaymentRequest on the left side of the operator and b.hash is the hash
property of the CPaymentRequest on the right side of the operator? Or should we be explicitly defining both objects in parenthesis as lhs and rhs like in the spec?
inline bool operator==(const X& lhs, const X& rhs){ /* do actual comparison */ }
inline bool operator!=(const X& lhs, const X& rhs){ return !(lhs == rhs); }
https://en.cppreference.com/w/cpp/language/operators
Either way is fine if it works, i just don't want to make assumptions on my side about how this functions exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes thats how it works
ui->labelPrequestPaymentHashTitle->setVisible(false); | ||
ui->labelPrequestPaymentHash->setVisible(false); | ||
} | ||
ui->labelPrequestPaymentHashTitle->setVisible(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this still be wrapped in the if
condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paymenthash does not exist anymore, so it makes sense to hide it always
This reverts commit 561339d.
A new build of 3b9d623 has completed succesfully! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did stress testing with the script https://gist.github.com/aguycalled/13f362c29ea9c38624e8dd4593bc37f8 on ubuntu 18.10 and 19.04 for a week and no problem was found besides some scripting issues.
src/consensus/cfund.cpp
Outdated
@@ -829,9 +829,9 @@ void CFund::CFundStep(const CValidationState& state, CBlockIndex *pindexNew, con | |||
CProposalModifier proposal = view.ModifyProposal(it->first); | |||
proposal->nVotesYes = it->second.first; | |||
proposal->nVotesNo = it->second.second; | |||
proposal->fDirty = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of moving this assignment to inside the if statements?
@@ -3911,6 +3881,7 @@ bool static ConnectTip(CValidationState& state, const CChainParams& chainparams, | |||
int64_t nTime3; | |||
int64_t nTime4; | |||
LogPrint("bench", " - Load block from disk: %.2fms [%.2fs]\n", (nTime2 - nTime1) * 0.001, nTimeReadFromDisk * 0.000001); | |||
uint256 statehash; | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a bit of a side note, what is the purpose of these curly braces which don't appear to be enclosing any conditional, loop or function definition? Is it some notation for a constructor or some way of encapsulating the contents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@proletesseract I think that is for scoping, IIRC any variables declared in the curlies will be destroyed once the code passes the closing curly.
I'm still running the scenarios:
Ubuntu 19.10 self compiled using depends dir. |
I've also verified that running this branch on an old datadir forces a reindex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a linter like @proletesseract suggested, I'm not sure how to go about implementing it though.
Fixes #630
See issue for more details.
Changes introduced
getcfunddbstatehash
covered by functional test cfunddb-statehash.py6
when those are paid.paidOnBlock
is substituted bystateChangedOnBlock
when state is6
verifychain
now checks for the consistency of the CFundDB state hash when level4
is specified.Important
This set of changes will require older clients to reindex on launch, keeping the node offline for some hours at best. In order to reduce downtime, node operators can proceed as follows if needed:
mkdir /tmp/reindexdata
;cp -rf <data_folder>/blocks /tmp/reindexdata/
;cp -rf <data_folder>/chainstate /tmp/reindexdata/
-reindex -datadir=/tmp/reindexdata/
rm -rf <data_folder>/blocks <data_folder>/chainstate
;cp -rf /tmp/reindexdata/* <data_folder>
What to test
verifychain 4 0
. This should return true.verifychain 4 n
return true (being n an arbitrary value between 1 and the length of the chain).