-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nodes getting stuck syncing #256
Conversation
…e last known block and not the system time. Without this comparing to the system time would cause unsynced nodes to get stuck if they try to reconnect while being offline for a while. The beacon age is a construct to save local space on the nodes so it needs to be compared within the nodes' time frames.
int64_t iAge = pindexBest != NULL | ||
? pindexBest->nTime - mvApplicationCacheTimestamp[key] | ||
: 0; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not compare the beacon age with age of block you are currently checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is much better. I'll check it when I get back to the comp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomasbrod Alright, there might not be a block to validate against since it also seems to apply when broadcasting your own beacon. What do you think about changing the signature to GetBeaconPublicKey(const std::string& cpid, uint64_t referenceTime);
? That way we can use the current time for beacon publishing and the block time when validating blocks.
@@ -7170,7 +7170,7 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv, | |||
return false; | |||
} | |||
|
|||
if (pfrom->nVersion < 180322 & !fTestNet && pindexBest->nHeight > 855000) | |||
if (pfrom->nVersion < 180322 && !fTestNet && pindexBest->nHeight > 855000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a large part of what caused the major syncing issue and 3.5.8.6 clients not getting dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a typo, but if you expand it it just happens to work out. It's merely a nuisance :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added som braces to the previous version expression:
((pfrom->nVersion < 180322) & (!fTestNet)) && (pindexBest->nHeight > 855000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result of first '<' is 0/1 and result of '!' is 0/1 so the '&' does the same thing to it
then it goes to '&&' as boolean
which means that the two expressions are equivalent
I want to fix this as much as you all do but we have to get this right as it will be too embarassing to not fix it permanently and ask exchanges to upgrade. Therefore we need to focus on validating this and getting the network back to smooth sailing (IE instead of improving the code in this ticket). Moving on to the last change, comparing block timestamp as opposed to getAdjustedTime - either way works as a consensus - as GetAdjustedTime is a network agreed timestamp for the current blocks being checked. Something changed, I dont know if its a portion of the network checking blocks that were truly online longer than the old bug exploited or what, but looking at this change it does not look like it will help. I think we need more research, and we need a very simple change to be applied as a madatory - and then Devranoska you can then improve the code and Ill turn the reigns to you. I am going to send you build scripts for windows and all my windows dependencies asap btw. As far as what can we do right now, the main solution I am thinking of is allowing the client to Pass the block check (for BAD CPID) when the block is older than BlockNeedsChecked- as that would allow clients to sync to the top. Then I could ensure the client does load MORE than 6 months In during the initial load (which I believe we do already). And possibly put a more stringent upgrade penalty in..... Will look into this more now- but I welcome comments. |
@gridcoin Actually, this PR is intended as a bugfix and not an improvement or refactoring :) Disregarding that particular issue, there is something very strange going on. I added a |
@denravonska, Ok, you are correct, now I see why I had it set originally the old way - in that nodes that are Not fully synced would have the same view of the past 6 months of beacons, and before the Mandatory, the main bug that existed before- was that nodes who stayed online for longer than 6 months had TOO many beacons in memory causing them to validate blocks they should have actually failed (IE the consensus was not accurate across all of our network). OK, I am onboard with your change now. I will pull it in and see if I can add any additional logic regarding blocks that dont need checked because they are old, and then we can do a new mandatory. Looking at this with a fresh view, I believe our last mandatory did not solve the problem and now this appears to me that it most likely will solve the problem. With a new grandfather block the latest snapshot should allow people to sync to the top. |
Regarding the CPID in question: It should have been declined by 51% and I assume it was accepted because of the nodes staying on an extended period. |
Thanks for all the hard work.. I think we are on the right track. |
@gridcoin getting stuck between block 854400 and 855000 is also very common due to forking and alot of old clients still online. If the disconnect from old clients would happen before block 854400 syncing would be smoother. |
@gridcoin Cool. My fix caused @skcin to get past @skcin I think you were correct about your initial, grandfather change. I think there are blocks with very old beacons in the 854400 series so using only .7 clients should not help. Or did it fix it in your test? |
Excellent spirit and team play regardless of outcome! To the for.. |
@denravonska disconnecting earlier would just prevent you getting dragged on a fork and getting stuck there. Especially if you are syncing from 0 you might currently be connected to several 3.5.8.6 clients. Between block 854400 and 855000 they can drag you on a fork and you get stuck soon since they accept blocks you don't. I think this this is a temorary issue though, unitl all most old clients are gone. |
@denravonska, the change looks solid, as far as logic. He probably either went on a new chain, or the chain may be full of investor blocks (that dont get checked). I think it will work. |
@Dantali0n @tomasbrod - What are your GRC addresses? Small bounty for aiding in development. |
@grctest SA1aEz3wpSUK5NqKGsGYh9vzBie5tqJJyL thank you that's very kind :) |
@tomasbrod any particular reason for the 'thumb down'? |
Sent! TX: 64ea219e125bd57b7f62d8ddae0438217a43c8e7058fc3cc90653e9191ce2ac3-000 |
@grctest SD24cwWaqCd1gNNYnwbK2gTkkjVKWUuPuG (Just generated) Do I deserve it? I just talked into the issue. But hey, Thank you. @philipswift Becouse it was off topic and ... unfinished? "To the for.." <- what does it mean? |
@tomasbrod sent! tx: d04dd70a846323244e6ed40a686fafbebdaf11e8cb0577684c897daa4ff29179-000 |
@tomasbrod lol, you give me a thumbs down because you don't understand? Skewed logic and quite an insight. It was finished. Maybe be a little less hasty to judge and try not to assume. Assuming makes an ASS out of U and ME. |
When determining if a beacon is old, compare its age to the age of th…e last known block and not the system time.
Without this comparing to the system time would cause unsynced nodes to get stuck if they try to reconnect while being offline for a while. The beacon age is a construct to save local space on the nodes so it needs to be compared within the nodes' time frames. Doing additional testing on this.
This closes #254.
It's possible that I made a bug in the new time limit so it allows everything instead. Master doesn't have unit tests so it's hard to test. I'll double check it.