Fix nodes getting stuck syncing #256

denravonska · 2017-03-31T21:12:38Z

When determining if a beacon is old, compare its age to the age of th…e last known block and not the system time.

Without this comparing to the system time would cause unsynced nodes to get stuck if they try to reconnect while being offline for a while. The beacon age is a construct to save local space on the nodes so it needs to be compared within the nodes' time frames. Doing additional testing on this.

This closes #254.

It's possible that I made a bug in the new time limit so it allows everything instead. Master doesn't have unit tests so it's hard to test. I'll double check it.

…e last known block and not the system time. Without this comparing to the system time would cause unsynced nodes to get stuck if they try to reconnect while being offline for a while. The beacon age is a construct to save local space on the nodes so it needs to be compared within the nodes' time frames.

tomasbrod · 2017-04-01T12:15:25Z

src/rpcblockchain.cpp

+    int64_t iAge = pindexBest != NULL
+          ? pindexBest->nTime - mvApplicationCacheTimestamp[key]
+          : 0;
+


Why not compare the beacon age with age of block you are currently checking?

That is much better. I'll check it when I get back to the comp.

@tomasbrod Alright, there might not be a block to validate against since it also seems to apply when broadcasting your own beacon. What do you think about changing the signature to GetBeaconPublicKey(const std::string& cpid, uint64_t referenceTime);? That way we can use the current time for beacon publishing and the block time when validating blocks.

Dantali0n · 2017-04-01T19:41:51Z

src/main.cpp

@@ -7170,7 +7170,7 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
            return false;
        }

-		if (pfrom->nVersion < 180322 & !fTestNet && pindexBest->nHeight > 855000)
+		if (pfrom->nVersion < 180322 && !fTestNet && pindexBest->nHeight > 855000)


This is probably a large part of what caused the major syncing issue and 3.5.8.6 clients not getting dropped.

This one is a typo, but if you expand it it just happens to work out. It's merely a nuisance :)

I added som braces to the previous version expression:
((pfrom->nVersion < 180322) & (!fTestNet)) && (pindexBest->nHeight > 855000)

result of first '<' is 0/1 and result of '!' is 0/1 so the '&' does the same thing to it
then it goes to '&&' as boolean
which means that the two expressions are equivalent

gridcoin · 2017-04-02T10:50:18Z

I want to fix this as much as you all do but we have to get this right as it will be too embarassing to not fix it permanently and ask exchanges to upgrade. Therefore we need to focus on validating this and getting the network back to smooth sailing (IE instead of improving the code in this ticket).
So looking at the proposed changes, as Tomas said, that boolean agebra typo actually means the same thing - so thats not actually helping (fix the issue). On a side note, we could increment the network version, forcing the entire network to upgrade, but the problem with that is it creates a massive fork and risks GRC to be lost on trading shenanigans and that is especially dangerous when we have such an acute upgrade timeframe like this one (IE unplanned emergency upgrade) so I am shying away from that.
Regarding adding the If pindex == null then return valid, that doesnt really help either because we have the nGrandfather int all through the program where we dont check those low blocks anyway (when current block is null), as soon as wallet syncs more than 1k in there is always a pindexbest.

Moving on to the last change, comparing block timestamp as opposed to getAdjustedTime - either way works as a consensus - as GetAdjustedTime is a network agreed timestamp for the current blocks being checked.

Something changed, I dont know if its a portion of the network checking blocks that were truly online longer than the old bug exploited or what, but looking at this change it does not look like it will help.

I think we need more research, and we need a very simple change to be applied as a madatory - and then Devranoska you can then improve the code and Ill turn the reigns to you. I am going to send you build scripts for windows and all my windows dependencies asap btw.

As far as what can we do right now, the main solution I am thinking of is allowing the client to Pass the block check (for BAD CPID) when the block is older than BlockNeedsChecked- as that would allow clients to sync to the top. Then I could ensure the client does load MORE than 6 months In during the initial load (which I believe we do already). And possibly put a more stringent upgrade penalty in..... Will look into this more now- but I welcome comments.

denravonska · 2017-04-02T11:22:59Z

@gridcoin Actually, this PR is intended as a bugfix and not an improvement or refactoring :) GetAdjustedTime() is the host's local time with a network time offset which means that the wallet may toss out beacons it needs to catch up back in sync. Consider a wallet which is taken offline for 6 months at block x. Now every beacon it used to know are old so it can never validate block x+1. That's why I think it should stay the way it was before, that is the age is compared to the age of the wallet's newest block. The proposal in your last paragraph will also solve it and make this PR redundant.

Disregarding that particular issue, there is something very strange going on. I added a findbeacon command to see where the beacon for a CPID of a particular block resided and found that for block 859239 (a current known "get-stuck-at" block) the CPID is 18c25e9a719c92bedd77a3e1c5ac536d and its closest beacon towards the root of the chain is in block 641919. That beacon is too old no matter how you look at it; height, current time or time from block 859239. Can anyone figure out how it got there or am I off in the way I'm thinking?

gridcoin · 2017-04-02T11:49:41Z

@denravonska, Ok, you are correct, now I see why I had it set originally the old way - in that nodes that are Not fully synced would have the same view of the past 6 months of beacons, and before the Mandatory, the main bug that existed before- was that nodes who stayed online for longer than 6 months had TOO many beacons in memory causing them to validate blocks they should have actually failed (IE the consensus was not accurate across all of our network).

OK, I am onboard with your change now. I will pull it in and see if I can add any additional logic regarding blocks that dont need checked because they are old, and then we can do a new mandatory.

Looking at this with a fresh view, I believe our last mandatory did not solve the problem and now this appears to me that it most likely will solve the problem. With a new grandfather block the latest snapshot should allow people to sync to the top.

gridcoin · 2017-04-02T11:51:00Z

Regarding the CPID in question: It should have been declined by 51% and I assume it was accepted because of the nodes staying on an extended period.

gridcoin · 2017-04-02T12:13:39Z

Thanks for all the hard work.. I think we are on the right track.

skcin · 2017-04-02T12:22:43Z

@gridcoin getting stuck between block 854400 and 855000 is also very common due to forking and alot of old clients still online. If the disconnect from old clients would happen before block 854400 syncing would be smoother.

denravonska · 2017-04-02T12:29:22Z

@gridcoin Cool. My fix caused @skcin to get past 859238 but that shouldn't be possible. Either my fix is buggy or I have pulled the incorrect data for the beacons. Are beacons always stored in block.tx[>0]?

@skcin I think you were correct about your initial, grandfather change. I think there are blocks with very old beacons in the 854400 series so using only .7 clients should not help. Or did it fix it in your test?

philipswift · 2017-04-02T12:37:05Z

Excellent spirit and team play regardless of outcome! To the for..

skcin · 2017-04-02T12:49:39Z

@denravonska disconnecting earlier would just prevent you getting dragged on a fork and getting stuck there. Especially if you are syncing from 0 you might currently be connected to several 3.5.8.6 clients. Between block 854400 and 855000 they can drag you on a fork and you get stuck soon since they accept blocks you don't. I think this this is a temorary issue though, unitl all most old clients are gone.

gridcoin · 2017-04-02T12:51:49Z

@denravonska, the change looks solid, as far as logic. He probably either went on a new chain, or the chain may be full of investor blocks (that dont get checked). I think it will work.

grctest · 2017-04-05T14:02:28Z

@Dantali0n @tomasbrod - What are your GRC addresses? Small bounty for aiding in development.

Dantali0n · 2017-04-06T05:46:18Z

@grctest SA1aEz3wpSUK5NqKGsGYh9vzBie5tqJJyL thank you that's very kind :)

philipswift · 2017-04-06T11:10:57Z

@tomasbrod any particular reason for the 'thumb down'?

grctest · 2017-04-06T13:53:42Z

@Dantali0n:
@grctest SA1aEz3wpSUK5NqKGsGYh9vzBie5tqJJyL thank you that's very kind :)

Sent! TX: 64ea219e125bd57b7f62d8ddae0438217a43c8e7058fc3cc90653e9191ce2ac3-000

tomasbrod · 2017-04-06T20:46:07Z

@grctest SD24cwWaqCd1gNNYnwbK2gTkkjVKWUuPuG (Just generated) Do I deserve it? I just talked into the issue. But hey, Thank you.

@philipswift Becouse it was off topic and ... unfinished? "To the for.." <- what does it mean?

grctest · 2017-04-07T01:29:48Z

@tomasbrod sent! tx: d04dd70a846323244e6ed40a686fafbebdaf11e8cb0577684c897daa4ff29179-000

philipswift · 2017-04-07T08:12:29Z

@tomasbrod lol, you give me a thumbs down because you don't understand? Skewed logic and quite an insight. It was finished. Maybe be a little less hasty to judge and try not to assume. Assuming makes an ASS out of U and ME.

denravonska changed the title ~~When determining if a beacon is old, compare its age to the age of th…~~ Fix nodes getting stuck syncing Mar 31, 2017

denravonska mentioned this pull request Mar 31, 2017

25k GRC bounty -> finding additional root causes & delivering solutions (e.g. users being out-of-sync) Erkan-Yilmaz/Gridcoin-tasks#96

Closed

denravonska added 2 commits April 1, 2017 08:28

Fix issue with logical vs conditional AND operation.

2a0aaf8

If we have no current block, all beacons are considered valid.

c7b1b22

tomasbrod reviewed Apr 1, 2017

View reviewed changes

Dantali0n reviewed Apr 1, 2017

View reviewed changes

denravonska mentioned this pull request Apr 2, 2017

Beacon age threshold mismatch #257

Closed

gridcoin merged commit 419db74 into gridcoin-community:master Apr 2, 2017

skcin mentioned this pull request Sep 22, 2017

[CLOSED] 25k GRC bounty -> finding additional root causes & delivering solutions (e.g. users being out-of-sync) gridcoin-community/Gridcoin-Tasks#101

Closed

denravonska deleted the sync-fix branch July 14, 2021 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nodes getting stuck syncing #256

Fix nodes getting stuck syncing #256

denravonska commented Mar 31, 2017 •

edited

Loading

tomasbrod Apr 1, 2017

denravonska Apr 1, 2017

denravonska Apr 1, 2017

Dantali0n Apr 1, 2017

denravonska Apr 1, 2017

tomasbrod Apr 1, 2017

tomasbrod Apr 1, 2017

gridcoin commented Apr 2, 2017

denravonska commented Apr 2, 2017 •

edited

Loading

gridcoin commented Apr 2, 2017

gridcoin commented Apr 2, 2017

gridcoin commented Apr 2, 2017

skcin commented Apr 2, 2017

denravonska commented Apr 2, 2017 •

edited

Loading

philipswift commented Apr 2, 2017

skcin commented Apr 2, 2017

gridcoin commented Apr 2, 2017

grctest commented Apr 5, 2017

Dantali0n commented Apr 6, 2017

philipswift commented Apr 6, 2017

grctest commented Apr 6, 2017

tomasbrod commented Apr 6, 2017

grctest commented Apr 7, 2017

philipswift commented Apr 7, 2017 •

edited

Loading

Fix nodes getting stuck syncing #256

Fix nodes getting stuck syncing #256

Conversation

denravonska commented Mar 31, 2017 • edited Loading

tomasbrod Apr 1, 2017

Choose a reason for hiding this comment

denravonska Apr 1, 2017

Choose a reason for hiding this comment

denravonska Apr 1, 2017

Choose a reason for hiding this comment

Dantali0n Apr 1, 2017

Choose a reason for hiding this comment

denravonska Apr 1, 2017

Choose a reason for hiding this comment

tomasbrod Apr 1, 2017

Choose a reason for hiding this comment

tomasbrod Apr 1, 2017

Choose a reason for hiding this comment

gridcoin commented Apr 2, 2017

denravonska commented Apr 2, 2017 • edited Loading

gridcoin commented Apr 2, 2017

gridcoin commented Apr 2, 2017

gridcoin commented Apr 2, 2017

skcin commented Apr 2, 2017

denravonska commented Apr 2, 2017 • edited Loading

philipswift commented Apr 2, 2017

skcin commented Apr 2, 2017

gridcoin commented Apr 2, 2017

grctest commented Apr 5, 2017

Dantali0n commented Apr 6, 2017

philipswift commented Apr 6, 2017

grctest commented Apr 6, 2017

tomasbrod commented Apr 6, 2017

grctest commented Apr 7, 2017

philipswift commented Apr 7, 2017 • edited Loading

denravonska commented Mar 31, 2017 •

edited

Loading

denravonska commented Apr 2, 2017 •

edited

Loading

denravonska commented Apr 2, 2017 •

edited

Loading

philipswift commented Apr 7, 2017 •

edited

Loading