Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCHAIN-993 unit test showing state not catching up with a fix. #2060

Conversation

dzajkowski
Copy link
Contributor

@dzajkowski dzajkowski commented Jan 2, 2019

Overview

Introduces a unit test that shows that a change in how the blocks are added to the 2-way graph fixes a catch up problem.

essentially the fix boils down to:
if a block is a parent of 2 block then this fact needs to make its way to blockBufferDependencyDag. it is not sufficient to hold only one of the dependencies (out of potentially many children).
The clash happens when a parent block is also a justification for a younger block (c2 and f2).

JIRA ticket:

https://rchain.atlassian.net/projects/RCHAIN/issues/RCHAIN-993

Notes

none really. it's a step in identifying a bug that's been in the project for over 4 months.

Please make sure that this PR:

Bors cheat-sheet:

  • bors r+ runs integration tests and merges the PR (if it's approved),
  • bors try runs integration tests for the PR,
  • bors delegate+ enables non-maintainer PR authors to run the above.

for {
_ <- nodes(0).receive()
_ <- nodes(1).receive()
_ <- nodes(2).receive() //nodes(2) gets the 5th block and starts requesting the dag
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests uses a mocked version of the transport layer. When you call receive on a node, it takes things off its mocked network queue and then processes them. If you want to simulate the process of node A asking for a block from node B and having node B send it back, you have to do something like

// node A sends block request
nodeB.receive() // node B receives block request and sends the block
nodeA.receive() // node A receives block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why it's done 20 times - to let the network do its thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; so in theory you only need to do like 8~9 times but you just do it 20 times to be sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory yes BUT:
I've observed a funny "self healing" mechanism - if the network layer holds enough requested blocks it will make the catch up mechanism look like it works (it processes stale requested blocks). This way I'm making sure all the blocks are processed.


br <- deploy(nodes(0), deployDatasFs(0).apply())

_ <- propagate(nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make sense to put those propagates into a collection and traverse it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@dzajkowski dzajkowski changed the title RCHAIN-993 unit test showing state not catching up. RCHAIN-993 unit test showing state not catching up with a fix. Jan 3, 2019
@dzajkowski
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Jan 3, 2019
@bors
Copy link
Contributor

bors bot commented Jan 3, 2019

try

Build succeeded

@dzajkowski
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Jan 3, 2019
@bors
Copy link
Contributor

bors bot commented Jan 3, 2019

try

Build succeeded

Copy link
Contributor

@lukasz-golebiewski lukasz-golebiewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


br <- deploy(nodes(0), deployDatasFs(0).apply())

_ <- List.fill(22)(propagate(nodes)).sequence//make the network chooch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does chooch mean? No results on my dictionary.

for {
_ <- nodes(0).receive()
_ <- nodes(1).receive()
_ <- nodes(2).receive() //nodes(2) gets the 5th block and starts requesting the dag
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; so in theory you only need to do like 8~9 times but you just do it 20 times to be sure?

@KentShikama
Copy link
Collaborator

Nice catch. Thanks for cleaning up after me; I always had a single parent block model in my head to simplify things but it backfired here.

Copy link
Collaborator

@KentShikama KentShikama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait nvm this PR has nothing to do with multiple parents

_ <- CommUtil.sendBlockRequest[F](BlockRequest(Base16.encode(hash.toByteArray), hash))
blockBufferDependencyDag =>
DoublyLinkedDagOperations
.add[BlockHash](blockBufferDependencyDag, hash, childBlock.blockHash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockBufferDependencyDag represents blocks that aren't in the block buffer yet. If the block is in the block buffer why add the block to the blockBufferDependencyDag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it can be a missing parent of a block that's already been added to blockBuffer.

IOW two blocks can have one parent.

when the "missing blocks" status is being resolved for a block - this block will be added to the blockBufferDependencyDag (with its parents) and to the block buffer.

the blockBufferDependencyDag is used in reattempt to get the dependencyFree blocks.

now if a block is already in blockBuffer but it's a parent of another block (which was requested but not yet processed) then it will never be added to blockBufferDependencyDag (because it's in the blockBuffer).
This results in the processing getting "stuck" - the blockBuffer holds a block that should be processed (so it's not allowing new blocks via the network) and the internal "catch up" mechanism isn't aware that the second block has a parent that is already available and should be added (hence blocking the catch up process).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops; I see my mistake... essentially blockBufferDependencyDag represents blocks that aren't in the block dag. And so blocks in the block buffer don't count.

@KentShikama
Copy link
Collaborator

But I'll have to think a bit more. I don't quite understand why this change makes the tests pass.

@@ -1057,6 +1057,78 @@ class HashSetCasperTest extends FlatSpec with Matchers {
} yield result
}

it should "ask peers for blocks it is missing and add them" in effectTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to draw out a diagram of what this test looks like?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this unless you've already started. I quickly drew it out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well thanks anyways

Copy link
Collaborator

@KentShikama KentShikama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually one small thing before you merge: can you rename the blockBufferDependencyDag to dependencyDag (or something else) as it represents blocks that are both missing and in the block buffer?

@dzajkowski
Copy link
Contributor Author

bors r+

bors bot added a commit that referenced this pull request Jan 4, 2019
2060: RCHAIN-993 unit test showing state not catching up with a fix. r=dzajkowski a=dzajkowski



Co-authored-by: Dominik Zajkowski <dzajkowski@invisibl.es>
@bors
Copy link
Contributor

bors bot commented Jan 4, 2019

Build failed

@dzajkowski
Copy link
Contributor Author

bors r+

bors bot added a commit that referenced this pull request Jan 4, 2019
2060: RCHAIN-993 unit test showing state not catching up with a fix. r=dzajkowski a=dzajkowski



Co-authored-by: Dominik Zajkowski <dzajkowski@invisibl.es>
@bors
Copy link
Contributor

bors bot commented Jan 4, 2019

Build succeeded

@bors bors bot merged commit fb0b7c2 into rchain:dev Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants