Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Root #3016

Closed
j01tz opened this issue Sep 3, 2019 · 12 comments · Fixed by #3017

Comments

@j01tz
Copy link
Contributor

commented Sep 3, 2019

The invalid root issue addressed in #3005 doesn't seem to be solved completely. My node has encountered the case twice today on the latest master branch. Haven't had a chance yet to investigate much further.

$ git rev-parse HEAD
2cebdc58680396323407b2bb483e13439d8269fb
20190902 10:38:57.864 DEBUG grin_chain::txhashset::txhashset - txhashset: rewind_to_pos: header 659976, output 3700006, kernel 1975340
20190902 10:38:57.866 DEBUG grin_chain::txhashset::txhashset - Error returned, discarding txhashset extension: Invalid Root 
 Cause: Unknown 
 Backtrace: 
20190902 10:38:57.866 INFO grin_chain::chain - Rejected block 000019880935 at 329992: Error { inner: 

Invalid Root }
20190902 10:38:57.866 DEBUG grin_servers::common::adapters - process_block: block 000019880935 refused by chain: Other Error: Error { inner: 

Invalid Root }
20190902 20:06:36.789 DEBUG grin_chain::txhashset::txhashset - txhashset: rewind_to_pos: header 661097, output 3707760, kernel 1979678
20190902 20:06:36.790 DEBUG grin_chain::txhashset::txhashset - Error returned, discarding txhashset extension: Invalid Root 
 Cause: Unknown 
 Backtrace: 
20190902 20:06:36.791 INFO grin_chain::chain - Rejected block 00000ea1e252 at 330553: Error { inner: 

Invalid Root }
20190902 20:06:36.791 DEBUG grin_servers::common::adapters - process_block: block 00000ea1e252 refused by chain: Other Error: Error { inner: 

Invalid Root }
@garyyu

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Looks like the fix of #3005 flipped the coin to another side, now the "Invalid Root" problem could happen in a high possibility once the node see a fork. (both 329991 and 330552 are the fork block).

I checked some of my servers, most of those servers upgraded on latest master branch has this Invalid Root problem now, but not all.

@garyyu garyyu added the bug label Sep 3, 2019
@antiochp

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Looks like the fix of #3005 flipped the coin to another side, now the "Invalid Root" problem could happen in a high possibility once the node see a fork. (both 329991 and 330552 are the fork block).

Interesting.

@garyyu, @j01tz can you try with the branch here #3004 and see if you can reproduce the "Invalid Root" problem?

Edit: (after I fix that branch, its got a bad merge on it right now). (branch is good now)

@antiochp

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

I'm spending some time today thinking through possible permutations of block and header processing. I suspect we do not have good test coverage for some of these.
i.e. It is possible to process "header first" among blocks in a different order to the full blocks. And a lot of our test coverage makes assumptions about one fork coming first - but there may actually be interleaved as we see here.

@garyyu

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

@antiochp OK, will try later.
I'm thinking why the Travis-CI tests for the fork case just passed on #3005, will investigate the tests. Now, I just draft a quick test case and looks like I can always reproduce the Invalid Root error.

@antiochp

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Can you post your failing test here?

@garyyu

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

OK, Here:

`Invalid Root` failing test code
#[test]
fn process_a_simple_fork() {
	let chain_dir = ".grin.simplefork";
	clean_output_dir(chain_dir);

	global::set_mining_mode(ChainTypes::AutomatedTesting);
	let kc = ExtKeychain::from_random_seed(false).unwrap();
	let genesis = pow::mine_genesis_block().unwrap();

	let last_status = RwLock::new(None);
	let adapter = Arc::new(StatusAdapter::new(last_status));
	let chain = setup_with_status_adapter(chain_dir, genesis.clone(), adapter.clone());

	let prev = chain.head_header().unwrap();
	assert_eq!(prev, genesis.header);

	// add 1 header + 1 block
	let b0 = prepare_block(&kc, &prev, &chain, 1);
	chain
		.process_block_header(&b0.header, chain::Options::SKIP_POW)
		.unwrap();
	chain
		.process_block(b0.clone(), chain::Options::SKIP_POW)
		.unwrap();

	let prev = chain.head_header().unwrap();
	assert_eq!(prev, b0.header);

	// add 1 header + 1 block
	let b1 = prepare_block(&kc, &prev, &chain, 1_000_086_937_600_726u64);
	chain
		.process_block_header(&b1.header, chain::Options::SKIP_POW)
		.unwrap();
	chain
		.process_block(b1.clone(), chain::Options::SKIP_POW)
		.unwrap();

	// Now mine a fork block and process this block.
	let b2 = prepare_block(&kc, &prev, &chain, 1_000_086_937_600_726u64);
	chain
		.process_block(b2.clone(), chain::Options::SKIP_POW)
		.unwrap();

	// Check head and header_head both reflect b1 (first come wins).
	let head = chain.head().unwrap();
	assert_eq!(head, Tip::from_header(&b1.header));
	let header_head = chain.header_head().unwrap();
	assert_eq!(header_head, Tip::from_header(&b1.header));

	// Now mine a block b3 to continue b2.
	let prev = b2.header;
	let b3 = prepare_block(&kc, &prev, &chain, 3);
	chain
		.process_block_header(&b3.header, chain::Options::SKIP_POW)
		.unwrap();

	// now header_head should references b3.
	let head_header = chain.header_head().unwrap();
	assert_eq!(head_header, Tip::from_header(&b3.header));

	// Now process the full block for b3.
	chain
		.process_block(b3.clone(), chain::Options::SKIP_POW)
		.unwrap();

	// Check head and header_head both reflect b3, as this is the winning full block at this height.
	let head = chain.head().unwrap();
	assert_eq!(head, Tip::from_header(&b3.header));
	let header_head = chain.header_head().unwrap();
	assert_eq!(header_head, Tip::from_header(&b3.header));

	clean_output_dir(chain_dir);
}

To run this test:

$ RUST_BACKTRACE=1 cargo test -p grin_chain -- process_a_simple_fork --nocapture
@antiochp

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Cool thanks.

Confirmed this fails on master but passes successfully on the branch from #3004.

@garyyu

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

What's the root cause of this Invalid Root?
Perhaps still deserve a dedicated fix PR for this.

@antiochp

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Still trying to work this out - and yes agreed it would be good to fix this on master.

@antiochp

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

Looks like the fix of #3005 flipped the coin to another side

Yes, that is almost exactly what happened - the fix in #3005 now allows certain scenarios to correctly handle the header (by correctly rewinding the header MMR) but then when subsequently processing the full block our txhashset MMRs are not correctly rewound.

The fix in #3017 ensures the following rules are followed -

  • rewind the header MMR based on header_head
  • rewind the txhashset MMRs based on head

Normally these are consistent and track the same chain head. But during periods when these diverge it is critical these rules are followed correctly.

@j01tz

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2019

Should #3017 be able to unstick the node in this state or do I need to remove chain_data to test properly? This is what I'm getting on that branch on my previously stuck node:

20190903 08:22:56.523 DEBUG grin_chain::txhashset::txhashset - txhashset: rewind_to_pos: header 661093, output 3707754, kernel 1979674
20190903 08:22:56.523 DEBUG grin_chain::txhashset::txhashset - Error returned, discarding txhashset extension: Already Spent: Commitment(081ed1fce34d44eaab2d29eea5b06241f82a08c19f12d16dde81f91d4b73c5e9d3) 
 Cause: Unknown 
 Backtrace: 
20190903 08:22:56.523 INFO grin_chain::chain - Rejected block 00000ea1e252 at 330553: Error { inner: 

Already Spent: Commitment(081ed1fce34d44eaab2d29eea5b06241f82a08c19f12d16dde81f91d4b73c5e9d3) }
20190903 08:22:56.524 DEBUG grin_servers::common::adapters - process_block: block 00000ea1e252 refused by chain: Other Error: Error { inner: 

Already Spent: Commitment(081ed1fce34d44eaab2d29eea5b06241f82a08c19f12d16dde81f91d4b73c5e9d3) }
@antiochp

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

@j01tz Already Spent indicates the UTXO got corrupted locally - the output MMR and the leafset got out of sync somehow.

I wish we had some way of allowing a node from recovering from this (mainly for testing purposes for scenarios like this). But right now unfortunately I don't think there is any way to recover.

Edit: Tracking some thoughts around recovering from this on a local node here #3018. It should be possible to do this robustly without too much work (and will make our lives so much easier).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.