Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pool++limits #1767

Merged
merged 31 commits into from
May 26, 2020
Merged

Fix pool++limits #1767

merged 31 commits into from
May 26, 2020

Conversation

andreibancioiu
Copy link
Collaborator

@andreibancioiu andreibancioiu commented May 19, 2020

This branch will contain fixes applied on top of pool++limits.

Original pool++limits PR: #1716

Helpers to diff with original PR:

Narrative of the bug on pool++limits
Bug behavior: missing transaction 0xabba is requested, resolved, but never added to cache, which results in it being requested again - functional deadlock.

Two conflicting goroutines concur to add a transaction the same one - hash=0xabba - in the cache. Say this is the first transaction of a sender (for example, due to a recent sender sweeping).

  • Routine 1 enters txListBySenderMap.getOrAddListForSender, obtains a list object ID=0xABCD and inserts the transaction ID=0xAAAA with hash=0xabba
  • Routine 2 enters txListBySenderMap.getOrAddListForSender, obtains (a different) a list object ID=0xDCBA and inserts the transaction ID=0xBBBB with hash=0xabba
  • Routine 1 arrives first to insert the transaction in the map by hashes. It does so, ID=0xAAAA.
  • Routine 2 will not insert the transaction in the map of hashes (transaction already there).
  • Routine 1's list object ID=0xABCD [(ID=0xAAAA, hash=0xabba)] will be lost forever from memory.
  • Routine 2's list object ID=0xDCBA [(ID=0xBBBB, hash=0xabba)] will be saved in the map by senders.
  • An arbitrary routine will intends to remove transaction hash=0xabba. This is found in the map by hashes, with pointer ID=0xAAAA, and removed. Object ID=0xAAAA is then searched in the list object ID=0xDCBA. The pointer comparison in txListForSender.findListElementWithTx fails to find the transaction with hash=0xabba. Not removed - bug, (ID=0xBBBB, hash=0xabba) stays there - bug.
  • Processing requires the transaction hash=0xabba. This is not found in the map by hashes (recently deleted, previous step), thus the transaction is requested from others. The transaction arrives.
  • The transaction hash=0xabba cannot be inserted in the cache, as the de-duplication logic in txListForSender.findInsertionPlace() detects duplication.
  • Processing requires the transaction again. Not found (in map by hashes).
  • Again, not added, due to duplication.
  • Functional deadlock.

Fixes

  1. Undo insertion order (first in map by hashes, then in map by senders)
  2. Independent de-duplication in both structures - try to add in both structures, only add if needed
  3. Independent transaction removal in both structures not possible.
  4. Use bytes.Equal(hash of a, hash of b) to compare transactions.
  5. Most importantly, add double-checked locking when creating a txListForSender.

Can be tested using these log levels:

"*:DEBUG,api:INFO,txcache:TRACE,dataretriever/requesthandlers:TRACE,process/block/preprocess:TRACE"

@andreibancioiu andreibancioiu changed the title [WIP] Fix pool++limits Fix pool++limits May 20, 2020
@andreibancioiu andreibancioiu marked this pull request as ready for review May 20, 2020 21:31
@miiu96 miiu96 closed this May 21, 2020
@miiu96 miiu96 deleted the pool++limits-fixed branch May 21, 2020 06:08
@AdoAdoAdo AdoAdoAdo restored the pool++limits-fixed branch May 21, 2020 06:13
andreibancioiu and others added 4 commits May 21, 2020 11:52
# Conflicts:
#	dataRetriever/factory/txpool/txPoolFactory_test.go
#	dataRetriever/txpool/shardedTxPool.go
#	dataRetriever/txpool/shardedTxPool_test.go
#	storage/txcache/config.go
#	storage/txcache/eviction_test.go
#	storage/txcache/txCache_test.go
#	storage/txcache/txListForSender_test.go
@sasurobert sasurobert requested review from sasurobert and removed request for iulianpascalau and SebastianMarian May 22, 2020 08:07
SizeInBytes = 524288000
SizeInBytesPerSender = 12288000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 12MBs per sender ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is a much larger limit (x20) than we originally thought of (~600kb). I've used this (high) limit to ease testing via TxGen. After some rounds of testing, we can lower it to ~600kb & 1000 txs per sender. @AdoAdoAdo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of having bigger limits. In case of smart contracts, oracle's this limit is too small.

dataRetriever/txpool/argShardedTxPool.go Show resolved Hide resolved
dataRetriever/txpool/shardedTxPool.go Show resolved Hide resolved
storage/txcache/config.go Show resolved Hide resolved
storage/txcache/monitoring.go Outdated Show resolved Hide resolved
storage/txcache/txCache.go Outdated Show resolved Hide resolved
storage/txcache/txCache.go Show resolved Hide resolved
storage/txcache/txCache.go Show resolved Hide resolved
storage/txcache/txCache.go Outdated Show resolved Hide resolved
storage/txcache/txListBySenderMap.go Show resolved Hide resolved
Copy link
Contributor

@sasurobert sasurobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small changes

storage/txcache/eviction.go Outdated Show resolved Hide resolved
storage/txcache/testutils_test.go Show resolved Hide resolved
storage/txcache/testutils_test.go Outdated Show resolved Hide resolved
storage/txcache/txCache.go Show resolved Hide resolved
storage/txcache/txCache.go Outdated Show resolved Hide resolved
storage/txcache/txCache_test.go Outdated Show resolved Hide resolved
storage/txcache/txCache_test.go Outdated Show resolved Hide resolved
storage/txcache/txListBySenderMap_test.go Show resolved Hide resolved
Copy link
Contributor

@sasurobert sasurobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those verification should be done with > then a base constant

Copy link
Contributor

@LucianMincu LucianMincu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System tests passed.

@LucianMincu LucianMincu merged commit a20308d into development May 26, 2020
@LucianMincu LucianMincu deleted the pool++limits-fixed branch May 26, 2020 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants