-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor TxPool, improve its robustness #1802
Conversation
# Conflicts: # dataRetriever/txpool/shardedTxPool.go # storage/txcache/config.go # storage/txcache/disabledCache.go # storage/txcache/txByHashMap.go # storage/txcache/txListBySenderMap_test.go # storage/txcache/txListForSender.go # storage/txcache/txListForSenderAsSortedMapItem_test.go # storage/txcache/txListForSender_test.go # storage/txcache/wrappedTransaction.go
@@ -0,0 +1,65 @@ | |||
package preprocess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not create a sub-package called factory?
Why not splitting this file? Contains the factory instance and a disabled component. HArd to figure this out just reading the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated at 8), this PR removes the old, deprecated sorting operations of the old transactions cache.
preprocess/transactions
handles the TxCache
using the Cacher
interface. However, when it selects transactions at proposal time, it requires the selection operation, which the Cacher
interface does not provide. This is why we needed the adapter in the first place. The adapter factory was switching between the new sortedTxsProvider
and the old one - the old one was used if the cast cache.(*TxCache)
failed (never the case in real life). Now, since we've removed the old components, we had to use a disabled substitute. Of course, this is suboptimal design. We actually have to fix transactions.go
to depend on a well-defined interface instead, and discard the adapters and the adapter factory altogether. Not doable in this PR though, I've added a TODO now. I will try to figure out another way in the next PR, pool++cross
.
Furthermore, the factory and the adapters were kept in separate files before this PR. I grouped them together to increase the cohesion of smelly things that we should remove sometimes soon.
Is this reasonable (that is, doing this non-trivial removal / change incrementally to avoid larger changes in the code at once)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is reasonable as it is for now.
) | ||
|
||
// createSortedTransactionsProvider is a "simple factory" for "SortedTransactionsProvider" objects | ||
func createSortedTransactionsProvider(transactionsPreprocessor *transactions, cache storage.Cacher, cacheKey string) SortedTransactionsProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transactionsPreprocessor and cacheKey are not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, removed leftovers :)
storage/txcache/score.go
Outdated
listForSender.scoreChunkMutex.Lock() | ||
listForSender.scoreChunk = scoreChunk | ||
listForSender.scoreChunkMutex.Unlock() | ||
func (computer *disabledScoreComputer) computeScore(scoreParams senderScoreParams) uint32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused parameter, maybe use _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
for _, txHash := range txHashes { | ||
txMap.removeTx(string(txHash)) | ||
_, removed := txMap.removeTx(string(txHash)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the removeTx implementation seems a bit weak as for example if we concurrently try to remove the same txhash we might end up with a wrong count and wrong size (removal succeeds only once but we decrement the counter and the size twice)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice catch!
tx, ok := txMap.getTx(txHash)
could return ok == true
for both the concurring routines and both will decrement, which, indeed, is not correct.
The issue can appear if transactions are removed during eviction (on high load) and also removed (at the same time) on the "commit / finalize block" event.
In monitoring.go
, in function diagnose()
we have:
numTxsEstimate := int(cache.CountTx()) // this comes from the possibly wrong counter
numTxsInChunks := cache.txByHash.backingMap.Count()
If the estimate does not equal numTxsInChunks
, a log.Debug()
is emitted and we should see the difference. This diagnose()
is unfortunately (currently) called only upon transaction selection (on my turn).
A possible solution to this issue: in txMap.backingMap.Remove
, when the map chunk is locked (under mutex), do the "item exists" check and call a onItemRemovedCallback()
to decrement the atomic counters. Would this be fine?
Another solution would be to apply corrections from time to time - that would work fine for the counter, but the corrections for the size counter would be quite inefficient.
I've added a TODO for this correction (it's an old condition).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same, add the check in the Remove under lock and have the function return a bool if the remove was done.
@@ -9,6 +9,7 @@ import ( | |||
) | |||
|
|||
var _ storage.Cacher = (*TxCache)(nil) | |||
var _ txCache = (*TxCache)(nil) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put and HasOrAdd have some unused parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed / _
(these methods of the Cacher
interface are not implemented by the TxCache
). For disabledCache
I've left them as they were, since we plan to remove that struct in a future PR anyway (see comments above). Is this fine?
storage/txcache/txCacheFailsafe.go
Outdated
|
||
// Remove delegates to backing cache | ||
func (decorator *txCacheFailsafeDecorator) Remove(key []byte) { | ||
decorator.RemoveTxByHash(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_= unhandled error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failSafe component was removed in the fix-after-review.
func (txMap *txListBySenderMap) notifyScoreChange(txList *txListForSender, scoreParams senderScoreParams) { | ||
score := txMap.scoreComputer.computeScore(scoreParams) | ||
txList.setLastComputedScore(score) | ||
txMap.backingMap.NotifyScoreChange(txList, score) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 106 removeSender can act incorrectly if called concurrently for same sender, e.g removing the only tx for that sender and removing the sender due to low tx both will call removeSender. The removal is done only once but the counter is decremented twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! As above, added a TODO. Is this ok?
@@ -108,6 +108,11 @@ func (txMap *txListBySenderMap) removeSender(sender string) bool { | |||
return false | |||
} | |||
|
|||
// TODO / bugfix: when we concurrently try to remove the same sender, | |||
// we might end up decrementing the counter twice. | |||
// Possible solution: use an "onItemRemoved" callback in the "backingMap" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To prevent double decreasing, might use a mutex or atomic function for that in order to make the decrementation concurrent safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System tests passed.
cacheConfig
etc.bucketSortedMap
does not callcomputeScore
anymore, but receives the score insteadtxListForSender
does not compute its score, it only computes thesenderScoreParams
.txListForSender
.RemoveTxsBulk
(it would have a benign overflow, the value only used to monitor / diagnose eviction)