go/runtime/txpool: transactions in incoming messages #4681

pro-wh · 2022-04-22T00:06:40Z

roothash incoming messages' data is no longer arbitrary bytes. it is now a runtime transaction
expand txpool to allow multiple tx queues. each queue can have its own way of organizing transactions. the existing "schedule queue" is now called the main queue
add a queue that exposes roothash incoming messages' transactions
while we're at it, add a queue for local transactions that we want to prioritize and preserve the order of. may come in handy later
add a mutex so that we don't recheck/republish/schedule at the same time

go/consensus/tendermint/apps/roothash/transactions.go

pro-wh · 2022-04-23T00:13:58Z

simplified map of executor worker

source

digraph {
    node [shape=none]
    "(block from consensus)" -> HandleNewBlockLocked -> maybeStartProcessingBatchLocked
    "(message with signed proposal)" -> queueBatchBlocking -> handleExternalBatchLocked -> maybeStartProcessingBatchLocked -> startProcessingBatchedLocked
    startProcessingBatchedLocked -> resolveBatchLocked -> "ub.resolve"
    g1 [label="goroutine"]
    resolveBatchLocked -> g1 -> requestMissingTransactions -> "(import from incoming messages)"
    requestMissingTransactions -> "(request through txsync)"
    g2 [label="goroutine"]
    startProcessingBatchedLocked -> g2
    g2 -> getRtStateAndRoundResults -> "(read from consensus)"
    g2 -> runtimeExecuteTxBatch -> "(entry into runtime)"
}

inserted a routine to import transactions from incoming messages at the beginning of requestMissingTransactions (outside of backoff-retry)

older proposal, not used

seems like this will need to start a goroutine as we have some locks held, and we'd need to go off and access consensus state. probably going to need a new state, in that case, to help coordinate the completion of "get transactions from roothash incoming messages" and "request the rest over txsync"

or it might be okay to add the step in the goroutine before requestMissingTransactions. I'll check if we're guaranteed to be caught up to the block in the proposal

pro-wh · 2022-04-25T23:00:01Z

updated, now a routine runs during requestMissingTransactions

uses existing SubmitProposedBatch--is this appropriate? seems to meet our need here:

arranges for these transactions to undergo CheckTx
marks them as in our possession, under txPool.proposedTxs

go/roothash/api/api.go

pro-wh · 2022-04-26T23:52:58Z

updated this to repurpose .Data instead of adding a new .Transaction field. adapted the "incoming message data" structure from the runtime side oasisprotocol/oasis-sdk#832.

pro-wh · 2022-04-28T23:31:10Z

bleh I need to change this some more, likely to make it have a separate queue in the txpool struct

right now it's really fiddly in the runtime, where we loop through the incoming messages and process transactions, but then we have to watch out for those transactions in the provided "batch" so that we don't accidentally process them a second time
if we have a separate queue, we can avoid sending the incoming messages' transactions in the "batch" parameter

some other thoughts:

might want to send the transactions in a separate parameter to the runtime, so that we don't have to parse again in the runtime
in this current draft, we parse once outside the runtime and once inside. if those two ever disagree, it'll be somehow bad, in a way I haven't even worked out yet

pro-wh · 2022-05-10T23:42:06Z

various thoughts from today:

requirements for separate incoming message embedded transactions queue

far less stateful, we will always know the contents by taking messages from consensus roothash. doesn't need step-by-step maintenance of adding and removing finished transactions
for txsync, counts as known. doesn't need to request it, able to provide it
doesn't undergo checktx. any garbage, dupes, silly ordering, whatever get processed as is
the runtime may be written to eliminate duplicates internally in schedule_and_execute_batch. but as always, the proposer may propose anything, not necessarily what the runtime implements
the pool's data structure doesn't need order or allow duplicates; i.e., a set would work. the runtime will actually read from the incoming messages queue
if a tx appears in this queue, we remove it from the main queue

decoding discrepancies between node and runtime

maybe it's safer to let the runtime to do the only decoding
we can still look at the transactions it made in the input write log. but how well would that be supported?
if node sees transaction that runtime doesn't see, no worries. it just sits around
if runtime sees transaction that node doesn't see, runtime has to halt because workers aren't able to obtain the transactions for a block over txsync

pro-wh · 2022-05-17T01:12:42Z

mapping out how the codebase interacts with txPool (updated may 17 2022)

pro-wh · 2022-05-17T22:41:36Z

design notes from today:

txPool already has multiple stores of transactions. one is its scheduleQueue (field is named schedulerQueue), another is a proposedTxs map. we should be able to add another store, with transactions from incoming messages
there's some duplication that I don't understand between the txPool's proposedTxs and the node's state's unresolvedBatch
we can separate off a "main queue" that interacts with all of the existing features:
- incoming transactions first undergo checktx
- prioritization and indexing by scheduleQueue
- automatic push via TxPublish
- automatic recheck
- available to pull via txSync
we add a new concept around the "main queue" of a "schedulable queue" (to distinguish from other queues nearby in the system, such as the checktx queue)
we add a new schedulable queue for transactions from incoming messages, with fewer features:
- no checktx. any junk that won't work is already paid for from the consensus layer
- no priority sorting, no indexing to delete extra txs from same account. one account can even submit multiple txs with increasing nonce and they should work in order. any other junk that won't work is already paid for from the consensus layer
- no push via TxPublish. all nodes should have access from consensus
- no recheck, as there was no check in the first place
- possibly available to pull via txSync, as a nicety to other software. but for efficiency, oasis-node shan't request
how do we keep the transactions in the queues disjoint?
- when we load a tx into the incoming messages queue, something somewhere will remove that tx from the main queue
- probably have an overarching index of all txs in all queues, in order to handle GetKnownBatch and txSync. can consult this before adding to main queue
we can rename + redocument several existing methods to indicate that they only work on the main queue
- seenCache: not sure
- txPool.proposedTxs: collects from all queues
- unresolvedBatch: not sure, seems like it ought to track txPool.proposedTxs and thus collect form all queues as well, but in some cases doesn't
- republish worker: operates only on main queue
- recheck worker: operates only on main queue
- txPool.GetKnownBatch: operates on all queues
- txSync: txs from all queues available, only used to fill in main queue
- requestMissingTransactions -- startProcessingBatchLocked -- resolveBatchLocked circle of doom plus txPool.PromoteProposedBatch and txPool.SubmitProposedBatch:
  - requestMissingTransactions needs a name change. it does more than request. it maintains a lot of state outside of itself too
  - requestMissingTransactions uses txSync, which retrieves from all queues on other nodes
  - requestMissingTransactions currently sends txs to SubmitProposedBatch, I would guess, so that if this round fails and this node becomes proposer, it can check the txs for itself and propose them. we should be able to have it continue to do this, just filter out any that show up in the incoming messages queue, the same way SubmitTx shall have to
  - resolveBatchLocked dunno
  - startProcessingBatchLocked I think we won't have to change
  - PromoteProposedBatch I think we won't have to change
  - SubmitProposedBatch I think we won't have to change, but there's some duplicate work between it and PromoteProposedBatch
to be determined: we currently have a txpool (lowercase p meaning the go module rather than the struct) Transaction type that's highly specialized for the main queue. should we come up with another type for less fully featured schedulable queues?

pro-wh · 2022-05-19T00:25:42Z

design notes from today:

new interfaces:
- UsableTransactionSource - a place to retrieve transactions that are "good enough." "good enough" variously means checktx'd, came from incoming message, or came from our own node
  - GetSchedulingSuggestion() -> tx[] - return some number of transactions to give to the scheduler, up to the implementation
  - GetTxByHash(hash) -> tx? - return the specific tx, if it is in this queue. used for resolving a batch from hashes and serving txSync
  - HandleTxsUsed(hash[]) -> void - callback to indicate that the scheduler is done with a set of txs, by hash. for most implementations, remove it from internal storage. this method may need refinement to pass information about whether it was successfully executed + scheduled or failed
- RecheckableTransactionStore - methods used for rechecking
  - TakeAll() -> tx[] - remove all txs and return them
  - OfferChecked(tx[]) -> void - add txs that are checked
- RepublishableTransactionSource - a place to get transactions that we want to push
  - GetTxsToPublish(now) -> (tx[], next) - get txs that this queue wants to publish between last call and now given, as well as next time that it wants to publish any transactions. if nothing to publish in the future, return next=now+republishInterval so that republish worker checks again if new txs come in
- no common interface for directly editing queue. those are all queue-specific. most changes can probably go through OfferChecked and HandleTxsUsed though
new queues:
- main queue: a priority queue, implements UsableTransactionSource and RecheckableTransactionStore
  - has internal index on sender, to implement existing behavior of replacing txs
  - did we used to republish everyone else's txs too? can maybe implement RepublishableTransactionStore too
- local txs queue: implements UsableTransactionSource, RecheckableTransactionStore, RepublishableTransactionSource
  - no priority reordering, keep our own txs in order
  - scheduling suggestion is always everything
  - we can give txs from this queue to the runtime first
  - could even have a non-checked variant. although we don't particularly desire that. would rather know if there are problems with our tx before spending gas on chain
- incoming messages txs queue: implements UsableTransactionStore
  - HandleTxsUsed does no removal, roothash module manages the set of pending incoming messages for us
  - scheduling suggestion is always empty, as the runtime will actually get transactions from incoming messages list instead of from tx batch parameter
uses of queues by surrounding systems:
- get initial batch to give to scheduler: UsableTransactionSource[], call GetSchedulingSuggestion, concatenate. pool will set it up in order of incoming messages txs queue, local txs queue, main queue
- give scheduler more txs on request: some non-common interface going to the main queue only? maybe. forgot to put this on the map lol
- txSync server and GetKnownBatch: UsableTransactionSource[], calls GetTxByHash in sequence
- checktx worker: RecheckableTransactionStore[] to offer into, checktx requests have an attached RecheckableTransactionStore reference for where to put checked tx
- recheck worker: RecheckableTransactionStore[] to pull from, puts them in checktx queue with with the queue it got each tx from
- republish worker: RepublishableTransactionSource[] to pull from. publishes any it gets, waits until earliest next-publish time. assuming newly added txs can't affect a next-publish time as it would want to be republished farthest in the future
changes, including changes as a result of the above:
- tx type is only the raw tx. metadata e.g. the current Transaction type will be internal to queues
- previously a field tracked a tx's checkedness state, now it'll be managed by moving the tx from the main queue to the check tx queue
- previously when rechecking, txs remain in the main queue while being rechecked. now they will be removed and not reinstated until checked
- check tx queue will maintain metadata indicating which queue to put the finished tx in
  - in recheck, set it to which queue it got the tx from
  - isLocal flag will instead have the destination be the local txs queue
  - isDiscard flag will instead have the destination be nil
  - seems in one place we did local and discard, but the effect would have been the same as only discard. local currently means we publish after checking, but discard comes first and prevents it
- seenCache had previously included transactions that are currently in queues. let's change it only to include txs that are known done (successful only)
  - deduping is now handled by queues, which was all in memory anyway
  - tracking when to publish what is now up to the queues
dunno how to implement:
- early flush if we have enough txs
  - we don't even have gas info on incoming message txs because we don't check them

kostko

Would be cool if you started drafting some of this in code form to make it easier to follow. I think that going with multiple queues is the right approach.

Also it would be great if we didn't need to break the runtime host protocol for this change. What exactly will be changing in a breaking way? Can it be solved with features instead of breaking the protocol?

go/roothash/api/message/incoming_message.go

pro-wh · 2022-05-20T23:50:34Z

there are indeed a lot of places where we relied on the cached tx hash. if many of these remain, let's use a much simplified {raw: byte[], hash: hash} Transaction struct

kostko

I would suggest splitting this into two steps/PRs:

Refactoring the transaction pool.
Adding the RIM queue.

Also why are we going back to untagged transactions (e.g. ones without metadata) instead of having something like the current Transaction type which follows the transaction over its lifetime in the queues? The general one could be simplified (hash and first seen time?) and then queues can wrap with their own metadata if needed?

go/runtime/txpool/local_queue.go

go/runtime/txpool/queues.go

go/runtime/txpool/main_queue.go

go/runtime/txpool/txpool.go

pro-wh · 2022-05-26T18:54:44Z

I would suggest splitting this into two steps/PRs:
* Refactoring the transaction pool.

* Adding the RIM queue.

adding RIM queue is too important to the design of the multiple queue system for us to do the refactor without it. may be able to do both and then remove the RIM queue in one PR though

Also why are we going back to untagged transactions (e.g. ones without metadata) instead of having something like the current Transaction type which follows the transaction over its lifetime in the queues? The general one could be simplified (hash and first seen time?) and then queues can wrap with their own metadata if needed?

not so much that we must go to untagged transactions, but we're experimenting with it because there were no strong uses of it outside of the queue noted while exploring the existing code and considering the goals. we may find some reasons to have tagged transactions during implementation

pro-wh · 2022-05-27T00:07:36Z

changed: OfferChecked now gives singular tx. additional metadata and error return in the future probably

currently pondering details of republishing. seems we currently all txs, even ones received from outside. whereas local txs we publish immediately after checktx, nonlocal ones we defer until the next republish wakeup. and then thereafter they all get republished every minute. if a republish attempt fails (although the current p2p implementation never reports any error), it'll be retried on the next wakeup

kostko · 2022-05-27T08:05:46Z

seems we currently all txs, even ones received from outside. whereas local txs we publish immediately after checktx, nonlocal ones we defer until the next republish wakeup. and then thereafter they all get republished every minute. if a republish attempt fails (although the current p2p implementation never reports any error), it'll be retried on the next wakeup

The interval is actually updated based on the P2P min republish interval (which I believe bumps it somewhere over 2 minutes currently).

Transactions received from outside don't need to be republished immediately (as they will be propagated further via gossipsub). The only reason why republish exists at all is because gossipsub is not reliable. Although since we added txsync maybe this is not really critical anymore.

pro-wh · 2022-05-28T00:09:28Z

stuff from today:

adding some ideas about managing republishing, to space txs out uniformly throughout the republish interval. the hypothesis is that evening out the publish bandwidth is somehow good. since we'll have two queues that want to republish, we could extract out this common interval management functionality.

pragmatically the two queues will push to the republisher rather than the republisher pulling, because the general design of these usable tx queues doesn't expose when txs are added or removed.

this would have been nice to have the check queue give to a usable tx queue, then the usable tx queue gives to the republish queue, then the republisher worker gets from the republish queue, eliminating the need for the check queue to republish directly when completing local txs. and the idea was for the local queue to use PushNow to insert things at the front of the republish queue, but this has the effect of reversing the transaction order. also of not really being immedate due to the uniform spacing taking giving a finite amount of time in between txs. might have to have a special code path for "publish right now but also put it at the end of the queue" after all.

and the whole handling of failed publish is missing. (although the p2p system doesn't currently support failed publish)

pro-wh · 2022-05-31T17:22:17Z

ah let's continue to use the seenCache for republishing 🤷 ain't nobody asked us to even out the republish bandwidth

pro-wh · 2022-05-31T18:32:50Z

actually yeah we'll definitely need to record timestamps for republishing. I just checked the default settings, and the recheck interval is 5 rounds, so ~40 sec on emerald. if we keep rechecking and saying "publish these ~2 minutes from now" but then removing them after 40 seconds (as we would in this new transactions-being-rechecked-are-removed-from-the-queue design), they'll never really be republished

pro-wh · 2022-05-31T23:48:14Z

notes from today:

adding a modest amount of metadata to txs. they now have hash.
undoing the republish new ideas. republishable sources will only give a list of all their transactions, and the republish worker will still consult the seen cache like before

pro-wh · 2022-08-19T00:51:26Z

oasis_txpool_pending_schedule_size now reports the size of the main queue (num txs)
new metrics oasis_txpool_local_queue_size and oasis_txpool_rim_queue_size to track the other two queues

main queue and local queue are updated in:

after we check txs and offer them (increase, like before)
after HandleTxsUsed (decrease, similar to previous remove batch)
after we empty them out for recheck (decrease)

roothash incoming queue is updated in:

process block, when we parse in the next batch (covers txs used and newly arrived txs)

go/runtime/txpool/main_queue.go

go/runtime/txpool/queues.go

go/runtime/txpool/schedule_queue.go

go/runtime/txpool/main_queue.go

pro-wh · 2022-08-23T18:13:05Z

thanks for reviewing!

kostko reviewed Apr 22, 2022

View reviewed changes

go/consensus/tendermint/apps/roothash/transactions.go Outdated Show resolved Hide resolved

pro-wh force-pushed the pro-wh/feature/txunion branch from d3c4537 to ec6f4f4 Compare April 25, 2022 22:53

pro-wh force-pushed the pro-wh/feature/txunion branch from ec6f4f4 to 01017ee Compare April 25, 2022 23:00

kostko reviewed Apr 26, 2022

View reviewed changes

go/roothash/api/api.go Outdated Show resolved Hide resolved

pro-wh force-pushed the pro-wh/feature/txunion branch 2 times, most recently from 46be265 to db4180d Compare April 26, 2022 23:48

pro-wh mentioned this pull request Apr 26, 2022

dispatch incoming messages oasisprotocol/oasis-sdk#832

Draft

kostko reviewed May 19, 2022

View reviewed changes

go/roothash/api/message/incoming_message.go Outdated Show resolved Hide resolved

pro-wh force-pushed the pro-wh/feature/txunion branch 2 times, most recently from ec73c49 to eb69ec2 Compare May 20, 2022 23:49

kostko reviewed May 26, 2022

View reviewed changes

pro-wh force-pushed the pro-wh/feature/txunion branch from 72221aa to 8f18c8c Compare May 27, 2022 00:06

pro-wh force-pushed the pro-wh/feature/txunion branch from 44447e8 to fa09b25 Compare May 31, 2022 18:00

pro-wh force-pushed the pro-wh/feature/txunion branch from fa09b25 to 602d66a Compare May 31, 2022 23:46

pro-wh force-pushed the pro-wh/feature/txunion branch 2 times, most recently from a05f5dd to 3b7b1db Compare August 19, 2022 20:35

kostko reviewed Aug 20, 2022

View reviewed changes

go/runtime/txpool/main_queue.go Outdated Show resolved Hide resolved

go/runtime/txpool/queues.go Outdated Show resolved Hide resolved

kostko reviewed Aug 20, 2022

View reviewed changes

go/runtime/txpool/queues.go Show resolved Hide resolved

go/runtime/txpool/schedule_queue.go Outdated Show resolved Hide resolved

go/runtime/txpool/main_queue.go Outdated Show resolved Hide resolved

pro-wh added 16 commits August 22, 2022 11:05

go/runtime/txpool: fix docstring typos

60b50dc

go/roothash: repurpose IncomingMessage.Data

84649e6

go/runtime/txpool: new queue structures

6b7e529

go common node: parse txs from incoming messages

20dc5db

go/runtime/txpool: use new queues

0148656

go/runtime/txpool: tests for new queues

370c4bb

go/runtime/txpool: add drain lock

fe2b913

add changelog

b36a206

go/runtime/txpool: move Transaction struct

0d05f93

go/runtime/txpool: rename {->MainQueue}Transaction

35de736

go/runtime/txpool: embed TxQueueMeta in MainQueueTransaction

1d4960b

go/worker/compute: fix comment typo

615ae8c

go/runtime/txpool: record first seen in TxQueueMeta

48eca41

go/runtime/txpool: additionally log tx count including recheck

a57aff0

go/runtime/txpool: tx check notifications for new txs only

e464049

go/runtime/txpool: new metrics for multiple queue counts

5d61f2a

pro-wh force-pushed the pro-wh/feature/txunion branch 3 times, most recently from d42912b to 23fdef9 Compare August 22, 2022 20:25

go/runtime/txpool: make TxQueueMeta fields private, add getters

c072a38

pro-wh force-pushed the pro-wh/feature/txunion branch from 23fdef9 to c072a38 Compare August 22, 2022 20:49

kostko approved these changes Aug 23, 2022

View reviewed changes

pro-wh merged commit 4c15373 into master Aug 23, 2022

pro-wh deleted the pro-wh/feature/txunion branch August 23, 2022 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go/runtime/txpool: transactions in incoming messages #4681

go/runtime/txpool: transactions in incoming messages #4681

pro-wh commented Apr 22, 2022 •

edited

Loading

pro-wh commented Apr 23, 2022 •

edited

Loading

pro-wh commented Apr 25, 2022

pro-wh commented Apr 26, 2022

pro-wh commented Apr 28, 2022

pro-wh commented May 10, 2022

pro-wh commented May 17, 2022 •

edited

Loading

pro-wh commented May 17, 2022

pro-wh commented May 19, 2022 •

edited

Loading

kostko left a comment

pro-wh commented May 20, 2022

kostko left a comment

pro-wh commented May 26, 2022

pro-wh commented May 27, 2022 •

edited

Loading

kostko commented May 27, 2022

pro-wh commented May 28, 2022 •

edited

Loading

pro-wh commented May 31, 2022

pro-wh commented May 31, 2022

pro-wh commented May 31, 2022

pro-wh commented Aug 19, 2022 •

edited

Loading

pro-wh commented Aug 23, 2022

go/runtime/txpool: transactions in incoming messages #4681

go/runtime/txpool: transactions in incoming messages #4681

Conversation

pro-wh commented Apr 22, 2022 • edited Loading

pro-wh commented Apr 23, 2022 • edited Loading

pro-wh commented Apr 25, 2022

pro-wh commented Apr 26, 2022

pro-wh commented Apr 28, 2022

pro-wh commented May 10, 2022

pro-wh commented May 17, 2022 • edited Loading

pro-wh commented May 17, 2022

pro-wh commented May 19, 2022 • edited Loading

kostko left a comment

Choose a reason for hiding this comment

pro-wh commented May 20, 2022

kostko left a comment

Choose a reason for hiding this comment

pro-wh commented May 26, 2022

pro-wh commented May 27, 2022 • edited Loading

kostko commented May 27, 2022

pro-wh commented May 28, 2022 • edited Loading

pro-wh commented May 31, 2022

pro-wh commented May 31, 2022

pro-wh commented May 31, 2022

pro-wh commented Aug 19, 2022 • edited Loading

pro-wh commented Aug 23, 2022

pro-wh commented Apr 22, 2022 •

edited

Loading

pro-wh commented Apr 23, 2022 •

edited

Loading

pro-wh commented May 17, 2022 •

edited

Loading

pro-wh commented May 19, 2022 •

edited

Loading

pro-wh commented May 27, 2022 •

edited

Loading

pro-wh commented May 28, 2022 •

edited

Loading

pro-wh commented Aug 19, 2022 •

edited

Loading