Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: analyzer/runtime: Store a more-parsed tx in the db #368

Merged
merged 8 commits into from
Mar 30, 2023

Conversation

mitjat
Copy link
Collaborator

@mitjat mitjat commented Mar 29, 2023

This PR changes the stage at which we process runtime txs: The processing used to be spread between the analyzer and a "renderer" at http api call time; it now happens entirely in the analyzer, and the parsed tx is stored in the db.

This is a no-op externally, but allows us

  1. one-off queries/analyses in the db
  2. implementation of filters on inner fields, such as method = accounts.Transfer

The first two commits are just pre-work and could well be separate small PRs. I recommend reviewing separately.

Testing: reindex_and_run.sh for 5000 rounds.
Diffs: The body of consensus.Withdraw txs is now serialized in a more verbose/interpretable way: Before:

      "body": {
        "amount": [
          "HLs6P/CNCAAA",
          ""
        ],
        "to": "AJifyI0wEVN16b6H61PxYQ/Hkvaw"
      },

After:

      "body": {
        "amount": {
          "Amount": "530000000000000000000",
          "Denomination": ""
        },
        "to": "oasis1qzvfljydxqg4xa0fh6r7k5l3vy8u0yhkkqak5uaq"
      },

Copy link
Collaborator

@Andrew7234 Andrew7234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall; would appreciate another pair of eyes on extract.go to double-check that it's not missing anything from logic.go.

storage/client/types.go Show resolved Hide resolved
transactionData.Method,
transactionData.Body,
transactionData.To,
transactionData.Amount,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's also a common.BigInt, should this be

Suggested change
transactionData.Amount,
transactionData.Amount.String(),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion finally works "automatically" (we do provide a special serialization method on BigInt) as it should since we've migrated to pgx v5. (Edit: Not really, read on)

If you're alluding to transactionData.Fee.String() above, that did require the String() for some reason ... The difference is that Fee is BigInt whereas Amount is *BigInt. I change Fee.String() to &Fee and now everything (still) works, with no explicit String(). 😬

Comment on lines +336 to +346
LEFT JOIN chain.runtime_transaction_signers AS signer0 USING (runtime, round, tx_index)
LEFT JOIN chain.address_preimages AS signer_eth ON
(signer0.signer_address = signer_eth.address) AND
(signer_eth.context_identifier = 'oasis-runtime-sdk/address: secp256k1eth') AND (signer_eth.context_version = 0)
LEFT JOIN chain.address_preimages AS to_eth ON
(txs.to = to_eth.address) AND
(to_eth.context_identifier = 'oasis-runtime-sdk/address: secp256k1eth') AND (to_eth.context_version = 0)
LEFT JOIN chain.runtime_related_transactions AS rel ON
(txs.round = rel.tx_round) AND
(txs.tx_index = rel.tx_index) AND
(txs.runtime = rel.runtime) AND
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this query is quite impressive; do we expect any performance issues that might arise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually expect it to be faster. (Famous last words ...)
They're very lightweight JOINs, they each look up a single row per tx, and using an index. The nested SELECT we had before is where the engine is more likely to have a hard time optimizing it into a reasonable plan. This just replaces the nested SELECT (with a JOIN of its own, no less!) with two new JOINs.

($4::text IS NOT NULL)
WHERE
(txs.runtime = $1) AND
(signer0.signer_index = 0) AND
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a dumb q, but is it worth moving this line to the join above? eg, LEFT JOIN chain.runtime_transaction_signers AS signer0 USING (runtime, round, tx_index) ON (signer0.signer_index = 0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not dumb at all, might be a little more readable. I debated it myself. A gotcha is that you cannot mix ON and USING, so we'd have to expand the USING into a much longer ON block, which I think hurts readability overall. Happy to change it if you have stronger feelings here; I don't.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question about this: does factoring out this WHERE signer0.signer_index = 0 change the meaning of the join outer-sidedness? if we tried doing the same for a hypothetical signer1 field, would the dbms conceptually first come up with a record:

method=accounts.Transfer, signer0=oasis1xxxx, signer1=NULL

and then determine NULL != 0 and discard it?

we're lucky here that everything actually has a signer0

Copy link
Collaborator Author

@mitjat mitjat Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought they were equivalent. I now see there's a subtle difference:

a restriction placed in the ON clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join. That does not matter with inner joins, but it matters a lot with outer joins.

See "Notice that placing the restriction in the WHERE clause produces a different result:" here.

Thanks. For our case ... Perhaps we should keep the WHERE even though it's less "pure"? It makes the join less "outer": In the hypothetical case of a tx without a signer, WHERE would discard that tx, whereas ON would return the tx but with a null signer. But our data types don't support null signers. Hrm it's actually good that we'd then crash and return a 500 for a result involving such a tx.

OK let's change the query. It's all pretty philosophical because we always have a signer, but still.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing in #367


fee UINT_NUMERIC NOT NULL,
gas_limit UINT63 NOT NULL,
gas_used UINT63 NOT NULL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a tx fails, will gas_used still be not null? or do we assume it's either 0 or gas_limit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do assume a number. Having NULL here is an interesting option ... one not discussed in #362 (comment).
I'd like to keep it NOT NULL here so we're not introducing another feature/change in this PR. This PR just puts into the DB what we already had in Go variables, with unchanged semantics.

@mitjat mitjat force-pushed the mitjat/cobalt-runtime branch 2 times, most recently from 7a221cb to 72ad55f Compare March 30, 2023 20:11
Base automatically changed from mitjat/cobalt-runtime to main March 30, 2023 20:37
@mitjat mitjat force-pushed the mitjat/store-parsed-runtime-tx-in-db branch from 86e046a to 13298d3 Compare March 30, 2023 20:43
@mitjat
Copy link
Collaborator Author

mitjat commented Mar 30, 2023

would appreciate another pair of eyes on extract.go to double-check that it's not missing anything from logic.go.

That is my biggest concern too; e2e regressions at least cover the most common events though. (They might well miss rarer events if they don't happen to appear in that single API query that run.sh does.)

@mitjat mitjat enabled auto-merge March 30, 2023 21:05
@mitjat mitjat merged commit 307f879 into main Mar 30, 2023
@mitjat mitjat deleted the mitjat/store-parsed-runtime-tx-in-db branch March 30, 2023 21:07
@@ -355,7 +355,7 @@ const (
($5::text IS NULL OR type = $5::text) AND
($6::text IS NULL OR evm_log_signature = $6::text) AND
($7::text IS NULL OR related_accounts @> ARRAY[$7::text])
ORDER BY round DESC, tx_index
ORDER BY round DESC, tx_index, type, body::text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bleh, we ought to file a low priority issue about preserving event order from the node


-- Transaction contents.
method TEXT NOT NULL, -- accounts.Transter, consensus.Deposit, consensus.Withdraw, evm.Create, evm.Call
body JSON NOT NULL, -- For EVM txs, the EVM method and args are encoded in here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this EVM only? or is the comment about this having a different meaning in some cases?

) AS eth_addr_lookup
txs.method,
txs.body,
-- .to, .to_eth, and .amout have to be inferred from the body post-hoc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amout -> amount

Copy link
Collaborator

@pro-wh pro-wh Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's our strategy for to_eth now? second query? next commit, we decide one during analysis

}
// Fancy-format eth addresses: Apply checksum capitalization, prepend 0x.
if t.Sender0Eth != nil {
*t.Sender0Eth = ethCommon.HexToAddress(*t.Sender0Eth).Hex()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view logic in storage layer 🙃

AND txs.tx_index = rel.tx_index
AND txs.runtime = rel.runtime
LEFT JOIN chain.runtime_transaction_signers AS signer0 USING (runtime, round, tx_index)
LEFT JOIN chain.address_preimages AS signer_eth ON
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AS signer0_eth

@@ -325,7 +325,9 @@ const (
txs.size,
txs.method,
txs.body,
-- .to, .to_eth, and .amout have to be inferred from the body post-hoc
txs.to,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok

@@ -21,6 +21,6 @@ func StringifyBytes(value []byte) string {

func QuantityFromBytes(value []byte) quantity.Quantity {
q := *quantity.NewQuantity()
q.FromBigInt(new(big.Int).SetBytes(value))
_ = q.FromBigInt(new(big.Int).SetBytes(value))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's this to put the errors into 🕳️

@@ -304,7 +304,7 @@ func (m *Main) queueDbUpdates(batch *storage.QueryBatch, data *BlockData) {
transactionData.Index,
transactionData.Hash,
transactionData.EthHash,
transactionData.Fee.String(),
&transactionData.Fee, // pgx bug? Needs a *BigInt (not BigInt) to know how to serialize.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we defined a method only on *BigInt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v tangential but encoding/gob complains if you use pointer receivers like func (t *MyType) MarshalBytes... go pointers are funky i guess

Copy link
Collaborator

@pro-wh pro-wh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that was less hectic than I was afraid of

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants