Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the block execution pipeline #17

Merged
merged 52 commits into from
Feb 6, 2024
Merged

Implement the block execution pipeline #17

merged 52 commits into from
Feb 6, 2024

Conversation

jsign
Copy link
Owner

@jsign jsign commented Jan 11, 2024

This PR rebuilds our block execution pipeline to something rather complete. The implementation is based on the execution-spec and the EVMC docs to integrate with EVMOne.

Laterally, this also fixed existing bugs, added more signature support, RLP decoding for types, and many things needed to make this work.

Instead of writing a long PR description, I'll do some heavy PR commenting to help review and explain these other things.

…ions

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
…ich requied fixes

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
…ing snapshoting more seameless

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
.root_source_file = .{ .path = "src/main.zig" },
.root_source_file = .{ .path = "src/lib.zig" },
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually recommended to put library tests in their own file, to separate the unit_test executable from the "real" executable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good, but maybe give it a more specific name than lib ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe. I'll think about it.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rockstars in this PR are two new files:

  • src/blockchain/blockchain.zig
  • src/blockchain/vm.zig

The former (this file) contains most of the execution specs code, which includes:

  • Block validation
  • Tx validation
  • Gas fee calcs (base fee, refunds, intrinsic costs, etc)
  • Access lists initialization
  • Tx -> Message creation (i.e: the first message to exec)

The latter (i.e: .../vm.zig) has the message execution pipeline, which mostly translates to EVMOne implementation.

const VM = vm.VM;
const Keccak256 = std.crypto.hash.sha3.Keccak256;

pub const Blockchain = struct {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the top level entity that manages the execution pipeline.

Comment on lines +31 to +33
state: *StateDB,
prev_block: BlockHeader,
last_256_blocks_hashes: [256]Hash32, // ordered in asc order
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the three main fields that it tracks:

  • The current state.
  • The previous block, which will be used when asked to run the next block to do some validations.
  • The last 256 block hashes since this state is needed for BLOCKHASH.
  • The txn signer for this chain id (to be used in getting sender from signature).

};
}

pub fn runBlock(self: *Blockchain, block: Block) !void {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On every new block, the client calls this fn to move forward.

@@ -24,24 +24,38 @@ pub const Txn = union(TxnTypes) {
}

// decode decodes a transaction from bytes. The provided bytes are referenced in the returned transaction.
pub fn decode(bytes: []const u8) !Txn {
pub fn decode(arena: Allocator, bytes: []const u8) !Txn {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we receive an allocator, since now we fixed zig-rlp to allow decoding the data field which is a slice that requires an allocation.

Note something important. The rlp.deserialize(...) receives an allocator and might potentially do a lot of allocations (in theory, depending how nested the struct is and how many slices might have).

But... there's no deinit(). So actually the RLP library is using the allocator in a way that is not easy to do a "deinit" unless the caller knows all the fields that might have been allocated, etc.

This is clearly a case for an arena allocator, which how I use the library. (SImilar to how std.json works). The client should create an arena, send that as the allocator, and for "deinit()" simply destroy the arena.


return error.UnsupportedTxnType;
}

pub fn decodeFromRLP(self: *Txn, arena: Allocator, serialized: []const u8) !usize {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a custom decoder I had to do, since the Txn type is an union and obviously that type isn't (and can't) be supported in zig-rlp.

Moreover, I need a logic to see if the encoded tx is a LegacyTx or is envelop-based one.

This is done by checking if the first byte:

  • If it's a RLP-struct, then the serialized is a LegacyTx directly.
  • If it isn't, we have to first decode it as a string and then we can delegate to Txn.decode(..) so it can decode it appropriately depending on the envelope tx type.

Probably the niciest example of a custom RLP decoder for zig-rlp.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we could probably implement rlp unions, assuming that they work exactly like transactions: you start with an index in the union and then the payload of the subtype. But let's see if that shows up anywhere else than here. I'll make a pr when this one is merged, so that the logic is abstracted to comptime stuff.

Comment on lines +47 to +49
var ltx: LegacyTxn = undefined;
const size = rlp.deserialize(LegacyTxn, arena, serialized, &ltx);
self.* = .{ .LegacyTxn = ltx };
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that (unfortunately) I cant do self.* = try Txn.decode(arena, str); here (as done in L54), since I need the usize of rlp.deserialize to return in the custom deserializer.

In the case of envelope based tx (L52-L54), I can use Txn.decode since I had to first deserialize serialized as a string with rlp.deserialized which gave me the size I needed. Pretty funny!

@@ -1,8 +1,8 @@
const std = @import("std");
Copy link
Owner Author

@jsign jsign Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I was tempted to do the Txn->Tx renaming in this PR but I prefer to do that in a separate one to avoid even more changed lines. I'll do that soon after.

Comment on lines +136 to +139
const prev_block_hash = try common.decodeRLPAndHash(BlockHeader, allocator, prev_block, null);
if (!std.mem.eql(u8, &curr_block.parent_hash, &prev_block_hash))
return error.InvalidParentHash;
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this decodeRLPAndHash(...) is that I found this (fixed now) bug.

The usual thing of "Ah, doing this check is easy" and you get in a rabbit hole. :P

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐇

@jsign jsign requested a review from gballet January 11, 2024 18:37
@jsign jsign changed the title [draft] Implement the block execution pipeline Implement the block execution pipeline Jan 11, 2024
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
return evmc.EVMC_ACCESS_COLD;
}

fn call(ctx: ?*evmc.struct_evmc_host_context, msg: [*c]const evmc.struct_evmc_message) callconv(.C) evmc.struct_evmc_result {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Days after comment: this call is missing some things for contract creation transactions. I'm working on that now and it will be in another PR. Just mentioning.

Copy link
Collaborator

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that was intense! I left a lot of questions and comments, but they can all be ignored for now, except the one about the max code size. But feel free to merge whenever you feel it's ready.

.root_source_file = .{ .path = "src/main.zig" },
.root_source_file = .{ .path = "src/lib.zig" },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good, but maybe give it a more specific name than lib ?

pub const Blockchain = struct {
allocator: Allocator,
chain_id: config.ChainId,
state: *StateDB,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh, that name strikes terror in the hearts of men. Let's name it something more contributor-friendly like just State. It's not meant to be a db, is it?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you refer to the field name or the type name?
StateDB is a database, yes (in memory one).
state refers to the state of the blockchain.

Are you suggesting State instead of StateDB?
I don't have a strong opinion, just checking that's your suggestion.

Comment on lines +58 to +60
var arena = std.heap.ArenaAllocator.init(self.allocator);
defer arena.deinit();
const allocator = arena.allocator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we make the top-level allocator an area allocator, and just pass self.allocator() when we call runBlock ? and then let the caller deinit() it? This way, we can use other types of allocators for "unit" tests without having to worry about it, and also so that some data can be extracted by the client call after runBlock ? I fear that, otherwise, it might force us to allocate+copy data that we then need to pass to the called, and then have to make the runBlock function signature impossibly long.

Comment on lines +65 to +82
pub fn processMessageCall(self: *VM, msg: Message) !evmc.struct_evmc_result {
const evmc_message: evmc.struct_evmc_message = .{
.kind = if (msg.target != null) evmc.EVMC_CALL else evmc.EVMC_CREATE,
.flags = 0,
.depth = 0,
.gas = @intCast(msg.gas),
.recipient = toEVMCAddress(msg.current_target),
.sender = toEVMCAddress(msg.caller),
.input_data = msg.data.ptr,
.input_size = msg.data.len,
.value = blk: {
var txn_value: [32]u8 = undefined;
std.mem.writeIntSliceBig(u256, &txn_value, msg.value);
break :blk .{ .bytes = txn_value };
},
.create2_salt = undefined, // EVMC docs: field only mandatory for CREATE2 kind which doesn't apply at depth 0.
.code_address = toEVMCAddress(msg.code_address),
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok if that's what you need, I was wondering if we could instead just abstract the struct_evmc_message as a Message and not have to worry about the internals/conversion, but I guess it's not necessary right now since we don't have any alternative interpreter.


// EVMOneHost contains the implementation of the EVMC host interface.
// https://evmc.ethereum.org/structevmc__host__interface.html
const EVMOneHost = struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to its own emvc.zig file for cleanliness

allocator: Allocator,
chain_id: config.ChainId,
state: *StateDB,
prev_block: BlockHeader,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about making this a ?BlockHeader since we execute in stateless and might not have that previous block?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? A stateless client must always have the previous block.
Even if it's the first one that it validates. If that isn't the case, how would verify the witness proof?

Comment on lines +136 to +139
const prev_block_hash = try common.decodeRLPAndHash(BlockHeader, allocator, prev_block, null);
if (!std.mem.eql(u8, &curr_block.parent_hash, &prev_block_hash))
return error.InvalidParentHash;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐇


fn applyBody(allocator: Allocator, chain: *Blockchain, state: *StateDB, block: Block) !BlockExecutionResult {
var gas_available = block.header.gas_limit;
for (block.transactions) |tx| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it should be called applyTransactions or something of the sort, because the body contains more information than just the txs, like uncles (not implemented, but still) and withdrawals. If you want to process witndrawals in here was well, then maybe applyStateTransition ?

Comment on lines +289 to +291
// TODO: self destruct processing
// for address in output.accounts_to_delete:
// destroy_account(env.state, address)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree, but in spite of what I said in other comments about the tooling, we should start by supporting our basic use case. I said otherwise before I got to this massive file 😉

return false;
if (tx.getNonce() >= (2 << 64) - 1)
return false;
if (tx.getTo() == null and tx.getData().len > 2 * params.max_code_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 2* here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was surprising to me too. Turns out that the inint code size seems to be double the max code size. This is done this way in the spec, and also double checked in geth.

@jsign jsign merged commit e6be1bc into main Feb 6, 2024
4 checks passed
@jsign jsign deleted the jsign-vm2 branch February 6, 2024 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants