Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message encoding / decoding / logging #18

Merged
merged 24 commits into from
Dec 13, 2022
Merged

Conversation

tkporter
Copy link
Collaborator

@tkporter tkporter commented Dec 7, 2022

This is a fun one!

It's super important that the Fuel/Sway Mailboxes encode messages and keccak256 them in the exact same way they do in the EVM/Solidity world. Unfortunately, Fuel/Sway has some quirks that makes it hard to get this out of the box:

  1. No equivalent of abi.encodePacked. The only encoding that sort of exists is the implied encoding of primitive types in structs. However these aren't packed tightly -- they're padded to fit in whole words. A word in Fuel is 64 bits or 8 bytes. So the following struct would look like this:
struct Foo {
  bar: u8,
  baz: u32,
}

let foo = Foo {
  bar: u8::max(),
  baz: u32::max(),
};

// In memory:
00 00 00 00 00 00 00 FF // <-- the u8 max, left padded with zeroes into the first word
00 00 00 00 FF FF FF FF // <-- the u32 max, left padded with zeroes into the second word
  1. Vec<u8> in Sway allocates its elements on the heap. Similar to structs, it doesn't tightly pack its elements-- each u8 element has it's own 64 bit word allocated to it! The leftmost 7 bytes are zeroes, the rightmost byte is the element. You also can't keccak or log the Vec type out of the box (you pretty much need to write assembly to do this)

  2. The new Bytes type hasn't yet been shipped but has been merged. It provides tight packing of bytes but nothing to help with encoding / decoding non u8 types.

So what can we do?

Originally, I wrote some mostly hand-crafted bitwise operations for EncodedMessage to craft each word and read / write to memory directly. This is pretty fragile and doesn't translate well to other use cases

So instead, this PR provides bytes_extended, which is an extension to the Bytes type. It essentially lets you read and write types at specific offsets in the Bytes. It also provides access to keccak256 and logging over these bytes.

Copy link
Contributor

@yorhodes yorhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work on the byte packing, super hairy

Comment on lines +99 to +104
fn msg_sender_b256() -> b256 {
match msg_sender().unwrap() {
Identity::Address(address) => address.into(),
Identity::ContractId(id) => id.into(),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

hyperlane-mailbox/tests/harness.rs Show resolved Hide resolved
Comment on lines +28 to +30
fn id(message: Message) -> b256 {
EncodedMessage::from(message).id()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we replace all of these with

Suggested change
fn id(message: Message) -> b256 {
EncodedMessage::from(message).id()
}
fn encode(message: Message) -> EncodedMessage {
EncodedMessage::from(message)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this even need to be a contract?

Copy link
Collaborator Author

@tkporter tkporter Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need it to be a contract - I don't think the Rust-based tests could otherwise get access to the library methods for EncodedMessage otherwise. If you try to run abigen in a way where it needs to return the struct EncodedMessage in a function, it'll complain because there's no equivalent for the raw_ptr. Also confirmed by running cargo expand --tests that it just gives you naive bindings to struct types but not to any functions of the structs, so even if we could use raw_ptr we wouldn't be able to call library functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird

Comment on lines 283 to 288
// each of these words. We only care about the rightmost byte for each
// word that's logged.
let body: Vec<u8> = body_log_data
.chunks(BYTES_PER_WORD)
.map(|buf| buf[BYTES_PER_WORD - 1])
.collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brutal

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmao

Comment on lines 105 to 106
let mut word: u64 = (version << 56);
word = word | (nonce << 24) | (origin >> 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to assign multiple times?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately yes lol - there's some jankiness in the compiler that requires this because otherwise it things word is of a non-u64 type. super weird, I describe it here https://github.com/hyperlane-xyz/fuel-contracts/pull/18/files#diff-29cc6efee7411b24b785a04ecdee0529fdef41ff993f0f37d3f4a97fc50102f6R92

//
// recipient_word3 [0:40] - 5 bytes (continued from previous word)
// body [40:??]
word = (recipient_word3 << 24);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to reason through all of this but tbh it seems way too easy to miss something minor here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do trust your test coverage tho

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wonder if we should wait for struct and array packing
seems like a pretty important feature

Comment on lines 258 to 261
(word1 << 8) | (word2 >> 56), // The last 7 bytes of word 1 and the first byte of word 2
(word2 << 8) | (word3 >> 56), // The last 7 bytes of word 2 and the first byte of word 3
(word3 << 8) | (word4 >> 56), // The last 7 bytes of word 3 and the first byte of word 4
(word4 << 8) | (word5 >> 56), // The last 7 bytes of word 4 and the first byte of word 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this almost makes me want to rethink the layout lol

Comment on lines 287 to 317
pub fn body(self) -> Vec<u8> {
let body_len = self.buffer.bytes_len - PREFIX_BYTES;

let mut body: Vec<u8> = Vec::with_capacity(body_len);

// The body starts in word 9.
let mut current_word_index = 9;
let mut word = self.buffer.read_word(current_word_index);

let mut body_index = 0;
while body_index < body_len {
// Where 0 means the furthest left byte in the word.
let byte_index_within_word = (body_index + BODY_START_BYTE_IN_WORD) % BYTES_PER_WORD;
let right_shift = ((7 - byte_index_within_word) * BITS_PER_BYTE);
// Push the byte to the Vec.
let byte: u8 = (word >> right_shift) & 0xff;
body.push(byte);

// If this was the last byte in the word, read the next word.
if byte_index_within_word == 7u64 {
// Move to the next word.
current_word_index += 1;

word = self.buffer.read_word(current_word_index);
}

body_index += 1;
}

body
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really would think we can just return a pointer with len into the buffer without looping

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately can't with the Vec<u8> type because each u8 isn't tightly packed and lives in it's own 8 byte word :/

Comment on lines +25 to +26
// [any prior receipts..., LogData with reason, Revert, ScriptResult]
// We want the LogData with the reason, which is utf-8 encoded as the `data`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

);
};

// Null bytes `\0` will be padded to the end of the revert string, so we remove them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol are we sure we want to remove this? isn't this the null terminator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe null termination isn't relevant to rust strs (see the first couple sentences here https://docs.rs/c_str/latest/c_str/). The string that fuel was giving was padded with a bunch of null bytes so that the string fit cleanly into words -- I think this is just cuz they pad a bunch of zero bytes and not anything to do with null termination

Base automatically changed from trevor/mailbox-and-merkle to main December 12, 2022 13:19
name = "bytes_extended"

[dependencies]
std = { git = 'https://github.com/fuellabs/sway', branch = "master" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for now to get access to FuelLabs/sway#3454 which has not yet been shipped in a forc release. Once they ship, we should rm


impl b256 {
/// Returns a pointer to the b256's packed bytes.
fn packed_bytes(self) -> raw_ptr {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this should probably be a trait but if you see the note here https://fuellabs.github.io/sway/v0.31.1/advanced/traits.html?search=#custom-types-structs-enums, you can't yet have where clauses :( so you can't have functions take in generics that implement a trait yet

Copy link
Contributor

@yorhodes yorhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super happy where we landed
lets ship

Comment on lines +4 to +8
asm(ptr) {
move ptr sp; // Copy the stack pointer (sp) register into `ptr`.
cfei i8; // Add 8 bytes (1 word) to the stack pointer, giving the memory at `ptr` a size of 8 bytes.
ptr: raw_ptr // Return `ptr`.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not?

Suggested change
asm(ptr) {
move ptr sp; // Copy the stack pointer (sp) register into `ptr`.
cfei i8; // Add 8 bytes (1 word) to the stack pointer, giving the memory at `ptr` a size of 8 bytes.
ptr: raw_ptr // Return `ptr`.
}
alloc_bytes(BYTES_PER_WORD)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a possibly incorrect feeling that we should prefer the stack over the heap because (to my understanding) is more ephemeral than the heap. I'm waiting to hear back from the Fuel guys if we're better off on the heap tho. If it's ok with you let's merge and can address this as we hear from the Fuel ppl

Comment on lines 26 to 57
fn write_value_to_stack(value: u64, byte_count: u64) -> raw_ptr {
// Allocate a whole word on the stack.
let stack_word_ptr = alloc_stack_word();
// Write the value onto the stack.
stack_word_ptr.write::<u64>(value);
// Move the pointer forward to ignore any left padded zero bytes, and to point
// directly to the start of the value's contents.
let left_padding_byte_count = BYTES_PER_WORD - byte_count;
stack_word_ptr.add_uint_offset(left_padding_byte_count)
}

/// Reads a value that is `byte_count` bytes in length from `ptr`.
/// Returns this value as a u64.
///
/// ### Arguments
/// * `ptr` - A pointer to memory where the value begins. The `byte_count` bytes
/// starting at `ptr` are read.
/// * `byte_count` - The number of bytes of the original value. E.g. if the value
/// being read is a u32, this should be 4 bytes.
fn read_value_from_memory(ptr: raw_ptr, byte_count: u64) -> u64 {
// Allocate a whole word on the stack.
let stack_word_ptr = alloc_stack_word();
// Copy the `byte_count` bytes from `ptr` into `stack_word_ptr`.
// Note if e.g. 4 bytes are read from `ptr`, these are copied into the
// first 4 bytes of `stack_word_ptr`. These bytes must be shifted to the
// right to be correctly read into a 4-byte u32.
ptr.copy_bytes_to(stack_word_ptr, byte_count);
// Get the word at stack_word_ptr.
let word = stack_word_ptr.read::<u64>();
// Bit shift as neccesary.
word >> (BITS_PER_BYTE * (BYTES_PER_WORD - byte_count))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much more readable now!

Comment on lines +282 to +291
pub fn read_bytes(ref mut self, offset: u64, len: u64) -> Bytes {
let read_ptr = self.get_read_ptr(
offset,
len,
);

let mut bytes = Bytes::with_length(len);
bytes.write_packed_bytes(0u64, read_ptr, len);
bytes
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we return a new Bytes into the existing memory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm this would be really cool...

I could see this maybe being a bad practice to have contention over the same memory, but I think this would actually be fine. I'm gonna open #25 so we can think about this and just merge this for now, but I think this is probably a good move

bytes-extended/src/main.sw Outdated Show resolved Hide resolved
bytes-extended/src/main.sw Show resolved Hide resolved
Comment on lines +28 to +30
fn id(message: Message) -> b256 {
EncodedMessage::from(message).id()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird

EncodedMessage::from(message).recipient()
}

/// Vec/Bytes return types aren't supported by the Rust SDK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if they have something WIP here as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hope so, we should ask for sure

hyperlane-message/src/main.sw Outdated Show resolved Hide resolved
run_tests.sh Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants