Skip to content
This repository has been archived by the owner on Oct 21, 2022. It is now read-only.

Changing widths of integers across round-trips breaks relocations #198

Closed
Arnavion opened this issue Mar 12, 2018 · 12 comments
Closed

Changing widths of integers across round-trips breaks relocations #198

Arnavion opened this issue Mar 12, 2018 · 12 comments

Comments

@Arnavion
Copy link
Contributor

I pointed out in #196 (comment) that LLVM/lld sometimes compiles integers to larger representations than they need. That particular case was an i32.const 0 compiled as 0x41 0x80 0x80 0x80 0x80 0x00 instead of just 0x41 0x00. parity-wasm can read it just fine, but it serializes it back out as 0x41 0x00 instead of the original representation.

This is a problem for relocations since relocations are measured in byte offsets from the start of the corresponding section. So a round-trip breaks all relocations after such a value.

It seems to me from testing that lld does this for every relocatable value, probably intentionally so that loaders / linkers doing have space to write the relocated value without needing to move bytes around.

How do you think this should be fixed?

  1. Each RelocationEntry could have some reference to the corresponding Var*Int* (or to the section containing the Var*Int*. But this crosses section boundaries and be brittle.

  2. Or, each Var*Int could store the width it originally had, and the serializer could artificially expand it to that width. But this breaks the From and Into impls for these types.

@NikVolf
Copy link
Contributor

NikVolf commented Mar 12, 2018

I believe we can just use separate type with custom serialization for such cases

@pepyakin
Copy link
Contributor

Is it really about original length?

As far as I remember, wabt has a switch like “non canonical lebs” for such cases, and if it set then all lebs will be encoded with it’s maximal length.

@Arnavion
Copy link
Contributor Author

Arnavion commented Mar 12, 2018

Is it really about original length?

Yes. Only relocatable addresses are being emitted as 5-byte values. Non-address constants are being emitted as the smallest width. See res/cases/v1/relocatable.wasm that I added in my PR - there is (call $console_log (i32.const 0) (i32.const 2)) where the 0 is 5 bytes (address of relocatable global) and the 2 is 1 byte (string length of that global).

@NikVolf
Copy link
Contributor

NikVolf commented Mar 12, 2018

I suggest to just add VarInt32NonCan or similar, and let it serialize to full 5 bytes always or remember it's deserialized length (not sure the latter is even required)

@Arnavion
Copy link
Contributor Author

It does mean that the const opcodes like I32Const and the load/store opcodes like I32Load will have to change to hold VarInt32NonCan instead of i32 / u32

@NikVolf
Copy link
Contributor

NikVolf commented Mar 12, 2018

Yeah, this is no-go, i thought it was happening in relocation sections only

@NikVolf
Copy link
Contributor

NikVolf commented Mar 13, 2018

Another solution will require a lot of work
code/global section can be generic over VarInt32/VarInt32NonCan(or "VarInt32FixedLen"), with regular VarInt32 being default

Not sure if it worth troubles

@Arnavion
Copy link
Contributor Author

How would that help? Like I said earlier, some opcodes contain smallest width and some opcodes contain largest width. It's not always one or always the other.

I'm thinking of storing a map of original offset -> opcode index in the code section and having the relocation section serialize differently based on how the code section was serialized. Haven't yet written the code to see how feasible it is because it means I need to have a custom Writer to be able to get the new opcode offsets out from CodeSection::serialize and then somehow propagate them to RelocSection::serialize...

@NikVolf
Copy link
Contributor

NikVolf commented Mar 13, 2018

something like this:
https://gist.github.com/anonymous/da3588dedc6427ec1c49e804a26ec12f#file-playground-rs

but surely it is not worth it

@Arnavion
Copy link
Contributor Author

  • Parameterizing the whole Opcode means that opcodes like I32Load(u32, u32) can't have different widths for their two values.

  • Opcodes needs to hold both type of Opcode like I said above so you cannot parameterize it.

@NikVolf
Copy link
Contributor

NikVolf commented Mar 14, 2018

You just add two width fields (if both operands actually do require so), and you don't need to hold both types of Opcode, you just use width-preserving variant whenever you need it, and use it for all opcodes (since its abilities to serialize match those of regular variant if the representation is the most compact originally).

@Arnavion
Copy link
Contributor Author

I made a prototype of how it would look for individual opcodes to store the original width with the value: https://github.com/Arnavion/parity-wasm/commits/min-width I only did it for I32Const and Call, the two opcodes that relocatable.wasm uses. I don't think it's particularly invasive. Tell me what you think.

That said, I want parity-wasm to be able to relocate modules in the future, which means it needs to be able to update these relocated values arbitrarily. I don't know if I want to depend on lld to always leave enough space like it does now. So I'm still going to try to implement my idea of storing a map in the CodeSection for original offset -> opcode, and changing the offsets in the RelocSection according to how the CodeSection is serialized.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants