Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to guess symbol address in Wasm #539

Merged
merged 4 commits into from
May 5, 2023
Merged

Conversation

ia0
Copy link
Contributor

@ia0 ia0 commented Apr 23, 2023

Fixes #538

src/read/wasm.rs Outdated Show resolved Hide resolved
@philipc
Copy link
Contributor

philipc commented Apr 24, 2023

I'm not familiar enough with wasm to review this as is. Can you provide a test file that exhibits the problem you are fixing? Something minimal that is suitable for https://github.com/gimli-rs/object-testfiles/tree/master/wasm would be good.

@ia0
Copy link
Contributor Author

ia0 commented Apr 24, 2023

I'm actually not able to completely fix my problem with this. I'm trying to make defmt work in wasm (knurling-rs/defmt#738). There are a few issues and having symbol address is only one of them. Here's a high-level overview for context:

  • The goal of defmt is to reduce binary size (and data transfer) for logging.
  • The idea is to keep the literal string and argument formatting outside the binary.
  • With ELF, this is done by generating a 1 byte symbol into a non-loaded section starting at address 1 for each string literal or formatting instruction and using the address of this symbol to refer to that string literal or formatting instruction. They all get nicely numbered starting from 1 during linking. During decoding, the ELF is parsed to produce a table from address to symbol name (which contains the actual information to decode) using that custom non-loaded section.
  • Wasm doesn't have a notion of non-loaded section, it actually doesn't even have a notion of section. So currently all symbols must be parsed and they consume space in the Wasm linear memory (although not in the binary).
  • Stripping those symbols needs to be done manually because tools won't remove exported symbols. This is not trivial because it will shift the indices of globals.

Here's an example program for illustration:

#![no_std]

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    loop {}
}

extern "C" {
    fn send(msg: u16);
}

macro_rules! send {
    ($msg:literal) => {{
        #[link_section = ".magic"]
        #[export_name = concat!(".magic.", $msg)]
        static X: u8 = 0;
        unsafe { send(&X as *const _ as u16) };
    }};
}

#[no_mangle]
pub fn main() {
    send!("hello");
    send!("world");
}

The end user should ideally only use the send! macro. And the strings it takes as argument should not take any space in the stripped binary. But the number that is sent should be sufficient to recover the original string from a non-stripped binary.

The implementation of send! in the example above is close to what defmt does for ELF. Here's what we get in Wasm:

(module
 (type $i32_=>_none (func (param i32)))
 (type $none_=>_none (func))
 (import "env" "send" (func $send (param i32)))
 (global $__stack_pointer (mut i32) (i32.const 1048576))
 (global $global$1 i32 (i32.const 1048576)) ;; we want to strip this
 (global $global$2 i32 (i32.const 1048577)) ;; we want to strip this
 (global $global$3 i32 (i32.const 1048578))
 (global $global$4 i32 (i32.const 1048592))
 (memory $0 17)
 (data $.rodata (i32.const 1048576) "\00\00") ;; wasm-opt strips this, but the memory is still lost
 (export "memory" (memory $0))
 (export "main" (func $main))
 (export ".magic.hello" (global $global$1)) ;; we want to strip this
 (export ".magic.world" (global $global$2)) ;; we want to strip this
 (export "__data_end" (global $global$3))
 (export "__heap_base" (global $global$4))
 (func $main
  (call $send ;; wasm-opt optimizes this to (i32.const 0)
   (i32.and
    (i32.const 1048576)
    (i32.const 65535)
   )
  )
  (call $send ;; wasm-opt optimizes this to (i32.const 1)
   (i32.and
    (i32.const 1048577)
    (i32.const 65535)
   )
  )
 )
)

So to conclude, although this PR correctly recovers the address of symbols in Wasm, it doesn't solve my problem. I would need to find another solution to generate unique identifiers for literal strings and formatting arguments. And I believe it's better to not merge this unless someone else has another problem that this solves (for example needing access to __heap_base or other custom symbols that the embedder could use).

@philipc
Copy link
Contributor

philipc commented Apr 25, 2023

Even if it doesn't solve your problem, I think this is still an improvement, so I would like to merge something like this.

@bjorn3
Copy link
Contributor

bjorn3 commented Apr 25, 2023

Wasm doesn't have a notion of non-loaded section, it actually doesn't even have a notion of section. So currently all symbols must be parsed and they consume space in the Wasm linear memory (although not in the binary).

Custom sections in a webassembly file are not loaded into the linear memory. They are only accessible by parsing the wasm file itself. Rustc should emit custom sections when you use #[link_section].

@ia0
Copy link
Contributor Author

ia0 commented Apr 25, 2023

Custom sections in a webassembly file are not loaded into the linear memory. They are only accessible by parsing the wasm file itself. Rustc should emit custom sections when you use #[link_section].

Only if the code doesn't reference those static objects. In this case there are 2 reasons why they are referenced: because they are exported (with export_name) and because their address is taken. In any of those 2 cases, Rust will allocate the static in the linear memory.

@bjorn3
Copy link
Contributor

bjorn3 commented Apr 25, 2023

Yeah, just tried it locally. I think you will need to do something like storing the message in #[link_section = ".magic.<message_hash>"] and then pass <message_hash> as integer to send(). If you prefix the message by it's length you can detect hash collisions that cause two messages to be concatenated and for example prompt the user to set an env var to seed the hasher.

@ia0
Copy link
Contributor Author

ia0 commented Apr 25, 2023

Yes, I was also thinking of using hashes, but then we lose a bit in compactness by having all identifiers fixed length instead of variable length. Using the linker is a nice way to get consecutive numbers. But probably this is an acceptable price to pay. The variable length technique is anyway only effective as long as you have less than 127 or so things to identify. After that point, you don't have control which ones will take 2 bytes since it's the linker choice. And this anyway only affects data transfer, not binary size, so a little bit less important. (EDIT: This might actually impact binary size with LTO or post-linking optimizations. Not sure how often this happens though.)

I'll update knurling-rs/defmt#738 with this solution. Might be worth exploring.

@ia0 ia0 marked this pull request as ready for review April 25, 2023 18:41
@ia0
Copy link
Contributor Author

ia0 commented Apr 25, 2023

Not sure how to deal with examples. But here is one in Rust and C:

// foo.rs
// rustc --crate-type=cdylib --target=wasm32-unknown-unknown -O foo.rs
#![no_std]

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    loop {}
}

#[no_mangle]
pub static FOO: u8 = 1; // TEST: should have an address, namely 1048576

#[no_mangle]
pub static BAR: u8 = 2; // TEST: should have an address, namely 1048577

extern "C" {
    fn send(msg: usize);
}

#[no_mangle]
pub fn entry() {
    unsafe { send(&FOO as *const _ as usize) };
    unsafe { send(&BAR as *const _ as usize) };
}
// foo.c
// clang --target=wasm32 -O -c foo.c
// wasm-ld foo.o -o foo.wasm --no-entry --allow-undefined --export=foo --export=bar --export=entry
#include <stdint.h>

extern char foo;
extern char bar;
extern void entry(void);
extern void send(uintptr_t);

char foo = 1; // TEST: should have an address, namely 1024
char bar = 2; // TEST: should have an address, namely 1025
void entry(void)
{
	send((uintptr_t)&foo);
	send((uintptr_t)&bar);
}

@philipc philipc merged commit 214eecb into gimli-rs:master May 5, 2023
@ia0 ia0 deleted the wasm_addr branch May 5, 2023 08:33
mcbegamerxx954 pushed a commit to mcbegamerxx954/object that referenced this pull request Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try to guess symbol address in Wasm
3 participants