Skip to content
This repository has been archived by the owner on May 11, 2020. It is now read-only.

wasm: when function names are present, include them #131

Merged
merged 2 commits into from
Jul 13, 2019

Conversation

justinclift
Copy link
Member

@justinclift justinclift commented Jun 20, 2019

LLVM generates (optional) function names, including them in a custom section on the end of the .wasm file.

It's useful to have them handy when working with decoded wasm data, so adding them directly to the Function struct seems like a good spot.


This PR originally included code by @mastersingh24, from the Life project. The below comment was regarding allowing that piece of code to be incorporated here with BSD licencing.

However, it turns out to be unneeded, as Wagon already has the ability to decode Custom Name sections, so it was simpler to reuse that.

@mastersingh24
Copy link

@justinclift - I don't have any issues with that

@codecov-io
Copy link

codecov-io commented Jun 20, 2019

Codecov Report

Merging #131 into master will increase coverage by 0.6%.
The diff coverage is 68.42%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #131     +/-   ##
=========================================
+ Coverage   68.48%   69.09%   +0.6%     
=========================================
  Files          42       42             
  Lines        4848     4866     +18     
=========================================
+ Hits         3320     3362     +42     
+ Misses       1254     1220     -34     
- Partials      274      284     +10
Impacted Files Coverage Δ
wasm/module.go 74.57% <ø> (+8.47%) ⬆️
wasm/section.go 53.33% <100%> (+4.04%) ⬆️
wasm/index.go 56.19% <64.7%> (+1.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2691113...47abda2. Read the comment docs.

@justinclift
Copy link
Member Author

@mastersingh24 Excellent. Thanks. 😄

@justinclift
Copy link
Member Author

justinclift commented Jun 20, 2019

As an aside, the custom_sections.wasm file in the test data seems a bit strange:

$ wasm2wat custom_section.wasm 
0000058: error: invalid function count 36, only 1 bytes left in section

@justinclift
Copy link
Member Author

Looking into adding some new test data now, incorporating a custom section with function names.

@justinclift
Copy link
Member Author

justinclift commented Jun 20, 2019

Adding some TinyGo (via LLVM) generated test data with function names in custom sections probably isn't going to be workable at the moment. 😦

Testing various TinyGo generated .wasm files locally here. The way encoding the 0 value is done for Call instructions is strange. (eg "call function 0")

They're padded with four 0x80 bytes prior to the 0x0, which while technically valid, means they're not binary identical when Wagon reassembles them (more efficiently) for the asm_test.go test:

wagon/disasm/asm_test.go

Lines 54 to 56 in c2df253

if !bytes.Equal(f.Code, code) {
t.Fatal("code is different")
}

@justinclift
Copy link
Member Author

It looks like this padding is common to all .wasm generated via LLVM at the moment. At least with LLVM ~8.0.1-rc2.

Saying that after trying with C compiled to wasm too (using clang). Same problem.

Using wasm-ld with --compress-relocations does seem to remove the padding bytes (when quickly checking via hexedit), however that option requires either --strip-debug or --strip-all, both of which remove the function names from the end of the file. 🤷‍♂️

justinclift added a commit to justinclift/WASim that referenced this pull request Jun 20, 2019
@justinclift
Copy link
Member Author

justinclift commented Jun 20, 2019

Just created a PR adding myself to the CONTRIBUTORS and AUTHORS files, as per the contributor guidelines: go-interpreter/license#16

@justinclift justinclift changed the title wasm: when function names are present, include them [WIP] wasm: when function names are present, include them Jun 20, 2019
@justinclift justinclift changed the title [WIP] wasm: when function names are present, include them wasm: when function names are present, include them Jun 20, 2019
@justinclift justinclift force-pushed the use_func_names branch 2 times, most recently from 26a9f34 to 46b51bf Compare June 21, 2019 12:53
@justinclift
Copy link
Member Author

justinclift commented Jun 21, 2019

Just recreated this, based on the exist Wagon unmarshalling code for name sections.

Sorry @mastersingh24... your code wasn't needed after all. 😉

Copy link
Contributor

@sbinet sbinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the overall code looks fine.

could you add a test for this, though?

@justinclift
Copy link
Member Author

Not 100% sure how at the moment. No LLVM generated wasm will pass the current Wagon test suite due to the Call function padding problem (above). 😦

If I can somehow get something generated which incorporates custom functions, then I can write a test.

@justinclift
Copy link
Member Author

Oh. I'll investigate using Wagon to generate .wasm files directly. That might work (haven't looked). 😄

@justinclift justinclift deleted the use_func_names branch June 25, 2019 12:16
@justinclift justinclift restored the use_func_names branch June 25, 2019 12:18
@justinclift justinclift reopened this Jun 25, 2019
@justinclift
Copy link
Member Author

Killed the branch when cleaning up my fork, and it turns out GitHub uses that as a signal to automatically close the PR. Oops. 😉

@sbinet
Copy link
Contributor

sbinet commented Jun 27, 2019

to follow-up on this, one possible avenue (instead of relying on the LLVM machinery), one could perhaps generate a .wasm file from this definition:

(module
  (type (;0;) (func (param i32 i32) (result i32)))
  (func $add (type 0) (param i32 i32) (result i32)
    get_local 0
    get_local 1
    i32.add)
  (export "add" (func $add))
)

and make sure we correctly parse that .wasm file and display add wherever expected.

WDYT?

wasm/index.go Outdated Show resolved Hide resolved
@justinclift
Copy link
Member Author

Ahhh, sorry, got busy with other stuff. I'll get back to this in the next few days. 🙄

@justinclift
Copy link
Member Author

justinclift commented Jul 10, 2019

Well, this is interesting. That failure is looking suspiciously like a problem with Wagon's EncodeModule() function not being deterministic.

Saying that because I took a fairly cheating approach to generate the wasm test data... 😉

Wrote a small util to read in an LLVM generated .wasm file, re-encode it with Wagon, and write that out.

It looks like the encoded byte stream created by Wagon will have the exports in any order. Running it multiple times, even on its own re-encoded output, can ~randomly change around the ordering:

$ hexdump -C hello-world-tinygo.wasm > hello-world-tinygo.wasm-dump
$ hexdump -C hello-world-tinygo.wasm-re-encoded > hello-world-tinygo.wasm-re-encoded-dump
$ diff hello-world-tinygo.wasm-dump hello-world-tinygo.wasm-re-encoded-dump
10,12c10,12
< 00000090  00 0b 5f 5f 68 65 61 70  5f 62 61 73 65 03 01 0a  |..__heap_base...|
< 000000a0  5f 5f 64 61 74 61 5f 65  6e 64 03 02 11 5f 5f 77  |__data_end...__w|
< 000000b0  61 73 6d 5f 63 61 6c 6c  5f 63 74 6f 72 73 00 02  |asm_call_ctors..|
---
> 00000090  00 0b 5f 5f 68 65 61 70  5f 62 61 73 65 03 01 11  |..__heap_base...|
> 000000a0  5f 5f 77 61 73 6d 5f 63  61 6c 6c 5f 63 74 6f 72  |__wasm_call_ctor|
> 000000b0  73 00 02 0a 5f 5f 64 61  74 61 5f 65 6e 64 03 02  |s...__data_end..|

Note __data_end and __wasm_call_ctors have been swapped around. 🤕

That's probably going to break / disallow any non-trivial example data with the current approach for testing.

Bug?

@sbinet
Copy link
Contributor

sbinet commented Jul 10, 2019

We're probably iterating over a map somewhere and not ensuring some order...

@justinclift
Copy link
Member Author

justinclift commented Jul 10, 2019

Yeah, it's a pretty classic case of that. I'll try and take a look in the next few days, to see if changing to a slice or something with stable ordering is simple.

@justinclift
Copy link
Member Author

justinclift commented Jul 10, 2019

This was the culprit for the unstable order:

return entries[i].Index < entries[j].Index

It compares by numerical index, not taking into account it's common to have several things with the same index number.

It'd probably be better to check for that condition first, and use a straight string compare of the export name for those. Everything else can still be sorted by index number correctly.

@justinclift
Copy link
Member Author

justinclift commented Jul 10, 2019

This should be ok to merge now.

Updated that small util to re-write each of the code body sections using Assemble() prior to running EncodeModule(), which when combined with the stable sort order should be a complete fix. 🕺

Nicely, because Wagon's byte code assembler is a bit more efficient than the current LLVM one, a side effect is slightly reduced file size.

That util could probably be used as a general purpose way to shave off a few more bytes from the file size when a wasm2wat->wat2wasm pass isn't suitable. eg when wanting to preserve the function names 😉

Copy link
Contributor

@sbinet sbinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

(apologies for the belated feedback: holidays)

@sbinet sbinet merged commit 55a1639 into go-interpreter:master Jul 13, 2019
@justinclift justinclift deleted the use_func_names branch July 13, 2019 22:56
@justinclift
Copy link
Member Author

justinclift commented Jul 13, 2019

Good stuff @sbinet. 😄

Paging @agnivade. You're up next. 😀

@agnivade
Copy link

Roger that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants