New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large wasm file sizes, potential causes, and how to avoid them? #5

Open
jakedeichert opened this Issue Dec 30, 2017 · 5 comments

Comments

3 participants
@jakedeichert
Owner

jakedeichert commented Dec 30, 2017

Seems like there's some gotchas when it comes to what parts of Rust you use and how large it can cause your wasm output to be.

The current wasm output file size is 88kb before wasm-gc and 71kb after. This has been bugging me for a few days and I haven't looked into it until now.

A basic hello world I created was 17kb before wasm-gc and 103 bytes after which sounds much better.

After disabling some code, i've found a few factors that seem to heavily change the wasm output size.

Creating a vec! causes ~8.5kb of file size.

I disabled a bunch of code to get down to this point. I haven't recreated a minimal example yet though.

Indexing a vec! by a random number causes an extra 31kb of file size (totaling ~40kb)
This seems to take up more than half of the final wasm size. If I index the vec by a hard coded number less than the vec's length, the file size remains at ~8.5kb as noted above. But if I hard coded a number >= the vec's length, the file size increases to 40kb too.


I'll post any more findings I have here. Hoping someone can enlighten me on ways to keep the file size down and perhaps as to why indexing a vec costs so much?

@jakedeichert jakedeichert changed the title from Large wasm file sizes and potential causes to Large wasm file sizes, potential causes, and how to avoid them? Dec 30, 2017

@alexcrichton

This comment has been minimized.

alexcrichton commented Dec 31, 2017

On the good news front Rust-the-language is more then amenable to tiny file sizes. We've had lots of examples in the past of tiny programs doing things like games, tiny microcontrollers, etc. On the bad news side though idiomatic Rust doesn't always lend itself well to tiny file sizes. In that sense if you're optimizing for file size then you'll probably not be writing "normal" Rust code but rather you'll be avoiding various things and/or trying to adhere to very specific patterns. The wasm project I've been helping has very strict size requirements and currently we're clocking in at about ~2k of Rust code gzipped, about 4k un-gzipped.

In general here's what you should do before you actually change any code:

  • Always run wasm-gc. There's bugs in the wasm toolchain which require this but eventually when LLD is used as a linker this won't be necessary.
  • Always run wasm-opt from binaryen. The default output of LLVM isn't the best (apparently) and wasm-opt can typically make a wasm binary ~10% smaller (for free).
  • Compile with -Os instead of -O2. This tells LLVM to optimize for size rather than speed and can make different inlining decisions, for example. You can do this by setting this in Cargo.toml:
    [profile.release]
    opt-level = 's'

If that's not small enough then at that point it's time to start changing code. This is where the guidelines may not be very idiomatic and can sometimes even be hard to adhere to as well. In any case...

  • Don't panic. Panicking machinery in libstd is quite heavy in terms of code size. This is one of the number one sources of large wasm binaries and can often relatively easily be removed. Panics are often hidden though in the standard library (aka Vec::with_capacity can panic) or through basic language operations (a / b can panic if b is 0). Unfortunately there's no great way to figure out why your program may panic so I've found the best strategy here is to start with a tiny program that doesn't panic and then incrementally add all your functionality. Each step of the way watch the binary size and if it jumps try to figure out why you might be panicking.

  • Don't format. Like panicking, std::fmt is not a small piece of code. This affects anything which may transitively use format_args!. This includes things like panic! (above) but also you can hit this with format! or assert!.

  • Don't allocate. The raw allocator (dlmalloc) weighs in at about 2k gzipped code (4k not gzipped I think?). It's not always easy to avoid but if you can this is a great way to reduce code size even further.

Overall the code guidelines for writing small executables are the same as those if you're writing an embedded system (which isn't a coincidence!). We've got a lot of possibilities in libstd as well to optimize for code size (libstd AFAIK basically hasn't ever been optimized for code size) so there may be some low hanging fruit to solve in the panicking/std::fmt front.

I'd recommend having allocations/formatting/panics as much as you can in debug mode, but figuring out how to remove it all in release mode is the trick.

@jsonnull

This comment has been minimized.

jsonnull commented Dec 31, 2017

I was just looking into this myself and this answer popped up. That's a very good answer!

I was thinking a lot of it boiled down to items from std—very cool to see specific ones like std::fmt getting called out, that's enlightening.

Compiling this project with the size optimization Alex mentioned, I observed a change from 66k in the binary to 64k.

I was also intrigued to realize that wasm-opt can be run in addition to wasm-gc, I'll have to try that.

@jakedeichert

This comment has been minimized.

Owner

jakedeichert commented Dec 31, 2017

@alexcrichton thanks for the great detail! I think I use format! in a few places for console logging.... i guess i should take that out.

I'd recommend having allocations/formatting/panics as much as you can in debug mode, but figuring out how to remove it all in release mode is the trick.

This sounds like a challenge! For node projects, typically NODE_ENV=production is checked for during the builds. Does Rust have something similar that I could check for in release mode and wrap code blocks with that check?

I wasn't even aware of wasm-opt. @jsonnull did you try wasm-opt too? And it only went down 2k?

I'm also using HashSet and HashMap which makes me wonder how much size they take up. I noticed that even if i took out my vec! related code, i didn't drop 40k as I kind of expected based on what I found above, so something else is taking up a lot of space too or perhaps using vec! under the hood.

@alexcrichton

This comment has been minimized.

alexcrichton commented Dec 31, 2017

@jakedeichert

Does Rust have something similar that I could check for in release mode and wrap code blocks with that check?

The closest analog for Rust is cfg(debug_assertions) which is set during cargo build and not set during cargo build --release. That'll allow you to differentiate between those two build modes and compile in different code and such.

I noticed that even if i took out my vec! related code, i didn't drop 40k as I kind of expected based on what I found above, so something else is taking up a lot of space too or perhaps using vec! under the hood.

Yeah HashSet and HashMap both have allocations and likely both have panics inside them as well. The allocations can't really be avoided and the panics are mostly related to assertions most likely that LLVM couldn't optimize away (but if they trigger they're a bug in HashMap for example).

@jakedeichert

This comment has been minimized.

Owner

jakedeichert commented Dec 31, 2017

@alexcrichton thanks again, this is some great stuff.

I'll check out cfg(debug_assertions). Sounds like exactly what i need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment