Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly support #269

Closed
bminixhofer opened this issue May 21, 2020 · 21 comments
Closed

WebAssembly support #269

bminixhofer opened this issue May 21, 2020 · 21 comments

Comments

@bminixhofer
Copy link
Contributor

Hi!

At the moment, there is no easy way to reliably run ONNX models in the browser. ONNX.js exists but is apparently unmaintained and lacks support for important operations like Conv1d and LSTM.

The alternative is Tensorflow.js which does not directly support ONNX so a model would have to be converted from ONNX to TF, then to a TFJS model, which does also not work at the moment (see onnx/onnx-tensorflow#490).

So there is a bit of a gap in the ecosystem there.

That gap could be filled by compiling tract to WASM, and exposing a higher-level API (i. e. load and predict functions) to Javascript. WebGL support would of course be missing but that is out of scope.

I did some prototyping today, and got the latest release (tract-onnx = "0.6.3") to work in the browser without any changes. So I think a JS wrapper would not be too hard to make.

I'll start working on this in the next couple of days. Depending on how it goes and if there is interest on your side, this could be merged back into tract at a later point.

It would be great if you could officially support compiling to WASM, and add WASM to the CI (e. g. the current master branch does not compile to WASM because of the memory maps from 99c622a).

Thanks, and please let me know what you think!

@kali
Copy link
Collaborator

kali commented May 21, 2020

Hey, thanks for your interest in tract!

I have never touched wasm myself, but I'll be glad to help in any way I can.

Sorry I have broken the support so recently, I think we can use conditional compilation to deal with the memory maps. I have no objection putting a WASM build under CI. Feel free to send me one or two pointers, I'll try to do this over the week-end. We are also in dire need of a release, but I keep postponing it because I want to wrap up the command line refactoring.

Depending on how big the JS wrapper is, and how much coupling it will have with the API, we could consider inclusion in the main repo, and cover it with the same CI.

As for WebGL, it would be interesting to see if we can plug it in tract-linalg. I agree it is a separate issue. I think we should also consider tract-tensorflow (which really should be similar to tract-onnx), but I'm perfectly happy to start with just tract-onnx.

Also, I think @Geal did some work on this very same area just a few months ago.

I'm very enthusiastic about this! I can't wait to see a static page browser-side demo with mobilenet and a webcam! Or any kind of voice demo of course.

@bminixhofer
Copy link
Contributor Author

Awesome! I'm excited to start working on this.

No worries about breaking support for WASM, it's actually remarkable that WASM was supported until so recently without explicitly trying to do so :)

Plugging WebGL into tract-linalg sounds interesting. I'm not familiar enough with both of these to judge whether that would be possible. But if so, it'd be fantastic. However it shouldn't be a top priority for initial browser support.

Regarding CI: The first and most important step is just building the relevant crates for WebAssembly i. e. cargo build --target wasm32-unknown-unknown. I suppose that would be:

  • tract-onnx
  • tract-tensorflow
  • tract-linalg
  • tract-hir
  • tract-core

Tests will likely be more difficult to run since WASM inherently does not have access to the file system via std::fs. I'll look a little further into that.

I'm also curious to see how tract in the browser compares to the WASM backend of ONNX.js and TFJS, maybe that'd open up some opportunities for performance improvement.

@kali
Copy link
Collaborator

kali commented May 22, 2020

Also, I have a few ideas about how much we need to expose (been thinking about doing a FFI interface for a while). I don't think you can realistically do it with load + predict, but here is a possible subset that will cover simple stateless workflows, like image categorisation. It's really what transpires from https://github.com/snipsco/tract/blob/master/examples/tensorflow-mobilenet-v2/src/main.rs . Onnx works exactly the same way. So if I'm trying to simplify this, this is where I think we can get:

1/ We can collapse for simple cases the framework instatiation with model loading.

let tf = tensorflow();
let mut model = tf.model_for_path("tests/models/plus3.pb").unwrap();

could become let mut model = tract-onnx::model_for_path(...); with trivial convenience functions (same goes for model_for_reader).

2/ Unfortunately we can not really skip the size hinting part. We have to do it after loading and before optimising. It's a bit of a pain, but we can try to cover only the simplest cases for a start, maybe assuming everything is f32, that there is only one input, and that the output can be found automatically. The only missing bit would be then input shape, so a simple array of int would work. We can do a full TensorFact binding in the future if we need it.

3/ we can collapse into_optimized() with the SimplePlan::new() . A simple convenience model ìnto_optimized_plan() on model would do the trick.

4/ predict boils down to calling SimplePlan::run.

We could eventually collapse 1,2,3 in a single function call that would be the load function. It would need the model buffer, and the input shape. I'm not sure this is highly beneficial to a streamlined OO API, with the right convenience functions. I would not be adverse also to extending the core API to allow building the model with a more fluent style (duplicating set_ with with_ method typically)

let model = tract-onnnx::model_from_path(...).with_input_fact(0, ...).into_optimized_plan().

I like the idea, if it is practical, to try and keep a consistent API between various interface. It does not mean I want to bind everything from tract-core to JS, but if there are high-level shortcuts we want in a JS interface, I can't see why we should not have them at tract-core level. So I'm all ears about what you find out :)

You mentioned LSTM and conv 1D, so I suspect you're after some real time voice or something... For that, we will need to expose the SimpleState too. And depending on the model, we may also need to expose the magic "S" streaming dimension and the pulse API (which I still think of as experimental : it does work but it is not nice to use).

@bminixhofer
Copy link
Contributor Author

bminixhofer commented May 22, 2020

Thanks for the additional info!

I like the idea, if it is practical, to try and keep a consistent API between various interface. It does not mean I want to bind everything from tract-core to JS, but if there are high-level shortcuts we want in a JS interface, I can't see why we should not have them at tract-core level.

That makes a lot of sense. I considered the tract API to be fixed in my initial message. If we can add higher-level functions to the tract core, it wouldn't only make the interfaces consistent but also make working with tract from Rust much nicer.

Ideally, in my opinion, that would be just load and predict in the simplest case, with intuitive extensions for more complex cases.

Actually I've been rewriting an NLP project (https://github.com/bminixhofer/nnsplit) where I previously had parallel implementations of the model in TF and PyTorch - the model became more complex and I didn't want to keep writing everything twice. I'm not at all familiar with voice applications.

we may also need to expose the magic "S" streaming dimension and the pulse API (which I still think of as experimental : it does work but it is not nice to use)

From the brief description I read in your readme streaming does sound very cool and useful for your audio applications. It is not something I need right now.

To get started I'd propose the following:

  1. I develop proof-of-concept javascript bindings, running MobileNetV2 in the browser like in the example. I'd want this to be not much more code than the example on the Rust side and not much more code than simple ONNX.js inference on the JS side (see here).
    Currently I think this would work without any changes in tract-onnx. There could be some unanticipated roadblocks though so I think a quick end-to-end tracer bullet makes sense.
  2. We work out a higher-level API for tract which is consistent across JS and Rust, and ideally would smoothly extend to more complex use cases like streaming.

What do you think about this?

@kali
Copy link
Collaborator

kali commented May 22, 2020

Sounds good to me. It's good that you don't need streaming, it makes the exercise much more incremental.

Let's see what come out of the POC, and we'll take it from there.

@kali kali mentioned this issue May 23, 2020
@bminixhofer
Copy link
Contributor Author

bminixhofer commented May 25, 2020

I'm done with a proof of concept which runs Squeezenet in the browser!

Check out https://bminixhofer.github.io/tract-js/demos/dist/ (similar to https://microsoft.github.io/onnxjs-demo/#/squeezenet).

The JS code is here.

Problems

I ran into a couple of problems, none however on the tract side:

  1. As you know I originally wanted to use MobileNetV2, however I couldn't load the ONNX file using tract because of a TractError(Msg("BatchNormalization: attribute \'spatial\' is not supported (deprecated by ONNX operator set 9)"). If that error message is correct that is however not a problem with tract I guess.
  2. I got very weird outputs from the ONNX Squeezenet models here. I got the same outputs using onnxruntime though, so that is also not a problem with tract. I settled for using the squeezenet file from onnxjs-demo which worked out of the box.
  3. Using a WASM module with webpack is still annoying because async imports are awkward. So a bootstrap.js file is required. There is however work being done on this.

Binding Structure

Writing the bindings themselves went very smoothly though and I didn't encounter any bigger problems.

The bindings currently expose a Model and Tensor class.

The Model has a constructor that takes a URL and a predict function which takes a Tensor and returns a Tensor.

The Tensor can be created from a Float32Array (the data) and an Array (the shape). Internally, a tract_core::tensor::Tensor stores the data. It might be better to internally store the data as JS ArrayBuffer instead and add a convenience function to convert to tract_core::tensor::Tensor.

See the code here. It is just ~ 100 LOC.

Limitations

  1. It only works for models which can be into_optimized() without any type or shape hints.
  2. It only works for models where the outputs can be auto-detected.
    2.1 Inputs and output nodes can not be set by the user.
  3. predict can only be called with a single input tensor, and always returns the first output tensor.
  4. Only float32 inputs and outputs are supported.
  5. Error handling is missing.

Takeaways

  • Tensor will have to be exposed to JS in one way or another to associate a JS Array with a shape for creating inputs and handling outputs.
  • I'm not sure if it is practical to keep the API consistent between JS and Rust. I. e. it would be idiomatic in JS to pass options to the Model as a dictionary like
new tractjs.Model("model.onnx", {
   "facts": {
     0: ["float32", [1, 224, 224, 3]]
   },
   "optimize": true
  // ...
})

while for Rust, some sort of builder pattern would be a good fit for a higher-level API as you mentioned.

  • tract is actually very good in terms of speed! On my machine, inference of SqueezeNet takes ~ 270ms with tract, while it takes ~ 170ms with the WASM backend of ONNX.js. I didn't expect them to be this close, especially if the ONNX.js benchmarks are still up to date.
  • Size is a problem. The optimized .wasm file has ~ 3.6MB (850kB gzipped). Could be worse, but could be better ;)

Next steps

I'm going to work on a small demo which allows the user to upload a .onnx model and runs inference with just random numbers of a user-specified shape as input to test the speed of some LSTMs I'm curious about and compare them to TFJS.

If you wouldn't mind, it'd be great if we could discuss the higher-level API (and tract and ONNX in general) some time via a call or IRC if you have time. You can send me an email at bminixhofer@gmail.com.

@bminixhofer
Copy link
Contributor Author

bminixhofer commented Jun 9, 2020

I've made nice progress with the bindings at https://github.com/bminixhofer/tractjs.

  • The model now runs in a web worker to avoid blocking the UI thread.
  • There is a small Typescript wrapper on top of the Rust code to make it ergonomic to use.
  • It builds to a single file tractjs.min.js by inlining the wasm and web worker to make it easily usable without a build system (and CommonJS + ES6 modules are built as well).
  • Added tensorflow support.

It was hard to get all the moving pieces to work together (Typescript + web workers + WASM) but it works nicely now.

I'd have a couple of questions @kali we should address before releasing the library:

  1. I noticed that Datumis not implemented for u32 so there's no u32 tensors. Is that intentional? I'd imagine u32 tensors are quite common for e. g. embeddings.
  2. Could you take a look at the API (docs here) and see if you agree with the design + naming / have suggestions for improvement? As we agreed terminology should be the same as in tract-core.
  3. Do you think there's any low-hanging fruit for reducing build size? tractjs.min.js has 8MB (2.4MB gzipped) which is very large for web applications. The raw wasm has 4.4MB (there is some overhead from inlining and other code) which is still quite large. I was thinking that maybe e. g. pulsing could be made into a separate feature if it significantly contributes to code size. But I'm not sure how that aligns with your priorities for tract.
  4. I'm not sure how to go about testing. I think some integration tests (e. g. squeezenet, some TF models) which mainly test the different options (custom inputs, custom outputs, ..) will do the job since tract itself is well tested anyway. Do you agree?

@kali
Copy link
Collaborator

kali commented Jun 9, 2020

Nicely done, thanks for taking the time to do tf! Not that I am verse enough in 2020 web stuff to judge :)

Questions:
1,3. u32 would be pretty easy to add. I have not seen that many of them in the wild, and there is a catch: a lot of the bloat in tract-core actually comes from operators implementations declined for each and every Datum type... In terms of size reduction, I have an ongoing epic to split tract-core in two and add a serializing format. The idea would be to be able to precompile a mode to either the decluttered (aka MIR) or the codegen (aka LIR) stage, save it, and have a smaller library (tract-vm maybe) that can be embedded. But this will take months. As for the pulsing stuff... it is not that heavy, but it should split away nicely too if we want to. Most of it would fall in the "compiling" side of things anyway. I also need to try compiling tract with logs set to release_max_level_info and see if that makes a difference.
4. We don't need to re-test tract for tractjs, we don't even need to test tract and squeezenet at integration level as it is part of the onnx test suite (we could argue it looks like it is a different version, but...) So really the only thing we need to test is the JS bindings, and they are relatively light, so should break relatively neatly (api changes being the obvious way). I have no idea about the state of the art here. Would a command-line node.js test be enough to ensure we are not breaking compatibility ? We already have some models of these models in the CI, so we can re-use them.

  1. Is there a way to make a compact example without any browser paraphernalia ? Sorry if this is stupid, but would adding a "main" in simple programatic form help people getting started ? Haven't done web front for a very long time, so I am not sure what people in this universe are expecting. I'm also thinking of non-browser usages of webasm as @Geal was playing with.

@bminixhofer
Copy link
Contributor Author

Thanks!

In that case it is fine to leave u32 out I guess. Embeddings can be represented with i32 or i64 as well.

It's good to know that you are considering binary size. I am happy with a "there is ongoing work in tract to reduce size" disclaimer in tractjs and leaving it at that for now. For my application larger size is not a deal breaker since it will be lazy loaded anyway and it's just a demo. But I'd imagine that some potential users need smaller size.

You're right, the bindings are very light and should break well in case of changes. I'm just not 100% confident in the correctness of the current implementation (especially options). I guess the best way is to have a couple of tests using different options and some different models from the tract CI.
In the past I've done tests of a browser JS library using Cypress which worked well. I'll try that again here.
Testing in Node.js is not practical because 1.) it is not the same environment as the browser so even though something works in Node.js it might not work in a browser (the obvious example being no browser support for some feature) and 2.) the bindings in their current state actually don't work in Node.js because they rely on fetchand web workers which would have to be polyfilled for Node.js (i. e. with https://www.npmjs.com/package/node-fetch and https://github.com/audreyt/node-webworker-threads). And I didn't plan to do that for now.

Regarding a non-browser example:

Browser have built-in WASM support. To run WASM outside of a browser you need a runtime such as wasmtime or wasmer. And then, currently, you need quite a bit of glue code to interact with WASM and to make a neat API.
I've been using wasm-bindgen which automatically generates lots of JS glue code so that you can work with e. g. strings and vectors of primitives directly on the Rust side. But it is inherently bound to the web because it is not a runtime, it just generates glue code.

So in short, non-browser usages of wasm are quite different to what I've done and should probably be a different project altogether. Hopefully that gets easier (and aligns more) in the future :)

@kali
Copy link
Collaborator

kali commented Jun 10, 2020

  • Thanks for the clarification about node.js. Is the name tractjs in that case ? Should we name this tract-wasm then ?
  • About the size issue, I'm also considering making some types and ops, features or plugins, but this leads to a huge number of different combinations to tests, and I'm concerned it will slow me down a lot on the long run. Would be different if I was a team of 5 people :)

Do you still feel like migrating this as a sub-dir inside the main project ? we can probalby leave it out of the locked-step release process, but having whatever tests we come with in the same CI would make sense.

@bminixhofer
Copy link
Contributor Author

I'd argue tractjs is fitting because it is JS bindings for tract. tract-wasm would be fitting if it was not bound to the browser (i. e. used a WASM runtime like wasmer).

I'm not sure about migrating it into this repo to be honest. The same CI would definitely be better, and it is not a lot of code, but:

  • It would add dependencies in the JS ecosystem and TypeScript code to this repo.
  • In a larger Rust repo with workspaces I'd still expect cargo buildand cargo testto build / test everything, respectively. That would be hard (and probably hacky) to achieve with the typescript wrapper and the browser tests.
  • I'm not sure how the website at https://bminixhofer.github.io/tractjs/ should be handled. https://snipsco.github.io/tract would be confusing because it is not a website for tract, just for the JS usage of tract. And https://snipsco.github.io/tract/js would be confusing because it implies the existence of a main website.

So I think the negatives outweigh the positives at the moment.

That said, I started working on tests. I have one more question: tract works with TF 1.x SavedModels right? No TFLite, and no TF 2.x support, right?

@kali
Copy link
Collaborator

kali commented Jun 10, 2020

Ok. Let's leave it on your side then. tract repository will move around a bit too in the coming weeks (switching to as sonos github account and dropping the snips namespace). I will add links to your project in the main README.md when you feel like it's time.

tract-tf works with TF 1.x "frozen" model format, and without versions compatibility really established. The SavedModel is what actually makes TF 2 so difficult to handle. There is a small collections of models that CI runs (see .travis/native.sh) which are hosted on public urls...

@bminixhofer
Copy link
Contributor Author

Just added the benchmarks to https://bminixhofer.github.io/tractjs/. Looks very good for tract!

Especially LSTMs are much faster than in TFJS, even than the WebGL backend.

The WASM backends of TFJS and ONNX.js are missing though, I'l see if I can add them.

@bminixhofer
Copy link
Contributor Author

bminixhofer commented Jun 18, 2020

There was some more work left to do with:

so aside of some minor changes doc updates I have to do tractjs is ready for release 🎉

One remaining question is whether I should wait for #300 to be merged. It's not a must but it would be very nice. I am not in a hurry, but I'm not sure how much work is left for that.

Thoughts? Or any other objections to releasing tractjs?

@kali
Copy link
Collaborator

kali commented Jun 18, 2020

I would like to merge the branch containing the fix for #300 (among other things) tomorrow, so...

@bminixhofer
Copy link
Contributor Author

Ok, great!

@kali
Copy link
Collaborator

kali commented Jun 19, 2020

done !

@bminixhofer
Copy link
Contributor Author

tractjs is officially released! 🎉 We can close this.

@kali
Copy link
Collaborator

kali commented Jun 20, 2020

Great! I just notice the last entry in the FAQ. About invalid shapes and ranks not being caught early or not at all ? Does it still happen ? Because I thought I had done what was required.

@bminixhofer
Copy link
Contributor Author

Oh to be honest I haven't checked in a while. I thought it didn't change.
If it doesn't happen anymore I'll remove it.

@kali
Copy link
Collaborator

kali commented Jun 20, 2020

change is here... 5847c17#diff-c89b37e22115a5dc0b72425de44d246f

Did not make an entry for it in the changelog, as I consider it a safety fix, but it is kinda breaking actually (as it makes tract less lenient).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants