-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer-sizing a decluttered streaming TypedModel without Pulse (for non causal models) #300
Comments
In other, better news custom outputs are now tried and tested in tractjs, so I'm pretty confident this issue (and possibly #296) are the last blockers on the tract side for the initial tractjs release :) |
Well, now this one is interesting... Once again you're pushing the envelope a tiny bit.
Yes. So the simple fix is to make whetever types we need from hir::ops public. We'll certainly do that, I am actually a bit surprised the compiler let us have public type alias with private types, but... ok. But we could do better: storing an InferenceModel and doing the the full optimization every time is quite expensive, while I think we are equiped to perform a significant part of it (up to declutter()) while working with the symbolic streaming S dimension. This is what we do when we go the pulsing route for our real time voice networks : into_normalize(), then creating a pulse network, then into_type(), obtaining a network with only numerical dimensions this time, and finally codegen(). If your network was causal, using the pulsing tranformation would be an option. But of course, bidi LSTM are not causal. In your case we could imagine having a new "route": first just go up to declutter using the S dimension as for a streaming network, and we could store the obtained TypedModel. From there, we would introduce new stuff: a simple transformation that just put a value on S (the actualy sequence length, taken as a parameter of the transformation), and evaluate all fact expressions that depend on it to obtain a numerically-sized TypedModel. From there, we could call codegen() as usual. If you're interested in getting your hands dirty and revisiting / fixing more half baked APIs in the process, I'd be glad to help you doing this. I don't mind at all just fixing the visibility stuff right away (PR welcome) to release tractjs, then consider doing this later. |
Turns out Sounds good! I'm not sure how much time I'll have in the coming weeks so I'd rather try to do an initial release of tractjs now. But I might give it a shot in the future :) |
FYI, I actually implemented it (turns out I started needing it in tract stream-check subcommand). I'll merge it soon. |
Wow, awesome, you're on fire with the changes to tract recently! :) Just now adding a test for the custom inputs to tractjs from #296. |
Great! One more question, and I'm sorry but: there can only ever be one streaming dimension, right? Because the LSTM in CI would technically have two - batch size and sequence length. |
Reopening because of some issues... did not realize a few operators need a specific treatment as the have sizes expressions in their attributes. Identified so far: TypedReshape, MultiBroadcast, there may be others. So beware. If these are the only ones, I'll fix them soon. And yes, at this stage, we can only have one variable in dimension, which is kind of rigged to be the time (but maybe only the streaming/pulsing code assumes that... not too sure). I think I need to start thinking about generalizing this (it may also be a way for runtime-dynamicly shape tensors...) So... need to think about this before anything happend on this front. |
|
Ok, sure. In that case I think it is best to release tractjs with the current feature set - nothing speaks against adding this later :) |
@DreamerMind I think we have what we need with this. |
Sure, I can try that and the invalid rank catching from #269 along with it. |
Ok, I'm not sure I am trying this correctly.
[package]
name = "sblstmtest"
version = "0.1.0"
authors = ["Benjamin Minixhofer <bminixhofer@gmail.com>"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
tract-onnx = { git = "https://github.com/snipsco/tract", branch = "fixes-for-concretize-dim" }
use tract_onnx::prelude::*;
fn main() -> TractResult<()> {
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.into_optimized()?
.concretize_stream_dim(1)?
.into_runnable()?;
let input: Tensor = tract_ndarray::Array2::<u8>::zeros((1, 50)).into();
model.run(tvec!(input))?;
Ok(())
} This throws an error upon calling
When I fix this by providing a fixed shape, again only that shape works: use tract_onnx::prelude::*;
fn main() -> TractResult<()> {
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, InferenceFact::dt_shape(u8::datum_type(), tvec!(1, 100)))?
.into_optimized()?
.concretize_stream_dim(1)?
.into_runnable()?;
let input: Tensor = tract_ndarray::Array2::<u8>::zeros((1, 50)).into();
model.run(tvec!(input))?;
Ok(())
}
So I'm not sure how to use |
Oh and the shape + rank checking really works. Thanks for that! I am removing it from the FAQ. Already one down ;) |
So what I had in mind was: use tract_onnx::prelude::*;
fn main() -> TractResult<()> {
let decluttered_model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, InferenceFact::dt_shape(u8::datum_type(), tvec!(1, TDim::s())))?
.into_type()?
.declutter()?;
[...]
let optimized_model = decluttered_model
.concretize_stream_dim(40)? // borrows &self
.into_optimized()?
.into_runnable()?;
let input: Tensor = tract_ndarray::Array2::<u8>::zeros((1, 40)).into();
optimized_model.run(tvec!(input))?;
Ok(())
} It's a bit of a lot, we may need more ergonomics when we've established it does work. But there still are bugs :) TypedReshape is giving me a lot of pain, specifically for something I ultimately want to exclude from the core set (see #307), but I'll still give a shot at fixing it. |
Hmm so where we are trying to get is that I can load and optimize the model first, then predict input of shape e. g. Is that already what the code above achieves? I would have thought that |
For now, we're trying to "save" the analysis and
|
Ok, that makes sense. I still don't know how the graph from an ONNX model can be "decluttered" (because when I view the graph all of the ops look pretty essential to me ;) ) but that's probably a topic for another time. I'll try the code snippet from above. |
Na, don't rush, I'm still working out some issues. :) I'll tell you when I think it's ok :) |
As for the term "declutter", it is true that ONNX graphes are not the worse that I have seen, and the concept have evolved since I introduced it. The main idea is actually to convert as much as possible to what is shaping out as tract core operator set:
On your model, the "decluttering" aspect is not obvious as the decluttered form looks bigger than the original form. But on some networks where machine learners abused shapes and padding computations in tensor form, the decluttered network can be 10 times smaller (in number of ops) than the input graph... In comparison, optimize is relatively cheap. As a matter of fact, most operators do not have an optimized that differs from the decluttered one. The remaining (Scan, MatMul, Conv mostly) are just translated one-to-one. Then Matmul can aggregates a few simple operations that happen at its output (like a bias addition, a relu or quantization fix), but all of these optimisations only require to "observe" and fix locally the graph, while decluttering may visit and rewrite big portions of the graph. |
Ok, that makes sense. Thanks! |
@bminixhofer I think it's ready for you to give it a shoot now :) thanks for your patience, I'm on vacation, so keeping it light ;) |
@kali Sure, will do. Enjoy your vacation! Well deserved after all the recent tract improvements ;) |
Ok, I've tried the following:
[package]
name = "sblstmtest"
version = "0.1.0"
authors = ["Benjamin Minixhofer <bminixhofer@gmail.com>"]
edition = "2018"
[dependencies]
tract-onnx = { git = "https://github.com/snipsco/tract", branch = "fixes-for-concretize-dim" }
tract-core = { git = "https://github.com/snipsco/tract", branch = "fixes-for-concretize-dim" }
use tract_core::dim::ToDim;
use tract_onnx::prelude::*;
fn main() -> TractResult<()> {
let _model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(
0,
InferenceFact::dt_shape(u8::datum_type(), tvec!(1.to_dim(), TDim::s())),
)?
.into_typed()?
.declutter()?;
Ok(())
} I get an error:
so it seems there is still some problem. |
Mmmm... bit weird, I have this working:
Do you see what we do differently ? |
The relevant code looks the same to me..
|
|
Also, tried your main, no error. (and FYI, tract_core is re-exported from tract_onnx) |
Hmm that's really strange. Can you try |
I think you just need a |
Oh, sorry, I thought deleting It is working now, and I get the same output using
use tract_onnx::prelude::*;
pub fn infer_dynamic(model: &TypedModel) -> TractResult<TVec<Arc<Tensor>>> {
let optimized_model = model
.concretize_stream_dim(40)? // borrows &self
.optimize()?
.into_runnable()?;
let input: Tensor = tract_ndarray::Array2::<u8>::zeros((1, 40)).into();
optimized_model.run(tvec!(input))
}
pub fn infer_unoptimized_dynamic(
model: &InferenceSimplePlan<InferenceModel>,
) -> TractResult<TVec<Arc<Tensor>>> {
let input: Tensor = tract_ndarray::Array2::<u8>::zeros((1, 40)).into();
model.run(tvec!(input))
}
pub fn infer_static(model: &TypedSimplePlan<TypedModel>) -> TractResult<TVec<Arc<Tensor>>> {
let input: Tensor = tract_ndarray::Array2::<u8>::zeros((1, 40)).into();
model.run(tvec!(input))
} and running a simple criterion benchmark with these functions:
So this is already a 2x speedup, but still significantly slower than |
All right ! I'm going to merge this then. Thanks for doing the bench, i thought we would be closer to infer_static, but I may have found one easy optimisation opportunity impacting both decluttering and optimizing. We'll see how it goes. |
Closing to close this one to switch to #313 for the next possible steps. |
@bminixhofer so you may to check the impact of #312 on your bench if you still have them around. I can see huge decluttering time improvements on some networks. |
Wow, looking much better!
|
Yeah. That's more like it :) And it will leave me time to think about the next steps :) |
Hey, I came across another problem trying the bidirectional LSTM model in a browser.
It is the same LSTM that is now in CI (download link). Now normally I'd use code similar to this:
but I get an error:
Running it without
into_optimized
, or with an input fact works.So I understand that the model can not be optimized because the shape of the input (batch size and seq len) is not known at the time of building. Is that correct?
In practice I don't want to fix the input shape at build time because it has to work with different batch sizes.
Now so far it wouldn't be a problem, I'd just add an option
optimize
to the JS API to turn optimization on or off depending on whether dynamic shapes are needed during inference.The problem comes when I try to store the model that I got by calling
into_runnable
without callinginto_optimized
before.I get a model of type
SimplePlan<InferenceFact, Box<dyn InferenceOp>, ModelImpl<InferenceFact, Box<dyn InferenceOp>>>
. When I want to store such a model in a struct like:I get an error which says that the module
ops
is private:So I can't store the result. Am I missing something? And if not, is there some way to work around this?
Thanks for all your help :)
The text was updated successfully, but these errors were encountered: