Question about Pulse #496

Rikorose · 2021-05-18T13:16:15Z

Hi there,

I work with sequential data and need an output from the model at every time step. When using RNN like models, this is straight forward; I can pass an input of [B,T,F] to ONNX, where T=1, for simplicity batch B=1, F are the features.

For Convolutions this is a little more complicated, if the convolution kernel covers multiple time steps (let's say 3 in a 3x1xC_in x C_out kernel).
During inference, we need a buffer for each convolution that holds the convolution input of of size 3x1xC_in, to be able to run the model in a loop over time steps.

From looking on the code, the Pulse module could be supposed to cover this use case. Is this correct? I haven't found any example or documentation on this.

If not, the buffer handling could obviously also be done outside of tract, by providing an input for each convolution or splitting the convolutions into different models.

kali · 2021-05-18T13:34:20Z

That's exactly what tract-pulse is about. It works for convolutional and recurring nets, but the API is a bit... well, convoluted, so I have not documented it as I consider the API semi experimental (the code does work well, it's critical for our usage). I will help.

You can "infer" a bit of the proces by looking at cli/src/params.rs in the pipeline function, there are sections which are activated by --pulse. There are also tests in harness that perform unit test of parts of the code that may give you inspiration.

The gist of if:

size your input using https://docs.rs/tract-pulse/0.14.1/tract_pulse/fact/fn.stream_dim.html for the time dimension.
don't use the into_optimised() helper. Instead "analyse", "type" and "declutter" your net.
Once you're there, you can create a PulseModel using https://docs.rs/tract-pulse/0.14.1/tract_pulse/model/trait.PulsedModelExt.html#tymethod.new with a pulse of one as you want to process your input step by step (you could process by bigger chunk of time to improve efficiency).
observe the output fact from your network. It has a delay : this will gives you a number of frame you will have to ignore while reading the output. tract will output one pulse everytime you input one, you will have to skip delay frames at the beginning of the results, this is the time taken to feed the convolution buffers.
call into_model() and into_optimised() and into_runnable() on the pulse model
create a SimpleState with https://docs.rs/tract-pulse/0.14.1/tract_pulse/internal/struct.SimpleState.html#method.new
call run on the state for each input frame. collect and concatenate (or pipe to your next processing stage) the result. don't forget to skip "delay" output frames at the beginning

The command line can be helpful too (try tract -i 1,S,<F> --pulse 1 model.onnx dump for instance, with an actual capitalized S and the real value for F).

Try it, show me where you get stuck. I will help :)

Rikorose · 2021-05-18T13:39:57Z

Thanks for your explanation and pointers! I will probably need some time to get things rolling, but will come back eventually.

Rikorose · 2021-05-28T13:01:51Z

Another question about the delay:
Assuming, the convolution buffers get initialized with zeros, the first pulse should be somewhat meaningful, no?
During training, I align the convolutions in time, so they don't introduce any algorithmic delay. Thus, tract will only report the required delay to fill all convolution buffers, but does not know about the algorithmic delay, right?

kali · 2021-05-28T14:46:41Z

Tell me if this helps:

1D convolution, kernel of 3, valid:

input:    1   2   3   4   5   6   7   8   9
output:	  *   *   123 234 345 456 567 678 789

123 represent the application of the convolution kernel
to input 1, 2, and 3

tract delay: 2, first two frame outputed have undefined value.

1D convolution, kernel of 3, left padded to same size:

input:    1   2   3   4   5   6   7   8   9
output:	  001 012 123 234 345 456 567 678 789

Here, 0 is the padding value, the first actual sample fed to tract is still 1.

tract reports a delay of 0

Rikorose · 2021-05-28T14:50:46Z

Yes, I mean the second example. Can the left padding to same size be specified per convolution, or for the entire model or input?

kali · 2021-05-28T14:58:53Z

Well, you have both options. ONNX will let you provide explicit padding for each convolution and put it all at the left (aka "before"), mimicking the second case.

Alternatively, you could declare the convolutions VALID in the ONNX model and feed the network enough 0 to flush the buffers out of their undefined values before starting pushing the actual signal. So that would be:

input:    0   0   1   2   3   4   5   6   7   8   9
output:   *   *   001 012 123 234 345 456 567 678 789

tract would still report a delay of 2, you need to feed two rounds of zero and skip their matching output. Then you can feed your signal and get the expected result.

Rikorose · 2021-06-17T15:53:09Z

I'm having some trouble with the streaming inputs.
I have two inputs, one with a streaming dim, one as an initial state without streaming dim:
python:

import onnx
m = onnx.load("enc.onnx")
print(m.graph.input[0])
print(m.graph.input[1])

name: "erb_feat"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 1
      }
      dim {
        dim_param: "time"
      }
      dim {
        dim_value: 16
      }
    }
  }
}

name: "h0emb"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 256
      }
    }
  }
}

I tried the following so far:

use tract_onnx::{
    prelude::{tvec, Datum, Framework, InferenceFact, InferenceModelExt, TractResult},
    tract_hir::shapefactoid,
};
use tract_pulse::internal::ToDim;
use tract_pulse::model::*;

fn main() -> TractResult<()> {
    let s = tract_pulse::fact::stream_dim();
    let erb_feat = InferenceFact::dt_shape(f32::datum_type(), shapefactoid!(1, 1, s, 16));
    let hemb = InferenceFact::dt_shape(f32::datum_type(), tvec!(1, 1, 256));
    let mut enc = tract_onnx::onnx()
        .model_for_path("../out/enc.onnx")?
        .with_input_fact(0, erb_feat)?
        .with_input_fact(1, hemb)?
        .with_input_names(&["erb_feat", "h0emb"])?;
    enc.analyse(true)?;
    let enc = enc.into_typed()?;
    let enc = enc.declutter()?;
    let pulsed = PulsedModel::new(&enc, 1)?;
    Ok(())
}

Resulting in the following error:

Error: Translating node #13 "h0emb" Source Pulsifier(1)

Caused by:
    Can not pulse a tensor with no streaming dim

Stack backtrace:
   0: anyhow::private::new_adhoc
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.41/src/lib.rs:632:36
   1: tract_pulse::fact::PulsedFact::from_tensor_fact_pulse::{{closure}}
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-pulse-0.14.2/src/fact.rs:51:28
   2: core::option::Option<T>::ok_or_else
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/core/src/option.rs:595:25
   3: tract_pulse::fact::PulsedFact::from_tensor_fact_pulse
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-pulse-0.14.2/src/fact.rs:48:27
   4: tract_pulse::ops::source::pulsify
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-pulse-0.14.2/src/ops/source.rs:14:23
   5: tract_pulse::ops::source::register_all::{{closure}}
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-pulse-0.14.2/src/macros.rs:35:25
   6: core::ops::function::FnOnce::call_once
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/core/src/ops/function.rs:227:5
   7: <tract_pulse::model::Pulsifier as tract_core::model::translator::Translate<tract_core::model::fact::TypedFact,alloc::boxed::Box<dyn tract_core::ops::TypedOp>,tract_pulse::fact::PulsedFact,alloc::boxed::Box<dyn tract_pulse::ops::PulsedOp>>>::translate_node
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-pulse-0.14.2/src/model.rs:104:13
   8: tract_core::model::translator::Translate::translate_model_with_mappings
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.14.2/src/model/translator.rs:35:27
   9: <tract_core::model::graph::Graph<tract_pulse::fact::PulsedFact,alloc::boxed::Box<dyn tract_pulse::ops::PulsedOp>> as tract_pulse::model::PulsedModelExt>::new_with_mapping
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-pulse-0.14.2/src/model.rs:28:9
  10: <tract_core::model::graph::Graph<tract_pulse::fact::PulsedFact,alloc::boxed::Box<dyn tract_pulse::ops::PulsedOp>> as tract_pulse::model::PulsedModelExt>::new
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-pulse-0.14.2/src/model.rs:20:12
  11: clc_tract::main
             at /home/hendrik/projects/clcrs/clc-tract/src/main.rs:20:18
  12: core::ops::function::FnOnce::call_once
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/core/src/ops/function.rs:227:5
  13: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/std/src/sys_common/backtrace.rs:125:18
  14: std::rt::lang_start::{{closure}}
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/std/src/rt.rs:49:18
  15: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/core/src/ops/function.rs:259:13
      std::panicking::try::do_call
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/std/src/panicking.rs:401:40
      std::panicking::try
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/std/src/panicking.rs:365:19
      std::panic::catch_unwind
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/std/src/panic.rs:434:14
      std::rt::lang_start_internal
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/std/src/rt.rs:34:21
  16: std::rt::lang_start
             at /rustc/24bdc6d73a75dce9a7013ebc7c037013ff4ea099/library/std/src/rt.rs:48:5
  17: main
  18: __libc_start_main
  19: _start

kali · 2021-06-18T11:24:57Z

I think it is a bug. Can you give me the model ? you can use @.fr if you prefer to share by email.

Rikorose · 2021-06-18T12:07:34Z

2This allows to reproduce the error:

import torch
from torch import nn
import onnx

layer_count = 4

model = nn.LSTM(10, 20, num_layers=layer_count, bidirectional=True)
model.eval()

with torch.no_grad():
    input = torch.randn(5, 3, 10)
    h0 = torch.randn(layer_count * 2, 3, 20)
    c0 = torch.randn(layer_count * 2, 3, 20)
    output, (hn, cn) = model(input, (h0, c0))

    # default export
    torch.onnx.export(model, (input, (h0, c0)), 'lstm.onnx')
    onnx_model = onnx.load('lstm.onnx')
    # input shape [5, 3, 10]
    print(onnx_model.graph.input[0])

    # export with `dynamic_axes`
    torch.onnx.export(model, (input, (h0, c0)), 'lstm.onnx',
                    input_names=['input', 'h0', 'c0'],
                    output_names=['output', 'hn', 'cn'],
                    dynamic_axes={'input': {0: 'sequence'}, 'output': {0: 'sequence'}})
    onnx_model = onnx.load('lstm.onnx')
    # input shape ['sequence', 3, 10]
    print(onnx_model.graph.input[0])
    print(onnx_model.graph.input[1])
    print(onnx_model.graph.input[2])

use tract_onnx::{
    prelude::{tvec, Datum, Framework, InferenceFact, InferenceModelExt, TractResult},
    tract_hir::shapefactoid,
};
use tract_pulse::internal::ToDim;
use tract_pulse::model::*;

fn main() -> TractResult<()> {
    let s = tract_pulse::fact::stream_dim();
    let input = InferenceFact::dt_shape(f32::datum_type(), shapefactoid!(s, 3, 10));
    let h0 = InferenceFact::dt_shape(f32::datum_type(), tvec!(4*2, 3, 20));
    let c0 = InferenceFact::dt_shape(f32::datum_type(), tvec!(4*2, 3, 20));
    let mut enc = tract_onnx::onnx()
        .model_for_path("lstm.onnx")?
        .with_input_fact(0, input)?
        .with_input_fact(1, h0)?
        .with_input_fact(2, c0)?
        .with_input_names(&["input", "h0", "c0"])?;
    enc.analyse(true)?;
    let enc = enc.into_typed()?;
    let enc = enc.declutter()?;
    let pulsed = PulsedModel::new(&enc, 1)?;
    Ok(())
}

kali · 2021-06-19T07:44:54Z

Ok, there may be a bug, at least a not-so-helpful error message. But there will be a deeper issue here: you can not use pulses with bidirectional LSTM as the output is dependent on inputs that will only be given to the network in subsequent pulses.

The issue you observe is still present with simple causal LSTM layers, so I'm going to investigate it.

kali · 2021-06-19T08:04:46Z

Next issue... the error message is still not great, there is still an opportunity to get more robust, but you can fix it by just giving a value to h0 and c0 (hopefully this makes sense in your case). That way they will get absorbed early by the Scan op that implements LSTMs, and it will deal with the pulsing aspect. If you needd to delay providing h0 and c0, then we need some more work here.

    let h0 = Tensor::zero::<f32>(&[4 * 2, 3, 20])?;
    let c0 = Tensor::zero::<f32>(&[4 * 2, 3, 20])?;
[...]
        .with_input_fact(1, h0.into())?
        .with_input_fact(2, c0.into())?

But we have one more blocking one:

Caused by:
    Pulsification not implemented for more than one input to Concat

I think this is just basically a TODO :) having a look.

Rikorose · 2021-06-19T10:56:38Z

Ok, there may be a bug, at least a not-so-helpful error message. But there will be a deeper issue here: you can not use pulses with bidirectional LSTM as the output is dependent on inputs that will only be given to the network in subsequent pulses.

The issue you observe is still present with simple causal LSTM layers, so I'm going to investigate it.

I get the same error with unidirectional lstms.

kali · 2021-06-19T11:35:42Z

The issue you observe is still present with simple causal LSTM layers, so I'm going to investigate it.

I get the same error with unidirectional lstms.

Yep, I'm trying to fix these.

Rikorose · 2021-06-24T14:39:16Z

Another point which might be also related: RNN-type models with an output state (i.e. (h0, c0) in the lstm example above) return the following error:

[2021-06-24T14:35:21.973847707Z ERROR tract] Translating node #19 "GRU_86" Scan Pulsifier(1)

    Caused by:
        Scan pulse only supports full outputs

    Stack backtrace:
       0: tract_pulse::ops::scan::<impl tract_pulse::ops::PulsedOp for tract_core::ops::scan::mir::Scan>::pulsed_output_facts
       1: tract_pulse::model::<impl tract_core::model::graph::SpecialOps<tract_pulse::fact::PulsedFact,alloc::boxed::Box<dyn tract_pulse::ops::PulsedOp>> for tract_core::model::graph::Graph<tract_pulse::fact::PulsedFact,alloc::boxed::Box<dyn tract_pulse::ops::PulsedOp>>>::wire_node
       2: core::ops::function::FnOnce::call_once
       3: <tract_pulse::model::Pulsifier as tract_core::model::translator::Translate<tract_core::model::fact::TypedFact,alloc::boxed::Box<dyn tract_core::ops::TypedOp>,tract_pulse::fact::PulsedFact,alloc::boxed::Box<dyn tract_pulse::ops::PulsedOp>>>::translate_node
       4: tract_core::model::translator::Translate::translate_model_with_mappings
       5: tract::params::Parameters::pipeline
       6: tract::params::Parameters::from_clap
       7: tract::main
       8: std::sys_common::backtrace::__rust_begin_short_backtrace
       9: main
      10: <unknown>
      11: _start

Theoretically, the input and output states that do not have a pulse dimension could also be fully handled by tract-pulse. I.e. the model would need to initialize the first hidden state itself and pulse makes sure that the state is passed to the network on every call.

kali · 2021-06-24T15:12:43Z

Here we go. I knew this was going to be terrible as this API and its kirks is obviously not ready for generic consumption...

You may need 0.15.0 .

use tract_onnx::prelude::*;
use tract_onnx::tract_hir::shapefactoid;

use tract_pulse::internal::ToDim;
use tract_pulse::model::*;

fn constantize_input(
    model: &mut InferenceModel,
    name: &str,
    value: Arc<Tensor>,
) -> TractResult<()> {
    let node_id = model.node_by_name(name)?.id;
    model.node_mut(node_id).op =
        tract_onnx::tract_core::ops::konst::Const::new(value.clone()).into();
    model.node_mut(node_id).outputs[0].fact = InferenceFact::from(value);
    Ok(())
}

fn main() -> TractResult<()> {
    let s = tract_pulse::fact::stream_dim();
    let input = InferenceFact::dt_shape(f32::datum_type(), shapefactoid!(s, 3, 10));
    let mut enc = tract_onnx::onnx().model_for_path("lstm.onnx")?.with_input_fact(0, input)?;
    // need to erase h0 and c0 from network interface so pulse and scan do their thing
    // first the state inputs are made into constant
    let h0 = Tensor::zero::<f32>(&[4 * 2, 3, 20])?;
    let c0 = Tensor::zero::<f32>(&[4 * 2, 3, 20])?;
    constantize_input(&mut enc, "h0", h0.into())?;
    constantize_input(&mut enc, "c0", c0.into())?;

    // then we erase the state vectors outputs from the interface as tract will deal with them
    enc = enc.with_input_names(&["input"])?.with_output_names(&["Squeeze_15"])?;

    // bring network to typed and declutter form
    enc.analyse(true)?;
    let enc = enc.into_typed()?;
    let enc = enc.declutter()?;
    // transform to pulse form
    let pulsed = PulsedModel::new(&enc, 1)?;
    let delay = pulsed.output_fact(0)?.delay;
    assert_eq!(delay, 0); // no convo in the net, lstm do not introduce delay

    // then back to typed, then optimized and runnable form
    let model = pulsed.into_typed()?.into_optimized()?.into_runnable()?;

    // we can not use model.run, we have to do a bit explicit state management here
    let mut state = SimpleState::new(model)?;
    // loop over input stream
    for v in 0..4 {
        // mocked input chunks
        let input = tensor0(v as f32).broadcast_scalar_to_shape(&[1, 3, 10])?;
        let output = state.run(tvec!(input.into()))?;
        eprintln!("{:?}", output);
    }
    Ok(())
}

kali · 2021-06-24T15:15:17Z

So... yeah, basically tract already does what you were suggesting, but the onnx export has its own state management that gets in the way, so we need to kill a part of the network (state maintenance input and outputs) to make the pulsification happy.

Rikorose · 2021-06-24T16:10:55Z

Awesome, thanks for your help! Maybe it makes sense to include this in the examples folder?

kali · 2021-06-24T16:19:09Z

I don't think it's ready for making an example. How about making this issue a conversation instead of closing it, so people may be able to use it as a reference ?

kali closed this as completed Jun 24, 2021

sonos locked and limited conversation to collaborators Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Question about Pulse #496

Question about Pulse #496

Rikorose commented May 18, 2021

kali commented May 18, 2021 •

edited

Loading

Rikorose commented May 18, 2021

Rikorose commented May 28, 2021

kali commented May 28, 2021

Rikorose commented May 28, 2021

kali commented May 28, 2021

Rikorose commented Jun 17, 2021

kali commented Jun 18, 2021

Rikorose commented Jun 18, 2021 •

edited

Loading

kali commented Jun 19, 2021

kali commented Jun 19, 2021

Rikorose commented Jun 19, 2021

kali commented Jun 19, 2021

Rikorose commented Jun 24, 2021

kali commented Jun 24, 2021 •

edited

Loading

kali commented Jun 24, 2021

Rikorose commented Jun 24, 2021

kali commented Jun 24, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Question about Pulse #496

Question about Pulse #496

Comments

Rikorose commented May 18, 2021

kali commented May 18, 2021 • edited Loading

Rikorose commented May 18, 2021

Rikorose commented May 28, 2021

kali commented May 28, 2021

1D convolution, kernel of 3, valid:

1D convolution, kernel of 3, left padded to same size:

Rikorose commented May 28, 2021

kali commented May 28, 2021

Rikorose commented Jun 17, 2021

kali commented Jun 18, 2021

Rikorose commented Jun 18, 2021 • edited Loading

kali commented Jun 19, 2021

kali commented Jun 19, 2021

Rikorose commented Jun 19, 2021

kali commented Jun 19, 2021

Rikorose commented Jun 24, 2021

kali commented Jun 24, 2021 • edited Loading

kali commented Jun 24, 2021

Rikorose commented Jun 24, 2021

kali commented Jun 24, 2021

This issue was moved to a discussion.

kali commented May 18, 2021 •

edited

Loading

Rikorose commented Jun 18, 2021 •

edited

Loading

kali commented Jun 24, 2021 •

edited

Loading