Skip to content

Delay less conv pulsification#840

Merged
kali merged 11 commits into
mainfrom
delay-less-conv-pulsification
Oct 15, 2022
Merged

Delay less conv pulsification#840
kali merged 11 commits into
mainfrom
delay-less-conv-pulsification

Conversation

@kali
Copy link
Copy Markdown
Collaborator

@kali kali commented Oct 13, 2022

No description provided.

@kali
Copy link
Copy Markdown
Collaborator Author

kali commented Oct 14, 2022

Valid convolution:
Conv(kernel_size = 3) =>
--[∂]--> Delay(overlap = 2) ---[∆]---> Conv ---[∂+2]--->

Making sense of ∆ is super hard (and not necessary super helpful).
If upstream the first frame comes in pulse #2, subframe #3, the first valid frame out of the
delay will come also in pulse #2. But ∆ is meant to be expressed in frame counts, so it would
be something like:
floor(∂/pulse) * (pulse + overlap) + ∂%pulse + overlap
or overlap * (floor(∂/pulse) + 1) + ∂%pulse

This is not what Delay outputs when it has overlap. It actually outputs ∂+overlap. But it does
not matter too much.

On top of that, how does the convolution computes it actual delay? Delay semantics would suggest
convolution compute its final delay based on ∆. But at this stage, it justs forwards the incoming delay
(which is locally wrong, but globally correct). Basically we are ignoring the actual ∆ value, and
hijack the delay from the intermediary overlapping stream to inform the Conv operator about its
actual delay...

Also what is the first "valid" frame if the overlap is bigger than the pulse? In that case, the
delay+overlap output could go over the input start more than once. Is it the first frame beyond
which there is no invalid input anymore? This wire is only here to support the convolution. The
convolution delay is what we must get right, and as long as there is invalid data is subsequent
frame, the convolution output is invalid, so it does not matter that we may have observe a couple
of valid frames.

So why worry about ∆ ? Because of padding.

Simple generic approach to padding, treat it separately, replacing:
Conv(padding=same) is Padding(padding = 1,1) then Conv(valid)
--[∂]--> Delay(delay = 1)
-[∂+1]-> PulsePad(before=1) <- adds one 0 to the left of the stream, so it's valid earlier
--[∂]--> Delay(overlap = 2)
--[∆]--> Conv
-[∂+2]->

@kali
Copy link
Copy Markdown
Collaborator Author

kali commented Oct 14, 2022

Side-effect of this approach to Padding: it adds extra delay in the output. With a "SAME" convolution it looks like this:

┏ 0 PulsedSource input
┃   ━━━ 1,1,5,F32 [pulse axis:2 ∂:0 full dim:S]
┣ 1 Delay output.delay-for-pad
┃   * axis: 2 delay: 1 overlap: 0
┃   * buffer: [Val(1), Val(1), Val(1)] F32
┃   ━━━ 1,1,5,F32 [pulse axis:2 ∂:1 full dim:S]
┣ 2 PulsePad output.pad
┃   * Mode: Constant(,F32 0), axis: 2 before: 1 after: 1
┃   ━━━ 1,1,5,F32 [pulse axis:2 ∂:0 full dim:S+2]
┣ 3 Delay output.delay
┃   * axis: 2 delay: 0 overlap: 2
┃   * buffer: [Val(1), Val(1), Val(2)] F32
┃   ━━━ 1,1,7,F32 [pulse axis:2 ∂:2 full dim:S+2]
┣ 4 ConvUnary output
    * Data format: NCHW
    * Kernel shape:[3] (strides:None, padding:Explicit([0], [0], false), dilations:None)
    * Kernel OIHW (groups:1), 1,1,3,F32 1, 2, 3
    ━━━ 1,1,5,F32 [pulse axis:2 ∂:2 full dim:S]

Convolution is delayed by 2, while the first valid output frame only needs two frames of input so could be computed with only a delay of 1.

And with a "padded causal" convolution, we are actually making it worse. In this case, all padding is put to the left so the first convolution is computed on [0, 0, first frame].

┏ 0 PulsedSource input
┃   ━━━ 1,1,5,F32 [pulse axis:2 ∂:0 full dim:S]
┣ 1 Delay output.delay-for-pad
┃   * axis: 2 delay: 2 overlap: 0
┃   * buffer: [Val(1), Val(1), Val(2)] F32
┃   ━━━ 1,1,5,F32 [pulse axis:2 ∂:2 full dim:S]
┣ 2 PulsePad output.pad
┃   * Mode: Constant(,F32 0), axis: 2 before: 2 after: 0
┃   ━━━ 1,1,5,F32 [pulse axis:2 ∂:0 full dim:S+2]
┣ 3 Delay output.delay
┃   * axis: 2 delay: 0 overlap: 2
┃   * buffer: [Val(1), Val(1), Val(2)] F32
┃   ━━━ 1,1,7,F32 [pulse axis:2 ∂:2 full dim:S+2]
┣ 4 ConvUnary output
    * Data format: NCHW
    * Kernel shape:[3] (strides:None, padding:Explicit([0], [0], false), dilations:None)
    * Kernel OIHW (groups:1), 1,1,3,F32 1, 2, 3
    ━━━ 1,1,5,F32 [pulse axis:2 ∂:2 full dim:S]

We still generate a delay of 2, while the first frame can actually be computed as soon as the first round, without additional delay.

@kali
Copy link
Copy Markdown
Collaborator Author

kali commented Oct 14, 2022

Strategy for delay-less translation:

Conv(padding=causal)
--[∂]--> Delay(overlap = 2)
--[?]--> PulsePad(before=2)
--[?]--> Conv
-[∂+2]->

Should Delay start outputting something obviously invalid as a delay when there is overlap ? And convolution "know" its output delay and just ignore the input delay ?

@kali kali marked this pull request as ready for review October 14, 2022 22:14
@kali kali merged commit ccd1947 into main Oct 15, 2022
@kali kali deleted the delay-less-conv-pulsification branch November 30, 2022 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant