About inference in real-time #3

liamsun2019 · 2022-04-06T01:49:42Z

Hi author,

Thanks for your such excellent work. I did some training and tests based on your paper and codes and the results are good. I am now curious about the inference in real time. My intention is to estimate the 3D coordinates while playing back a video. According to your strategy and demo code, estimation for a center frame need the 2D poses before and after it, which means the 3D pose of a certain frame cannot be achieved until the 2D poses after it are calculated. But for real-time inference, the 2D pose sequence after a centain frame cannot be acquired while it's played back.

I am now in a dilemma. I have already a 2D pose estimator which achieves a good balance between performance and speed even after being deployed on a mobile device after quantization. My thought is to use it plus P-STMO to act as a real-time 3D pose estimator, i.e, firstly get the 2D poses and then acquire the 3D pose. Actually I am a little confused about the training strategy. My understanding is that the frames "before" current frame are supposed to be enough for prediction, why are the frames "after" current one also collected for training? It's the case seen from your training code. My naive idea is just using the "before" frames as the input sequence for inference exclusive of the "after" frames. Appreicate your comment, big thanks.

paTRICK-swk · 2022-04-07T03:20:00Z

Frames after the current one are used to maintain the continuity of the movement. You can replace the symmetric convolutions in MOFA with causal convolutions, which are used in this paper, for real-time inference.

liamsun2019 · 2022-04-07T06:29:16Z

Thanks for your prompt reply. My understanding is that strided_transformer_encoder.py is supposed to be MOFA module. But for Conv1d, I have not seen any logic about dilation/padding/kernel_size in this module. Looks like the 'dilation' argument is always set to default value, i.e, 1. Could you explain in more detail about the replacement by causal convolution?

On the other hand, I conducted some tests where only the left_padding is applied to the input training sequence, i.e, the right_padding is always 0. The resulted model has also a good accuracy against human3.6M.

paTRICK-swk · 2022-04-07T14:20:40Z

For more details about the causal convolution, please refer to Figure 6 in the paper I mentioned. It only performs 1D convolutions on the frames before the current frame. This approach is essentially the same as your implementation (right_padding=0).

liamsun2019 · 2022-04-08T01:48:18Z

Appreciate your help. Will ask you for advice in case of further questions. This issue could be closed.

vicentowang · 2022-07-21T03:11:16Z

@paTRICK-swk

vicentowang · 2022-07-21T03:16:08Z

Frames after the current one are used to maintain the continuity of the movement. You can replace the symmetric convolutions in MOFA with causal convolutions, which are used in this paper, for real-time inference.
@paTRICK-swk @liamsun2019
I just changge the pad parameters from 121 to 243, which means the many-to-one frame aggregator will refer to the last frame, is that the right way to achieve real time 3d pose estimation ?

paTRICK-swk · 2022-07-25T04:03:04Z

Frames after the current one are used to maintain the continuity of the movement. You can replace the symmetric convolutions in MOFA with causal convolutions, which are used in this paper, for real-time inference. @paTRICK-swk @liamsun2019 I just changge the pad parameters from 121 to 243, which means the many-to-one frame aggregator will refer to the last frame, is that the right way to achieve real time 3d pose estimation ?

No, changing the parameters from 121 to 243 will still pad the left and right sides of the current frame. You need to modify the convolution to achieve real-time 3d pose estimation. You can refer to this repo for causal convolutions.

vicentowang · 2022-08-16T10:00:59Z

Frames after the current one are used to maintain the continuity of the movement. You can replace the symmetric convolutions in MOFA with causal convolutions, which are used in this paper, for real-time inference. @paTRICK-swk @liamsun2019 I just changge the pad parameters from 121 to 243, which means the many-to-one frame aggregator will refer to the last frame, is that the right way to achieve real time 3d pose estimation ?

No, changing the parameters from 121 to 243 will still pad the left and right sides of the current frame. You need to modify the convolution to achieve real-time 3d pose estimation. You can refer to this repo for causal convolutions.

I mean changge the pad parameters from 121 to 243 to make the supervised frames as last frame, which means the network learns to predict the last frame pose, symmetric convolutions or causal convolutions may have no big difference.

Edu4444 · 2022-09-19T07:01:24Z

Thanks for your prompt reply. My understanding is that strided_transformer_encoder.py is supposed to be MOFA module. But for Conv1d, I have not seen any logic about dilation/padding/kernel_size in this module. Looks like the 'dilation' argument is always set to default value, i.e, 1. Could you explain in more detail about the replacement by causal convolution?

On the other hand, I conducted some tests where only the left_padding is applied to the input training sequence, i.e, the right_padding is always 0. The resulted model has also a good accuracy against human3.6M.

For more details about the causal convolution, please refer to Figure 6 in the paper I mentioned. It only performs 1D convolutions on the frames before the current frame. This approach is essentially the same as your implementation (right_padding=0).

Hello. I'm also interested in causal convolutions for real time processing, but I am not able to find left_padding or right_padding in the code.
Where are these variables?

noahcoolboy · 2023-01-08T20:46:25Z

Have you been successful at training the causal model? It would save me quite some money if it has been trained already

liamsun2019 closed this as completed Apr 8, 2022

sylyt62 mentioned this issue Aug 8, 2022

Do I need to re-train the model in order to do causal prediction? #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About inference in real-time #3

About inference in real-time #3

liamsun2019 commented Apr 6, 2022

paTRICK-swk commented Apr 7, 2022

liamsun2019 commented Apr 7, 2022

paTRICK-swk commented Apr 7, 2022

liamsun2019 commented Apr 8, 2022

vicentowang commented Jul 21, 2022 •

edited

Loading

vicentowang commented Jul 21, 2022 •

edited

Loading

paTRICK-swk commented Jul 25, 2022

vicentowang commented Aug 16, 2022 •

edited

Loading

Edu4444 commented Sep 19, 2022

noahcoolboy commented Jan 8, 2023

About inference in real-time #3

About inference in real-time #3

Comments

liamsun2019 commented Apr 6, 2022

paTRICK-swk commented Apr 7, 2022

liamsun2019 commented Apr 7, 2022

paTRICK-swk commented Apr 7, 2022

liamsun2019 commented Apr 8, 2022

vicentowang commented Jul 21, 2022 • edited Loading

vicentowang commented Jul 21, 2022 • edited Loading

paTRICK-swk commented Jul 25, 2022

vicentowang commented Aug 16, 2022 • edited Loading

Edu4444 commented Sep 19, 2022

noahcoolboy commented Jan 8, 2023

vicentowang commented Jul 21, 2022 •

edited

Loading

vicentowang commented Jul 21, 2022 •

edited

Loading

vicentowang commented Aug 16, 2022 •

edited

Loading