Clarification on Pose to Body #8

cuuupid · 2018-08-21T19:31:47Z

I have a few questions regarding the pose --> body task. From the paper,

Dance video dataset. We download YouTube dance videos for the pose to human motion
synthesis task. Each video is about 3 ∼ 4 minutes at 1280 × 720 resolution, and we crop the
central 512×720 regions. We extract human poses with the DensePose [21] and the OpenPose [5]
algorithms, and directly concatenate the results together. The training set includes a dance video
from a single dancer, while the test set contains videos of other dance motions or from other
dancers.

By "directly concatenate" do you also layer the DensePose UV pose on the OpenPose color-coded pose? Or just use one of the two, and if so which performs better?
The provided example shows a still background for the dancer. If I understand correctly would a video with changing background or multiple people cause issues? e.g. the demonstration for DensePose and OpenPose includes a fast paced multi-person dance video. However the poses generated are all more or less synchronized, and the background is not encoded in any way. Would the model generate mostly noise for the background in this case, and would it be able to synthesize human bodies from multiple pose estimations?
How does the test set perform against other dancers? Do changes such as height, limb length, etc. cause issues in generation?
Lastly, is there a dockerized version of this repository available? Alternatively, would this model compile inside the Flownet2 Docker container?

Thanks!

tcwang0509 · 2018-08-21T19:50:53Z

Yes we concatenate both of them together. If you can only use one, DensePose works better; we added OpenPose mainly for the face and hand regions.
Changing background might be an issue, as the network would get confused. An easier way might be to just mask out the background, or as you mentioned, encode the background into another vector. Multiple people still kind of works in our experiments, but usually not as good as for single person.
Small changes are fine. If the changes are large, there would be some quality degradation.
We currently don't have a docker yet. But since it only depends on PyTorch (except for the Flownet2 part), getting it running shouldn't be hard. Let me know if you run into any problem.

Tetsujinfr · 2018-08-23T00:31:32Z

Can you pease elaborate on the steps of the workflow for the pose-to-body application?
I am not clear about the training data format, e.g. do you take densepose/open pose binary masks? Do you use gradient masks ?
Whe you say "concatenate" for densepose and openpose, does it mean you take the union of the masks?
What about the target video, do you inject some style image on top of the pose data to generate the target video? I am not clear on the nature of the data for the training nor for the inference, could you clarify a bit or point me to the materials with the details (I could not find much details in the paper).
Thank you

tcwang0509 · 2018-08-23T00:38:07Z

Both DensePose and OpenPose will generate a (3-channel) color image out of the box. I simply concatenated these two images together, forming a 6-channel input to the network.

dustinfreeman · 2018-08-28T19:54:32Z

@pshah123 I added a functioning Docker as a pull request at #23
Let me know if find any issues!

cuuupid · 2018-09-17T02:41:25Z

@dustinfreeman haven't been able to get either to work for Pose2Body, getting CUDA issues as described in #32

tsing90 · 2018-10-13T20:38:33Z

@tcwang0509 Hi, it's an amazing work, regarding to pose to body task, during my training, I found the face of dancer was the most difficult part to generate using your model, is there any trick you applied to optimise face exclusively? like the one used in the paper "everybody dance now"?
One more question regarding to pose to body task, when doing inference after well trained, does the model apply pose transform (e.g. transform the limb ratio between source and target person) on extracted pose?
thanks !

tcwang0509 · 2018-10-16T21:00:17Z

Regarding face, do you observe quality degrade on training or test images? If it's training, there's an additional face discriminator (by ----add_face_disc) which you can increase the weight (face_weight in vid2vid_model_D.py). If it's test, try --remove_face_labels so the network can rely less on the face labels during training.
During inference currently no transformation is done. There is a normalize_pose function in pose_dataset.py, which only normalizes the overall size and position of the person.

tsing90 · 2018-10-23T17:48:03Z

@tcwang0509 many thanks for your suggestions, very useful, now it's much better

wswdx · 2018-10-25T12:25:39Z

Regarding face, do you observe quality degrade on training or test images? If it's training, there's an additional face discriminator (by ----add_face_disc) which you can increase the weight (face_weight in vid2vid_model_D.py). If it's test, try --remove_face_labels so the network can rely less on the face labels during training.
During inference currently no transformation is done. There is a normalize_pose function in pose_dataset.py, which only normalizes the overall size and position of the person.

Excuse me? When I use the option -add_face_disc, I got an error:

Traceback (most recent call last):
File "train.py", line 295, in
train()
File "train.py", line 115, in train
losses = modelD(0, reshape([real_B, fake_B, fake_B_raw, real_A, real_B_prev, fake_B_prev, flow, weight, flow_ref, conf_ref]))
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data/weidx/vid2vid/models/vid2vid_model_D.py", line 202, in forward
real_A[:,:,ys:ye,xs:xe], real_B[:,:,ys:ye,xs:xe], fake_B[:,:,ys:ye,xs:xe])
File "/data/weidx/vid2vid/models/vid2vid_model_D.py", line 102, in compute_loss_D
loss_D_real = self.criterionGAN(pred_real, True)
File "/data/weidx/vid2vid/models/networks.py", line 731, in call
if isinstance(input[0], list):
IndexError: list index out of range

Do you know why and how to fix it? Thanks in advance~

wswdx · 2018-10-25T13:47:26Z

Regarding face, do you observe quality degrade on training or test images? If it's training, there's an additional face discriminator (by ----add_face_disc) which you can increase the weight (face_weight in vid2vid_model_D.py). If it's test, try --remove_face_labels so the network can rely less on the face labels during training.
During inference currently no transformation is done. There is a normalize_pose function in pose_dataset.py, which only normalizes the overall size and position of the person.

Excuse me? When I use the option -add_face_disc, I got an error:

Traceback (most recent call last):
File "train.py", line 295, in
train()
File "train.py", line 115, in train
losses = modelD(0, reshape([real_B, fake_B, fake_B_raw, real_A, real_B_prev, fake_B_prev, flow, weight, flow_ref, conf_ref]))
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "/home/weidx/anaconda3/envs/pytorchgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data/weidx/vid2vid/models/vid2vid_model_D.py", line 202, in forward
real_A[:,:,ys:ye,xs:xe], real_B[:,:,ys:ye,xs:xe], fake_B[:,:,ys:ye,xs:xe])
File "/data/weidx/vid2vid/models/vid2vid_model_D.py", line 102, in compute_loss_D
loss_D_real = self.criterionGAN(pred_real, True)
File "/data/weidx/vid2vid/models/networks.py", line 731, in call
if isinstance(input[0], list):
IndexError: list index out of range

Do you know why and how to fix it? Thanks in advance~

Good news! I've just found the solution. It is because that when you want to add the face discriminator, the number of patch scales of the face discriminator will be set to "num_D - 2" according to the code:
Line 40 in "vid2vid_model_D.py"
self.netD_f = networks.define_D(netD_input_nc, opt.ndf, opt.n_layers_D, opt.norm,
opt.num_D - 2, not opt.no_ganFeat, gpu_ids=self.gpu_ids)
and "num_D" was set to 2 in the pose2body scripts, so the number of patch scales of the face discriminator will be set to 0, which is invalid and cause the above error. So what we should do is to modify the "opt.num_D - 2" to a fixed number(e.g. 1), so that the problem will be solved.
Hope this answer will help you guys! I wish contributors could update the code to fix this error.@tcwang0509

jakeelwes · 2020-03-01T12:19:22Z

@pshah123 in regards to your questions about multiple pose estimations, did you work this out and how to control it?

I have a model containing multiple people and it would be great to be able to interpolate between them, wondered if you had any ideas? Maybe some sort to z-vector to control the generator when testing, at the moment it just goes to whichever training person was closest to the densepose/openpose data.

With the paper you released a video of multiple output for face to edge, are these contained within one model (or 3 different models) and if so how do you control which one it generates?
https://youtu.be/LivIo2mB-gA

FangSen9000 · 2023-05-15T12:02:21Z

@dustinfreeman @tcwang0509 @cuuupid @jakeelwes @tsing90
Hello everyone, I encountered the following problem when using the sample small dataset to train the pose2body. When I type the following command, create data set, initialize neural network with create folder and then nothing happens, and the dataset uses examples, which is not a problem with the dataset.
I have two RTX3090, using multi card training mode, -- input_ nc is 3 because I only use openpose, and of course, I also used the relevant command, --num_D is 1 or 2, it does not affect, and the error is consistent.
I have configured flownet2 and tested it as normal using main.py. The numpy version is appropriate. If the 1.2 version or higher is used, an error will be reported that xx should be a floating point number
I think it may be a command issue, or I have overlooked something. Can someone give some advice? Anything is fine.

vid2vid# python train.py --name pose2body_256p \ --dataroot datasets/pose --dataset_mode pose \ --input_nc 3 --num_D 2 \ --resize_or_crop randomScaleHeight_and_scaledCrop --loadSize 384 --fineSize 256 \ --gpu_ids 0,1 --batchSize 8 --max_frames_per_gpu 3 \ --niter 5 --niter_decay 5 \ --no_first_img --n_frames_total 12 --max_t_step 4

This post is about the discussion of pose2body, and my statement here is appropriate.

dustinfreeman mentioned this issue Sep 4, 2018

How do you generate body masks for dance video? #27

Closed

linkAmy mentioned this issue Jun 27, 2019

train.py: THCudaCheck FAIL file=/pytorch/aten/src/THC/THCTensorCopy.cu line=102 error=77 : an illegal memory access was encountered #114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Pose to Body #8

Clarification on Pose to Body #8

cuuupid commented Aug 21, 2018

tcwang0509 commented Aug 21, 2018

Tetsujinfr commented Aug 23, 2018 •

edited

Loading

tcwang0509 commented Aug 23, 2018

dustinfreeman commented Aug 28, 2018 •

edited

Loading

cuuupid commented Sep 17, 2018 •

edited

Loading

tsing90 commented Oct 13, 2018 •

edited

Loading

tcwang0509 commented Oct 16, 2018

tsing90 commented Oct 23, 2018

wswdx commented Oct 25, 2018

wswdx commented Oct 25, 2018 •

edited

Loading

jakeelwes commented Mar 1, 2020 •

edited

Loading

FangSen9000 commented May 15, 2023 •

edited

Loading

Clarification on Pose to Body #8

Clarification on Pose to Body #8

Comments

cuuupid commented Aug 21, 2018

tcwang0509 commented Aug 21, 2018

Tetsujinfr commented Aug 23, 2018 • edited Loading

tcwang0509 commented Aug 23, 2018

dustinfreeman commented Aug 28, 2018 • edited Loading

cuuupid commented Sep 17, 2018 • edited Loading

tsing90 commented Oct 13, 2018 • edited Loading

tcwang0509 commented Oct 16, 2018

tsing90 commented Oct 23, 2018

wswdx commented Oct 25, 2018

wswdx commented Oct 25, 2018 • edited Loading

jakeelwes commented Mar 1, 2020 • edited Loading

FangSen9000 commented May 15, 2023 • edited Loading

Tetsujinfr commented Aug 23, 2018 •

edited

Loading

dustinfreeman commented Aug 28, 2018 •

edited

Loading

cuuupid commented Sep 17, 2018 •

edited

Loading

tsing90 commented Oct 13, 2018 •

edited

Loading

wswdx commented Oct 25, 2018 •

edited

Loading

jakeelwes commented Mar 1, 2020 •

edited

Loading

FangSen9000 commented May 15, 2023 •

edited

Loading