Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional training for Motion Imitation #21

Closed
ryo12882 opened this issue Oct 4, 2019 · 4 comments
Closed

Additional training for Motion Imitation #21

ryo12882 opened this issue Oct 4, 2019 · 4 comments

Comments

@ryo12882
Copy link

ryo12882 commented Oct 4, 2019

Hi, thank you for your awesome work.

Btw, I tried to transfer my own with other target images.
Basically, it works but my head doesn't look like me. My hair style and face don't reflect.
Then, I assume this happened because of pre-trained model.
I saw datasets and found out most of people are short and black hair.
What do you think? And if so, how do I train more datasets?

And also, I tried to do for fashion model who you provided. It works well!!
I'm wondering what's going on.

Thanks in advance.

@StevenLiuWen
Copy link
Collaborator

StevenLiuWen commented Oct 4, 2019

@ryo12882 Hi, your guess is right. Our dataset (iPER focused on video) is not very big (though we have tried to make it larger), and it contains around 20 different people wearing 82 clothes with different textures in the training set. All (most) people are short black hairs. So, if the model trained on our iPER dataset, the results might be prone to be the training patterns (like the face, hair, and style of clothes)

The Fashion dataset has a more variety on the style of clothes and hairs than our iPER dataset. However, it only provides paired images with different views of the same person (not videos). So, the best model is trained when combining these two datasets together.

There are two ways to improve the results:

  1. Improve the generalization of the model.
    First, it needs you to collects as more abundant as possible of videos with people dancing, from the Internet or other ways, and do some data cleaning. Then, training your new large collected dataset (The codes about how to train a model from your own dataset are under cleaning. After we have clean the training codes, we will release it as soon as possible).

  2. Push the model focuses on a specifical person.
    If you just want the model work on yourself or other specifical people. You can prepare a short video like our iPER dataset. The person does A pose and additional random motions.
    Then, you can finetune the networks with your prepared video from our pretrained model. It could result in more high-fidelity results, and the shorting comings are obvious – that everyone needs to finetune their own model. However, this requirement is reasonable, and the codes are under testing and cleaning. After we have done, we will release all mentioned codes.

@ryo12882
Copy link
Author

ryo12882 commented Oct 5, 2019

Thank you for your quick response.

I understand all you explained.
I would like to try both of ways but especially fine-tuning!

I also would like to ask you about two ways.

  1. When is your team going to release these points?
  2. What do you think how many datasets this requires to get reasonable results?
  3. What do you think how long duration of video this requires to fine-tune?

And finally, I would like to join this project and help these tasks.

Thank you.

@ryo12882 ryo12882 closed this as completed Nov 6, 2019
@OldChi
Copy link

OldChi commented Dec 2, 2019

@ryo12882 Hi, your guess is right. Our dataset (iPER focused on video) is not very big (though we have tried to make it larger), and it contains around 20 different people wearing 82 clothes with different textures in the training set. All (most) people are short black hairs. So, if the model trained on our iPER dataset, the results might be prone to be the training patterns (like the face, hair, and style of clothes)

The Fashion dataset has a more variety on the style of clothes and hairs than our iPER dataset. However, it only provides paired images with different views of the same person (not videos). So, the best model is trained when combining these two datasets together.

There are two ways to improve the results:

  1. Improve the generalization of the model.
    First, it needs you to collects as more abundant as possible of videos with people dancing, from the Internet or other ways, and do some data cleaning. Then, training your new large collected dataset (The codes about how to train a model from your own dataset are under cleaning. After we have clean the training codes, we will release it as soon as possible).
  2. Push the model focuses on a specifical person.
    If you just want the model work on yourself or other specifical people. You can prepare a short video like our iPER dataset. The person does A pose and additional random motions.
    Then, you can finetune the networks with your prepared video from our pretrained model. It could result in more high-fidelity results, and the shorting comings are obvious – that everyone needs to finetune their own model. However, this requirement is reasonable, and the codes are under testing and cleaning. After we have done, we will release all mentioned codes.

could u please talk about how to finetune the networks when the result is of low quality?

@leesky1c
Copy link

leesky1c commented Aug 5, 2020

@ryo12882 Hi, your guess is right. Our dataset (iPER focused on video) is not very big (though we have tried to make it larger), and it contains around 20 different people wearing 82 clothes with different textures in the training set. All (most) people are short black hairs. So, if the model trained on our iPER dataset, the results might be prone to be the training patterns (like the face, hair, and style of clothes)

The Fashion dataset has a more variety on the style of clothes and hairs than our iPER dataset. However, it only provides paired images with different views of the same person (not videos). So, the best model is trained when combining these two datasets together.

There are two ways to improve the results:

  1. Improve the generalization of the model.
    First, it needs you to collects as more abundant as possible of videos with people dancing, from the Internet or other ways, and do some data cleaning. Then, training your new large collected dataset (The codes about how to train a model from your own dataset are under cleaning. After we have clean the training codes, we will release it as soon as possible).
  2. Push the model focuses on a specifical person.
    If you just want the model work on yourself or other specifical people. You can prepare a short video like our iPER dataset. The person does A pose and additional random motions.
    Then, you can finetune the networks with your prepared video from our pretrained model. It could result in more high-fidelity results, and the shorting comings are obvious – that everyone needs to finetune their own model. However, this requirement is reasonable, and the codes are under testing and cleaning. After we have done, we will release all mentioned codes.

Hi, would you mind telling us how to train the model with the Fashion dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants