Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with new dataset #10

Closed
MingZJU opened this issue Dec 25, 2021 · 37 comments
Closed

Training with new dataset #10

MingZJU opened this issue Dec 25, 2021 · 37 comments

Comments

@MingZJU
Copy link

MingZJU commented Dec 25, 2021

Hi, Happy Christmas!

I'd like to train the model with hq images (320X320) and need some help.

There is a loss error in line 133 of "color_syncnet_train.py". The loss is out of (0, 1).

It seems work when I change loss function
from
logloss = nn.BCELoss()
to
logloss = nn.BCEWithLogitsLoss()

Is it OK? Do I have to make other changes?

The training lasts more than 4 days now, loss around 0.54 after 940, 000 steps.

Thanks in advance.

@ghost
Copy link

ghost commented Dec 26, 2021

I've used logloss = nn.BCEWithLogitsLoss() to solve this problem. But then I realized it tended to vanishing problem again. So I decided to apply Wasserstein distance loss

@MingZJU
Copy link
Author

MingZJU commented Dec 27, 2021

Many thanks, I'll try this loss.

@ghost ghost closed this as completed Dec 27, 2021
@MingZJU
Copy link
Author

MingZJU commented Mar 30, 2022

I recently trained this model, but the discriminator's eval loss is only 0.30, can not go down to ~0.25.
Is it reasonable?
Thanks a lot.

@ghost
Copy link

ghost commented Mar 30, 2022

let me know your batch_size, learning_rate, and logging when you're training

@ghost
Copy link

ghost commented Mar 30, 2022

you can give me some of your sample?

@MingZJU
Copy link
Author

MingZJU commented Mar 31, 2022

Sorry for late reply.
Parameters: batch size 128, learning rate 5e-5, AdamW. I have not recorded logging details.

One sample is:

en1_88.mp4

It doesn't sync well.

@ghost
Copy link

ghost commented Mar 31, 2022

have you used Wasserstein distance or BCE loss?
did you scale up the audio block in wav2lip model?

@MingZJU
Copy link
Author

MingZJU commented Mar 31, 2022

I used both ReLU and BCELoss. I also tried Wasserstein distance, but even worse.

what is your meaning of "scale up audio block"?

@ghost
Copy link

ghost commented Mar 31, 2022

why don't you use leaky instead of relu?
because relu lead to vanishing problem
scale up audio block, I mean you need to add more hidden layers

@MingZJU
Copy link
Author

MingZJU commented Mar 31, 2022

When I used leakyrelu, the loss did not converge, so I tried relu.

Many thanks, I will scale up audio block.

@ghost
Copy link

ghost commented Apr 1, 2022

ok, if you need some help, you can ask this thread

@MingZJU
Copy link
Author

MingZJU commented Apr 4, 2022

Can you tell me how to compute Wasserstein distance in syncnet?

def cosine_loss(a, v, y):
d = nn.functional.cosine_similarity(a, v)
loss = torch.mean(torch.squeeze(y)) - torch.mean(d)

There are other methods like torch.mean(y*d) or use scipy.stats.wasserstein_distance, but seems not right.

@ghost
Copy link

ghost commented Apr 4, 2022

you need to understand theory and implement from scratch

@ghost
Copy link

ghost commented Apr 27, 2022

hi, what is your infrastructure? 10 GPUS? telsa v100?

@MingZJU
Copy link
Author

MingZJU commented Apr 28, 2022

now I use 4 * V100

@ghost
Copy link

ghost commented Apr 28, 2022

with batch size 128, How much memory does it take up?

@MingZJU
Copy link
Author

MingZJU commented Apr 29, 2022

With batch size 64, it takes less than 13G.
you cannot set 128 unless using Automatic Mixed Precision or your V100 with 32GB memory.

My V100 is 16GB.

@ghost
Copy link

ghost commented May 4, 2022

with color_syncnet_train.py, I trained on batch_size 128, lr=1e5, it has converged to 0.23
you need to use syncnet to evaluate your dataset. the score need to be in range [-5,5]

@ghost
Copy link

ghost commented May 4, 2022

I'm checking my wasserstein distance loss

@MingZJU
Copy link
Author

MingZJU commented May 5, 2022

thanks.
I do use the syncnet to correct my datasets.
The loss is only 0.28, I guess my wasserstein loss is still not right.

@ghost
Copy link

ghost commented May 5, 2022

can you show me your Wasserstein loss?

@MingZJU
Copy link
Author

MingZJU commented May 6, 2022

I followed this blog.
the loss is very simple: torch.mean(torch.mul(y_true, y_pred)), not right.
I may modify it from scratch in spare time.

This example genereated using bceloss and relu.

result_voice.mp4

@ghost ghost mentioned this issue May 27, 2022
@ghost
Copy link

ghost commented May 27, 2022

good job bro :) I can see your result is very good

@NikitaKononov
Copy link

Many thanks, I will scale up audio block.

Hello, can you please suggest how to scale audio block and does it make sence?

@NikitaKononov
Copy link

why don't you use leaky instead of relu?

Hello)
I can see PReLU activation func in Conv2d and Conv2dTranspose
LeakyReLU is in nonorm_Conv2d
Original implementation uses ReLU
Where should I place LeakyReLU? in all blocks?

@ZestfulCitrus
Copy link

I followed this blog. the loss is very simple: torch.mean(torch.mul(y_true, y_pred)), not right. I may modify it from scratch in spare time.

This example genereated using bceloss and relu.

result_voice.mp4

I would like to buy this model, how much is it ? connect me with 1243137612@qq.com @MingZJU

@Curisan
Copy link

Curisan commented Jan 29, 2023

I followed this blog. the loss is very simple: torch.mean(torch.mul(y_true, y_pred)), not right. I may modify it from scratch in spare time.

This example genereated using bceloss and relu.

result_voice.mp4

Do you train the model using AVSpeech dataset? @MingZJU

@MingZJU
Copy link
Author

MingZJU commented Jan 29, 2023 via email

@Curisan
Copy link

Curisan commented Jan 30, 2023

Thanks. I use AVSpeech dataset. And I did the following:

  1. choose the 25fps video
  2. using syncnet_python to filter dataset in range [-1,1]
    And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU

@MingZJU
Copy link
Author

MingZJU commented Jan 30, 2023 via email

@Curisan
Copy link

Curisan commented Jan 30, 2023

Thanks, Also how many epoches do you use to drop loss below 0.3? @MingZJU

@NikitaKononov
Copy link

Thanks, Also how many epoches do you use to drop loss below 0.3? @MingZJU

It takes 750-1250 epochs to get there, depending on you data, optimizer and LR

@Curisan
Copy link

Curisan commented Feb 6, 2023

Thanks for your suggestion. I use Relu, BCELoss, and AdamW and the train loss can drop below to 0.3. But the eval loss is about 0.42 and the eval loss value seems to be increasing when train loss decrease. It seems to be overfitting. Now I use 32000 video to train the syncnet. Did you meet the problem? Can you give me some suggestion? @MingZJU @NikitaKononov

@ThetaRgo
Copy link

ThetaRgo commented Apr 19, 2023

2. using syncnet_python to filter dataset in range [-1,1]

hello i was stoped at "using syncnet_python to filter dataset in range [-1,1]" ,can you paste your process code ?
my code is

def parse_args():
parser = argparse.ArgumentParser(description="SyncNet Python - Filter Videos")
parser.add_argument('--tmp_dir', type=str)
parser.add_argument('--reference', type=str)
parser.add_argument('--batch_size', type=int, default='20', help='');
parser.add_argument('--vshift', type=int, default='15', help='');
return parser.parse_args()

if name == "main":
args ={}
syncnet_instance = SyncNetInstance(args)

input_videos_folder = '/output/pyavi/'  # Your input videos directory
output_videos_folder = 'filtered_videos'  # The directory to save the filtered videos
if not os.path.exists(output_videos_folder):
    os.makedirs(output_videos_folder)
opt = parse_args()
flist = glob.glob(os.path.join(input_videos_folder, '*', '*.avi'))
print('flist', flist)

for video_file in flist:
    opt.tmp_dir = 'temp'
    opt.reference = os.path.basename(video_file).split('.')[0]
    # Evaluate synchronization
    results = syncnet_instance.evaluate(opt,videofile=video_file)
    print(results)
    confidence = results['conf']
    offset = results['offset']

    # Check if the video meets the filtering criteria
    if 6 <= confidence <= 9 and -3 <= offset <= 3:
        # Copy the video to the output folder
        shutil.copy(video_file, os.path.join(output_videos_folder, os.path.basename(video_file)))
    else:
        print('pass a bad video:',video_file)
print("Filtering complete!")

get an error can not soloved. RuntimeError: mat1 dim 1 must match mat2 dim 0, follow the issue,
i have exected run_pipeline.py but still get the same error 。

@kike-0304
Copy link

谢谢。我使用 AVSpeech 数据集。我做了以下事情:

  1. 选择25fps视频
  2. 使用syncnet_python过滤范围[-1,1]内的数据集
    ,最后使用32000视频来训练syncnet。我还使用 bceloss 和 relu。但损失只能降至0.5。你能给我一些建议吗?你需要多长时间才能将损失降至 0.3?@MingZJU

Thanks. I use AVSpeech dataset. And I did the following:

  1. choose the 25fps video
  2. using syncnet_python to filter dataset in range [-1,1]
    And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU

May I ask how you use syncnet_ Python batch filters datasets within the range of [-1,1], and I have thousands of videos

@Yyyyyyxh
Copy link

谢谢。我使用 AVSpeech 数据集。我做了以下事情:

  1. 选择25fps视频
  2. 使用syncnet_python过滤范围[-1,1]内的数据集
    ,最后使用32000视频来训练syncnet。我还使用 bceloss 和 relu。但损失只能降至0.5。你能给我一些建议吗?你需要多长时间才能将损失降至 0.3?@MingZJU

Thanks. I use AVSpeech dataset. And I did the following:

  1. choose the 25fps video
  2. using syncnet_python to filter dataset in range [-1,1]
    And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU

May I ask how you use syncnet_ Python batch filters datasets within the range of [-1,1], and I have thousands of videos

请问您知道如何筛选数据集吗,如何使用syncnet_Python

@tailangjun
Copy link

tailangjun commented Dec 6, 2023

谢谢。我使用 AVSpeech 数据集。我做了以下事情:

  1. 选择25fps视频
  2. 使用syncnet_python过滤范围[-1,1]内的数据集
    ,最后使用32000视频来训练syncnet。我还使用 bceloss 和 relu。但损失只能降至0.5。你能给我一些建议吗?你需要多长时间才能将损失降至 0.3?@MingZJU

Thanks. I use AVSpeech dataset. And I did the following:

  1. choose the 25fps video
  2. using syncnet_python to filter dataset in range [-1,1]
    And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU

May I ask how you use syncnet_ Python batch filters datasets within the range of [-1,1], and I have thousands of videos

请问您知道如何筛选数据集吗,如何使用syncnet_Python

我在linux上执行命令 bash calculate_scores_real_videos.sh you_folder_name,这样会在 syncnet_python目录下生成一个结果文件 all_scores.txt,关键代码是:offset, conf, dist = s.evaluate(opt,videofile=fname)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants