Training with new dataset #10

MingZJU · 2021-12-25T08:44:15Z

Hi, Happy Christmas!

I'd like to train the model with hq images (320X320) and need some help.

There is a loss error in line 133 of "color_syncnet_train.py". The loss is out of (0, 1).

It seems work when I change loss function
from
logloss = nn.BCELoss()
to
logloss = nn.BCEWithLogitsLoss()

Is it OK? Do I have to make other changes?

The training lasts more than 4 days now, loss around 0.54 after 940, 000 steps.

Thanks in advance.

ghost · 2021-12-26T08:12:31Z

I've used logloss = nn.BCEWithLogitsLoss() to solve this problem. But then I realized it tended to vanishing problem again. So I decided to apply Wasserstein distance loss

MingZJU · 2021-12-27T08:19:15Z

Many thanks, I'll try this loss.

MingZJU · 2022-03-30T01:21:58Z

I recently trained this model, but the discriminator's eval loss is only 0.30, can not go down to ~0.25.
Is it reasonable?
Thanks a lot.

ghost · 2022-03-30T07:13:35Z

let me know your batch_size, learning_rate, and logging when you're training

ghost · 2022-03-30T07:38:29Z

you can give me some of your sample?

MingZJU · 2022-03-31T05:55:30Z

Sorry for late reply.
Parameters: batch size 128, learning rate 5e-5, AdamW. I have not recorded logging details.

One sample is:

en1_88.mp4

It doesn't sync well.

ghost · 2022-03-31T06:56:43Z

have you used Wasserstein distance or BCE loss?
did you scale up the audio block in wav2lip model?

MingZJU · 2022-03-31T07:31:08Z

I used both ReLU and BCELoss. I also tried Wasserstein distance, but even worse.

what is your meaning of "scale up audio block"?

ghost · 2022-03-31T10:24:40Z

why don't you use leaky instead of relu?
because relu lead to vanishing problem
scale up audio block, I mean you need to add more hidden layers

MingZJU · 2022-03-31T16:25:51Z

When I used leakyrelu, the loss did not converge, so I tried relu.

Many thanks, I will scale up audio block.

ghost · 2022-04-01T03:26:07Z

ok, if you need some help, you can ask this thread

MingZJU · 2022-04-04T07:18:48Z

Can you tell me how to compute Wasserstein distance in syncnet?

def cosine_loss(a, v, y):
d = nn.functional.cosine_similarity(a, v)
loss = torch.mean(torch.squeeze(y)) - torch.mean(d)

There are other methods like torch.mean(y*d) or use scipy.stats.wasserstein_distance, but seems not right.

ghost · 2022-04-04T12:55:22Z

you need to understand theory and implement from scratch

ghost · 2022-04-27T08:54:37Z

hi, what is your infrastructure? 10 GPUS? telsa v100?

MingZJU · 2022-04-28T09:41:31Z

now I use 4 * V100

ghost · 2022-04-28T13:48:11Z

with batch size 128, How much memory does it take up?

MingZJU · 2022-04-29T09:59:28Z

With batch size 64, it takes less than 13G.
you cannot set 128 unless using Automatic Mixed Precision or your V100 with 32GB memory.

My V100 is 16GB.

ghost · 2022-05-04T13:10:35Z

with color_syncnet_train.py, I trained on batch_size 128, lr=1e5, it has converged to 0.23
you need to use syncnet to evaluate your dataset. the score need to be in range [-5,5]

ghost · 2022-05-04T13:11:41Z

I'm checking my wasserstein distance loss

MingZJU · 2022-05-05T05:53:36Z

thanks.
I do use the syncnet to correct my datasets.
The loss is only 0.28, I guess my wasserstein loss is still not right.

ghost · 2022-05-05T07:37:37Z

can you show me your Wasserstein loss?

MingZJU · 2022-05-06T03:33:57Z

I followed this blog.
the loss is very simple: torch.mean(torch.mul(y_true, y_pred)), not right.
I may modify it from scratch in spare time.

This example genereated using bceloss and relu.

result_voice.mp4

ghost · 2022-05-27T17:07:51Z

good job bro :) I can see your result is very good

NikitaKononov · 2022-05-30T17:59:56Z

Many thanks, I will scale up audio block.

Hello, can you please suggest how to scale audio block and does it make sence?

NikitaKononov · 2022-05-30T18:02:50Z

why don't you use leaky instead of relu?

Hello)
I can see PReLU activation func in Conv2d and Conv2dTranspose
LeakyReLU is in nonorm_Conv2d
Original implementation uses ReLU
Where should I place LeakyReLU? in all blocks?

ZestfulCitrus · 2022-07-26T04:47:37Z

I followed this blog. the loss is very simple: torch.mean(torch.mul(y_true, y_pred)), not right. I may modify it from scratch in spare time.

This example genereated using bceloss and relu.

result_voice.mp4

I would like to buy this model, how much is it ? connect me with 1243137612@qq.com @MingZJU

Curisan · 2023-01-29T06:57:52Z

I followed this blog. the loss is very simple: torch.mean(torch.mul(y_true, y_pred)), not right. I may modify it from scratch in spare time.

This example genereated using bceloss and relu.

result_voice.mp4

Do you train the model using AVSpeech dataset? @MingZJU

MingZJU · 2023-01-29T12:56:32Z

you can follow Primepake's code and obtain your results.I downloaded avspeech but have no time to train a model.-------- 原始邮件 --------发件人： Curisan ***@***.***>日期： 2023年1月29日周日 14:58收件人： primepake/wav2lip_288x288 ***@***.***>抄送： MingZJU ***@***.***>, Mention ***@***.***>主题： Re: [primepake/wav2lip_288x288] Training with new dataset (Issue #10) I followed this blog. the loss is very simple: torch.mean(torch.mul(y_true, y_pred)), not right. I may modify it from scratch in spare time. This example genereated using bceloss and relu. result_voice.mp4 Do you train the model using AVSpeech dataset? @MingZJU —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

Curisan · 2023-01-30T01:33:38Z

Thanks. I use AVSpeech dataset. And I did the following:

choose the 25fps video
using syncnet_python to filter dataset in range [-1,1]
And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU

MingZJU · 2023-01-30T03:54:02Z

I used Mandarin video dataset collected by my partners. There are around 10,000 video clips with less than 1000 people. I think bceloss and relu is OK. The syncnet loss should drop soon below 0.5, after a few or decades of epoches.you may check your data again and try to modified the lr. you can also try to scale up audio encoder, though I haven't find its positive effects.-------- 原始邮件 --------发件人： Curisan ***@***.***>日期： 2023年1月30日周一 09:33收件人： primepake/wav2lip_288x288 ***@***.***>抄送： MingZJU ***@***.***>, Mention ***@***.***>主题： Re: [primepake/wav2lip_288x288] Training with new dataset (Issue #10) Thanks. I use AVSpeech dataset. And I did the following: choose the 25fps videousing syncnet_python to filter dataset in range [-1,1] And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

Curisan · 2023-01-30T06:48:54Z

Thanks, Also how many epoches do you use to drop loss below 0.3? @MingZJU

NikitaKononov · 2023-01-30T07:19:25Z

Thanks, Also how many epoches do you use to drop loss below 0.3? @MingZJU

It takes 750-1250 epochs to get there, depending on you data, optimizer and LR

Curisan · 2023-02-06T01:25:59Z

Thanks for your suggestion. I use Relu, BCELoss, and AdamW and the train loss can drop below to 0.3. But the eval loss is about 0.42 and the eval loss value seems to be increasing when train loss decrease. It seems to be overfitting. Now I use 32000 video to train the syncnet. Did you meet the problem? Can you give me some suggestion? @MingZJU @NikitaKononov

ThetaRgo · 2023-04-19T04:30:00Z

2. using syncnet_python to filter dataset in range [-1,1]

hello i was stoped at "using syncnet_python to filter dataset in range [-1,1]" ,can you paste your process code ?
my code is

def parse_args():
parser = argparse.ArgumentParser(description="SyncNet Python - Filter Videos")
parser.add_argument('--tmp_dir', type=str)
parser.add_argument('--reference', type=str)
parser.add_argument('--batch_size', type=int, default='20', help='');
parser.add_argument('--vshift', type=int, default='15', help='');
return parser.parse_args()

if name == "main":
args ={}
syncnet_instance = SyncNetInstance(args)

input_videos_folder = '/output/pyavi/'  # Your input videos directory
output_videos_folder = 'filtered_videos'  # The directory to save the filtered videos
if not os.path.exists(output_videos_folder):
    os.makedirs(output_videos_folder)
opt = parse_args()
flist = glob.glob(os.path.join(input_videos_folder, '*', '*.avi'))
print('flist', flist)

for video_file in flist:
    opt.tmp_dir = 'temp'
    opt.reference = os.path.basename(video_file).split('.')[0]
    # Evaluate synchronization
    results = syncnet_instance.evaluate(opt,videofile=video_file)
    print(results)
    confidence = results['conf']
    offset = results['offset']

    # Check if the video meets the filtering criteria
    if 6 <= confidence <= 9 and -3 <= offset <= 3:
        # Copy the video to the output folder
        shutil.copy(video_file, os.path.join(output_videos_folder, os.path.basename(video_file)))
    else:
        print('pass a bad video:',video_file)
print("Filtering complete!")

get an error can not soloved. RuntimeError: mat1 dim 1 must match mat2 dim 0, follow the issue,
i have exected run_pipeline.py but still get the same error 。

kike-0304 · 2023-09-10T13:27:25Z

谢谢。我使用 AVSpeech 数据集。我做了以下事情：

选择25fps视频

使用syncnet_python过滤范围[-1,1]内的数据集
，最后使用32000视频来训练syncnet。我还使用 bceloss 和 relu。但损失只能降至0.5。你能给我一些建议吗？你需要多长时间才能将损失降至 0.3？@MingZJU

Thanks. I use AVSpeech dataset. And I did the following:

choose the 25fps video

using syncnet_python to filter dataset in range [-1,1]
And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU

May I ask how you use syncnet_ Python batch filters datasets within the range of [-1,1], and I have thousands of videos

Yyyyyyxh · 2023-11-17T09:25:22Z

谢谢。我使用 AVSpeech 数据集。我做了以下事情：

选择25fps视频

使用syncnet_python过滤范围[-1,1]内的数据集
，最后使用32000视频来训练syncnet。我还使用 bceloss 和 relu。但损失只能降至0.5。你能给我一些建议吗？你需要多长时间才能将损失降至 0.3？@MingZJU

Thanks. I use AVSpeech dataset. And I did the following:

choose the 25fps video

using syncnet_python to filter dataset in range [-1,1]
And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU

May I ask how you use syncnet_ Python batch filters datasets within the range of [-1,1], and I have thousands of videos

请问您知道如何筛选数据集吗，如何使用syncnet_Python

tailangjun · 2023-12-06T17:02:29Z

谢谢。我使用 AVSpeech 数据集。我做了以下事情：

选择25fps视频

使用syncnet_python过滤范围[-1,1]内的数据集
，最后使用32000视频来训练syncnet。我还使用 bceloss 和 relu。但损失只能降至0.5。你能给我一些建议吗？你需要多长时间才能将损失降至 0.3？@MingZJU

Thanks. I use AVSpeech dataset. And I did the following:

choose the 25fps video

using syncnet_python to filter dataset in range [-1,1]
And finally use 32000 video to train the syncnet. I also use bceloss and relu. But the loss only can drop to 0.5. Can you give me some suggestion? And how long do you use to drop loss to 0.3? @MingZJU

May I ask how you use syncnet_ Python batch filters datasets within the range of [-1,1], and I have thousands of videos

请问您知道如何筛选数据集吗，如何使用syncnet_Python

我在linux上执行命令 bash calculate_scores_real_videos.sh you_folder_name，这样会在 syncnet_python目录下生成一个结果文件 all_scores.txt，关键代码是：offset, conf, dist = s.evaluate(opt,videofile=fname)

ghost closed this as completed Dec 27, 2021

ghost mentioned this issue May 27, 2022

Sample outputs #17

Closed

This issue was closed.

Training with new dataset #10

Training with new dataset #10

Comments

MingZJU commented Dec 25, 2021

ghost commented Dec 26, 2021

MingZJU commented Dec 27, 2021

MingZJU commented Mar 30, 2022

ghost commented Mar 30, 2022

ghost commented Mar 30, 2022

MingZJU commented Mar 31, 2022 • edited Loading

ghost commented Mar 31, 2022

MingZJU commented Mar 31, 2022

ghost commented Mar 31, 2022

MingZJU commented Mar 31, 2022

ghost commented Apr 1, 2022

MingZJU commented Apr 4, 2022

ghost commented Apr 4, 2022

ghost commented Apr 27, 2022

MingZJU commented Apr 28, 2022

ghost commented Apr 28, 2022

MingZJU commented Apr 29, 2022 • edited Loading

ghost commented May 4, 2022

ghost commented May 4, 2022

MingZJU commented May 5, 2022

ghost commented May 5, 2022

MingZJU commented May 6, 2022 • edited Loading

ghost commented May 27, 2022

NikitaKononov commented May 30, 2022

NikitaKononov commented May 30, 2022

ZestfulCitrus commented Jul 26, 2022

Curisan commented Jan 29, 2023

MingZJU commented Jan 29, 2023 via email

Curisan commented Jan 30, 2023

MingZJU commented Jan 30, 2023 via email

Curisan commented Jan 30, 2023

NikitaKononov commented Jan 30, 2023

Curisan commented Feb 6, 2023

ThetaRgo commented Apr 19, 2023 • edited Loading

kike-0304 commented Sep 10, 2023

Yyyyyyxh commented Nov 17, 2023

tailangjun commented Dec 6, 2023 • edited Loading

MingZJU commented Mar 31, 2022 •

edited

Loading

MingZJU commented Apr 29, 2022 •

edited

Loading

MingZJU commented May 6, 2022 •

edited

Loading

ThetaRgo commented Apr 19, 2023 •

edited

Loading

tailangjun commented Dec 6, 2023 •

edited

Loading