Can we train with this yet? #10

EmElleE · 2021-08-05T05:31:27Z

Just wondering if we can train with LJS on this implementation thanks!

keonlee9420 · 2021-08-06T01:12:09Z

Hi @EmElleE , yes you could but need to tune hparam for residual encoder and it is really close to.

ArEnSc · 2021-08-10T02:31:33Z

@keonlee9420 quick question do you have the LJS Model? I would like to finetune on this, do you know how much data is required for fine tuning? also is the quality close to tacotron2? it seems like these days people use tacotron2 because it works well cloning voices. Do you think Parallel-Tacotron2 is similar or capable ?

keonlee9420 · 2021-08-11T03:27:19Z

Hi @ArEnSc , I don't have it yet, but I'll share when I get it. But please note that the result would be much worse than expected since the maximum batch is too small compared to the original paper.

huypl53 · 2021-08-15T12:27:34Z

Take a look at this:

speaker_embedding_m = speaker_embedding.unsqueeze(1).expand(
    -1, max_mel_len, -1
)

position_enc = self.position_enc[
    :, :max_mel_len, :
].expand(batch_size, -1, -1)

enc_input = torch.cat([position_enc, speaker_embedding_m, mel], dim=-1)

speaker_embedding_m and mel both have max_mel_len in channel-1, but position_enc has max_seq_len+1 which is different. Therefore torch.cat will raise exception
Am I right?

keonlee9420 · 2021-08-17T01:19:39Z

Hi @phamlehuy53 , position_enc also has max_seq_len in that dimension.

huypl53 · 2021-08-17T01:59:37Z

Hi @phamlehuy53 , position_enc also has max_seq_len in that dimension.

But you notice that speaking_embedding_m and mel have max_mel_len instead, don't you?

keonlee9420 · 2021-08-17T02:04:05Z

oh, sorry I mistyped. position_enc has max_mel_len, not max_seq_len.

position_enc = self.position_enc[
    :, :max_mel_len, :
].expand(batch_size, -1, -1)

huypl53 · 2021-08-17T02:59:16Z

oh, sorry I mistyped. position_enc has max_mel_len, not max_seq_len.
position_enc = self.position_enc[
    :, :max_mel_len, :
].expand(batch_size, -1, -1)

Yep, when max_mel_len is higher than max_seq_len, the 1st dim of position_enc is sitll max_seq_len in length. That makes mismatch of dim in torch.cat's arguments
I'm sorry for missing this info in first question. Tks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we train with this yet? #10

Can we train with this yet? #10

EmElleE commented Aug 5, 2021

keonlee9420 commented Aug 6, 2021

ArEnSc commented Aug 10, 2021

keonlee9420 commented Aug 11, 2021

huypl53 commented Aug 15, 2021

keonlee9420 commented Aug 17, 2021 •

edited

huypl53 commented Aug 17, 2021

keonlee9420 commented Aug 17, 2021 •

edited

huypl53 commented Aug 17, 2021 •

edited

Can we train with this yet? #10

Can we train with this yet? #10

Comments

EmElleE commented Aug 5, 2021

keonlee9420 commented Aug 6, 2021

ArEnSc commented Aug 10, 2021

keonlee9420 commented Aug 11, 2021

huypl53 commented Aug 15, 2021

keonlee9420 commented Aug 17, 2021 • edited

huypl53 commented Aug 17, 2021

keonlee9420 commented Aug 17, 2021 • edited

huypl53 commented Aug 17, 2021 • edited

keonlee9420 commented Aug 17, 2021 •

edited

keonlee9420 commented Aug 17, 2021 •

edited

huypl53 commented Aug 17, 2021 •

edited