Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix the noise during inference time? #23

Closed
xinghua-qu opened this issue Jul 27, 2021 · 5 comments
Closed

How to fix the noise during inference time? #23

xinghua-qu opened this issue Jul 27, 2021 · 5 comments

Comments

@xinghua-qu
Copy link

xinghua-qu commented Jul 27, 2021

Hi Jaehyeon,

May I ask how to fix the stochastic noise during inference time? I want some generated audio to be reproducable, thus need to fix the random noise part.
Currently it seems I can only control the noise scale.

sid = torch.LongTensor([1]) # speaker identity
stn_tst = get_text("Tell me the answer please", hps_ms)

with torch.no_grad():
    x_tst = stn_tst.unsqueeze(0)
    x_tst_lengths = torch.LongTensor([stn_tst.size(0)])
    audio = net_g_ms.infer(x_tst, x_tst_lengths, sid = sid, noise_scale=1, noise_scale_w=2, length_scale=1)[0][0,0].data.float().numpy()
ipd.display(ipd.Audio(audio, rate=hps_ms.data.sampling_rate))
@BridgetteSong
Copy link

always using same input in mode.py infer function:
z_p = m_p + torch.randn_like(m_p) * torch.exp(logs_p) * noise_scale
you can replace torch.randn_like with same fixed input:

  1. print torch.randn_like(m_p) and save
  2. always use results of 1 as input tensor

@xinghua-qu
Copy link
Author

xinghua-qu commented Jul 27, 2021

always using same input in mode.py infer function:
z_p = m_p + torch.randn_like(m_p) * torch.exp(logs_p) * noise_scale
you can replace torch.randn_like with same fixed input:

  1. print torch.randn_like(m_p) and save
  2. always use results of 1 as input tensor

Thanks for the reply.
But it seems the dimension of m_p varies every time. In this case, it's impossible to replace torch.randn_like(m_p) as a constant tensor.
You can see the dimension of m_p that I print out several times.

torch.Size([1, 192, 124])
torch.Size([1, 192, 126])
torch.Size([1, 192, 123])
torch.Size([1, 192, 125])
torch.Size([1, 192, 124])

To my understanding, setting the hyperparameter noise_scale_w to be zero can enable the dimension of m_p to be a constant value. But if noise_scale_w is not equal to 0, is there any way to reproduce the same generated audio?

@BridgetteSong
Copy link

Because you use a StochasticDurationPredictor for duration prediction and it also contains a sample process like this
e_q = torch.randn(w.size(0), 2, w.size(2)).to(device=x.device, dtype=x.dtype) * x_mask
you should also fixed this like above.

@CookiePPP
Copy link

Just set the seed in torch when you input your data. Use branch rng if you want the other components to be unaffected by set seed.
It will always give you the same output from the same input.

@CookiePPP
Copy link

CookiePPP commented Jul 27, 2021

with torch.random.fork_rn:

https://pytorch.org/docs/stable/random.html

And

torch.manual_seed(0)

https://pytorch.org/docs/stable/notes/randomness.html

When you call the model forward should be enough for reproduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants