-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matplotlib API change & NaNs for short clips & new hop_length #8
Comments
Hello. It's strange. Maybe they changed APIs in the latest versions. The version of
|
Thanks for your quick support. |
Training went (reproduceable) well until iteration 55. Then it's running into problems calculating loss stats.
Any idea on that? |
No, it is not connected to |
Thanks for your reply and sorry that i've not seen this existing helpful issue before. After reducing batchsize the "nan" problem occurs later:
Is there a better place for best practice config discussion than within this issue? |
@thorstenMueller, I am planning to push a new version of WaveGrad soon, which should be more robust to loss explosion problem. Please, check it in a few days. |
Thanks, sounds good. |
@thorstenMueller Hello, sorry for being a bit late. I have updated the repo. I believe it should be more robust to loss explosion issue now. |
Hey @ivanvovk . Thanks for the huge update 👍 . I’ve set up a training run with my available german dataset (https://github.com/thorstenMueller/deep-learning-german-tts/) and training is running for 1 day without stopping because of errors. But i could need some help in understanding it’s progress. Do you have an account on Mozilla discourse so we could discuss my questions there (https://discourse.mozilla.org/t/contributing-my-german-voice-for-tts/) to not blow up this „issue“? Following things are in my mind right now:
This is my used wavegrad config. Taco2 Training is based on "hop_length": 256 so i'll need to adjust "factors" in config. Currently wavegrad training has value auf hop_length = 300. Would be great if you can support me on this :-). |
Sorry, I have no account there. Write me on my mail iyuvovk@yandex.ru and we'll decide where to continue the discussion. |
Okay, i've sent you an email. |
Hey @ivanvovk .
Tensorboard was running while this error occurs. Is a running tb a problem? |
I have the same error when training vietnamese dataset.
My config:
|
@dodoproptit99 Hello. Okay, that seems like a problem of pytorch mixed-precision training. I've just pushed a small update to the repo, where I added support to turn it off. Please pull new version, disable fp16-training here and decrease batch size (I suggest 48). I suppose it should help. And please report if it helps or not. |
@ivanvovk Thanks for your reply! I try to decrease batch size to 48, 24 and disable fp16-training but i still got this error :( |
@dodoproptit99 what tensorboard output do you have? |
|
@dodoproptit99 this is really strange. Can you please run the following script? Put it to the root folder of WaveGrad and run |
This is my output: Can u tell me more about that? |
@dodoproptit99 Okay, I found the problem origin. Seems like your data contains audios of length less than |
@ivanvovk It work ^^ |
@dodoproptit99 glad to hear that! @thorstenMueller also check it out, probably it will solve your problem with NaNs too (if its still relevant). |
Thank's @ivanvovk for triggering me and updating code.
I'd like to try your tip, but i'm not sure how to do this:
|
@thorstenMueller oh, sorry, I got what's wrong. Besides upsampling factors you also need to update segment length, that should be divisible by hop length. Change |
Thanks @ivanvovk . |
Hey @ivanvovk . Next point will be checking if generated melspecs are compatible with mozilla TTS project. |
@thorstenMueller glad that it works and you're welcome! Closing this issue. |
I'm trying to run training on a nvidia xavier agx device running nvidia docker container based on these https://ngc.nvidia.com/catalog/containers/nvidia:l4t-pytorch instructions.
But i receive following error:
Initializing logger...
Initializing model...
Number of parameters: 15810401
Initializing optimizer, scheduler and losses...
Initializing data loaders...
Traceback (most recent call last):
File "train.py", line 185, in
run(config, args)
File "train.py", line 72, in run
logger.log_specs(0, specs)
File "/media/908f901d-e80b-4a8e-8a16-9e0f1b896732/TTS/thorsten-de/models/model-v02/WaveGrad/logger.py", line 53, in log_specs
self.add_image(key, plot_tensor_to_numpy(image), iteration, dataformats='HWC')
File "/media/908f901d-e80b-4a8e-8a16-9e0f1b896732/TTS/thorsten-de/models/model-v02/WaveGrad/utils.py", line 66, in plot_tensor_to_numpy
im = ax.imshow(tensor, aspect="auto", origin="bottom", interpolation='none', cmap='hot')
File "/usr/local/lib/python3.6/dist-packages/matplotlib/init.py", line 1438, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/usr/local/lib/python3.6/dist-packages/matplotlib/axes/_axes.py", line 5521, in imshow
resample=resample, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/matplotlib/image.py", line 905, in init
**kwargs
File "/usr/local/lib/python3.6/dist-packages/matplotlib/image.py", line 246, in init
cbook._check_in_list(["upper", "lower"], origin=origin)
File "/usr/local/lib/python3.6/dist-packages/matplotlib/cbook/init.py", line 2257, in _check_in_list
.format(v, k, ', '.join(map(repr, values))))
ValueError: 'bottom' is not a valid value for origin; supported values are 'upper', 'lower'
python3 -V: Python 3.6.9
pip3 -V: 20.2.3
Running pip3 list shows following installed packages:
I tried matplotlib (3.3.1) and 3.3.2 both with same result.
Any ideas what i miss?
Thank you.
The text was updated successfully, but these errors were encountered: