Floating point exception (core dumped) #49

ZohaibAhmed · 2019-04-09T02:13:02Z

I tried to train the tacotron model you have on top of the LJ pretrained checkpoint you have. Just ran train_tacotron.py but when I run gen_tacotron.py, I get the following:

Initialising WaveRNN Model...

Trainable Parameters: 4.481M

Loading Weights: "checkpoints/lj.wavernn/latest_weights.pyt"


Initialising Tacotron Model...

Trainable Parameters: 11.078M

Loading Weights: "checkpoints/lj.tacotron/latest_weights.pyt"

+---------+----------+---+-----------------+----------------+-----------------+
| WaveRNN | Tacotron | r | Generation Mode | Target Samples | Overlap Samples |
+---------+----------+---+-----------------+----------------+-----------------+
|  804k   |   197k   | 1 |     Batched     |     11000      |       550       |
+---------+----------+---+-----------------+----------------+-----------------+
 

| Generating 1/6
Floating point exception (core dumped)

Any ideas on how I can go on debugging this?

The text was updated successfully, but these errors were encountered:

fatchord · 2019-04-09T07:31:32Z

@ZohaibAhmed Unfortunately I don't see the same error on my end - can you do me a small favor? If you have an IDE with breakpoints can you check which function is causing that in gen_tacotron.py (should be somewhere in the loop starting on line 91)?

If you don't have breakpoints you can just print('a', True), print('b', True) after each function in that loop to see what's throwing the error.

Thanks.

ZohaibAhmed · 2019-04-09T15:26:50Z

looks like the issue is on the vocoder generate function in fatchord_wavernn, specifically when it calls:

h1 = rnn1(x, h1)

Note, that just using the pretrained model out of the box seems to work. It's just when I train the model further, the error occurs.

More details about my setup:

ubuntu16.04
pytorch=1.0.0
cuda10.0
cudnn7.4.1_1
GPU: RTX 2080 Ti

fatchord · 2019-04-10T12:54:22Z

@ZohaibAhmed can I get the exact steps you went through to get that error? Have you tried training a fresh model for a couple of epochs and then tried generating?

Also is there no other error message besides "Floating point exception (core dumped)"?

ZohaibAhmed · 2019-04-12T03:05:46Z

@fatchord - training a model from scratch seems to work.

The exact steps I did were as follows:

take your pretrained models
get a different dataset, run preprocessor on that (the dataset is structured exactly like LJ)

Input File     : '100.wav'
Channels       : 1
Sample Rate    : 22050
Precision      : 16-bit
Duration       : 00:00:03.42 = 75411 samples ~ 256.5 CDDA sectors
File Size      : 151k
Bit Rate       : 353k
Sample Encoding: 16-bit Signed Integer PCM

Run train_tacotron.py for a bit.
Run gen_tacotron.py after the first checkpoint (i made it after 500 steps instead of the default).

And that's how I get to that error. Even if i keep the WaveRNN as the pretrained model, it still results in the Floating point exception (core dumped). Theres no other stack trace.

fatchord · 2019-04-12T13:09:06Z

@ZohaibAhmed can you try training LJ from scratch to see if you get the same error?

ZohaibAhmed · 2019-04-12T14:07:33Z

@fatchord training Tacotron from scratch makes it work. But I don't have enough data for my own dataset to effectively train the model.

Have you had any success with fine-tuning?

EDIT: the main issue seems to be that the decoder is producing all silent values

It looks like the shape of the output from the original pretrained model is different then when I train on top of it:

Original:
torch.Size([1, 80, 338])

Tuned:
torch.Size([1, 80, 1])

Looks like I hit the condition where if silent frames are present:

if (mel_frames < -3.8).all() : break

This is what the alignment plot looks like while training tacotron:

candlewill · 2019-04-14T04:09:09Z

@ZohaibAhmed I met the same error. The reason is that the first frame of mel_frames is all silence (< -3.8), which makes the tacotron output empty. You could fix that by using the following code:

if (mel_frames < -3.8).all() and i > 10 : break

fatchord · 2019-04-15T05:36:02Z

@candlewill Nice catch, I'll push a fix for that later today.

ZohaibAhmed · 2019-04-15T15:48:28Z

@candlewill - I still largely get silence (with some static). Did you try to train your model on top of the checkpoint that @fatchord provided? Or did you just train it from scratch?

fatchord · 2019-04-22T13:15:33Z

Tacotron has been updated to fix the premature stopping of generation.

fatchord closed this as completed Apr 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating point exception (core dumped) #49

Floating point exception (core dumped) #49

ZohaibAhmed commented Apr 9, 2019

fatchord commented Apr 9, 2019

ZohaibAhmed commented Apr 9, 2019 •

edited

Loading

fatchord commented Apr 10, 2019

ZohaibAhmed commented Apr 12, 2019

fatchord commented Apr 12, 2019

ZohaibAhmed commented Apr 12, 2019 •

edited

Loading

candlewill commented Apr 14, 2019 •

edited

Loading

fatchord commented Apr 15, 2019

ZohaibAhmed commented Apr 15, 2019

fatchord commented Apr 22, 2019

Floating point exception (core dumped) #49

Floating point exception (core dumped) #49

Comments

ZohaibAhmed commented Apr 9, 2019

fatchord commented Apr 9, 2019

ZohaibAhmed commented Apr 9, 2019 • edited Loading

fatchord commented Apr 10, 2019

ZohaibAhmed commented Apr 12, 2019

fatchord commented Apr 12, 2019

ZohaibAhmed commented Apr 12, 2019 • edited Loading

candlewill commented Apr 14, 2019 • edited Loading

fatchord commented Apr 15, 2019

ZohaibAhmed commented Apr 15, 2019

fatchord commented Apr 22, 2019

ZohaibAhmed commented Apr 9, 2019 •

edited

Loading

ZohaibAhmed commented Apr 12, 2019 •

edited

Loading

candlewill commented Apr 14, 2019 •

edited

Loading