Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating audio while training the provided default mode and default arguments #29

Open
mukul74 opened this issue Dec 21, 2021 · 3 comments

Comments

@mukul74
Copy link

mukul74 commented Dec 21, 2021

Hello, @relativeflux Thanks for reviving the SampleRNN in TensorFlow.
I have a question regarding the audio generation using a model trained on 1 audio of 8sec just for some inference and for validation, just one audio file.

--data_dir ./chunks --num_epochs 100 --batch_size 1 --max_checkpoints 1 --checkpoint_every 10 --output_file_dur 10 --sample_rate 11025

Audio Sampling_rate: 11025
I trained the model for around 40 epochs and while training and training accuracy comes out to 100% and validation accuracy is to be 4.132, as expected.
For Ref :
Epoch: 40/100, Step: 82/86, Loss: 0.000, Accuracy: 100.000, (0.440 sec/step)
Epoch: 40/100, Step: 83/86, Loss: 0.000, Accuracy: 100.000, (0.449 sec/step)
Epoch: 40/100, Step: 84/86, Loss: 0.000, Accuracy: 100.000, (0.438 sec/step)
Epoch: 40/100, Step: 85/86, Loss: 0.000, Accuracy: 100.000, (0.434 sec/step)
Epoch: 40/100, Step: 86/86, Loss: 0.000, Accuracy: 100.000, (0.437 sec/step)

Epoch: 40/100, Total Steps: 86, Loss: 0.000, Accuracy: 100.000, Val Loss: 13.038, Val Accuracy: 4.132 (1 min 0.427 sec)

But when I hear the generated audio using this checkpoint, I can hear only a small sequence of data and mostly corrupted by noise and nothing else. Generated audio sampled for 10 sec. But if I am not working due to overfitting, generated audio must provide exact training data as output or something very similar.

Just wanted to ask, am I doing something wrong or this is an expected result.

@relativeflux
Copy link
Member

@mukul74 Hi, and thanks for getting in touch about this. I'm actually not sure about this one, I've done quite a bit of experimentation with small datasets but nothing much with reducing the batch size. I'd be interested to see how this compares to the same thing carried out with, say, WaveNet. I'll run some tests.

@mukul74
Copy link
Author

mukul74 commented Dec 22, 2021

@relativeflux Hi, thanks for the reply. I guess I can share my observation on running a single audio file on the default model.

  1. Even after training for 40 epochs, the generated audio file was almost garbage, although the training accuracy was 100% and loss was 0.00.
  2. For Eg: Epoch: 40/100, Step: 86/86, Loss: 0.000, Accuracy: 100.000, (0.437 sec/step)
  3. But after Epoch 60/100, Step: 86/86, Loss: 0.000, Accuracy: 100.000, the generated file was very very similar to training data. As it should be.
  4. The only thing a little bit weird for me is why similar results are not generated during epoch 40.
  5. Anyways thanks for the working Tensorflow implementation, I have to realize the PyTorch flavor of this repo for my project and then extend that model for my research.

@relativeflux
Copy link
Member

@mukul74 Thank you so much for this, very useful information. I'll do some further investigation on this when I get back in the new year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants