Questions about chunks #16

wangshiyao-1119 · 2021-04-15T02:13:02Z

Hello, thank you very much for your code.
I have a question.
Why do I need to split the .wav file into chunks? Is the chunk file the training set?
Which training set is more appropriate? Or do you have a trained model for reference?
Thank you very much！

relativeflux · 2021-04-15T12:14:26Z

Hi @wangshiyao-1119, thanks for your interest in the project. The folder containing the chunks is indeed the training set, but it also contains the validation set - which by default is 10% of the total (the entire dataset is randomly shuffled first, then split into the two sub-datasets. The shuffle is guaranteed to be the same distribution if you resume the same training after terminating).

We will be releasing some generated output from the Beethoven 32 Piano Sonatas dataset in the next few days - I will add a link to the README when ready. You can create your own dataset from archive.org. I concatenated all 32 sonatas together, then split into 8 second chunks (no overlap). I used the same hparams as in the original 2017 paper. if i recall correctly 1 rnn layer, frame sizes of 2 & 8, dimensionality of 1024 and seq len of 512. No skip connections because that that only applies with more than 1 rnn layer. Batch size 128. Trained on 1 NVIDIA GFORCE RTX 3080, c.20 min per epoch I believe.

The generted output from the Beethoven is quite impressive, but any large-enough dataset should be able to produce something interesting. It likes audio that is sonically fairly uniform, however - which is why the Beethoven works well. For the Beethoven the dataset was about 5000 chunks in size (no overlap).

wangshiyao-1119 · 2021-04-16T01:24:50Z

Hi @wangshiyao-1119, thanks for your interest in the project. The folder containing the chunks is indeed the training set, but it also contains the validation set - which by default is 10% of the total (the entire dataset is randomly shuffled first, then split into the two sub-datasets. The shuffle is guaranteed to be the same distribution if you resume the same training after terminating).

We will be releasing some generated output from the Beethoven 32 Piano Sonatas dataset in the next few days - I will add a link to the README when ready. You can create your own dataset from archive.org. I concatenated all 32 sonatas together, then split into 8 second chunks (no overlap). I used the same hparams as in the original 2017 paper. if i recall correctly 1 rnn layer, frame sizes of 2 & 8, dimensionality of 1024 and seq len of 512. No skip connections because that that only applies with more than 1 rnn layer. Batch size 128. Trained on 1 NVIDIA GFORCE RTX 3080, c.20 min per epoch I believe.

The generted output from the Beethoven is quite impressive, but any large-enough dataset should be able to produce something interesting. It likes audio that is sonically fairly uniform, however - which is why the Beethoven works well. For the Beethoven the dataset was about 5000 chunks in size (no overlap).

Thank you very much for your answer! I have started training with my own data set and look forward to the results!

wangshiyao-1119 · 2021-04-16T07:33:09Z

Hi @wangshiyao-1119, thanks for your interest in the project. The folder containing the chunks is indeed the training set, but it also contains the validation set - which by default is 10% of the total (the entire dataset is randomly shuffled first, then split into the two sub-datasets. The shuffle is guaranteed to be the same distribution if you resume the same training after terminating).

We will be releasing some generated output from the Beethoven 32 Piano Sonatas dataset in the next few days - I will add a link to the README when ready. You can create your own dataset from archive.org. I concatenated all 32 sonatas together, then split into 8 second chunks (no overlap). I used the same hparams as in the original 2017 paper. if i recall correctly 1 rnn layer, frame sizes of 2 & 8, dimensionality of 1024 and seq len of 512. No skip connections because that that only applies with more than 1 rnn layer. Batch size 128. Trained on 1 NVIDIA GFORCE RTX 3080, c.20 min per epoch I believe.

The generted output from the Beethoven is quite impressive, but any large-enough dataset should be able to produce something interesting. It likes audio that is sonically fairly uniform, however - which is why the Beethoven works well. For the Beethoven the dataset was about 5000 chunks in size (no overlap).

Hello, I have one more question. Can this project only run on the CPU? I am currently running very slowly on the CPU. Can the program run on the GPU? (Except for hyperparameter optimization)

relativeflux · 2021-04-16T10:00:51Z

Hello, I have one more question. Can this project only run on the CPU? I am currently running very slowly on the CPU. Can the program run on the GPU? (Except for hyperparameter optimization)

Oh yes, of course! I wouldn'r bother runing it on the CPU at all. The Beethoven was trained on one of the latest NVIDIA GPUs, in the new Ampere generation. Have you tried running it on the provided Colab notebook? See the README for the link.

wangshiyao-1119 · 2021-04-19T07:21:50Z

Hello, I have one more question. Can this project only run on the CPU? I am currently running very slowly on the CPU. Can the program run on the GPU? (Except for hyperparameter optimization)

Oh yes, of course! I wouldn'r bother runing it on the CPU at all. The Beethoven was trained on one of the latest NVIDIA GPUs, in the new Ampere generation. Have you tried running it on the provided Colab notebook? See the README for the link.

Thanks again for the reply! Can the program run on my own GPU? I have installed CUDA 10.0. But when I run training and generate scripts, the program automatically uses the CPU.

relativeflux · 2021-04-19T12:19:39Z

TensorFlow should automatically use your GPU, if you have one. If you run Python (from within a conda environment, highly recommended) and import TensorFlow, what does the following give you:

tf.config.list_physical_devices('GPU')

You might see something like:

Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory

That means TensorFlow can't find the relevant dynamic libraries.

Unfortunately the training script silences TensorFlow warnings, which is perhaps not very helpful. To view what TF is telling you uncomment the following lines at the top of the script as follows:

#os.environ["KMP_AFFINITY"] = "noverbose"
#os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import tensorflow as tf
#tf.get_logger().setLevel('ERROR')
#tf.autograph.set_verbosity(3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about chunks #16

Questions about chunks #16

wangshiyao-1119 commented Apr 15, 2021

relativeflux commented Apr 15, 2021 •

edited

wangshiyao-1119 commented Apr 16, 2021

wangshiyao-1119 commented Apr 16, 2021

relativeflux commented Apr 16, 2021

wangshiyao-1119 commented Apr 19, 2021

relativeflux commented Apr 19, 2021 •

edited

Questions about chunks #16

Questions about chunks #16

Comments

wangshiyao-1119 commented Apr 15, 2021

relativeflux commented Apr 15, 2021 • edited

wangshiyao-1119 commented Apr 16, 2021

wangshiyao-1119 commented Apr 16, 2021

relativeflux commented Apr 16, 2021

wangshiyao-1119 commented Apr 19, 2021

relativeflux commented Apr 19, 2021 • edited

relativeflux commented Apr 15, 2021 •

edited

relativeflux commented Apr 19, 2021 •

edited