New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing the network on music datasets #104
Comments
I started training it on violin samples today as well, we'll see how it EDIT: I really need a GPU... |
I've made a little experiment. I trained the WaveNet model using just a single wav file (4 seconds piano wav - 16bits - Mono). As I guessed, the loss dropped in hardly 4k steps to ~0.060, but when I generated the audio (using as parameter 64000 samples for 4 seconds generated), the result was a little strange. You can see that the model is playing someting like a piano sound (and the tempo looks pretty similar), but I have still a noisy background that I guessed it will not be there because the loss was just 0.060. It's possible this noise to be caused by the resting cross-entropy error in the prediction? It's worth to note that I had to low the window of the sample in training time to 10.000 (SAMPLE_SIZE = 10000) because I have no GPU and I cannot wait 20 sec/step. The other parameters are the default ones that are in the master json file. This is the original wav file: https://soundcloud.com/samu-283712554/piano-base And this is the progression of the training: Could somebody with a good GPU repeat this experiment using the same base wav file (https://soundcloud.com/samu-283712554/piano-base) but with a SAMPLE_SIZE of 100k and others parameters? I don't know if I'm wrong but I think that if the model is right, this experiment should end quickly with a very smooth piano sound (without any noise), isn't it? Well, if somebody finally do this in a while I'll be very pleased :). Regards, I made another test with this. I generated the model but in this time a seeded the model with the original base wav file: This is the result: https://soundcloud.com/samu-283712554/piano-3500-seed (the 4 first seconds are the original wav file seeded, and then there are the 4 seconds generated using the WaveNet trained -4k steps- model). |
@Zeta36: I don't think there's an easy way to download your audio file from soundcloud. |
I'm sorry, @ibab . Here you have it: https://github.com/Zeta36/tensorflow-wavenet/blob/master/test/piano_base.wav |
this is a simple change to
|
@ibab , don't you think I should have got a clean and smooth wav after reaching a loss of only 0.060 in the test I commented above? Did you tried it by yourself? |
@Zeta36: Can't use the GPUs at the moment, as they're needed for other things. If you look at #47, several people have posted very clean sounding samples, so it appears that this is something that gets better with more training samples and a substantially longer training period. Also, the network configuration that you've used could be relevant to this. The results in #47 were achieved with substantially deeper networks than the one in the default Finally, there's also the possibility that things are better when using the waveform directly as an input to the network instead of one-hot encoding it first, which is implemented in #106. |
"That means that if one of the waveform samples it created deviates from what it received during training, the output will quickly leave the region of sample space that the network is familiar with." Yes, that's true. Thank you for the comments. |
Here's a 5-second sample that is kind of not completely terrible (first second is seed): https://soundcloud.com/maxhodak/tycho-wavenet-version-1/s-YHbBX trained to 20k steps, loss around ~2.7, wav files of tycho's whole discography (individual tracks per file)
I tried training that out to 40k steps and it actually sounded a lot worse. Smoothed loss was pretty flat between 2.6 - 2.8 between 20k and 40k steps. Trying to understand why subjectively it sounded much worse at 40k. |
@maxhodak I love Tycho! :) |
@maxhodak nice! the drums sound spot on... |
This could be a result of the network overfitting the dataset. |
Quick note -- for an online listening + easy download with no upload limit El dt., 4 oct. 2016 a les 4:41, Igor Babuschkin (notifications@github.com)
|
@lemonzi: I wasn't aware of Freesound, thanks for pointing it out! |
Just another music test to share… I think this is pretty nice/interesting! https://soundcloud.com/robinsloan/sets/tensorflow-wavenet-temperature-demo Model trained on this album to a loss of ~2.8 with these params: {
"filter_width": 2,
"sample_rate": 8000,
"dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
1, 2, 4, 8, 16, 32, 64],
"residual_channels": 32,
"dilation_channels": 32,
"quantization_channels": 256,
"skip_channels": 1024,
"use_biases": false
} |
@robinsloan sounds very nice! i am also sharing some early results, though mine are much less coherent. this is a sample generated on 2.5 hours of opera (Paisiello's Il Barbiere di Siviglia / Barber of Seville), with minimal pre-processing: https://soundcloud.com/genekogan/il-barbiere-di-siviglia-wavenet this was generated after 25k steps with the following parameters and learning rate 0.001, loss around ~2.6.
it's mostly noise/gibberish, but to my ears, there is some coherent operatic material in there. i hear a sort of orchestra drone (sounds kind of like a warm-up/tuning) along with some faint baritone voices. right now it sounds more like granular synthesis than something coherent. i know i was too ambitious trying to train it on too many different sections, so i've started working on a script that uses librosa's analytical tools to narrow down a folder of a music to a more homogenous subset of it. i need to clean it up a bit and will post it as an ipython notebook shortly. trying to get a sense of ways i can improve it. the obvious one i mentioned: limit to a smaller, more uniform set of audio. parametrically, having more dilations perhaps, lowering the sample rate, etc. |
@genekogan, I like that a lot! There are some really interesting things happening in that clip. I mean it's almost like, forget simulation; on its own merits that's a novel, compelling sound. More, more! Corpus selection is interesting. I went with SK Kakraba's gyil music because it's (a) stylistically pretty uniform, and (b) naturally quite noisy -- both of which seemed useful in this context. |
I've been training on the MagnaTagATune dataset with clips that are tagged as solo piano. https://soundcloud.com/evan-dunn-676478257/sets/magnatagatune-solo-piano I am training with batches of 4 with a loss fluctuating between ~1.7 to ~2.4. The first second of audio is seeded.
|
@dunnevan nice! What optimizer did you use? I noticed you run 400000 steps, have you adjusted learning rate during training? |
@lmaxwell I'm using adam. Started learning rate at 1e-3 and L2 at 1e-4 for the first 80000 steps then moved to 1e-4 and 1e-5. I just recently pushed it down another order of magnitude to see how it helps. My feeling is that feeding it more data is the most important thing, even 400000 steps, 4 buckets at 8192 sampling is only about 3 minutes of trained audio. |
@dunnevan |
I've started to play around with the MagnaTagATune dataset.
There's a small change that needs to be made to the code when training on this dataset:
Because it uses mp3 instead of wav, the pattern in
wavenet/audio_reader.py
needs to be adjusted.It would be nice to write a
MagnaReader
class that inherits from theAudioReader
(or contains one), and that's able to filter the content by genre using the provided metadata.The text was updated successfully, but these errors were encountered: