Are there supposed to be spectral gaps? And are the latents tiled during diffusion? #18
Replies: 1 comment 1 reply
-
Hi @enn-nafnlaus - good observations. The seed image was manually tiled to be seamless. The generations are all independent and there is no effort currently to make them tile and you can audibly hear the clicking in the web app when the audio transitions. It would be a nice win to fix this, and I consider it fairly high priority. On the inference side, one thing to do would be to generate a little extra beyond the desired loop, and then blend the spectrogram images before converting to audio. Another option would be to do inpainting for the seams. I would be interested in any options for generating tile-able spectrograms as well, I haven't looked into that. The web app currently makes a call independently per image, so if we did want to blend in the spectrogram space as opposed to audio, something would have to change. But the app could just fade audio if a little extra were generated. |
Beta Was this translation helpful? Give feedback.
-
So here's what the OG spectrogram looks like tiled - no gaps.
But when I tile the spectrograms I generate, it looks like this:
Obvious gaps at the start of each riff. Is this expected behavior? Are they not diffused as horizontally-tilable latents? Also, I can hear the (awkward) gaps when I concatenate the mp3 riffs too, so it's not just a graphical bug. Are you doing something to smooth them out in the webapp?
In fact, come to think of it, I'd expect not simply for the latents to be diffused in a tilable manner where the left side mirrors the right of the currentlatent, but where the left side mirrors the right side of the previous latent.
Anyway, just something I ran into while messing around with the riffs... I was wanting to try to align them with a beat detector (which I'm having trouble installing due to packaging errors with lame :Þ ). But if these were diffused as per the above, I'd expect the beats to auto-align.
Beta Was this translation helpful? Give feedback.
All reactions