Chasing down some strange artifacts in sndfile-spectrogram's output for real music, I've made a simple example that exhibits the same artifacts, visible at the start and end of a 1-second 60Hz tone burst
sndfile-spectrogram --no-border 60Hz.wav 400 8192 60Hz-kaiser.png
The following image is a 4x magnification of the bottom of the output
At normal screen resolution, these florets recede into visual imperceptibility but when applying an external log-frequency-axis wrapper to the output they are very visible at lower frequencies of the musical range.
I assume it is also this artifact that is causing the apparent tremolo in the constant tone.
Audacity on the same input file gives a dead-level line at the same FFT size and magnification.
Hypothesis: It's an artifact of our windowing function
Experiment: Try the same transform with --nuttall and see if it changes
Observation: With the other windowing functions, the output exhibits the same artifacts
Rejected: It's not the windowing function.
In the above examples, a 2-second sample is analysed into 400 width, giving 200 pixels per second. The tone is as 60Hz and the height of each floret is at 100Hz. It looks like the interference pattern depends on the pixels-per-second rate of the time axis.
Hypothesis: The height of each floret seems to be half the pixels-per-second value.
Experiment: Change the pixel per second rate by increasing the width of the graph. If the hypothesis is correct, the size of the florets will increase to twice the height of the 60Hz line.
sndfile-spectrogram --no-border 60Hz.wav 480 8192 60Hz-kaiser.png
Result: Here at 240pps the florets are 120Hz high.
Confirmed: interference pattern height is proportional to pixel-per-second rate.
I think these are pretty cool, and I've seen lots of spectrograms with artifacts like these.
Funny and pretty, yes.
I've compared with sox's output for what should be the same parameters.
Sox had arbitrary limits on output size which I have removed, but there are no signs of artifacts either in the burst or in the steady-state line. (It's also INCREDIBLY SLOW. Where sndfile-spectrogram finishes in 1.5 seconds, sox takes 2m27s which makes it 98 times slower!)
sox/src/sox 60Hz.wav -n spectrogram -r -x 400 -y 8192 -z 180 -w kaiser -o 60Hz-sox-400-8192-dyn180-kaiser.png
Happy to have you fix this if you can but unfortunatley I'm not able to offer much in the way of time and effort here. Currently way too busy with worked related stuff.
When the samples are read from the audio file to produce a column of pixels, the time point around which they are read is truncated to an integer value, the starting sample number. As the time distance between adjacent pixel columns is not necessarily an exact number of samples, this effectively introduces a time jitter of up to one sample period. We could mitigate this by interpolating between samples.
Hypothesis: Time jitter of up to one sample due to starting moment quantization is causing this effect.
Experiment: Interpolate linearly between samples to compensate for the time offset, and see if this improves the output.
Experimental patch: 2f57eb7
Observation: At width 400, the output files are identical because the error is always 0 . At width 413, the output files are not identical, but they look just the same.
sndfile-spectrogram --no-border 60Hz.wav 413 8192 60Hz-kaiser-413-8192.png
sndfile-spectrogram --no-border --precise 60Hz.wav 413 8192 60Hz-kaiser-413-8192-precise.png
Conclusion: The florets are not due to quantization of time to a sample boundary.
I wonder if these florets are due to the side lobes of the window functions. If you look at say the Nuttall window, https://en.wikipedia.org/wiki/Window_function#Blackman.E2.80.93Nuttall_window the largest side lobes are at about -98dB. If you choose the Nuttall window and a synamic range of 95dB do these florets disappear?
sndfile-spectrogram --dyn-range=95 --no-border --nuttall
60Hz.wav 413 8192 60Hz-nuttall-413-8192-dyn95.png
Nope. Still there.
Hypothesis: It's caused by the scaling of the Y axis from speclen to height
Rejected: speclen == height in the 8192-high examples, so no interpolation is done in them yet the effect persists.
Got it. The calculation of magnitudes from the real and complex values in FFTW's "half complex" output format was out by one in the complex part, so mag = sqrt(re * re + im * im) and so on. Here's detail of the first graph with comit 53de8b7
Oh, well done! Suppose this can be closed now.