Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio sync issues on long videos #37

Closed
veltman opened this issue Aug 9, 2016 · 7 comments
Closed

Audio sync issues on long videos #37

veltman opened this issue Aug 9, 2016 · 7 comments

Comments

@veltman
Copy link
Contributor

veltman commented Aug 9, 2016

A few reports of audio getting gradually out of sync with the waveform in a long video.

It doesn't seem like the number of frames generated is wrong, so either:

-The detected waveform data is getting offset by a little bit (a problem with waveform or the math to split up the samples)
-FFmpeg's frame spacing is getting offset somehow

Things to investigate:
-Audio file rounding
-Setting the framerate on inputs vs. output
-Disabling -shortest
-waveforming a long alternating tone file and confirming that the frames line up
-Specifying -aframes

@veltman
Copy link
Contributor Author

veltman commented Aug 10, 2016

After testing out some contrived audio + frames with FFmpeg, the sync seems fine even for very long audio, which leads me to believe the problem is with waveform or the math of splitting up the samples. I'm not sure how it picks samples when the number of samples to pull isn't an even factor of the total number of samples, which could explain some of the drift.

Going to try extracting the raw PCM data and manually getting the samples to see whether that fixes the alignment.

@veltman
Copy link
Contributor Author

veltman commented Aug 10, 2016

Another test confirms the drift is coming from waveform - even with a number of samples that divides evenly, it seems to stretch a bit, where after a few minutes the lag is 10 - 20 frames.

@veltman
Copy link
Contributor Author

veltman commented Aug 10, 2016

Tried this with raw PCM data, skipping waveform entirely. Will take some testing for mono/stereo, etc. but it looks promising. There's still a little bit of drift depending on how the rounding/sample rate, but much less. At 20 fps, waveform is drifting by about 1.5s after 5 minutes, whereas this drifts by about 0.1s, and we might be able to reduce that a bit further with smart frame math.

@markusvoelter
Copy link

0.1s after a few minutes would certainly be perfectly ok, that's much better than what we have now. I don't think that further optimization is required beyond that.

@veltman
Copy link
Contributor Author

veltman commented Aug 11, 2016

The pcm branch takes a crack at this by removing the waveform library entirely and replacing it with a bespoke PCM data sampler that segments samples piped from FFmpeg. The alignment can still be off by up to 1 frame total, since it doesn't try to reallocate slop evenly across frames, but lag shouldn't accumulate over time.

A few notes:

  • Removing the waveform dependency makes the installation a lot simpler and makes Windows native installation possible (last I checked, libgroove was the holdup). Need to test the updated installation in various environments.
  • The vertical positioning puts the zero baseline in the middle, where it currently is, but now that it's assymetrical, that leaves some unused space. It might be preferable to always scale it so that the minimum value is at waveBottom and the maximum is at waveTop, even if that means the baseline will vary from video to video.
  • Waveform data is now being pulled as -1 to 1 instead of scaled to positive and mirrored, so the shape is somewhat different. It would be easy enough to rescale to match the old look but that may or may not be desirable.
  • For now, the bricks/equalizer patterns are using only the positive peak for each point, but it may make more sense to take the max of the two absolute values or something.
  • Also need to benchmark how long the new waveforming takes - it's probably still a small fraction compared to the frame rendering, but it's somewhat slower than the old method. It would potentially be possible to segment the samples into frames as they're coming into the stream instead of at the end, which would be a lot more efficient.

@veltman
Copy link
Contributor Author

veltman commented Aug 23, 2016

Related to the vertical positioning above, we'll likely want to switch to a logarithmic scale to better fill the space and translate back into perceived loudness (which libgroove was handling before).

@veltman
Copy link
Contributor Author

veltman commented Oct 12, 2016

This has been merged to master.

@veltman veltman closed this as completed Oct 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants