Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify the "first sound" #4773

Merged
merged 30 commits into from
Dec 13, 2022
Merged

Verify the "first sound" #4773

merged 30 commits into from
Dec 13, 2022

Conversation

daschuer
Copy link
Member

Currently only a warning is printed to mixxx.log

If the first sound has moved, the waveform, beat grid, and all other track annotations are likely not valid anymore. The user should be informed about it to mix the track carefully by ear only.

Currently the only fix is to re-analyze the track and adjust all annotations.
In case everything is moved by an equal offset, Mixxx might be in future able to adjust it without bothering the user, but this is difficult because of edge cases.

The situation can happen if the track file was replaced or has been edited. It also happens when the file is decoded with a different or updated version of the decoder.

This PR is a prerequisite if we want to replace all our sound sources with ffmpeg. With ffmpeg we no longer have control which encoder is used. Offset issues or other encoder issues my happen with any update without a notice.

The next step is to de-duplicate the threshold logic and consider a suitable user feedback.
I am considering a static text overlay on top of the waveform.
We may also open a pop up when this happens the fist time in a Mixxx run, because ALL tracks might be affected if the encoder has changed.

What do you think?

Copy link
Contributor

@uklotzde uklotzde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this an ugly hack.

A more sustainable solution would re-analyze mixxx::CueType::AudibleSound (only the start position) on every track load and compare the old and new positions. The calculated offset (if it exceeds a predefined minimum threshold) could then be used to auto-adjust all other positions.

The user does not need to be bothered. Only if the offset exceeds a predefined maximum, e.g. 100 ms or more. The auto-adjust should ignore those outliers.

@daschuer
Copy link
Member Author

Let's go in small steps!

As I already wrote, this is only the first step.
This is a very fast check, that comes with almost no CPU and no extra HD overhead.

When this fails, we may lock the track for playing, until the silence detector has been run again and calculate an offset.

I really don't want to run the analyzer on every track load. This will introduce nearly the double of HD action during a gig, for an issue that happens only due rare casionall events like a changed detector or a messed up track due to an external tool.

Is this acceptable?

@Holzhaus
Copy link
Member

Holzhaus commented May 29, 2022

Is the first sound detector really that heavy? I'd believe it would be pretty cheap. Isn't the verification code in this PR basically a copy of the first sound detector?

I think as a first step it's sufficient to print it to the terminal, but I agree with @uklotzde should be that we should strive for a solution that works without user interaction.

@uklotzde
Copy link
Contributor

Is this acceptable?

I don't think that we should start moving in the wrong direction. This code differs substantially from what is desirable.

If we decide to merge it, it should be better isolated into functions with accompanying comments.

@daschuer
Copy link
Member Author

daschuer commented May 29, 2022

@Holzhaus

Is the first sound detector really that heavy? I'd believe it would be pretty cheap. Isn't the verification code in this PR basically a copy of the first sound detector?

The AnalyzerSilence decodes the whole track and compares every single sample with the threshold in the dedicated analyzer thread. Currently the Analyzer is only running when preparing new tracks. It is not started when loading prepared tracks during a gig.

This verification code compares only four Samples to check if the First Sound Position is still in place, that's all. It does when the caching reader reads that particular chunk anyway and only once after track load. So this simple check is really light weight, compared to using AnalyzerSilence.

I agree with @uklotzde should be that we should strive for a solution that works without user interaction.

I also agree to that.

@uklotzde

I don't think that we should start moving in the wrong direction. This code differs substantially from what is desirable.

Any solution that solves the issue needs to be started can be triggered by this light weight check. So I consider this a part of it.

@uklotzde
Copy link
Contributor

In my opinion this code does not belong into CachingReader. Implementing a pseudo-analyzer or a pseudo-verifier for AnalyzerSilence in CachingReader is wrong and only increases the technical debt.

@daschuer
Copy link
Member Author

I am preparing a solution that moves the check into a common function of AnalyzerSilence

@daschuer
Copy link
Member Author

Done

/// -60 dB or -1 to start with the following index.
static SINT findLastSound(const CSAMPLE* pIn, SINT iLen, SINT signalStart);

/// Returns true if the first sound if found at the given frame and logs a warning message if not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not belong to the analyzer. It must not care about beat grids and such.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about beat grid, it is just a verify function for the data it has formally analyzed. That's it.

But anyway, if you have an idea for improvement, please propose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message mentions beat grids. This clearly shows why this does not belong to the analyzer. Interpreting the results in a higher-level context does not belong in here.

The whole function could be implemented in terms of the public API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rephrase the warning message to mention only the first sound. But that is not relay helpful, because the message itself is correct.
I can imagine to send an signal form the caching reader worker thread that to the GUI thread. Than we can receive the signal for instance in the player class and write the log there. But this feels over-engineered just because the message wording. I would like to postpone the GUI interaction to a future PR, and continue in small steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placement is wrong, whatever the exact wording of the log message is. I just tried to explain how you could discover such software design flaws by applying some basic reasoning.

SINT AnalyzerSilence::findLastSound(const CSAMPLE* pIn, SINT iLen, SINT firstSound) {
DEBUG_ASSERT(firstSound >= -1);
SINT lastSound = firstSound;
for (SINT i = firstSound + 1; i < iLen; ++i) {
Copy link
Contributor

@uklotzde uklotzde Jun 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After splitting the functions findLastSound() could now start searching from the end of the buffer in reverse direction and probably exit early.

Why is firstSound passed as a parameter instead of adjusting pIn and iLen when calling this function? This function should do the same as findFirstSound(), just in the opposite direction. Then both functions do not need to know about each other and the DEBUG_ASSERT becomes obsolete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After splitting the functions findLastSound() could now start searching from the end of the buffer in reverse direction and probably exit early.

I did actually consider this as well, but it did not work, because the analyzer is called in chunks and we cannot go backward to a previous chunk.

Why is firstSound passed as a parameter instead of adjusting pIn and iLen

Without it we need to add it to pIn and substract it from size and and add it later again. It felt more save to not mess around with the pIn pointer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the functions to findFirstSoundInChunk()/findFirstSoundInChunk() and my arguments will become clearer. The caller is the one decides who decides what it considers a "chunk". A "chunk" should always be a pointer and a length. It keeps the arguments of the functions aligned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also doesn't matter in which order the chunks are processed. Individual chunks are still ideally processed in front-to-back (first) or back-to-front (last) order.

The last sound's position is the maximum of the last sound positions from each chunk. You can even skip processing of whole chunks depending on the min/max position so far.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "chunk" should always be a pointer and a length. It keeps the arguments of the functions aligned.

sidenote: perfect usecase for std::/gsl::span

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With iterators the functions could even be combined into one. I just didn't want to open Pandora's box in this review ;)

}

// This can happen in case of track edits or replacements, changed encoders or encoding issues.
qWarning() << "First sound has been moved! The beatgrid and "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzerSilence cannot decide why this might be a problem in a different context. It just knows about some threshold and how to find the first and last sound in a given buffer depending on the threshold. That's it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, but this is just a static message that gives the user more context in case of hitting such issue.
Once we have a GUI representation we can move writing the log entry elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't belong here. Inline the code in CachingReaderWorker if this is the out most opportunity so far. Or place it in a function in an anonymous namespace there. But not here.

SINT AnalyzerSilence::findLastSound(const CSAMPLE* pIn, SINT iLen, SINT firstSound) {
DEBUG_ASSERT(firstSound >= -1);
SINT lastSound = firstSound;
for (SINT i = firstSound + 1; i < iLen; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the functions to findFirstSoundInChunk()/findFirstSoundInChunk() and my arguments will become clearer. The caller is the one decides who decides what it considers a "chunk". A "chunk" should always be a pointer and a length. It keeps the arguments of the functions aligned.

SINT AnalyzerSilence::findLastSound(const CSAMPLE* pIn, SINT iLen, SINT firstSound) {
DEBUG_ASSERT(firstSound >= -1);
SINT lastSound = firstSound;
for (SINT i = firstSound + 1; i < iLen; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also doesn't matter in which order the chunks are processed. Individual chunks are still ideally processed in front-to-back (first) or back-to-front (last) order.

The last sound's position is the maximum of the last sound positions from each chunk. You can even skip processing of whole chunks depending on the min/max position so far.

}

// This can happen in case of track edits or replacements, changed encoders or encoding issues.
qWarning() << "First sound has been moved! The beatgrid and "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't belong here. Inline the code in CachingReaderWorker if this is the out most opportunity so far. Or place it in a function in an anonymous namespace there. But not here.

/// -60 dB or -1 to start with the following index.
static SINT findLastSound(const CSAMPLE* pIn, SINT iLen, SINT signalStart);

/// Returns true if the first sound if found at the given frame and logs a warning message if not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placement is wrong, whatever the exact wording of the log message is. I just tried to explain how you could discover such software design flaws by applying some basic reasoning.

@daschuer
Copy link
Member Author

daschuer commented Jun 5, 2022

Done.

}
}
return i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results of these functions are now inconsistent. findFirstSoundInChunk returns an out-of-range index when not found and findLastSoundInChunk returns the first index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's because of going backwards. Is this a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both return an out of bounds index, by the way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. findLastSoundInChunk returns 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And -1 if the buffer size was 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

static SINT findFirstSoundInChunk(const CSAMPLE* pIn, SINT iLen);

/// returns the index of the last sample in the buffer that is above -60 dB
/// or -1 if no sample is found. signalStart can be set to a known sample above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is correct, the code was wrong. I will amend the last commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter signalStart does not exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right.

@daschuer
Copy link
Member Author

This is a related bug: https://bugs.launchpad.net/mixxx/+bug/1981726

@daschuer
Copy link
Member Author

Done

@JoergAtGithub
Copy link
Member

Slightly o.t., but I wonder, if the imported data from Serato, Rekordbox or Traktor contain a similar information? Isn't this needed for any import?

@daschuer
Copy link
Member Author

Slightly o.t., but I wonder, if the imported data from Serato, Rekordbox or Traktor contain a similar information? Isn't this needed for any import?

We have offset correction code in place for imports. This would do the wrong thing after the offset of the playback in Mixxx itself changes changes. I am not aware of test data we can use, because any data will be affected by a shift.

In the current state, this one verifies whether the cue points are still good after updating Mixxx or the OS. (We have a pending issue when upgrading to Windows 11). This happens without user awareness. When you import tracks from a third party App, it is an action on demand, where you need to verify the ipmorted data.

Copy link
Member

@Swiftb0y Swiftb0y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also look at any remaining comments of my previous review.

src/analyzer/analyzergain.cpp Outdated Show resolved Hide resolved
src/analyzer/analyzersilence.h Show resolved Hide resolved
src/analyzer/analyzersilence.h Outdated Show resolved Hide resolved
src/analyzer/analyzersilence.cpp Outdated Show resolved Hide resolved
@daschuer
Copy link
Member Author

Ready.

Copy link
Member

@Swiftb0y Swiftb0y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do you have concrete plans on how to move the verification out of the caching chunk reader?

@daschuer
Copy link
Member Author

My plan is to do it when turning this int s real offset detection with compensation. Than we need more measures to destinguish simple offset from an edited or replaced tack. This can go to a new facility, along with the delayed detection of the real track length, which hangs also between the lines.

The next step for now is to add a visible indicator for outdated analysis data and cue points. I am curious if the Windows 11 bug is able to trigger this check and if ffmpeg is compatible to the win 10 sound source.

@Swiftb0y
Copy link
Member

Ok makes sense. I'll go ahead and merge since the code seems fairly robust to me now (apart from the caching reader code).

@Swiftb0y Swiftb0y merged commit 3aa2318 into mixxxdj:main Dec 13, 2022
@daschuer
Copy link
Member Author

Thank you for the detailed review.

@Swiftb0y
Copy link
Member

Thank you for your patience

@JoergAtGithub
Copy link
Member

The next step for now is to add a visible indicator for outdated analysis data and cue points. I am curious if the Windows 11 bug is able to trigger this check
The message occurs now always on Windows11. Also after re-analysing of the track.

@daschuer daschuer deleted the offset_detect branch January 3, 2023 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants